SAPIENS is a project founded by academics in Queen Mary University of London (QMUL) and Instituto Politécnico Nacional (IPN) in Mexico City, wanting to invest their research expertise into improving society and life in big cities
SAPIENS idea: can we use some basic free level of traffic data to predict pollution levels?
We propose to investigate how traffic and pollution data can be analysed and with statistical modelling and machine learning we can learn behaviours and obtain predictions.
SAPIENS starts from the data of course. And we start from Mexico City analysing its pollution levels and traffic intensities.
Why it is needed
Pollution has a devastating effect in our lives if we live in big cities.
These days, an enormous amount of data can be available on various aspects of city life. Some data sets can come directly from the citizens themselves and there is some level of open data to be used.
SAPIENS idea: can we use the basic free level of traffic data to learn about pollution?
Pollution data are obtained from the Mexico City Data Agency.
Mexico City has 27 stations throughout the city. Each station records measurements for up to nine pollutants every hour. The Mexico City Data Agency releases the cleaned pollution data three months after it is collected.
We further clean up the data, including in our analysis only complete sets of nine pollutants and focusing on daylight hours.
We use basic Google Maps images, currently focusing on traffic levels within a 10 km² area around each sensor. Additionally, we have other sources of traffic data, such as HERE and TomTom.
In general, traffic information is collected live every hour to determine traffic levels (green, orange, red, dark red). The mapping can be optimised based on how we plan to use the input data and the model we choose to employ.
We aim at obtaining hyperlocal pollution predictions from live traffic. This way we can enable cities to enhance pollutions information/set local alerts even without specific pollution sensors and hyperlocal monitoring.
Our unique approach to traffic data can represent a game-changing opportunity for third world cities to inform and protect their citizens
As baseline model we use a partial least squares regression:
The four traffic levels in 15 aggregated rings are used as traffic inputs:
Marcella Bona, Queen Mary University of London, UK
Nathan Heatley, Queen Mary University of London, UK
Jia-Chen Hua, Queen Mary University of London, UK
Adriana Lara, Instituto Politecnico Nacional, Mexico
Valeria Legaria Santiago, Instituto Politecnico Nacional, Mexico & Queen Mary University of London, UK
Fernando Moreno-Gomez, Instituto Politecnico Nacional, Mexico
Jocelyn Richardson, Queen Mary University of London, UK