Using the World Air Quality Index API to make pollution maps
I like looking at live data visualizations, like the World Air Quality Index (WAQI). It puts together thousands of measuring station data feeds and shows them on a map. However, the main visualization is not really a mapping, but rather a map with markers. Is it possible to use the same data and generate an interactive map? This post is a write-down of my attempt.
If you're wondering what the Air Quality Index is, here is some more information (from here):
Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.
If my understanding of this calculator is correct, then the AQI is just a mapping, that depends on the considered pollutant, from concentration in the air to a number between 0 and 500.
The WAQI API¶
To access the WAQI API, we need a token, which can be obtained by registering an account with them (see here).
Once we have this, we can use the API to fetch some data. Here's how it goes in Python. Most API calls start with the same base URL, so let's declare this first, as well as the loading of my personal token.
import requests
base_url = "https://api.waqi.info"
# read secret token from file containing the token as a single string
# you need to create this file if you want to reproduce this analysis
token = open('.waqitoken').read()
City request¶
We can do a basic city request with the following code (see http://aqicn.org/json-api/doc/#api-City_Feed-GetCityFeed for details) that prints the Air Quality Index (AQI) in the location that best matches the search string:
city = 'Paris'
r = requests.get(base_url + f"/feed/{city}/?token={token}")
"City: {}, Air quality index: {}".format(r.json()['data']['city']['name'], r.json()['data']['aqi'])
'City: Paris, Air quality index: 20'
city = 'New York'
r = requests.get(base_url + f"/feed/{city}/?token={token}")
"City: {}, Air quality index: {}".format(r.json()['data']['city']['name'], r.json()['data']['aqi'])
'City: New York, Air quality index: 50'
city = 'Beijing'
r = requests.get(base_url + f"/feed/{city}/?token={token}")
"City: {}, Air quality index: {}".format(r.json()['data']['city']['name'], r.json()['data']['aqi'])
'City: Beijing (北京), Air quality index: 152'
city = 'Berlin'
r = requests.get(base_url + f"/feed/{city}/?token={token}")
"City: {}, Air quality index: {}".format(r.json()['data']['city']['name'], r.json()['data']['aqi'])
'City: Berlin, Germany, Air quality index: 34'
Searching by coordinates¶
The API also allows to search by latitude / longitude coordinates (see http://aqicn.org/json-api/doc/#api-Geolocalized_Feed-GetGeolocFeed):
lat, lng = 43.6604, 7.1531
r = requests.get(base_url + f"/feed/geo:{lat};{lng}/?token={token}")
"Station: {}, Air quality index: {}".format(r.json()['data']['city']['name'], r.json()['data']['aqi'])
'Station: Nice Aeroport, PACA, France, Air quality index: 23'
Map query¶
Since our goal is to make a map, the most useful API query is the one allowing to define a latitude / longitude range box so that the API returns all measurement stations within (see http://aqicn.org/json-api/doc/#api-Map_Queries-GetMapStations). Let's try it out and print the names of the returned stations when searching within the Paris area.
# (latitude, longitude) bottom left, (latitude, longitude) top right
latlngbox = "48.639956,1.761273,49.159944,2.947797"
r = requests.get(base_url + f"/map/bounds/?latlng={latlngbox}&token={token}")
[f"Station: {d['station']['name']}, latitude: {d['lat']}, longitude: {d['lon']}, air quality: {d['aqi']}" for d in r.json()['data']]
['Station: Gennevilliers, Paris, latitude: 48.9302, longitude: 2.29421, air quality: 15', 'Station: Place De Lopera, Paris, latitude: 48.8704, longitude: 2.33241, air quality: 37', 'Station: Autoroute A1 - Saint-denis, Paris, latitude: 48.9251, longitude: 2.35654, air quality: 58', 'Station: Gonesse, Paris, latitude: 48.9908, longitude: 2.44461, air quality: 39', 'Station: La Defense, Paris, latitude: 48.8913, longitude: 2.24064, air quality: 8', 'Station: Lognes, Paris, latitude: 48.8403, longitude: 2.63463, air quality: 17', 'Station: Boulevard Peripherique Auteuil, Paris, latitude: 48.8502, longitude: 2.25258, air quality: 38', 'Station: Paris 18eme, Paris, latitude: 48.8917, longitude: 2.34563, air quality: 12', 'Station: Zone Rurale Nord - Saint-martin-du-tertre, Paris, latitude: 49.1003, longitude: 2.34378, air quality: 18', 'Station: Zone Rurale Nord-ouest - Fremainville, Paris, latitude: 49.0631, longitude: 1.8664, air quality: 25', 'Station: Route Nationale 2 - Pantin, Paris, latitude: 48.9022, longitude: 2.3907, air quality: 23', 'Station: Nogent-sur-marne, Paris, latitude: 48.8408, longitude: 2.4844, air quality: -', 'Station: Boulevard Peripherique Est, Paris, latitude: 48.8386, longitude: 2.41278, air quality: 51', 'Station: Place Victor Basch, Paris, latitude: 48.8277, longitude: 2.3267, air quality: -', 'Station: Bobigny, Paris, latitude: 48.9024, longitude: 2.45261, air quality: -', 'Station: Cergy-pontoise, Paris, latitude: 49.0459, longitude: 2.04104, air quality: 23', 'Station: Paris, latitude: 48.856614, longitude: 2.3522219, air quality: 51', 'Station: Paris Stade Lenglen, Paris, latitude: 48.8303, longitude: 2.26972, air quality: 52', 'Station: Boulevard Haussmann, Paris, latitude: 48.8733, longitude: 2.32957, air quality: 20', 'Station: Vitry-sur-seine, Paris, latitude: 48.7759, longitude: 2.37577, air quality: 25', 'Station: Tremblay-en-france, Paris, latitude: 48.9557, longitude: 2.57441, air quality: 39', 'Station: Avenue Des Champs Elysees, Paris, latitude: 48.8686, longitude: 2.31166, air quality: -']
Mapping the data¶
The above data is exactly what we need to make a map. To do that, we first turn it into a DataFrame, which is more easy to process.
import pandas as pd
def make_dataframe(r):
"""Extracts data from request r and returns a DataFrame."""
rows = []
for item in r.json()['data']:
rows.append([item['lat'], item['lon'], item['aqi'], item['station']['name']])
df = pd.DataFrame(rows, columns=['lat', 'lon', 'aqi', 'name'])
df['aqi'] = pd.to_numeric(df.aqi, errors='coerce')
return df
df = make_dataframe(r)
df.head()
lat | lon | aqi | name | |
---|---|---|---|---|
0 | 48.9302 | 2.29421 | 15.0 | Gennevilliers, Paris |
1 | 48.8704 | 2.33241 | 37.0 | Place De Lopera, Paris |
2 | 48.9251 | 2.35654 | 58.0 | Autoroute A1 - Saint-denis, Paris |
3 | 48.9908 | 2.44461 | 39.0 | Gonesse, Paris |
4 | 48.8913 | 2.24064 | 8.0 | La Defense, Paris |
Using this dataframe, we can make an interactive scatter plot using Holoviews:
import holoviews as hv
hv.extension('bokeh', logo=False)
scatter = hv.Scatter(df.dropna(), kdims='lon', vdims=['lat', 'aqi', 'name'])
scatter.opts(color='aqi', size=10, padding=.1, tools=['hover'], colorbar=True, cmap='magma', width=500, height=400, clim=(0, 60))
This is nice, but it would be nicer if could have a background map as well. Fortunately, the Holoviews team has put together a Geoviews package that allows us to do that.
import geoviews as gv
import geoviews.tile_sources as gvts
gv.extension('bokeh', 'matplotlib', logo=False)
points = gv.Points(df.dropna(), ['lon', 'lat'], ['aqi', 'name'])
points.opts(size=10, color='aqi', cmap='magma', tools=['hover'], colorbar=True, width=500, height=400, padding=.1, clim=(0, 60))
gvts.OSM * points
C:\Anaconda3\lib\site-packages\datashader\transfer_functions.py:21: FutureWarning: xarray subclass Image should explicitly define __slots__ class Image(xr.DataArray):
An Inverse Distance Weighting map¶
The previous plot is nice, since it allows us to visually inspect the data. But it doesnt cover the map. We can do better by using IDW and interpolating the existing points over a grid.
The implementation is copied on https://stackoverflow.com/questions/3104781/inverse-distance-weighted-idw-interpolation-with-python.
import numpy as np
def distance_matrix(x0, y0, x1, y1):
obs = np.vstack((x0, y0)).T
interp = np.vstack((x1, y1)).T
# Make a distance matrix between pairwise observations
# Note: from <http://stackoverflow.com/questions/1871536>
# (Yay for ufuncs!)
d0 = np.subtract.outer(obs[:,0], interp[:,0])
d1 = np.subtract.outer(obs[:,1], interp[:,1])
return np.hypot(d0, d1)
def simple_idw(x, y, z, xi, yi):
dist = distance_matrix(x,y, xi,yi)
# In IDW, weights are 1 / distance
weights = 1.0 / dist
# Make weights sum to one
weights /= weights.sum(axis=0)
# Multiply the weights for each interpolated point by all observed Z-values
zi = np.dot(weights.T, z)
return zi
To use the above functions, let's now define a grid using meshgrid and interpolate over it.
latlngbox_num = list(map(float, latlngbox.split(',')))
lats = np.linspace(latlngbox_num[0], latlngbox_num[2], num=150)
lons = np.linspace(latlngbox_num[1], latlngbox_num[3], num=151)
meshgrid_shape = (lats.size, lons.size)
xi, yi = np.meshgrid(lons, lats)
xi, yi = xi.ravel(), yi.ravel()
df = df.dropna()
x, y = df.lon.values, df.lat.values
z = df.aqi.values
zi = simple_idw(x, y, z, xi, yi)
zi = zi.reshape(meshgrid_shape)
Let's display the result and compare it to our previous scatter plot.
interpolated = hv.Image(zi[::-1, :])
interpolated.opts(colorbar=True, cmap='magma', tools=['hover'], clim=(0, 60), width=500, height=400)
(scatter + interpolated).cols(1)
Finally, let's put this into a geoviews map so that we get a nice background.
import xarray as xr
ds = xr.DataArray(zi, dims=['lat', 'lon'], coords={'lat': lats, 'lon': lons}).to_dataset(name='aqi')
aqi_ds = gv.Dataset(ds, ['lon', 'lat'], 'aqi')
gvts.OSM * gv.Image(aqi_ds).opts(alpha=0.5, width=500, height=400, colorbar=True, clim=(0, 60), cmap='magma')
Map of Europe¶
With all these tools in place, we can move on to our final project. Let's make an air quality index map of Europe.
latlngbox = "34.85,-12.7,60.845,30.498"
r = requests.get(base_url + f"/map/bounds/?latlng={latlngbox}&token={token}")
df = make_dataframe(r)
latlngbox_num = list(map(float, latlngbox.split(',')))
lats = np.linspace(latlngbox_num[0], latlngbox_num[2], num=350)
lons = np.linspace(latlngbox_num[1], latlngbox_num[3], num=351)
meshgrid_shape = (lats.size, lons.size)
xi, yi = np.meshgrid(lons, lats)
xi, yi = xi.ravel(), yi.ravel()
df = df.dropna()
x, y = df.lon.values, df.lat.values
z = df.aqi.values
zi = simple_idw(x, y, z, xi, yi)
zi = zi.reshape(meshgrid_shape)
arr = xr.DataArray(zi, dims=['lat', 'lon'], coords={'lat': lats, 'lon': lons}).to_dataset(name='aqi')
aqi_ds = gv.Dataset(arr, ['lon', 'lat'], 'aqi')
aqi_map = gvts.CartoEco * gv.Image(aqi_ds).opts(alpha=0.8, width=500, height=400, colorbar=True, cmap='magma')
aqi_map
Additionaly, we can add a contour plot on top of this.
gvts.CartoEco * aqi_ds.to(gv.FilledContours, ['lon', 'lat']).opts(alpha=0.8, width=500, height=400,
colorbar=True, cmap='magma', levels=10, color_levels=10,
tools=['hover'])
Conclusions¶
Judging from the above, it looks like the european air quality is split in two between West and East Europe, with clusters of worse air quality around industrial areas. This is a nice finishing point for this analysis, which has taken us from the nicely working WAQI API to some geographical plotting using GeoViews.
This post was entirely written using the Jupyter Notebook. Its content is BSD-licensed. You can see a static view or download this notebook with the help of nbviewer at 20191126_WorldAirQualityIndexAPI.ipynb.