Visualizations with Dash Leaflet Series Part 2 - Basics - Review of Dash, Geopandas and Dash-Leaflet
Posted by Haw-minn Lu in Visualizations
Overview
In this section, we discuss geographic data handling, a brief overview of dash
and introduce dash-leaflet
in greater detail.
Geographic Data Handling
Where to get geographic data
There are two types of geographic data that we use in this tutorial, map data, and geometries. Map data is the underlying map used by Leaflet. In the section below where we discuss dash-leaflet
we will go into greater detail, but the vast majority of maps have a basis in the open street map project. The other category of geographic data, we refer to as geometries. These describe objects as points or boundaries. For example, a city may be represented by a latitude and longitude though technically cities have municipal boundaries and actually may span area. For the purposes of this tutorial, we will use the uscities.csv
file obtained from simplemaps.com, distributed under the Creative Commons Attribution 4.0. They have a number of other geographic files that are free and distributed under that license.
Another good resource is the US Census Bureau which has almost every imaginable boundary file for the US at this link. However, the Census Bureau frequently reorganizes their site so an on-line search can be used should the provided link fail. The files are distributed in geographic information system (GIS) shapefiles and keyhole markup language (KML) files all together in a single zip file. They offer several resolutions for these depending on the detail desired. These files from the Census Bureau are do not have a copyright and do not necessarily require attribution, though it is always a good idea to give credit where credit is due and give attribution. For the purposes of this tutorial we will use the cb_2018_us_county_20m.zip
file for county boundaries and zip cb_2019_us_state_20m.zip
for state boundaries. The lowest resolution shapes are selected to reduce the memory footprint and processing required. For higher quality, use the 500k version of the files.
GeoPandas and Data Formats
We've already mentioned GIS shapefiles and KML files. Now we introduce two more GeoJSON and Geobuf. We won't get into the fine details on either of these, but leaflet
supports the use of both of these. GeoJSON is a JSON file format for encoding a variety of geographic data structures. Geobuf is a binary file format for encoding a variety of geographic data structures. Beyond know what they are for the purposes here, that's all that is needed.
Fortunately, geopandas
can read most geographic data files automatically. It can even read the zip files provided by the US Census Bureau without unzipping it. So let's start by loading the county census shape file and examine the geopanda
dataframe.
import geopandas as gpd
zipfile = "zip://cb_2019_us_county_20m.zip"
gdf = gpd.read_file(zipfile)
gdf
This dataframe looks and feels like a pandas
dataframe, but you should note that there is a geometry
column, which as some strangely formatted object. It is in fact a shapely.geometry
object. For the purposes here, no deep knowledge of shapely
or geopandas
is needed. Suffice it to say that a GeoDataFrame
is essentially a pandas
DataFrame
with that has a geometry
column. In all other respects it behaves like a pandas
DataFrame
and most methods and tools operate in the same fashion.
Turning to the other data file, uscities.csv
, which does not explicitly have a geometry column, we also would like also like to convert it to a geopandas
GeoDataFrame
. Upon inspection of the csv
file, there is a latitude and longitude columns, lat
and lng
respectively. The following snippet of code can convert it to a geopandas
GeoDataFrame
.
import geopandas as gpd
import pandas as pd
df = pd.read_csv('uscities.csv')
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lat, df.lng))
The last line basically creates a GeoDataFrame
from a DataFrame
by adding a column geometry
which converts pairs of values into a shapely
point using the function points_from_xy
.
The choice of using GeoDataFrame
is that is allows a powerful set of geographic and geometric tools on top of pandas
. As we progress to our examples, two particular operations bounds
and centroid
will be important. It should be noted that the dash-leaflet
documentation elects to do any data manipulation in pandas
then uses dash-leaflet.express
helpers to convert the an extracted dict
to GeoJSON. using the dict_to_geojson
helper function. Using geopandas
allows you to skip this as geopandas
can directly generate GeoJSON formatted data.
Another helper function in dash-leaflet.express
is geojson_to_geobuf
which can be used to encode a GeoJSON structure into a Geobuf. Geobuf has the advantage of being a much more space efficient format.
Dash Basics
Dash is framework to build web applications and is especially good for developing analytic applications. It is built on Flask
, Plotly.js
and React.js
.
This example taken from the Dash User Guide demonstrates the basic anatomy of a Dash app.
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
app = dash.Dash(__name__)
app.layout = html.Div([
html.H6("Change the value in the text box to see callbacks in action!"),
html.Div(["Input: ",
dcc.Input(id='my-input', value='initial value', type='text')]),
html.Br(),
html.Div(id='my-output'),
])
@app.callback(
Output(component_id='my-output', component_property='children'),
Input(component_id='my-input', component_property='value')
)
def update_output_div(input_value):
return 'Output: {}'.format(input_value)
if __name__ == '__main__':
app.run_server(debug=True)
This basic structure is all you will need for this tutorial. If you are interested in greater detail or other components, we encourage you to explore the documentation in greater detail. The basic building blocks are components and for this tutorial, the dash_core_components
, dash_html_components
as well as dash_leaflet
will provide the necessary components. The app
instantiation of Dash
is the core object which the app is built around. This is similar for those who build Flask app to the instantiation of Flask
. The layout
attribute of the app
object defines the HTML layout of the app. As you can see, the components from dash_html_components
mirror HTML tags. Pretty much any HTML tag has a corresponding Dash component in dash_html_components
. The core components contain more of the interactive components, such as input, dropdown, and graph.
The next block of code defines the interactivity. The callback
decorator is passed input and output dependencies. In this case the value
property of the my-input
component triggers the callback which updates the children
property of the my-output
component.
Finally, app.run_server
starts the webserver. Dash can be run inline in Jupyter notebooks with the use of JupyterDash
as mentioned in part 1. To use JupyterDash
, assign your app
to an instantiation of JupyterDash
rather than Dash
. Additionally, run_server
also takes a mode
argument and if set to 'inline'
will display the app inline in a Jupyter notebook.
Dash Leaflet
Dash-leaflet as mentioned in part 1 is a lightweight wrapper to leaflet.js
. It ports most react
components in leaflet
to dash. There are numerous components, but the three that will be of the most importance to us here are the Map
, TileLayer
and GeoJSON
components.
The Map
component can be though of as the parent container which houses our visualizations. The TileLayer
and GeoJSON
components are layers within the Map
component. We have seen from our Getting Started example, where a Map
component comprises a TileLayer
as a child.
app.layout = dl.Map(dl.TileLayer(), style={'width': '1000px', 'height': '500px'})
If you layer children in a Map
component they are rendered from left to right with the last child rendered on the top. So you will want the TileLayer
to be the bottom layer hence it should be first. The height
and width
styling is supplied to make the map look good when deployed inline to a Jupyter notebook. These dimensions may change due to the aesthetic of your particular visualization.
The Map Component
The Map
component as well as the other layer components has the standard Dash
properties: id
, children
, style
and className
. It has a lot of other properties that aren't really documented but can be found in the source for Map.react.js
at the components
subdirectory of the source file. They are two numerous to go into here, but a few useful properties include click_lat_lng
which when used as an Input
to a callback provide the location on the map of a mouse click and dbl_click_lat_lng
which reports the location of a mouse double click. Aside from the styling properties of style
and className
the other property that is useful is bounds
which functions like a bounding box how much of the map is to be rendered. Two other properties center
and zoom
are useful as well. The property center
centers the map at a given latitude and longitude and zoom
provides the zoom level to display the map. For example, to display the continental United States a setting of center=[39, -98], zoom=4
yields nice results. This allows us to programatically change the view of the map presented to the user.
app.layout = dl.Map(dl.TileLayer(),
center=[39, -98],
zoom=4,
style={'width': '1000px', 'height': '500px'})
The TileLayer component
The TileLayer
is responsible for grabbing map tiles from tile server. There are a number of free servers as well as some servers which are commercial, but on a freemium model, but require incorporating an access key. Openstreetmaps is really the parent of all tileservers, even the commercial map servers use openstreetmaps as their underlying dataset. For a list of map servers, you can look at openstreetmap's wiki article on tile servers.
The TileLayer
component has the standard Dash
properties mentioned above for the Map
component. Two other important properties are the url
property and the attribution
property. The url
property specifies the url for the tile server, by default openstreetmap is used. It has a corresponding url of https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png
. The x
,y
, z
components clearly indicate tile position, the s
indicates the server. In the case of openstreetmap, there are 3 servers indicated by a
, b
, c
. Some tile servers take and s
value some don't. The attribution
is an HTML string with links to properly give credit to the map server. This should be set in compliance with the terms of use of the underlying tile server, plus it is always the right thing to give credit where credit is due.
The GeoJSON component
The GeoJSON
component is where most of the magic happens. In short it is the layer best equipped with infusing data into the maps. The properties of the GeoJSON
play a large part in the customization of the map visualization and behavior described in this tutorial. As such these properties are described in greater detail in subsequent parts of this tutorial. Probably the most important property of the GeoJSON
layer is the data
property which supplies the layer with geographic data. The propezrty format
specifies whether the data is formatted as a 'geojson'
or 'geobuf'
as be introduced at the beginning of this part of the tutorial. In a nutshell what the GeoJSON
does is that it renders all point data in data
as a marker (which we'll discuss later) and all polygonal data in data
as an outline. Another property we'll need for our examples is zoomToBounds
which if set to True
zooms the map to the bounds of the data
whenever it changes. For instance, if the data
is an outline of a state, it will center and zoom the map to the state.
Some Simple Examples
To close out this section of the tutorial, we'll draw a few outlines on a map using the county and state US Census shape files. The full example is provided in this notebook. We will make an interactive map of the states and counties of Alaska (why Alaska? I will be clear in a moment.)
First thing we do is to process the data appropriately.
zipfile = "zip://cb_2019_us_state_20m.zip"
gdf = gpd.read_file(zipfile)
us_territories_geoids = ['78','69','72','60','66']
alaska_hawaii_geoids = ['02','15']
continental_us_states=gdf[~gdf['GEOID'].isin(us_territories_geoids+alaska_hawaii_geoids)
Besides reading the shape file, we also filter out US territories that are provided by the Census file. Additionally, to display the continental United States we filter out Alaska and Hawaii.
Next the GeoPandas
DataFrame
is converted to a GeoJSON object. the to_json
method converts the DataFrame
to a JSON string which then needs to be loaded into a python dictionary. Then we create the layout for the app.
geojson=json.loads(continental_us_states.to_json())
app.layout = dl.Map([
dl.TileLayer(),
dl.GeoJSON(data=geojson, zoomToBounds=True, zoomToBoundsOnClick=True)
],
style={
'width': '1000px',
'height': '500px'
})
The zoomToBounds
property sets the bounds based on the supplied data so we don't need to supply the center
or zoom
properties to the map. The zoomToBoundsOnClick
adds a little interactivity. If set to True
clicking on a feature (i.e. state in our case) will set the zoom on the data associated with that feature. There many more customizations we can apply to this type of visualization which we will go into greater depth in the next section.
The Anti Meridian Problem
Suppose we decide to display the entire 50 states. Suddenly it seems the zoom level is out of whack.
To illustrate the problem further, we produce a similar visualization based on the counties of Alaska. To show the use of the geobuf
type we convert the geojson to geobuf
here. If geobuf
is used the format
property must be set to 'geobuf'
. As for the data preprocessing we filter the GeoDataFrame
read from the Census by filtering on the state Federal Information Processing Standards (FIPS) code for Alaska, '02'
.
import dash_leaflet.express as dlx
zipfile = "zip://cb_2019_us_county_20m.zip"
gdf = gpd.read_file(zipfile)
alaska = gdf[gdf['STATEFP']=='02']
geojson=json.loads(alaska.to_json())
geobuf = dlx.geojson_to_geobuf(geojson)
app.layout = dl.Map([
dl.TileLayer(),
dl.GeoJSON(data=geobuf, format='geobuf', zoomToBounds=True, zoomToBoundsOnClick=True)
],
style={
'width': '1000px',
'height': '500px'
})
Again, we get that zoomed out view. If you click on the counties, the zoom in feature works as expected, until you click on the lower left Aleutian Islands. If you click on that you get that world view again. So what is wrong here? Both for this county Aleutians West and the outline for Alaska in our state maps, there are some polygons on the opposite side of the Anti Meridian (the 180 degrees longitude line). If you example the longitudes of the Aleutians West, some are listed as -170
s and some in the 170
s. The other clue is that if you look at the extreme right you will see highlighted the western half of the county rendered in that part of the map.
There are two solutions, one is to simply drop the polygons on the wrong side of the Anti Meridian. The other is to take the longitudes that are in the 170
s and put them in the -180
s. This is like instead of saying "179 degrees West" saying "181 degrees East" while not canonical, it is still accurate.
To correct this, we used shapely
and some fancy pandas
coding which is beyond the scope of this tutorial. Nonetheless, it is presented here and in the sample notebook and can be adapted for other applications.
First we extract the geometry of Aleutians West
aleutians_west=alaska[alaska['GEOID']=='02016'].geometry.to_list()[0]
Next we find all points west of the Anti Meridian and shift it to an east value (basically subtracting 360 from it.)
import shapely
corrected_poly=[]
for each in aleutians_west:
if each.exterior.coords.xy[0][0]>0:
corrected_poly.append(shapely.affinity.translate(each,-360,0))
else:
corrected_poly.append(each),
fixed_aleutian=shapely.geometry.multipolygon.MultiPolygon(corrected_poly)
It should be noted that we know that the geometry is a MultiPolygon
if you were to adapt this to be general purpose, you need to account for the possibility that your shape is simply a Polygon
.
Finally we insert it back into the GeoDataFrame
. The at
mechanism was used since with trial and error it seems to be the one method that works.
alaska.at[alaska[alaska['GEOID']=='02016'].index[0], 'geometry']=fixed_aleutian
Now if you repeat the code shown above, Alaska comes into view without being wrapped around the world.
Conclusion
In this installment, we showed the basics behind, working with geographic data files, Dash and Dash Leaflet. Demonstrated some basic visualizations with state and county boundaries. In the next installment, we will take the outline visualization shown here even further.