West Health Data Science Blog

Applied Data Science to Lower Healthcare Costs for Successful Aging

Mon 22 March 2021

Visualizations with Dash Leaflet Series Part 2 - Basics - Review of Dash, Geopandas and Dash-Leaflet

Posted by Haw-minn Lu in Visualizations   

Overview

In this section, we discuss geographic data handling, a brief overview of dash and introduce dash-leaflet in greater detail.

Geographic Data Handling

Where to get geographic data

There are two types of geographic data that we use in this tutorial, map data, and geometries. Map data is the underlying map used by Leaflet. In the section below where we discuss dash-leaflet we will go into greater detail, but the vast majority of maps have a basis in the open street map project. The other category of geographic data, we refer to as geometries. These describe objects as points or boundaries. For example, a city may be represented by a latitude and longitude though technically cities have municipal boundaries and actually may span area. For the purposes of this tutorial, we will use the uscities.csv file obtained from simplemaps.com, distributed under the Creative Commons Attribution 4.0. They have a number of other geographic files that are free and distributed under that license.

Another good resource is the US Census Bureau which has almost every imaginable boundary file for the US at this link. However, the Census Bureau frequently reorganizes their site so an on-line search can be used should the provided link fail. The files are distributed in geographic information system (GIS) shapefiles and keyhole markup language (KML) files all together in a single zip file. They offer several resolutions for these depending on the detail desired. These files from the Census Bureau are do not have a copyright and do not necessarily require attribution, though it is always a good idea to give credit where credit is due and give attribution. For the purposes of this tutorial we will use the cb_2018_us_county_20m.zip file for county boundaries and zip cb_2019_us_state_20m.zip for state boundaries. The lowest resolution shapes are selected to reduce the memory footprint and processing required. For higher quality, use the 500k version of the files.

GeoPandas and Data Formats

We've already mentioned GIS shapefiles and KML files. Now we introduce two more GeoJSON and Geobuf. We won't get into the fine details on either of these, but leaflet supports the use of both of these. GeoJSON is a JSON file format for encoding a variety of geographic data structures. Geobuf is a binary file format for encoding a variety of geographic data structures. Beyond know what they are for the purposes here, that's all that is needed.

Fortunately, geopandas can read most geographic data files automatically. It can even read the zip files provided by the US Census Bureau without unzipping it. So let's start by loading the county census shape file and examine the geopanda dataframe.

import geopandas as gpd
zipfile = "zip://cb_2019_us_county_20m.zip"
gdf = gpd.read_file(zipfile)
gdf

This dataframe looks and feels like a pandas dataframe, but you should note that there is a geometry column, which as some strangely formatted object. It is in fact a shapely.geometry object. For the purposes here, no deep knowledge of shapely or geopandas is needed. Suffice it to say that a GeoDataFrame is essentially a pandas DataFrame with that has a geometry column. In all other respects it behaves like a pandas DataFrame and most methods and tools operate in the same fashion.

Turning to the other data file, uscities.csv, which does not explicitly have a geometry column, we also would like also like to convert it to a geopandas GeoDataFrame. Upon inspection of the csv file, there is a latitude and longitude columns, lat and lng respectively. The following snippet of code can convert it to a geopandas GeoDataFrame.

import geopandas as gpd
import pandas as pd
df = pd.read_csv('uscities.csv')
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lat, df.lng))

The last line basically creates a GeoDataFrame from a DataFrame by adding a column geometry which converts pairs of values into a shapely point using the function points_from_xy.

The choice of using GeoDataFrame is that is allows a powerful set of geographic and geometric tools on top of pandas. As we progress to our examples, two particular operations bounds and centroid will be important. It should be noted that the dash-leaflet documentation elects to do any data manipulation in pandas then uses dash-leaflet.express helpers to convert the an extracted dict to GeoJSON. using the dict_to_geojson helper function. Using geopandas allows you to skip this as geopandas can directly generate GeoJSON formatted data.

Another helper function in dash-leaflet.express is geojson_to_geobuf which can be used to encode a GeoJSON structure into a Geobuf. Geobuf has the advantage of being a much more space efficient format.

Dash Basics

Dash is framework to build web applications and is especially good for developing analytic applications. It is built on Flask, Plotly.js and React.js.

This example taken from the Dash User Guide demonstrates the basic anatomy of a Dash app.

import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output


app = dash.Dash(__name__)

app.layout = html.Div([
    html.H6("Change the value in the text box to see callbacks in action!"),
    html.Div(["Input: ",
              dcc.Input(id='my-input', value='initial value', type='text')]),
    html.Br(),
    html.Div(id='my-output'),

])

@app.callback(
    Output(component_id='my-output', component_property='children'),
    Input(component_id='my-input', component_property='value')
)
def update_output_div(input_value):
    return 'Output: {}'.format(input_value)


if __name__ == '__main__':
    app.run_server(debug=True)

This basic structure is all you will need for this tutorial. If you are interested in greater detail or other components, we encourage you to explore the documentation in greater detail. The basic building blocks are components and for this tutorial, the dash_core_components, dash_html_components as well as dash_leaflet will provide the necessary components. The app instantiation of Dash is the core object which the app is built around. This is similar for those who build Flask app to the instantiation of Flask. The layout attribute of the app object defines the HTML layout of the app. As you can see, the components from dash_html_components mirror HTML tags. Pretty much any HTML tag has a corresponding Dash component in dash_html_components. The core components contain more of the interactive components, such as input, dropdown, and graph.

The next block of code defines the interactivity. The callback decorator is passed input and output dependencies. In this case the value property of the my-input component triggers the callback which updates the children property of the my-output component.

Finally, app.run_server starts the webserver. Dash can be run inline in Jupyter notebooks with the use of JupyterDash as mentioned in part 1. To use JupyterDash, assign your app to an instantiation of JupyterDash rather than Dash. Additionally, run_server also takes a mode argument and if set to 'inline'will display the app inline in a Jupyter notebook.

Dash Leaflet

Dash-leaflet as mentioned in part 1 is a lightweight wrapper to leaflet.js. It ports most reactcomponents in leaflet to dash. There are numerous components, but the three that will be of the most importance to us here are the Map, TileLayer and GeoJSON components.

The Map component can be though of as the parent container which houses our visualizations. The TileLayer and GeoJSON components are layers within the Map component. We have seen from our Getting Started example, where a Map component comprises a TileLayer as a child.

app.layout = dl.Map(dl.TileLayer(), style={'width': '1000px', 'height': '500px'})

If you layer children in a Map component they are rendered from left to right with the last child rendered on the top. So you will want the TileLayer to be the bottom layer hence it should be first. The height and width styling is supplied to make the map look good when deployed inline to a Jupyter notebook. These dimensions may change due to the aesthetic of your particular visualization.

The Map Component

The Map component as well as the other layer components has the standard Dash properties: id, children, style and className. It has a lot of other properties that aren't really documented but can be found in the source for Map.react.js at the components subdirectory of the source file. They are two numerous to go into here, but a few useful properties include click_lat_lng which when used as an Input to a callback provide the location on the map of a mouse click and dbl_click_lat_lng which reports the location of a mouse double click. Aside from the styling properties of style and className the other property that is useful is bounds which functions like a bounding box how much of the map is to be rendered. Two other properties center and zoom are useful as well. The property center centers the map at a given latitude and longitude and zoom provides the zoom level to display the map. For example, to display the continental United States a setting of center=[39, -98], zoom=4 yields nice results. This allows us to programatically change the view of the map presented to the user.

app.layout = dl.Map(dl.TileLayer(),
                    center=[39, -98],
                    zoom=4,
                    style={'width': '1000px', 'height': '500px'})

The TileLayer component

The TileLayer is responsible for grabbing map tiles from tile server. There are a number of free servers as well as some servers which are commercial, but on a freemium model, but require incorporating an access key. Openstreetmaps is really the parent of all tileservers, even the commercial map servers use openstreetmaps as their underlying dataset. For a list of map servers, you can look at openstreetmap's wiki article on tile servers.

The TileLayer component has the standard Dash properties mentioned above for the Map component. Two other important properties are the url property and the attribution property. The url property specifies the url for the tile server, by default openstreetmap is used. It has a corresponding url of https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png. The x,y, z components clearly indicate tile position, the s indicates the server. In the case of openstreetmap, there are 3 servers indicated by a, b, c. Some tile servers take and s value some don't. The attribution is an HTML string with links to properly give credit to the map server. This should be set in compliance with the terms of use of the underlying tile server, plus it is always the right thing to give credit where credit is due.

The GeoJSON component

The GeoJSON component is where most of the magic happens. In short it is the layer best equipped with infusing data into the maps. The properties of the GeoJSON play a large part in the customization of the map visualization and behavior described in this tutorial. As such these properties are described in greater detail in subsequent parts of this tutorial. Probably the most important property of the GeoJSON layer is the data property which supplies the layer with geographic data. The propezrty format specifies whether the data is formatted as a 'geojson' or 'geobuf' as be introduced at the beginning of this part of the tutorial. In a nutshell what the GeoJSON does is that it renders all point data in data as a marker (which we'll discuss later) and all polygonal data in data as an outline. Another property we'll need for our examples is zoomToBounds which if set to True zooms the map to the bounds of the data whenever it changes. For instance, if the data is an outline of a state, it will center and zoom the map to the state.

Some Simple Examples

To close out this section of the tutorial, we'll draw a few outlines on a map using the county and state US Census shape files. The full example is provided in this notebook. We will make an interactive map of the states and counties of Alaska (why Alaska? I will be clear in a moment.)

First thing we do is to process the data appropriately.

zipfile = "zip://cb_2019_us_state_20m.zip"
gdf = gpd.read_file(zipfile)
us_territories_geoids = ['78','69','72','60','66']
alaska_hawaii_geoids = ['02','15']
continental_us_states=gdf[~gdf['GEOID'].isin(us_territories_geoids+alaska_hawaii_geoids)

Besides reading the shape file, we also filter out US territories that are provided by the Census file. Additionally, to display the continental United States we filter out Alaska and Hawaii.

Next the GeoPandas DataFrame is converted to a GeoJSON object. the to_json method converts the DataFrame to a JSON string which then needs to be loaded into a python dictionary. Then we create the layout for the app.

geojson=json.loads(continental_us_states.to_json())
app.layout = dl.Map([
    dl.TileLayer(),
    dl.GeoJSON(data=geojson, zoomToBounds=True, zoomToBoundsOnClick=True)
],
                    style={
                        'width': '1000px',
                        'height': '500px'
                    })

The zoomToBounds property sets the bounds based on the supplied data so we don't need to supply the center or zoom properties to the map. The zoomToBoundsOnClick adds a little interactivity. If set to True clicking on a feature (i.e. state in our case) will set the zoom on the data associated with that feature. There many more customizations we can apply to this type of visualization which we will go into greater depth in the next section.

The Anti Meridian Problem

Suppose we decide to display the entire 50 states. Suddenly it seems the zoom level is out of whack.

To illustrate the problem further, we produce a similar visualization based on the counties of Alaska. To show the use of the geobuf type we convert the geojson to geobuf here. If geobuf is used the format property must be set to 'geobuf'. As for the data preprocessing we filter the GeoDataFrame read from the Census by filtering on the state Federal Information Processing Standards (FIPS) code for Alaska, '02'.

import dash_leaflet.express as dlx
zipfile = "zip://cb_2019_us_county_20m.zip"
gdf = gpd.read_file(zipfile)
alaska = gdf[gdf['STATEFP']=='02']
geojson=json.loads(alaska.to_json())
geobuf = dlx.geojson_to_geobuf(geojson)
app.layout = dl.Map([
    dl.TileLayer(),
    dl.GeoJSON(data=geobuf, format='geobuf', zoomToBounds=True, zoomToBoundsOnClick=True)
],
                    style={
                        'width': '1000px',
                        'height': '500px'
                    })

Again, we get that zoomed out view. If you click on the counties, the zoom in feature works as expected, until you click on the lower left Aleutian Islands. If you click on that you get that world view again. So what is wrong here? Both for this county Aleutians West and the outline for Alaska in our state maps, there are some polygons on the opposite side of the Anti Meridian (the 180 degrees longitude line). If you example the longitudes of the Aleutians West, some are listed as -170s and some in the 170s. The other clue is that if you look at the extreme right you will see highlighted the western half of the county rendered in that part of the map.

There are two solutions, one is to simply drop the polygons on the wrong side of the Anti Meridian. The other is to take the longitudes that are in the 170s and put them in the -180s. This is like instead of saying "179 degrees West" saying "181 degrees East" while not canonical, it is still accurate.

To correct this, we used shapely and some fancy pandas coding which is beyond the scope of this tutorial. Nonetheless, it is presented here and in the sample notebook and can be adapted for other applications.

First we extract the geometry of Aleutians West

aleutians_west=alaska[alaska['GEOID']=='02016'].geometry.to_list()[0]

Next we find all points west of the Anti Meridian and shift it to an east value (basically subtracting 360 from it.)

import shapely
corrected_poly=[]
for each in aleutians_west:
    if each.exterior.coords.xy[0][0]>0:
        corrected_poly.append(shapely.affinity.translate(each,-360,0))
    else:
        corrected_poly.append(each),
fixed_aleutian=shapely.geometry.multipolygon.MultiPolygon(corrected_poly)

It should be noted that we know that the geometry is a MultiPolygon if you were to adapt this to be general purpose, you need to account for the possibility that your shape is simply a Polygon.

Finally we insert it back into the GeoDataFrame. The at mechanism was used since with trial and error it seems to be the one method that works.

alaska.at[alaska[alaska['GEOID']=='02016'].index[0], 'geometry']=fixed_aleutian

Now if you repeat the code shown above, Alaska comes into view without being wrapped around the world.

Conclusion

In this installment, we showed the basics behind, working with geographic data files, Dash and Dash Leaflet. Demonstrated some basic visualizations with state and county boundaries. In the next installment, we will take the outline visualization shown here even further.