West Health Data Science Blog

Applied Data Science to Lower Healthcare Costs for Successful Aging

Tue 16 February 2021

Visualizations with Dash Leaflet Series - Part 1: Setup and Configuration

Posted by Haw-minn Lu in Visualizations   

Introduction

This tutorial series came about when the Data Science team here at the West Health Institute was asked to provide a highly interactive map visualization by the West Health Policy Center and the University of Pittsburg School of Pharmacy for their study: Access to Potential COVID-19 Vaccine Administration Facilities: A Geographic Information Systems Analysis. We'll describe more about the visualization later.

In the evaluation process Dash Leaflet prove to be a powerful though underdocumented library. As a result, many sources of documentation as well as some source code sleuthing were required to acquire sufficent expertise to implement the visualization. This tutorial series serves as a vessel to share this expertise with the community at large.

What is Dash Leaflet?

Dash leaflet is a lightweight wrapper around the popular Leaflet.js library but also includes supercluster, MapBox's open source clustering library. The wrapper presents the various React components of Leaflet.js as dash components that are easy to configure to those comfortable with the dash framework.

Why Dash Leaflet?

There are other map visualizations and pythonization of Leaflet.js available, so why pick Dash Leaflet amongst these choices. In our case, it came down to our requirements and Dash Leaflet was the only one which met them.

At a minumum requirements were that it include a reasonably fast clustering solution, be able to handle data on the order of 80,000 features, be low cost or free and be deployable as a standalone app.

MapBox was a leading candidate. MapBox provided map visualization capabilities and had a very fast clustering algorithm which runs client side and even has components in dash. However, MapBox works on a freemium model and could become too costly for a non-profit project such as the aforementioned study. Additionally, using MapBox would require key management to be incorporated. MapBox might have remained the option of choice had we not found Dash Leaflet.

Jupyter's ipyleaflet was another potential candidate. In truth, like MapBox, it has a lot more customizations baked in (such as predefined marker shapes). The Achilles heel for ipyleaflet is that there was no standard deployment method as a standalone app. The documentation recommended the use of voila for deployment, but for our particular use case, this resulted in a startup time of up to 15 minutes per session, which simply wasn't acceptable.

With all these benefits, Dash Leaflet is very barebones and does not include many of the conveniences of some other packages so some additional effort is required to perform customizations.

Resources

Documentation is somewhat scant with Dash Leaflet. The bulk of the documentation are examples of usage with reference to the Leaflet.js documentation. Since much of the Dash Leaflet components mirror that of Leaflet.js, one can get a majority of the needed information. One deficiency is that it is often unclear what properties are exposed to dash. Often to figure this out, a little sleuthing in the source code is necessary in particular in the src/lib/components/ directory. The code itself is sufficiently documented to make this determination.

An Complex Example

We use the aforementioned Covid Vaccination visualization as an example. The purpose to of this section is to give an overview of the custom features and explain at a high level how the features were implemented. If you are interested in the subject matter please refer to the study mentioned above.

If this visualization takes some time to start up initially it is a consequence of our use of AWS Lambda functions to deploy the app not a consequence of the Dash Leaflet framework. You can deploy to a dedicated server if you want to avoid this issue.

The visualization has two active modes, the overview mode was described in the national view. After a state is selected the visualization enters a detail mode. The selection of various options can yield both a state or county view while in the detail mode.

When the visualization is first shown a national view is presented, each state has a custom marker indicating the number of potential vaccination centers as projected by the study. Over each marker is a tooltip which breaks down the vaccination centers into four categories. These markers are precalculated at this level and do not use clusters. They are intended to give a similar look and feel as the clusters that are presented in the state and county views. To this end the markers are placed at the centroid of each state. Hovering over each state highlights it. Clicking on a state or selecting it from the drop down menus on the lower left selects the state and goes into a state view.

Under the hood, clicking on a state triggers a callback which sets the state of the drop down menu. Because of this zooming is governed by the dropdown menu's callback so automatic zooming features built into leaflet can not be used.

Once the state is selected, the visualization enters the detail mode. At this point the counties for the selected state are shown. For basic navigation, clicking on the county activates the dropdown menu's callback for each state. Hovering over any county displays an information box in the upper right corner with statistics about the county. Clicking on a neighboring state selects that state. At the detail level potential vaccine centers are clustered employing the MapBox supercluster library. An automatic feature of the library is that clicking on the cluster marker zooms in on the cluster until that cluster separates into smaller clusters or individual sites. A custom tooltip is created deriving from the contents of a given cluster.

Finally, the individual potential vaccine sites have different symbols for markers. The marker legend is also a filter which activates the dropdown callback for facility type.

All of these features and customizations will be explained in subsequent blog articles.

Getting Setup

Our github repository contains Jupyter Notebooks for all the articles.

If you are skilled in python, Jupyter Notebooks is not required.

Prerequesites

For this tutorial series, it is expected that the reader be moderately skilled in programming in python and using plotly dash and is comfortable in the Jupyter environment.

While not required, familiarity with pandas, geopandas and javascript is helpful.

Required Libraries

To run the notebooks, it is assumed that the following python packages be installed.

  • dash
  • jupyter-dash
  • dash-leaflet
  • pandas
  • dash_extensions
  • geopandas

If you are using conda or mamba, please note that at the time of the writing of this article, conda's dependencies have an issue when installing jupyter-dash. Dash must be installed with conda first before jupyter-dash or an obsolete version of dash will get installed.

A Dockerfile is provided as well which enables the reader to run the jupyter environment inside a docker container. If you use docker it is recommended that you install jupyter-server-proxy.

Getting Started

To insure your environment is properly set up. The notebook contains the "Getting Started" example from the Dash-Leaflet documentation ported to the Jupyter environment. This notebooks is designed to be compatible with operating in the docker environment. If you are not running in the docker enviroment. The following line is not necessary and can be commented out:

JupyterDash.infer_jupyter_proxy_config()

What this line does is it enables JupyterDash to use Jupyter Server Proxy to proxy all connections to a single exposed port. While JupyterDash is a wrapper that allows dash to run inside a notebook. Dash still opens a local flask server on a selected port.

Additionally, JupyterDash replaces Dash in the creation of the app. The final change is that JupyterDash's app takes an mode parameter. By setting, mode to 'inline' the visualization will display inside the notebook.

Finally, one might noticed that the port number is randomly selected. This is to permit the running of simultaneous dash apps within any container. Remember while JupyterDash inlines dash it still uses a port even if it is internal to the container. By randomly selecting a port, it is unlikely you will have a port number conflict.

Outline

The following parts are tentatively scheduled for this blog series. This section will be update as the series is published.

  • Part 2 - Basics - Review of Dash, Geopandas and Dash-Leaflet
  • Part 3 - Feature Outline Maps
  • Part 4 - Scatter Plot Maps
  • Part 5 - Cluster Maps
  • Part 6 - Complex Visualizations Using Multiple Layers
  • Part 7 - Special Tricks and Hacks