West Health Data Science Blog

Applied Data Science to Lower Healthcare Costs for Successful Aging

Mon 10 February 2020

Interactive Spotify networks with Pyvis

Posted by Giancarlo Perrone in Network   

Interactive network graphs with Pyvis

Successful Data Science pivots upon discovering meaningful relationships in data using intuitive visualizations. Representing relationships using a network graph is a common approach, but generating an interactive and fluid graph visualization can be a challenge, especially with large datasets. In this blog post we introduce Pyvis, a python module based upon the mature VisJS JavaScript library, enabling fluid and interactive visualizations of complex network graphs.

It is recommended to also check out the standalone JavaScript library, visjs to see even more examples of the capabilities. Documentation can be found here.

from IPython.display import HTML
HTML('<img src="extra/showcase.gif" height=500px width=500px>')

Installation

To install pyvis simply invoke

pip install pyvis

in a terminal

You can verify that pyvis was installed correctly by importing it:

import pyvis

Getting started

The standard way to import pyvis' Network interface is to import it as

from pyvis.network import Network

Use a Network instance to create your first graph

g = Network()

This blog post is written in Jupyter Notebook, pyvis supports experimentation within jupyter. To use pyvis from a notebook persepective, pass the notebook indicator upon instantiation:

g = Network(notebook=True)

Dataset exploration

To demonstrate a practical example of building intuitive relational visualizations, I am choosing to explore Spotify data for artists and their related artists. I have already aggregated and stored the data as a csv, so let's begin to explore what we're working with.

from pyvis.network import Network
import pandas as pd
df = pd.read_csv("../spotify_graph.csv", index_col=0)

Our dataframe is as basic as it gets. Each unique artist in the artist column is in the top 20 listened to artists for this particular user. The corresponding artist in the related column specifies a link or relationship between these two artists.

df.head()

Now, let's create our graph:

g = Network(notebook=True, width=750, height=800, bgcolor="#3c4647", font_color="white")
g

At this point, we can add the individual nodes in each column

for artist in df.artist:
    g.add_node(artist, label=artist, color="#26d18f")
for related in df.related:
    g.add_node(related, label=related, color="#8965c7")
g

To display our results, we use the show() method and provide a name for the html output

g.show("extra/snodes.html")

The above graph doesn't really tell us anything of value, but we can easily distinguish between our source nodes and destination nodes.

Let's add the edges to see how the top 20 artists link to their related artists.

g.add_edges(list(zip(df.artist, df.related)))
g.show("extra/sconnect_init.html")
g

Now our network is connected! It is easy to identify the clusters our data creates. Again, noticing the the central and outgoing nodes. Here, we can zoom in to the network to click on and drag nodes to navigate the data. In this network, you will notice that there are certain clusters that are interconnected, as well as some disconnected islands. The islands make up the outliers of the listening pattern as their links are not aligned with the majority of the data.

The interconnectivity is not exactly obvious to notice at first. For example, the King Krule and Tame Impala clusters contain multiple edges, meaning that particular artist appears several times in the related set of artists.

Looking back at our DataFrame we can delve into the related columns to see how many times each related artist appears as a result of having a relationship with another top artist.

counts = df.related.value_counts() 
counts

Using the above data we can modify our nodes to adapt to the frequencies in which the node appears, as well as provide that frequency in the node metadata.

for node in g.nodes:
    freq = str(counts.get(node['id'], 1))
    # nodes with a value will scale their size
    # nodes with a title will include a hover tooltip on the node
    node.update({"value": freq, "title": f"Frequency: {freq}"})
g.nodes[0]

We can also tweak the colors of the edges. Default pyvis behavior is to color the edges the based on the color of the source node. By configuring our Network to inherit edge colors from the destination nodes we can benefit from seeing how our source artists relate to the related artists:

g.inherit_edge_colors("to")
g.show("extra/smodded.html")

Edge metadata

Just how we demonstrated adding labels to the nodes on hover, we can also add labels to edges in the same manner. Just like nodes, the title attribute of the edges can be supplied to render a hoverable tooltip to display edge information:

for e in g.edges:
    edge_label = f'{e["from"]} ---> {e["to"]}'
    e.update({"title": edge_label})
g.edges[0]

Try hovering over an edge below to see the effect!

g.show("extra/sedgetitles.html")

Now it is much easier to tell which artists contribute to the listening pattern displayed. The bigger the node, the more it appears as a related artist to the top listened. Furthermore, nodes with green edges between them, indicate that they each appear in each other's related artists list.

Layout

Due to pyvis' default physics settings, initial network graphs may contain undesirable features. For example, our graph so far has adapted well to spacing out clusters and drawing them nicely. However, the more dense clusters contain overlapping nodes. Internal physics parameters can be tweaked to result in desired layout, and pyvis offers a sandbox-like approach to generating layout parameters.

Achieve this with the show_buttons() method, which will attach a set of tweakable options to the graph in order to adjust settings in a live setting. Use the scroll wheel on the resulting graph to access and tweak the modifiers.

g.show_buttons(filter_="physics") # only show physics options
g.show("extra/sbuttons.html")

We need to tweak the Network's options. Examining the options member we see the following:

g.options

So, using the settings UI to adjust parameters until satisfied with the layout, you can copy and paste the output from the generate options button to update the Network's internal options dict. I found the barnesHut solver to be best with the below settings:

g.options.__dict__.update({
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -8500,
      "centralGravity": 0.95,
      "springLength": 195
    },
    "minVelocity": 0.75
  }
})
g.options
g.conf = False # this is needed at the moment when reverting from a show_buttons call
g.options.__dict__.update({
    "configure": {
        "enabled": False
    }
})

g.show("extra/sfinal.html")

Conclusion

Pyvis is a powerful tool to incorporate in any data analysis task dealing with relationships between data points or topological structures. With a familiar approach to building our data Network data structures, you have seen that visualization of the results is quick and easy. Additional customization is supported to accomodate specific layout needs as well -- all supported within a Jupyter Notebook environment.

Pyvis is open source, additional feedback/feature requests and contribution is welcome! Check out pyvis on github, or check out the documentation on readthedocs.