Interactive Data Visualization in Python With Bokeh
Bokeh prides itself on being a library for interactive data visualization.
Unlike popular counterparts in the Python visualization space, like Matplotlib and Seaborn, Bokeh renders its graphics using HTML and JavaScript. This makes it a great candidate for building web-based dashboards and applications. However, it’s an equally powerful tool for exploring and understanding your data or creating beautiful custom charts for a project or report.
Using a number of examples on a real-world dataset, the goal of this tutorial is to get you up and running with Bokeh.
You’ll learn how to:
- Transform your data into visualizations, using Bokeh
- Customize and organize your visualizations
- Add interactivity to your visualizations
So let’s jump in.
Free Bonus:
From Data to Visualization
Building a visualization with Bokeh involves the following steps:
- Prepare the data
- Determine where the visualization will be rendered
- Set up the figure(s)
- Connect to and draw your data
- Organize the layout
- Preview and save your beautiful data creation
Let’s explore each step in more detail.
Prepare the Data
Any good data visualization starts with—you guessed it—data. If you need a quick refresher on handling data in Python, definitely check out the growing number of excellent Real Python tutorials on the subject.
This step commonly involves data handling libraries like Pandas and Numpy and is all about taking the required steps to transform it into a form that is best suited for your intended visualization.
Determine Where the Visualization Will Be Rendered
At this step, you’ll determine how you want to generate and ultimately view your visualization. In this tutorial, you’ll learn about two common options that Bokeh provides: generating a static HTML file and rendering your visualization inline in a Jupyter Notebook.
Set up the Figure(s)
From here, you’ll assemble your figure, preparing the canvas for your visualization. In this step, you can customize everything from the titles to the tick marks. You can also set up a suite of tools that can enable various user interactions with your visualization.
Connect to and Draw Your Data
Next, you’ll use Bokeh’s multitude of renderers to give shape to your data. Here, you have the flexibility to draw your data from scratch using the many available marker and shape options, all of which are easily customizable. This functionality gives you incredible creative freedom in representing your data.
Additionally, Bokeh has some built-in functionality for building things like stacked bar charts and plenty of examples for creating more advanced visualizations like network graphs and maps.
Organize the Layout
If you need more than one figure to express your data, Bokeh’s got you covered. Not only does Bokeh offer the standard grid-like layout options, but it also allows you to easily organize your visualizations into a tabbed layout in just a few lines of code.
In addition, your plots can be quickly linked together, so a selection on one will be reflected on any combination of the others.
Preview and Save Your Beautiful Data Creation
Finally, it’s time to see what you created.
Whether you’re viewing your visualization in a browser or notebook, you’ll be able to explore your visualization, examine your customizations, and play with any interactions that were added.
If you like what you see, you can save your visualization to an image file. Otherwise, you can revisit the steps above as needed to bring your data vision to reality.
That’s it! Those six steps are the building blocks for a tidy, flexible template that can be used to take your data from the table to the big screen:
"""Bokeh Visualization Template This template is a general outline for turning your data into a visualization using Bokeh. """ # Data handling import pandas as pd import numpy as np # Bokeh libraries from bokeh.io import output_file, output_notebook from bokeh.plotting import figure, show from bokeh.models import ColumnDataSource from bokeh.layouts import row, column, gridplot from bokeh.models.widgets import Tabs, Panel # Prepare the data # Determine where the visualization will be rendered output_file('filename.html') # Render to static HTML, or output_notebook() # Render inline in a Jupyter Notebook # Set up the figure(s) fig = figure() # Instantiate a figure() object # Connect to and draw the data # Organize the layout # Preview and save show(fig) # See what I made, and save if I like it
Some common code snippets that are found in each step are previewed above, and you’ll see how to fill out the rest as you move through the rest of the tutorial!
Generating Your First Figure
There are multiple ways to output your visualization in Bokeh. In this tutorial, you’ll see these two options:
output_file('filename.html')
will write the visualization to a static HTML file.output_notebook()
will render your visualization directly in a Jupyter Notebook.
It’s important to note that neither function will actually show you the visualization. That doesn’t happen until show()
is called. However, they will ensure that, when show()
is called, the visualization appears where you intend it to.
By calling both output_file()
and output_notebook()
in the same execution, the visualization will be rendered both to a static HTML file and inline in the notebook. However, if for whatever reason you run multiple output_file()
commands in the same execution, only the last one will be used for rendering.
This is a great opportunity to give you your first glimpse at a default Bokeh figure()
using output_file()
:
# Bokeh Libraries from bokeh.io import output_file from bokeh.plotting import figure, show # The figure will be rendered in a static HTML file called output_file_test.html output_file('output_file_test.html', title='Empty Bokeh Figure') # Set up a generic figure() object fig = figure() # See what it looks like show(fig)
As you can see, a new browser window opened with a tab called Empty Bokeh Figure and an empty figure. Not shown is the file generated with the name output_file_test.html in your current working directory.
If you were to run the same code snippet with output_notebook()
in place of output_file()
, assuming you have a Jupyter Notebook fired up and ready to go, you will get the following:
# Bokeh Libraries from bokeh.io import output_notebook from bokeh.plotting import figure, show # The figure will be right in my Jupyter Notebook output_notebook() # Set up a generic figure() object fig = figure() # See what it looks like show(fig)
As you can see, the result is the same, just rendered in a different location.
More information about both output_file()
and output_notebook()
can be found in the Bokeh official docs.
Note: Sometimes, when rendering multiple visualizations sequentially, you’ll see that past renders are not being cleared with each execution. If you experience this, import and run the following between executions:
# Import reset_output (only needed once) from bokeh.plotting import reset_output # Use reset_output() between subsequent show() calls, as needed reset_output()
Before moving on, you may have noticed that the default Bokeh figure comes pre-loaded with a toolbar. This is an important sneak preview into the interactive elements of Bokeh that come right out of the box. You’ll find out more about the toolbar and how to configure it in the Adding Interaction section at the end of this tutorial.
Getting Your Figure Ready for Data
Now that you know how to create and view a generic Bokeh figure either in a browser or Jupyter Notebook, it’s time to learn more about how to configure the figure()
object.
The figure()
object is not only the foundation of your data visualization but also the object that unlocks all of Bokeh’s available tools for visualizing data. The Bokeh figure is a subclass of the Bokeh Plot object, which provides many of the parameters that make it possible to configure the aesthetic elements of your figure.
To show you just a glimpse into the customization options available, let’s create the ugliest figure ever:
# Bokeh Libraries from bokeh.io import output_notebook from bokeh.plotting import figure, show # The figure will be rendered inline in my Jupyter Notebook output_notebook() # Example figure fig = figure(background_fill_color='gray', background_fill_alpha=0.5, border_fill_color='blue', border_fill_alpha=0.25, plot_height=300, plot_width=500, h_symmetry=True, x_axis_label='X Label', x_axis_type='datetime', x_axis_location='above', x_range=('2018-01-01', '2018-06-30'), y_axis_label='Y Label', y_axis_type='linear', y_axis_location='left', y_range=(0, 100), title='Example Figure', title_location='right', toolbar_location='below', tools='save') # See what it looks like show(fig)
Once the figure()
object is instantiated, you can still configure it after the fact. Let’s say you want to get rid of the gridlines:
# Remove the gridlines from the figure() object fig.grid.grid_line_color = None # See what it looks like show(fig)
The gridline properties are accessible via the figure’s grid
attribute. In this case, setting grid_line_color
to None
effectively removes the gridlines altogether. More details about figure attributes can be found below the fold in the Plot class documentation.
Note: If you’re working in a notebook or IDE with auto-complete functionality, this feature can definitely be your friend! With so many customizable elements, it can be very helpful in discovering the available options:
Otherwise, doing a quick web search, with the keyword bokeh and what you are trying to do, will generally point you in the right direction.
There is tons more I could touch on here, but don’t feel like you’re missing out. I’ll make sure to introduce different figure tweaks as the tutorial progresses. Here are some other helpful links on the topic:
- is the superclass of the
figure()
object, from which figures inherit a lot of their attributes. - documentation is a good place to find more detail about the arguments of the
figure()
object.
Here are a few specific customization options worth checking out:
- Text Properties covers all the attributes related to changing font styles, sizes, colors, and so forth.
- TickFormatters are built-in objects specifically for formatting your axes using Python-like string formatting syntax.
Sometimes, it isn’t clear how your figure needs to be customized until it actually has some data visualized in it, so next you’ll learn how to make that happen.
Drawing Data With Glyphs
An empty figure isn’t all that exciting, so let’s look at glyphs: the building blocks of Bokeh visualizations. A glyph is a vectorized graphical shape or marker that is used to represent your data, like a circle or square. More examples can be found in the Bokeh gallery. After you create your figure, you are given access to a bevy of configurable glyph methods.
Let’s start with a very basic example, drawing some points on an x-y coordinate grid:
# Bokeh Libraries from bokeh.io import output_file from bokeh.plotting import figure, show # My x-y coordinate data x = [1, 2, 1] y = [1, 1, 2] # Output the visualization directly in the notebook output_file('first_glyphs.html', title='First Glyphs') # Create a figure with no toolbar and axis ranges of [0,3] fig = figure(title='My Coordinates', plot_height=300, plot_width=300, x_range=(0, 3), y_range=(0, 3), toolbar_location=None) # Draw the coordinates as circles fig.circle(x=x, y=y, color='green', size=10, alpha=0.5) # Show plot show(fig)
Once your figure is instantiated, you can see how it can be used to draw the x-y coordinate data using customized circle
glyphs.
Here are a few categories of glyphs:
-
Marker includes shapes like circles, diamonds, squares, and triangles and is effective for creating visualizations like scatter and bubble charts.
-
Line covers things like single, step, and multi-line shapes that can be used to build line charts.
-
Bar/Rectangle shapes can be used to create traditional or stacked bar (
hbar
) and column (vbar
) charts as well as waterfall or gantt charts.
Information about the glyphs above, as well as others, can be found in Bokeh’s Reference Guide.
These glyphs can be combined as needed to fit your visualization needs. Let’s say I want to create a visualization that shows how many words I wrote per day to make this tutorial, with an overlaid trend line of the cumulative word count:
import numpy as np # Bokeh libraries from bokeh.io import output_notebook from bokeh.plotting import figure, show # My word count data day_num = np.linspace(1, 10, 10) daily_words = [450, 628, 488, 210, 287, 791, 508, 639, 397, 943] cumulative_words = np.cumsum(daily_words) # Output the visualization directly in the notebook output_notebook() # Create a figure with a datetime type x-axis fig = figure(title='My Tutorial Progress', plot_height=400, plot_width=700, x_axis_label='Day Number', y_axis_label='Words Written', x_minor_ticks=2, y_range=(0, 6000), toolbar_location=None) # The daily words will be represented as vertical bars (columns) fig.vbar(x=day_num, bottom=0, top=daily_words, color='blue', width=0.75, legend='Daily') # The cumulative sum will be a trend line fig.line(x=day_num, y=cumulative_words, color='gray', line_width=1, legend='Cumulative') # Put the legend in the upper left corner fig.legend.location = 'top_left' # Let's check it out show(fig)
To combine the columns and lines on the figure, they are simply created using the same figure()
object.
Additionally, you can see above how seamlessly a legend can be created by setting the legend
property for each glyph. The legend was then moved to the upper left corner of the plot by assigning 'top_left'
to fig.legend.location
.
You can check out much more info about styling legends. Teaser: they will show up again later in the tutorial when we start digging into interactive elements of the visualization.
A Quick Aside About Data
Anytime you are exploring a new visualization library, it’s a good idea to start with some data in a domain you are familiar with. The beauty of Bokeh is that nearly any idea you have should be possible. It’s just a matter of how you want to leverage the available tools to do so.
The remaining examples will use publicly available data from Kaggle, which has information about the National Basketball Association’s (NBA) 2017-18 season, specifically:
- 2017-18_playerBoxScore.csv: game-by-game snapshots of player statistics
- 2017-18_teamBoxScore.csv: game-by-game snapshots of team statistics
This data has nothing to do with what I do for work, but I love basketball and enjoy thinking about ways to visualize the ever-growing amount of data associated with it.
If you don’t have data to play with from school or work, think about something you’re interested in and try to find some data related to that. It will go a long way in making both the learning and the creative process faster and more enjoyable!
To follow along with the examples in the tutorial, you can download the datasets from the links above and read them into a Pandas DataFrame
using the following commands:
import pandas as pd # Read the csv files player_stats = pd.read_csv('2017-18_playerBoxScore.csv', parse_dates=['gmDate']) team_stats = pd.read_csv('2017-18_teamBoxScore.csv', parse_dates=['gmDate']) standings = pd.read_csv('2017-18_standings.csv', parse_dates=['stDate'])
This code snippet reads the data from the three CSV files and automatically interprets the date columns as datetime
objects.
It’s now time to get your hands on some real data.
Using the ColumnDataSource
Object
The examples above used Python lists and Numpy arrays to represent the data, and Bokeh is well equipped to handle these datatypes. However, when it comes to data in Python, you are most likely going to come across Python dictionaries and Pandas DataFrames, especially if you’re reading in data from a file or external data source.
Bokeh is well equipped to work with these more complex data structures and even has built-in functionality to handle them, namely the ColumnDataSource
.
You may be asking yourself, “Why use a ColumnDataSource
when Bokeh can interface with other data types directly?”
For one, whether you reference a list, array, dictionary, or DataFrame directly, Bokeh is going to turn it into a ColumnDataSource
behind the scenes anyway. More importantly, the ColumnDataSource
makes it much easier to implement Bokeh’s interactive affordances.
The ColumnDataSource
is foundational in passing the data to the glyphs you are using to visualize. Its primary functionality is to map names to the columns of your data. This makes it easier for you to reference elements of your data when building your visualization. It also makes it easier for Bokeh to do the same when building your visualization.
The ColumnDataSource
can interpret three types of data objects:
-
Python
dict
: The keys are names associated with the respective value sequences (lists, arrays, and so forth). -
Pandas
DataFrame
: The columns of theDataFrame
become the reference names for theColumnDataSource
. -
Pandas
groupby
: The columns of theColumnDataSource
reference the columns as seen by callinggroupby.describe()
.
Let’s start by visualizing the race for first place in the NBA’s Western Conference in 2017-18 between the defending champion Golden State Warriors and the challenger Houston Rockets. The daily win-loss records of these two teams is stored in a DataFrame named west_top_2
:
>>> west_top_2 = (standings[(standings['teamAbbr'] == 'HOU') | (standings['teamAbbr'] == 'GS')] ... .loc[:, ['stDate', 'teamAbbr', 'gameWon']] ... .sort_values(['teamAbbr','stDate'])) >>> west_top_2.head() stDate teamAbbr gameWon 9 2017-10-17 GS 0 39 2017-10-18 GS 0 69 2017-10-19 GS 0 99 2017-10-20 GS 1 129 2017-10-21 GS 1
From here, you can load this DataFrame
into two ColumnDataSource
objects and visualize the race:
# Bokeh libraries from bokeh.plotting import figure, show from bokeh.io import output_file from bokeh.models import ColumnDataSource # Output to file output_file('west-top-2-standings-race.html', title='Western Conference Top 2 Teams Wins Race') # Isolate the data for the Rockets and Warriors rockets_data = west_top_2[west_top_2['teamAbbr'] == 'HOU'] warriors_data = west_top_2[west_top_2['teamAbbr'] == 'GS'] # Create a ColumnDataSource object for each team rockets_cds = ColumnDataSource(rockets_data) warriors_cds = ColumnDataSource(warriors_data) # Create and configure the figure fig = figure(x_axis_type='datetime', plot_height=300, plot_width=600, title='Western Conference Top 2 Teams Wins Race, 2017-18', x_axis_label='Date', y_axis_label='Wins', toolbar_location=None) # Render the race as step lines fig.step('stDate', 'gameWon', color='#CE1141', legend='Rockets', source=rockets_cds) fig.step('stDate', 'gameWon', color='#006BB6', legend='Warriors', source=warriors_cds) # Move the legend to the upper left corner fig.legend.location = 'top_left' # Show the plot show(fig)
Notice how the respective ColumnDataSource
objects are referenced when creating the two lines. You simply pass the original column names as input parameters and specify which ColumnDataSource
to use via the source
property.
The visualization shows the tight race throughout the season, with the Warriors building a pretty big cushion around the middle of the season. However, a bit of a late-season slide allowed the Rockets to catch up and ultimately surpass the defending champs to finish the season as the Western Conference number-one seed.
Note: In Bokeh, you can specify colors either by name, hex value, or RGB color code.
For the visualization above, a color is being specified for the respective lines representing the two teams. Instead of using CSS color names like 'red'
for the Rockets and 'blue'
for the Warriors, you might have wanted to add a nice visual touch by using the official team colors in the form of hex color codes. Alternatively, you could have used tuples representing RGB color codes: (206, 17, 65)
for the Rockets, (0, 107, 182)
for the Warriors.
ColumnDataSource
objects can do more than just serve as an easy way to reference DataFrame
columns. The ColumnDataSource
object has three built-in filters that can be used to create views on your data using a CDSView
object:
GroupFilter
selects rows from aColumnDataSource
based on a categorical reference valueIndexFilter
filters theColumnDataSource
via a list of integer indicesBooleanFilter
allows you to use a list ofboolean
values, withTrue
rows being selected
In the previous example, two ColumnDataSource
objects were created, one each from a subset of the west_top_2
DataFrame. The next example will recreate the same output from one ColumnDataSource
based on all of west_top_2
using a GroupFilter
that creates a view on the data:
# Bokeh libraries from bokeh.plotting import figure, show from bokeh.io import output_file from bokeh.models import ColumnDataSource, CDSView, GroupFilter # Output to file output_file('west-top-2-standings-race.html', title='Western Conference Top 2 Teams Wins Race') # Create a ColumnDataSource west_cds = ColumnDataSource(west_top_2) # Create views for each team rockets_view = CDSView(source=west_cds, filters=[GroupFilter(column_name='teamAbbr', group='HOU')]) warriors_view = CDSView(source=west_cds, filters=[GroupFilter(column_name='teamAbbr', group='GS')]) # Create and configure the figure west_fig = figure(x_axis_type='datetime', plot_height=300, plot_width=600, title='Western Conference Top 2 Teams Wins Race, 2017-18', x_axis_label='Date', y_axis_label='Wins', toolbar_location=None) # Render the race as step lines west_fig.step('stDate', 'gameWon', source=west_cds, view=rockets_view, color='#CE1141', legend='Rockets') west_fig.step('stDate', 'gameWon', source=west_cds, view=warriors_view, color='#006BB6', legend='Warriors') # Move the legend to the upper left corner west_fig.legend.location = 'top_left' # Show the plot show(west_fig)
Notice how the GroupFilter
is passed to CDSView
in a list. This allows you to combine multiple filters together to isolate the data you need from the ColumnDataSource
as needed.
For information about integrating data sources, check out the Bokeh user guide’s post on the ColumnDataSource
and other source objects available.
The Western Conference ended up being an exciting race, but say you want to see if the Eastern Conference was just as tight. Not only that, but you’d like to view them in a single visualization. This is a perfect segue to the next topic: layouts.
Organizing Multiple Visualizations With Layouts
The Eastern Conference standings came down to two rivals in the Atlantic Division: the Boston Celtics and the Toronto Raptors. Before replicating the steps used to create west_top_2
, let’s try to put the ColumnDataSource
to the test one more time using what you learned above.
In this example, you’ll see how to feed an entire DataFrame into a ColumnDataSource
and create views to isolate the relevant data:
# Bokeh libraries from bokeh.plotting import figure, show from bokeh.io import output_file from bokeh.models import ColumnDataSource, CDSView, GroupFilter # Output to file output_file('east-top-2-standings-race.html', title='Eastern Conference Top 2 Teams Wins Race') # Create a ColumnDataSource standings_cds = ColumnDataSource(standings) # Create views for each team celtics_view = CDSView(source=standings_cds, filters=[GroupFilter(column_name='teamAbbr', group='BOS')]) raptors_view = CDSView(source=standings_cds, filters=[GroupFilter(column_name='teamAbbr', group='TOR')]) # Create and configure the figure east_fig = figure(x_axis_type='datetime', plot_height=300, plot_width=600, title='Eastern Conference Top 2 Teams Wins Race, 2017-18', x_axis_label='Date', y_axis_label='Wins', toolbar_location=None) # Render the race as step lines east_fig.step('stDate', 'gameWon', color='#007A33', legend='Celtics', source=standings_cds, view=celtics_view) east_fig.step('stDate', 'gameWon', color='#CE1141', legend='Raptors', source=standings_cds, view=raptors_view) # Move the legend to the upper left corner east_fig.legend.location = 'top_left' # Show the plot show(east_fig)
The ColumnDataSource
was able to isolate the relevant data within a 5,040-by-39 DataFrame
without breaking a sweat, saving a few lines of Pandas code in the process.
Looking at the visualization, you can see that the Eastern Conference race was no slouch. After the Celtics roared out of the gate, the Raptors clawed all the way back to overtake their division rival and finish the regular season with five more wins.
With our two