New Plot Types in Seaborn’s Latest Release

阿新 • • 發佈：2018-12-29

scatterplot and lineplot examples

For this article, I will use a small data set showing the number of traffic fatalities by county in the state of Minnesota. I am only including the top 10 counties and added some additional data columns that I thought might be interesting and would showcase how seaborn supports rapid visualization of different relationships. The base data was taken from the

NHTSA web site and augmented with data from the MN State demographic center.

County	Twin_Cities	Pres_Election	Public_Transport(%)	Travel_Time	Population	2012	2013	2014	2015	2016
0	Hennepin	Yes	Clinton	7.2	23.2	1237604	33	42	34	33	45
1	Dakota	Yes	Clinton	3.3	24.0	418432	19	19	10	11	28
2	Anoka	Yes	Trump	3.4	28.2	348652	25	12	16	11	20
3	St. Louis	No	Clinton	2.4	19.5	199744	11	19	8	16	19
4	Ramsey	Yes	Clinton	6.4	23.6	540653	19	12	12	18	15
5	Washington	Yes	Clinton	2.3	25.8	253128	8	10	8	12	13
6	Olmsted	No	Clinton	5.2	17.5	153039	2	12	8	14	12
7	Cass	No	Trump	0.9	23.3	28895	6	5	6	4	10
8	Pine	No	Trump	0.8	30.3	28879	14	7	4	9	10
9	Becker	No	Trump	0.5	22.7	33766	4	3	3	1	9

Here’s a quick overview of the non-obvious columns:

Twin_Cities: The cities of Minneapolis and St. Paul are frequently combined and called the Twin Cities. As the largest metro area in the state, I thought it would be interesting to see if there were any differences across this category.
Pres_Election: Another categorical variable that shows which candidate won that county in the 2016 Presidential election.
Public_Transport(%): The percentage of the population that uses public transportation.
Travel_Time: The mean travel time to work for individuals in that county.
2012 - 2016: The number of traffic fatalities in that year.

If you want to play with the data yourself, it’s available in the repo along with the notebook.

Let’s get started with the imports and data loading:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

sns.set()
df = pd.read_csv("https://raw.githubusercontent.com/chris1610/pbpython/master/data/MN_Traffic_Fatalities.csv")

These are the basic imports we need. Of note is that recent versions of seaborn do not automatically set the style. That’s why I explicitly use sns.set() to turn on the seaborn styles. Finally, let’s read in the CSV file from github.

Before we get into using the relplot() we will show the basic usage of the scatterplot() and lineplot() and then explain how to use the more powerful relplot() to draw these types of plots across different rows and columns.

For the first simple example, let’s look at the relationship between the 2016 fatalities and the average Travel_Time . In addition, let’s identify the data based on the Pres_Election column.

sns.scatterplot(x='2016', y='Travel_Time', style='Pres_Election', data=df)

There are a couple things to note from this example:

By using a pandas dataframe, we can just pass in the column names to define the X and Y variables.
We can use the same column name approach to alter the marker style .
Seaborn takes care of picking a marker style and adding a legend.
This approach supports easily changing the views in order to explore the data.

If we’d like to look at the variation by county population:

sns.scatterplot(x='2016', y='Travel_Time', size='Population', data=df)

In this case, Seaborn buckets the population into 4 categories and adjusts the size of the circle based on that county’s population. A little later in the article, I will show how to adjust the size of the circles so they are larger.

Before we go any further, we need to create a new data frame that contains the data in tidy format. In the original data frame, there is a column for each year that contains the relevant traffic fatality value. Seaborn works much better if the data is structured with the Year and Fatalities in tidy format.

Panda’s handy melt function makes this transformation easy:

df_melted = pd.melt(df, id_vars=['County', 'Twin_Cities', 'Pres_Election',
                                 'Public_Transport(%)', 'Travel_Time', 'Population'],
                    value_vars=['2016', '2015', '2014', '2013', '2012'],
                    value_name='Fatalities',
                    var_name=['Year']
                   )

Here’s what the data looks like for Hennepin County:

County	Twin_Cities	Pres_Election	Public_Transport(%)	Travel_Time	Population	Year	Fatalities
0	Hennepin	Yes	Clinton	7.2	23.2	1237604	2016	45
10	Hennepin	Yes	Clinton	7.2	23.2	1237604	2015	33
20	Hennepin	Yes	Clinton	7.2	23.2	1237604	2014	34
30	Hennepin	Yes	Clinton	7.2	23.2	1237604	2013	42
40	Hennepin	Yes	Clinton	7.2	23.2	1237604	2012	33

If this is a little confusing, here is an illustration of what happened:

Now that we have the data in tidy format, we can see what the trend of fatalities looks like over time using the new lineplot() function:

sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities')

This illustration introduces the hue keyword which changes the color of the line based on the value in the Twin_Cities column. This plot also shows the statistical background inherent in Seaborn plots. The shaded areas are confidence intervals which basically show the range in which our true value lies. Due to the small number of samples, this interval is large.

New Plot Types in Seaborn’s Latest Release

scatterplot and lineplot examples For this article, I will use a small data set showing the number of traffic fatalities by county in the state of Minneso

Using TypeScript’s singleton types in practice

Part two: The practical gotchas“To know a thing well, know its limits; Only when pushed beyond its tolerance will its true nature be seen.”― Frank Herbert,

MIT's latest A.I. is freakishly good at determining what's going on in videos

Just a few frames of information telling a story are all we need to understand what is going on. This is, after all, the basis for comic books -- which pro

New Evidence of Hacked Supermicro Hardware Found in U.S. Telecom

New Evidence of Hacked Supermicro Hardware Found in U.S. TelecomThe discovery shows that China continues to sabotage critical technology components bound f

Move over Rover: There's a new sniffing powerhouse in the neighborhood: Researchers study animals' unique sense of smell to deve

"We turned to animals to understand what nature has already figured out," said Thomas Spencer, a doctoral candidate in David Hu's lab at Georgia Tech. "We

TNonblockingServer.h:76:23: 錯誤：一個宣告指定了多個型別英文提示 multiple types in one declaration

錯誤資訊： /usr/local/include/thrift/server/TNonblockingServer.h:76:23: 錯誤：一個宣告指定了多個型別 typedef THRIFT_SOCKET evutil_socket_t 其中的相關程式碼如

解決“Default Boot2Docker ISO is out-of-date, downloading the latest release...”

load tor doc crt iss info The efault ref 看GitHub大概是作者忘記給ISO加上頭了，根據裏面一個解決方案，做成博客，首先下載HexEditor,然後打開ISO，crtl+g去到offset為8032行的位置，，粘貼-v18.09.

To add new library path in ubuntu 10.10

原文地址：https://www.linux.com/blog/add-new-library-path-ubuntu-1010 To add new library path in ubuntu 10.10 To add new library path, create new file

Java| String s=new String("abc")和Stirng s = "abc"的區別

本文希望能弄懂的問題: String s=new String(“abc”)和String s = "abc"的區別? 思考: 在Java中,我們是如何建立一個類的例項的? 在我們常用的建立一個類的例項（物件）的方法有以下兩種: 一、使用new建立物件。二、

R magento seo esearchers Identify Two New Ancient Mammals in Bolivia

Artist’s impression of Theosodon arozquetai and Llullataruca shockeyi. Researchers at Case Western Reserve University and two othe

Docker安裝問題3 No default Boot2Docker ISO found locally, downloading the latest release（然後下載失敗!）

問題背景 (default) Image cache directory does not exist, creating it at C:\Users\libin\.d ocker\machine\cache... (default) No default

Facebook’s latest account breach: see it as a reminder to update your security

IMAGE: PDPics — PixabayFacebook’s latest account breach: see it as a reminder to update your securityFacebook’s announcement yesterday of yet another secur

task in SA's insurance sector | AITopics

Robotic process automation (RPA) is still relatively new to South Africa, with mainly the major banks moving to deploy it to manage certain repetitive and

Can Community Banks and Credit Unions Survive in Today's Digital World?

Over the past couple years, the Digital Banking Report has done several research reports on the digital transformation of the banking industry. While the l

Honda to Invest $2.75 Billion in GM's Self

Honda will work with GM Cruise LLC to develop a purpose-built driverless car from the ground up that can be manufactured in high volumes and deployed globa

Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities

Influenza virus strikes communities in northern latitudes during winter, straining health care provision almost to the breaking point. Change in environmen

platform apps using Neutralinojs latest release

Developing cross-platform apps using Neutralinojs latest releaseNeutralinojs is a framework for developing web apps with native calls. Usually Neutralino a

Mystery of Saturn's moon Titan's atmospheric haze: A team of scientists homes in on a 'missing link' in Titan's one

Now, a research collaboration involving scientists in the Chemical Sciences Division at the Department of Energy's Lawrence Berkeley National Laboratory (

The Missing Element in China’s Start-up Scene

As a result, we see hundreds of tech start-ups emerging in the past few years . The start-up scene has also drawn some serious international attention. Zho

Universal Basic Income Is Silicon Valley’s Latest Scam

Uber’s business plan, like that of so many other digital unicorns, is based on extracting all the value from the markets it enters. This ultimately means s

New Plot Types in Seaborn’s Latest Release

scatterplot and lineplot examples

相關推薦