Awesome seaborn for Data Visualisation — Part 1

阿新 • • 發佈：2018-12-28

Awesome seaborn for Data Visualisation — Part 1

Data visualization is one of the most important parts of in Data Science. It is often said that a picture is worth a thousand words, and this is no different in Data Science. Often times, communicating relationships between variables in Data or other findings can be made easier with Data Visualizations. Many executives do not have time to read long reports or technical writings about machine learning projects. They need visualizations that can tell them all they need to know about data at one glance. This is where data visualization comes in. There are several Data Visualization tools that can help you turn data into insightful pictures. One such is the

Seaborn visualization package.

What is Seaborn?

Seaborn is a data visualization package for python that is based on the Matplotlib. Seaborn allows Data Scientists to make high-level and interactive visualizations that are better than anything Matplotlib offers. Users can also combine several variables and make visualizations more beautiful than regular Matplotlib.

In this tutorial, we will look at everything about getting started with Seaborn for data visualization. We will first look at how to plot single categorical or numerical variables with Seaborn. Thereafter, we will look at two variables on a single plot, then finally multivariate and 3D plots.

In performing this tutorial, we will be using two datasets:

A dataset containing various information about automobile vehicles
A dataset containing information about police killings in the United States of America

We will explore the relationship between various variables in both these datasets, all with the aid of the Seaborn package.

The first thing to do is to import the following libraries

Pandas for data manipulation
Matplotlib for basic visualization
Seaborn for improved and interactive visualization

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline

The first dataset is a set of vehicle information. The second data set is information about police killings in the United States

#Import the data sets vehicles = pd.read_csv('vehicles.csv') police_killings = pd.read_csv('PoliceKillingsUS.csv', encoding = 'Windows-1252')

#Check the general info of the dataset police_killings.info() vehicles.info()

Below is the execution result:

<class 'pandas.core.frame.DataFrame'>RangeIndex: 2535 entries, 0 to 2534Data columns (total 13 columns):id                         2535 non-null int64name                       2535 non-null objectmanner_of_death            2535 non-null objectarmed                      2526 non-null objectage                        2458 non-null float64gender                     2535 non-null objectrace                       2340 non-null objectcity                       2535 non-null objectstate                      2535 non-null objectsigns_of_mental_illness    2535 non-null boolthreat_level               2535 non-null objectflee                       2470 non-null objectbody_camera                2535 non-null booldtypes: bool(2), float64(1), int64(1), object(9)memory usage: 222.9+ KB<class 'pandas.core.frame.DataFrame'>RangeIndex: 37843 entries, 0 to 37842Data columns (total 16 columns):barrels08         37843 non-null float64co2TailpipeGpm    37843 non-null float64cylinders         37720 non-null float64drive             36654 non-null objecteng_dscr          22440 non-null objectfuelCost08        37843 non-null int64fuelType          37843 non-null objectfuelType1         37843 non-null objectmake              37843 non-null objectmodel             37843 non-null objectmpgData           37843 non-null objectphevBlended       37843 non-null booltrany             37832 non-null objectVClass            37843 non-null objectyear              37843 non-null int64youSaveSpend      37843 non-null int64dtypes: bool(1), float64(3), int64(3), object(9)memory usage: 4.4+ MB

#get a glimpse of both datasetspolice_killings.head(3)vehicles.head(3)

Univariate Visualization

Univariate visualizations are used to describe plots that have single variables on them. These variables can either be categorical or numerical on Seaborn.

For Numerical Variables:

Seaborn allows us plot distribution plots with added features that beat any other visualization library. We can add Kernel Density Estimates Plots (KDE) and a rug of the actual values of the variables. In the example below, we look at the distribution of victims’ age in the police shooting dataset with a kernel density plot and a rug of the variable

'''visualize the dsitribution of victims age in the police shooting dataset with a kernel density plot and a rug of the variables'''

sns.distplot(police_killings['age'].dropna(), kde=True, rug=True)

We can also decide to plot only the KDE as shown below:

sns.distplot(vehicles['fuelCost08'].dropna(), hist = False)

We can also plot boxplots as seen below. The plot is a boxplot of barrels in the vehicles dataset

sns.boxplot(vehicles['barrels08'].dropna())

For Categorical Variables:

For categorical variables, we can use countplots in Seaborn to visualize the frequency of each category of a variable as shown below:

Here, we view a count of the type of wheel drives:

sns.countplot(y="drive", data=vehicles)

Here, we view a count of the manner of death in the police killings:

sns.countplot(x='manner_of_death', data=police_killings)

I will continue the post explaining more visualization techniques.

I have shared the iPython code here

See you next post ?

Awesome seaborn for Data Visualisation — Part 1

Awesome seaborn for Data Visualisation — Part 1

Awesome seaborn for Data Visualisation — Part 1

SAS Programming for R Users, Part 1 針對R使用者的SAS程式設計，第1部分 Lynda課程中文字幕

Advanced SAS Programming for R Users, Part 1 針對R使用者的高階SAS程式設計，第1部分 Lynda課程中文字幕

Scala for Data Science Engineering — Part 1

1.2 Why Python for Data Analysis（為什麼使用Python做資料分析）

Modern Data Lake with Minio : Part 1

Understanding Feature Engineering (Part 1) — Continuous Numeric Data

Work + Grad School: A Data Scientist’s Survival Guide: Part 1

Datalore 1.0: Intelligent Web Application for Data Analysis

Scala for Data Science Engineering — Part 2

New social infrastructure for an algorithmic world: Coping not Coding, Part 1

How to create role based accounts for your Saas App using FEAN? (Part 1)

Part 1: Build an Action to make Google Assistant find the food you are craving for.

Selecting Subsets of Data in Pandas: Part 1

Understanding Data and Machine Learning Models with Visualizations (Part 1)

Creating visualizations to better understand your data and models (Part 1)

Instant, Full-Text Search Engine for Dropbox (Part 1) | Dropbox Tech Blog

Java Card Technology for Smart Card's Architecture and Programmer's Guide (Zhiqun Chen)翻譯版(PART 1)

Store, Protect, Optimize Your Healthcare Data with AWS: Part 1

Lesson 2 Building your first web page: Part 1

Awesome seaborn for Data Visualisation — Part 1

Awesome seaborn for Data Visualisation — Part 1

相關推薦