對於以Python作為技術棧的資料科學工作者，Jupyter是不得不提的資料報告工具。可能對於R社群而言，鼎鼎大名的ggplot2是常見的視覺化框架，而大家對於Python，以及Jupyter為核心的互動式報告的可個視化方案就並沒有那麼熟悉。本文試圖比較幾個常用的解決方案，方便大家選擇。

選擇標準

稱述式還是命令式

資料工作者使用的圖的類別，常見的就三類：GIS視覺化、網路視覺化和統計圖。因此，大多數場景下，我們並不想接觸非常底層的基於點、線、面的命令，所以，選擇一個好的封裝的框架相當重要。

當然，公認較好的封裝是基於《The Grammar of Graphics (Statistics and Computing)》

一書，R中的ggplot2基本上就是一個很好的實現。我們基本上可以像用「自然語言」（Natural Language）一樣使用這些繪圖命令。我們姑且採用電腦科學領域的「陳述式」來表達這種繪圖方式。

相反，有時候，以下情形時，我們可能對於這種繪圖命令可能並不在意：

出圖相當簡單，要求繪製速度，一般大的框架較重（當然只是相對而言）；
想要對細節做非常詳盡的微調，一般大框架在微調方面會相對複雜或者退縮成一句句命令；
是統計作圖視覺化的創新者，想要嘗試做出新的視覺化實踐。

這些情況下，顯然，簡單操作式並提供底層繪製命令的框架更讓人愉快，與上面類似，我們借用「命令式」描述這類框架。

是否互動

與傳統的交付靜態圖示不同，基於Web端的Jupter

的一大好處就是可以繪製互動的圖示（最近的RNotebook也有實現），因此，是否選擇互動式，也是一個需要權衡的地方。

互動圖的優勢：

可以提供更多的資料維度和資訊；
使用者端可以做更多諸如放大、選取、轉存的操作；
可以交付BI工程師相應的JavaScript程式碼用以工程化；
效果上比較炫酷，考慮到報告接受者的特徵可以選擇。

非互動圖的優勢：

報告檔案直接匯出成靜態檔案時相對問題，不會因為轉換而損失資訊；
圖片可以與報告分離，必要時作為其他工作的成果；
不需要在執行Notebook時花很多世界載入各類前端框架。

是非核心互動

Jupyter上大多數命令通過以下方式獲取資料，而大多數繪圖方式事實上只是通過Notebook內的程式碼在Notebook與核心互動後展示出輸出結果。但ipywidgets

框架則可以實現Code Cell中的程式碼與Notebook中的前端控制元件（比如按鈕等）繫結來進行操作核心，提供不同的繪圖結果，甚至某些繪圖框架的每個元素都可以直接和核心進行互動。

用這些框架，可以搭建更復雜的Notebook的視覺化應用，但缺點是因為基於核心，所以在呈遞、展示報告時如果使用離線檔案時，這些互動就會無效。

框架羅列

matplotlib

最家喻戶曉的繪圖框架是matplotlib，它提供了幾乎所有python內靜態繪圖框架的底層命令。如果按照上面對視覺化框架的分法，matplotlib屬於非互動式的的「命令式」作圖框架。

Python

## matplotlib程式碼示例
from pylab import *

X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
C,S = np.cos(X), np.sin(X)

plot(X,C)
plot(X,S)

show()

12345678910

## matplotlib程式碼示例frompylab import*X=np.linspace(-np.pi,np.pi,256,endpoint=True)C,S=np.cos(X),np.sin(X)plot(X,C)plot(X,S)show()

優點是相對較快，底層操作較多。缺點是語言繁瑣，內建預設風格不夠美觀。

matplotlib在jupyter中需要一些配置，可以展現更好的效果，詳情參見這篇文章.

ggplot和plotnine

值得一說，對於R遷移過來的人來說，ggplot和plotnine簡直是福音，基本克隆了ggplot2所有語法。橫向比較的話，plotnine的效果更好。這兩個繪圖包的底層依舊是matplotlib，因此，在引用時別忘了使用%matplotlib inline語句。值得一說的是plotnine也移植了ggplot2中良好的配置語法和邏輯。

Python

## plotnine示例
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear'))

12345

## plotnine示例(ggplot(mtcars,aes('wt','mpg',color='factor(gear)'))+geom_point()+stat_smooth(method='lm')+facet_wrap('~gear'))

Seaborn

seaborn準確上說屬於matplotlib的擴充套件包，在其上做了許多非常有用的封裝，基本上可以滿足大部分統計作圖的需求，以matplotlib+seaborn基本可以滿足大部分業務場景，語法也更加「陳述式」。

缺點是封裝較高，基本上API不提供的圖就完全不可繪製，對於各類圖的拼合也不適合；此外配置語句語法又迴歸「命令式」，相對複雜且不一致。

Python

## seaborn示例
import seaborn as sns; sns.set(color_codes=True)
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)

12345

## seaborn示例importseaborn assns;sns.set(color_codes=True)iris=sns.load_dataset("iris")species=iris.pop("species")g=sns.clustermap(iris)

plotly

plotly是跨平臺JavaScript互動式繪圖包，由於開發者的核心是javascript，所以整個語法類似於寫json配置，語法特質也介於「陳述式」和「命令式」之間，無服務版本是免費的。

有點是學習成本不高，可以很快將語句移植到javascript版本；缺點是語言相對繁瑣。

Python

##plotly示例
import plotly.plotly as py
import plotly.graph_objs as go

# Add data
month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
         'August', 'September', 'October', 'November', 'December']
high_2000 = [32.5, 37.6, 49.9, 53.0, 69.1, 75.4, 76.5, 76.6, 70.7, 60.6, 45.1, 29.3]
low_2000 = [13.8, 22.3, 32.5, 37.2, 49.9, 56.1, 57.7, 58.3, 51.2, 42.8, 31.6, 15.9]
high_2007 = [36.5, 26.6, 43.6, 52.3, 71.5, 81.4, 80.5, 82.2, 76.0, 67.3, 46.1, 35.0]
low_2007 = [23.6, 14.0, 27.0, 36.8, 47.6, 57.7, 58.9, 61.2, 53.3, 48.5, 31.0, 23.6]
high_2014 = [28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9]
low_2014 = [12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1]

# Create and style traces
trace0 = go.Scatter(
    x = month,
    y = high_2014,
    name = 'High 2014',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4)
)
trace1 = go.Scatter(
    x = month,
    y = low_2014,
    name = 'Low 2014',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,)
)
trace2 = go.Scatter(
    x = month,
    y = high_2007,
    name = 'High 2007',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dash') # dash options include 'dash', 'dot', and 'dashdot'
)
trace3 = go.Scatter(
    x = month,
    y = low_2007,
    name = 'Low 2007',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,
        dash = 'dash')
)
trace4 = go.Scatter(
    x = month,
    y = high_2000,
    name = 'High 2000',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4,
        dash = 'dot')
)
trace5 = go.Scatter(
    x = month,
    y = low_2000,
    name = 'Low 2000',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4,
        dash = 'dot')
)
data = [trace0, trace1, trace2, trace3, trace4, trace5]

# Edit the layout
layout = dict(title = 'Average High and Low Temperatures in New York',
              xaxis = dict(title = 'Month'),
              yaxis = dict(title = 'Temperature (degrees F)'),
              )

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-line')

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677

##plotly示例importplotly.plotly aspyimportplotly.graph_objs asgo# Add datamonth=['January','February','March','April','May','June','July','August','September','October','November','December']high_2000=[32.5,37.6,49.9,53.0,69.1,75.4,76.5,76.6,70.7,60.6,45.1,29.3]low_2000=[13.8,22.3,32.5,37.2,49.9,56.1,57.7,58.3,51.2,42.8,31.6,15.9]high_2007=[36.5,26.6,43.6,52.3,71.5,81.4,80.5,82.2,76.0,67.3,46.1,35.0]low_2007=[23.6,14.0,27.0,36.8,47.6,57.7,58.9,61.2,53.3,48.5,31.0,23.6]high_2014=[28.8,28.5,37.0,56.8,69.7,79.7,78.5,77.8,74.1,62.6,45.3,39.9]low_2014=[12.7,14.3,18.6,35.5,49.9,58.0,60.0,58.6,51.7,45.2,32.2,29.1]# Create and style tracestrace0=go.Scatter(x=month,y=high_2014,name='High 2014',line=dict(color=('rgb(205, 12, 24)'),width=4))trace1=go.Scatter(x=month,y=low_2014,name='Low 2014',line=dict(color=('rgb(22, 96, 167)'),width=4,))trace2=go.Scatter(x=month,y=high_2007,name='High 2007',line=dict(color=('rgb(205, 12, 24)'),width=4,dash='dash')# dash options include 'dash', 'dot', and 'dashdot')trace3=go.Scatter(x=month,y=low_2007,name='Low 2007',line=dict(color=('rgb(22, 96, 167)'),width=4,dash='dash'))trace4=go.Scatter(x=month,y=high_2000,name='High 2000',line=dict(color=('rgb(205, 12, 24)'),width=4,dash='dot'))trace5=go.Scatter(x=month,y=low_2000,name='Low 2000',line=dict(color=('rgb(22, 96, 167)'),width=4,dash='dot'))data=[trace0,trace1,trace2,trace3,trace4,trace5]# Edit the layoutlayout=dict(title='Average High and Low Temperatures in New York',xaxis=dict(title='Month'),yaxis=dict(title='Temperature (degrees F)'),)fig=dict(data=data,layout=layout)py.iplot(fig,filename='styled-line')

注意：此框架在jupyter中使用需要使用init_notebook_mode()載入JavaScript框架。

bokeh

bokeh是pydata維護的比較具有潛力的開源互動視覺化框架。

值得一說的是，該框架同時提供底層語句和「陳述式」繪圖命令。相對來說語法也比較清楚，但其配置語句依舊有很多視覺化框架的問題，就是與「陳述式」命令不符，沒有合理的結構。此外，一些常見的互動效果都是以底層命令的方式使用的，因此如果要快速實現Dashboard或者作圖時就顯得較為不便了。

Python

## Bokeh示例
import numpy as np
import scipy.special

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file

p1 = figure(title="Normal Distribution (μ=0, σ=0.5)",tools="save",
            background_fill_color="#E8DDCB")

mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2

p1.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p1.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p1.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p1.legend.location = "center_right"
p1.legend.background_fill_color = "darkgrey"
p1.xaxis.axis_label = 'x'
p1.yaxis.axis_label = 'Pr(x)'



p2 = figure(title="Log Normal Distribution (μ=0, σ=0.5)", tools="save",
            background_fill_color="#E8DDCB")

mu, sigma = 0, 0.5

measured = np.random.lognormal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8.0, 1000)
pdf = 1/(x* sigma * np.sqrt(2*np.pi)) * np.exp(-(np.log(x)-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((np.log(x)-mu)/(np.sqrt(2)*sigma)))/2

p2.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p2.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p2.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p2.legend.location = "center_right"
p2.legend.background_fill_color = "darkgrey"
p2.xaxis.axis_label = 'x'
p2.yaxis.axis_label = 'Pr(x)'



p3 = figure(title="Gamma Distribution (k=1, θ=2)", tools="save",
            background_fill_color="#E8DDCB")

k, theta = 1.0, 2.0

measured = np.random.gamma(k, theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 20.0, 1000)
pdf = x**(k-1) * np.exp(-x/theta) / (theta**k * scipy.special.gamma(k))
cdf = scipy.special.gammainc(k, x/theta) / scipy.special.gamma(k)

p3.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p3.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p3.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p3.legend.location = "center_right"
p3.legend.background_fill_color = "darkgrey"
p3.xaxis.axis_label = 'x'
p3.yaxis.axis_label = 'Pr(x)'



p4 = figure(title="Weibull Distribution (λ=1, k=1.25)", tools="save",
            background_fill_color="#E8DDCB")

lam, k = 1, 1.25

measured = lam*(-np.log(np.random.uniform(0, 1, 1000)))**(1/k)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8, 1000)
pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k)
cdf = 1 - np.exp(-(x/lam)**k)

p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color="#036564", line_color="#033649")
p4.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p4.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p4.legend.location = "center_right"
p4.legend.background_fill_color = "darkgrey"
p4.xaxis.axis_label = 'x'
p4.yaxis.axis_label = 'Pr(x)'



output_file('histogram.html', title="histogram.py example")

show(gridplot(p1,p2,p3,p4, ncols=2, plot_width=400, plot_height=400, toolbar_location=None))

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697

Jupyter 常見視覺化框架選擇

選擇標準

稱述式還是命令式

是否互動

是非核心互動

框架羅列

matplotlib

ggplot和plotnine

Seaborn

plotly

bokeh

Jupyter 常見視覺化框架選擇

「開源」TensorSpace.js -- 神經網路3D視覺化框架，在瀏覽器端構建可互動模型

Ebay開源基於大資料的視覺化框架：Pulsar Reporting

幾種資料視覺化框架分析

地圖視覺化框架新進展（1）

ubuntu搭建caffe視覺化框架digits

【 D3.js 入門系列 --- 9 】常見視覺化圖形

【16-20】視覺化：jupyter notebook做視覺化

TensorSpace：一套用於構建神經網路3D視覺化應用的框架

搜尋框架搭建1：elasticsearch安裝和視覺化工具kibana、分詞外掛jieba安裝

matlab-視覺化影象閾值選擇GUI工具

使用JFileChooser視覺化選擇檔案

java選擇排序視覺化

深度學習框架tensorflow學習與應用9（tensorboard視覺化）

深度學習框架Caffe-權值視覺化[重啟]

應用Tableau、Vertica的視覺化大資料分析框架

SVM分類器的實現（包括交叉驗證選擇引數，Dlib，視覺化）

pytorch 其他深度框架使用tensorflow的tensorboard 視覺化

TensorFlow學習筆記（11）--【Ubuntu】slim框架下的inception_v4模型的執行、視覺化、匯出和使用

深度學習框架Caffe學習筆記(4)-MNIST資料集轉換成視覺化圖片

Jupyter 常見視覺化框架選擇

選擇標準

稱述式還是命令式

是否互動

是非核心互動

框架羅列

matplotlib

ggplot和plotnine

Seaborn

plotly

bokeh

相關推薦