1. 程式人生 > >python:利用pandas進行繪圖(總結)繪圖格式

python:利用pandas進行繪圖(總結)繪圖格式

利用python進行資料分析

第八章:繪圖和視覺化

pandas繪圖工具

22.5 Plot Formatting(繪圖格式)

22.5.1 Controlling the Legend(圖例管理)

You may set the legend argument to False to hide the legend, which is shown by default.
可通過legend=False這個引數選擇不顯示圖例,預設顯示
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt


>>> df = pd.DataFrame(np.random.randn(1000, 4), index=range(1,1001), columns=list('ABCD'))
>>> df = df.cumsum()
>>> df.plot(legend=False)

22.5.2 Scales

You may pass logy to get a log-scale Y axis.
可通過logy=True引數使用對數標尺作為圖表的Y軸,Y軸顯示10的多少次冪
>>> ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000',periods=1000))


>>> ts = np.exp(ts.cumsum())
>>> ts.plot(logy=True)

22.5.3 Plotting on a Secondary Y-axis(繪製雙Y軸圖形)

To plot data on a secondary y-axis, use the secondary_y keyword:
secondary_y=True引數繪製雙Y軸圖形
>>> df.A.plot()
>>> df.B.plot(secondary_y=True, style='g')
雙Y軸圖形中,應存在兩個繪圖,其中A圖用左Y軸標註,B圖用右Y軸標註,二者共用一個X軸
To plot some columns in a DataFrame, give the column names to the secondary_y keyword:
對於DataFrame物件,可定義其中的那一列(columns)用右Y軸標註
>>> plt.figure()


>>> ax = df.plot(secondary_y=['A', 'B']) #定義column A B使用右Y軸
>>> ax.set_ylabel('CD scale') #設定左Y軸標籤為CD scale
>>> ax.right_ax.set_ylabel('AB scale') #設定右Y軸標籤為AB scale
DataFrame物件這個df存在4列,ABCD,並設定AB列使用右Y軸,並將df.plot()定義為另一個物件。
之後這個ax物件進行Y軸標籤定義,同時ax.right_ax表示設定為右Y軸
繪圖完成後會在圖例當中顯示哪個column是標註為右Y軸的
—— A(right)
—— B(right)
—— C
—— D
Note that the columns plotted on the secondary y-axis is automatically marked with “(right)” in the legend. To turn off the automatic marking, use the mark_right=False keyword:
如果想關閉這個標籤使用mark_right=False引數
>>> df.plot(secondary_y=['A', 'B'], mark_right=False)

22.5.4 Suppressing Tick Resolution Adjustment

對於X軸的時間標籤,pandas多數情況下不能判斷X軸的出現頻率,所以可以使用x-axis tick labelling(X軸加標籤的方法)來全部顯示X軸內容
Using the x_compat parameter, you can suppress this behavior:
設定引數就是x_compat=True
>>> ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000',periods=1000))
>>> df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list('ABCD'))
>>> df = df.cumsum()
>>> plt.figure()
>>> df.A.plot(x_compat=True)
If you have more than one plot that needs to be suppressed, the use method in pandas.plotting. plot_params can be used in a with statement:
>>> plt.figure()
>>> with pd.plotting.plot_params.use('x_compat', True): #該用法不是很懂
... ····df.A.plot(color='r')
... ····df.B.plot(color='g')
... ····df.C.plot(color='b')

22.5.6 Subplots(多圖繪製)

Each Series in a DataFrame can be plotted on a different axis with the subplots keyword:
DataFrame物件當中的每個column都能繪製出單獨的圖,需要加入subplots=True引數
>>> df.plot(subplots=True, figsize=(6, 6))

22.5.7 Using Layout and Targeting Multiple Axes

The layout of subplots can be specified by layout keyword. It can accept (rows, columns). The layout keyword can be used in hist and boxplot also. If input is invalid, ValueError will be raised.
layout引數可以將subplots排列成想要的行數和列數,可應用到柱狀圖和箱線圖,如果輸入無效則會報出ValueError錯誤
The number of axes which can be contained by rows x columns specified by layout must be larger than the number of required subplots. If layout can contain more axes than required, blank axes are not drawn. Similar to a numpy array’s reshape method, you can use -1 for one dimension to automatically calculate the number of rows or columns needed, given the other.
>>> df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False) #4個圖排列2行3列
>>> df.plot(subplots=True, layout=(2, 2), figsize=(6, 6), sharex=False) #4個圖排列2行2列
>>> df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False) #規定2行但不規定列數,根據DataFrame當中column的數量自行判定行列規則
更加複雜的圖,繪製16張圖,對角線是4個column及其反向圖
Also, you can pass multiple axes created beforehand as list-like via ax keyword. This allows to use more complicated layout. The passed axes must be the same number as the subplots being drawn.
When multiple axes are passed via ax keyword, layout, sharex and sharey keywords don’t affect to the output. You should explicitly pass sharex=False and sharey=False, otherwise you will see a warning.
>>> fig, axes = plt.subplots(4, 4, figsize=(6, 6))
>>> plt.subplots_adjust(wspace=0.5, hspace=0.5)
>>> target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]]
>>> target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]]
>>> df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False)
>>> (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False)
Another option is passing an ax argument to Series.plot() to plot on a particular axis:
將4個圖橫向縱向兩兩排列,並將圖例去掉改成圖片標題,4個圖用相同的顏色的線條表示
>>> fig, axes = plt.subplots(nrows=2, ncols=2)
>>> df['A'].plot(ax=axes[0,0]); axes[0,0].set_title('A')
>>> df['B'].plot(ax=axes[0,1]); axes[0,1].set_title('B')
>>> df['C'].plot(ax=axes[1,0]); axes[1,0].set_title('C')
>>> df['D'].plot(ax=axes[1,1]); axes[1,1].set_title('D')

22.5.8 Plotting With Error Bars(新增誤差棒)

Horizontal and vertical errorbars can be supplied to the xerr and yerr keyword arguments to plot(). The error values can be specified using a variety of formats.
水平或垂直誤差棒可以在plot()函式中通過xerr和yerr兩個引數進行新增,誤差值可以有一下存在形式
• As a DataFrame or dict of errors with column names matching the columns attribute of the plotting DataFrame or matching the name attribute of the Series; DataFrame物件可以使用字典字典的鍵與DataFrame的column的名稱相對應
• As a str indicating which of the columns of plotting DataFrame contain the error values; 字串可以表明DataFrame的哪一列包含誤差值
• As raw values (list, tuple, or np.ndarray). Must be the same length as the plotting DataFrame/Series; 作為初始值物件(list, tuple, or np.ndarray),其序列長度要和DataFrame列的長度相同
Asymmetrical error bars are also supported, however raw error values must be provided in this case. For a M length Series, a Mx2 array should be provided indicating lower and upper (or left and right) errors. For a MxN DataFrame, asymmetrical errors should be in a Mx2xN array.
# Generate the data
>>> ix3 = pd.MultiIndex.from_arrays([['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], ['foo', 'foo', 'bar', 'bar', 'foo', 'foo', 'bar', 'bar']], names=['letter', 'word'])
>>> df3 = pd.DataFrame({'data1': [3, 2, 4, 3, 2, 4, 3, 2], 'data2': [6, 5, 7, 5, 4, 5, 6, 5]}, index=ix3)
# Group by index labels and take the means and standard deviations for each group
>>> gp3 = df3.groupby(level=('letter', 'word'))
>>> means = gp3.mean()
>>> errors = gp3.std()
>>> means.plot.bar(yerr=errors, ax=ax)

22.5.9 Plotting Tables

Plotting with matplotlib table is now supported in DataFrame.plot() and Series.plot() with a table keyword. The table keyword can accept bool, DataFrame or Series. The simple way to draw a table is to specify table=True. Data will be transposed to meet matplotlib’s default layout.
如果想繪製matplotlib資料表格,需要加入table引數,table=True
>>> plt.show()
>>> fig, ax = plt.subplots(1, 1)
>>> df = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
>>> ax.get_xaxis().set_visible(False) # Hide Ticks隱藏X軸數值
>>> df.plot(table=True, ax=ax)
在表格下方顯示X軸各個位置column的數值
>>> fig, ax = plt.subplots(1, 1)
>>> ax.get_xaxis().set_visible(False)
>>> df.plot(table=np.round(df.T, 2), ax=ax)
在表格下方顯示X軸各個位置column的數值,但這個值是編輯一個二維陣列,並保留兩位小數
Finally, there is a helper function pandas.plotting.table to create a table from DataFrame and Series, and add it to an matplotlib.Axes. This function can accept keywords which matplotlib table has.
>>> from pandas.plotting import table
>>> fig, ax = plt.subplots(1, 1)
>>> table(ax, np.round(df.describe(), 2), loc='upper right', colWidths=[0.2, 0.2, 0.2])
>>> df.plot(ax=ax, ylim=(0, 2), legend=None)

22.5.10 Colormaps

DataFrame plotting supports the use of the colormap= argument, which accepts either a Matplotlib colormap or a string that is a name of a colormap registered with Matplotlib. A visualization of the default matplotlib colormaps is available here.
繪圖當中常常要使用多種繪圖顏色,所以使用colormap這個引數,colormap這個類當中的函式用法見http://matplotlib.org/api/cm_api.html,colormap所能夠用到的顏色見http://scipy.github.io/old-wiki/pages/Cookbook/Matplotlib/Show_colormaps
>>> df = pd.DataFrame(np.random.randn(1000, 10), index=range(1,1001))
>>> df = df.cumsum()
>>> plt.figure()
>>> df.plot(colormap='cubehelix')
或者df.plot(colormap=’gist_rainbow’) 或者 df.plot(colormap=’prism’) 如果df的column很多,建議選擇這個gist_rainbow來填充線條顏色

Colormaps can also be used other plot types, like bar charts #Colormap也可以使用其他的繪圖型別
>>> dd = pd.DataFrame(np.random.randn(10, 10)).applymap(abs)
>>> dd = dd.cumsum()
>>> plt.figure()
>>> dd.plot.bar(colormap='Greens')

Parallel coordinates charts #平行座標軸繪圖法
>>> from pandas.plotting import parallel_coordinates
>>> plt.figure()
>>> parallel_coordinates(data, 'Name', colormap='gist_rainbow')

Andrews curves charts #安德魯斯曲線
>>> from pandas.plotting import andrews_curves
>>> plt.figure()
>>> andrews_curves(data, 'Name', colormap='winter')

22.6 Plotting directly with matplotlib(繪製填充線)

>>> price = pd.Series(np.random.randn(150).cumsum(), index=pd.date_range('2000-1-1', periods=150, freq='B'))
>>> ma = price.rolling(20).mean()
>>> mstd = price.rolling(20).std()
>>> plt.figure()
>>> plt.plot(price.index, price, 'k')
>>> plt.plot(ma.index, ma, 'b')
>>> plt.fill_between(mstd.index, ma-2*mstd, ma+2*mstd, color='b', alpha=0.2)
對plt進行填充,藍線是ma值,藍色的條狀是ma-2*mstd和ma+2*mstd分佈兩側
22.7 Trellis plotting interface
具體繪圖參見https://github.com/mwaskom/seaborn(seaborn)和http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html(pandas的visualization網站章節)