python進行資料分析-----pandas入門之層次化索引
阿新 • • 發佈:2018-12-15
目錄
層次化索引
層次化索引是pandas的一項重要功能,它使你在一個軸上擁有多個索引級別,可以是你以低維度的形式處理高維度的資料。
levels是索引集合和它的空間結構
labels是索引在levels中索引的集合
> from pandas import DataFrame,Series Backend TkAgg is interactive backend. Turning interactive mode on. >>> import pandas as pd >>> import numpy as np >>> data = Series(np.random.randn(10),index=[['a','a','a','b','b','b','c','c','d','d'],[1,2,3,1,2,3,1,2,2,3]]) >>> data a 1 -0.070153 2 0.017225 3 0.905866 b 1 -0.156584 2 0.213097 3 0.263765 c 1 -0.141315 2 1.175804 d 2 0.812828 3 -0.820116 dtype: float64 >>> data.index MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]], labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])
對於層次化索引,選取資料子集操作很簡單,也可以通過索引在內層進行選取
>>> data['b'] 1 -0.156584 2 0.213097 3 0.263765 dtype: float64 >>> data['b':'c'] b 1 -0.156584 2 0.213097 3 0.263765 c 1 -0.141315 2 1.175804 dtype: float64 >>> data.ix[['b','d']] b 1 -0.156584 2 0.213097 3 0.263765 d 2 0.812828 3 -0.820116 dtype: float64 >>> data[:,2] a 0.017225 b 0.213097 c 1.175804 d 0.812828 dtype: float64
資料可以通過unstack方法被安排到新的DataFrame中。也可過逆運算變回。
>>> data.unstack() 1 2 3 a -0.070153 0.017225 0.905866 b -0.156584 0.213097 0.263765 c -0.141315 1.175804 NaN d NaN 0.812828 -0.820116 >>> data.unstack().stack() a 1 -0.070153 2 0.017225 3 0.905866 b 1 -0.156584 2 0.213097 3 0.263765 c 1 -0.141315 2 1.175804 d 2 0.812828 3 -0.820116 dtype: float64
每層索引可以設定名字
>>> frame = DataFrame(np.arange(12).reshape((4,3)),index=[['a','a','b','b'],[1,2,1,2]],columns=[['Onio','Onio','Colorado'],['Green','Red','Green']])
>>> frame
Onio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
>>> frame.stack()
Colorado Onio
a 1 Green 2.0 0
Red NaN 1
2 Green 5.0 3
Red NaN 4
b 1 Green 8.0 6
Red NaN 7
2 Green 11.0 9
Red NaN 10
>>> frame.index.names=['key1','key2']
>>> frame.columns.names = ['state','color']
>>> frame
state Onio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
>>> frame['Onio']
Green Red
a 1 0 1
2 3 4
b 1 6 7
2 9 10
>>> frame.swaplevel('key1','key2')
state Onio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
2 a 3 4 5
1 b 6 7 8
2 b 9 10 11
根據級別彙總統計
>>> frame.sum(level='key2')
state Onio Colorado
color Green Red Green
key2
1 6 8 10
2 12 14 16
>>> frame.sum(level='color',axis=1)
color Green Red
key1 key2
a 1 2 1
2 8 4
b 1 14 7
2 20 10
使用DataFrame的列
DataFrame的set_index函式會將一個或多個列轉換為行索引,並建立一個新的DataFrame
預設情況下,那些列會從DataFrame中移除,但也可以將其保留下來。
>>> frame = DataFrame({'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two','two','two'],'d':[0,1,2,0,1,2,3]})
>>> frame
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
>>> frame2= frame.set_index(['c','d'])
>>> frame2
a b
c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
>>> frame.set_index(['c','d'],drop=False)
a b c d
c d
one 0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
two 0 3 4 two 0
1 4 3 two 1
2 5 2 two 2
3 6 1 two 3
>>> frame2.reset_index()
c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1