1. 程式人生 > >python進行資料分析-----pandas入門之層次化索引

python進行資料分析-----pandas入門之層次化索引

目錄

層次化索引

層次化索引

層次化索引是pandas的一項重要功能,它使你在一個軸上擁有多個索引級別,可以是你以低維度的形式處理高維度的資料。

levels是索引集合和它的空間結構

labels是索引在levels中索引的集合

> from pandas import DataFrame,Series
Backend TkAgg is interactive backend. Turning interactive mode on.
>>> import pandas as pd
>>> import numpy as np
>>> data = Series(np.random.randn(10),index=[['a','a','a','b','b','b','c','c','d','d'],[1,2,3,1,2,3,1,2,2,3]])
>>> data
a  1   -0.070153
   2    0.017225
   3    0.905866
b  1   -0.156584
   2    0.213097
   3    0.263765
c  1   -0.141315
   2    1.175804
d  2    0.812828
   3   -0.820116
dtype: float64
>>> data.index
MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])

對於層次化索引,選取資料子集操作很簡單,也可以通過索引在內層進行選取

>>> data['b']
1   -0.156584
2    0.213097
3    0.263765
dtype: float64
>>> data['b':'c']
b  1   -0.156584
   2    0.213097
   3    0.263765
c  1   -0.141315
   2    1.175804
dtype: float64
>>> data.ix[['b','d']]
b  1   -0.156584
   2    0.213097
   3    0.263765
d  2    0.812828
   3   -0.820116
dtype: float64
>>> data[:,2]
a    0.017225
b    0.213097
c    1.175804
d    0.812828
dtype: float64

資料可以通過unstack方法被安排到新的DataFrame中。也可過逆運算變回。

>>> data.unstack()
          1         2         3
a -0.070153  0.017225  0.905866
b -0.156584  0.213097  0.263765
c -0.141315  1.175804       NaN
d       NaN  0.812828 -0.820116
>>> data.unstack().stack()
a  1   -0.070153
   2    0.017225
   3    0.905866
b  1   -0.156584
   2    0.213097
   3    0.263765
c  1   -0.141315
   2    1.175804
d  2    0.812828
   3   -0.820116
dtype: float64

每層索引可以設定名字

>>> frame = DataFrame(np.arange(12).reshape((4,3)),index=[['a','a','b','b'],[1,2,1,2]],columns=[['Onio','Onio','Colorado'],['Green','Red','Green']])
>>> frame
     Onio     Colorado
    Green Red    Green
a 1     0   1        2
  2     3   4        5
b 1     6   7        8
  2     9  10       11
>>> frame.stack()
           Colorado  Onio
a 1 Green       2.0     0
    Red         NaN     1
  2 Green       5.0     3
    Red         NaN     4
b 1 Green       8.0     6
    Red         NaN     7
  2 Green      11.0     9
    Red         NaN    10
>>> frame.index.names=['key1','key2']
>>> frame.columns.names = ['state','color']
>>> frame
state      Onio     Colorado
color     Green Red    Green
key1 key2                   
a    1        0   1        2
     2        3   4        5
b    1        6   7        8
     2        9  10       11

>>> frame['Onio']
     Green  Red
a 1      0    1
  2      3    4
b 1      6    7
  2      9   10
>>> frame.swaplevel('key1','key2')
state      Onio     Colorado
color     Green Red    Green
key2 key1                   
1    a        0   1        2
2    a        3   4        5
1    b        6   7        8
2    b        9  10       11

根據級別彙總統計

>>> frame.sum(level='key2')
state  Onio     Colorado
color Green Red    Green
key2                    
1         6   8       10
2        12  14       16
>>> frame.sum(level='color',axis=1)
color      Green  Red
key1 key2            
a    1         2    1
     2         8    4
b    1        14    7
     2        20   10

使用DataFrame的列

DataFrame的set_index函式會將一個或多個列轉換為行索引,並建立一個新的DataFrame

預設情況下,那些列會從DataFrame中移除,但也可以將其保留下來。

>>> frame = DataFrame({'a':range(7),'b':range(7,0,-1),'c':['one','one','one','two','two','two','two'],'d':[0,1,2,0,1,2,3]})
>>> frame
   a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3
>>> frame2=  frame.set_index(['c','d'])
>>> frame2
       a  b
c   d      
one 0  0  7
    1  1  6
    2  2  5
two 0  3  4
    1  4  3
    2  5  2
    3  6  1
>>> frame.set_index(['c','d'],drop=False)
       a  b    c  d
c   d              
one 0  0  7  one  0
    1  1  6  one  1
    2  2  5  one  2
two 0  3  4  two  0
    1  4  3  two  1
    2  5  2  two  2
    3  6  1  two  3
>>> frame2.reset_index()
     c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1