1. 程式人生 > >pandas 19 - 分層索引建立(MultiIndex)( tcy)

pandas 19 - 分層索引建立(MultiIndex)( tcy)

建立分層索引(MultiIndex)  2018/12/14

用途:在較低維度的資料結構中儲存和操作具有任意數量維度的資料1d或2d。 

函式: 

pd.MultiIndex.from_tuples(tuples, sortorder=None, names=None) # 將元組列表轉分層索引
  # 引數:tuples : list / tuple-每個元組都是一行/列的索引。sortorder : int or None
pd.MultiIndex.from_arrays(arrays, sortorder=None, names=None) # 陣列轉分層索引
  # 引數:list / array
pd.MultiIndex.from_product(iterables, sortorder=None, names=None)# 迭代轉分層索引(交叉迭代集)
  # 引數:list / sequence of iterables 

例項:   

例項1:
arrays = [['s1', 's1', 's2', 's2', 's3', 's3', 's4', 's4'],['ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2']]
tuples = list(zip(*arrays))# [('s1', 'ss1'),('s1', 'ss2'),('s2', 'ss1'),('s2', 'ss2'),('s3', 'ss1'),('s3', 'ss2'),('s4', 'ss1'),('s4', 'ss2')]

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) 

例項2:
arrays = [['s1', 's1', 's2', 's2', 's3', 's3', 's4', 's4'], ['ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second')) 

例項3:#兩個迭代中的每個元素配對
iterables = [['s1', 's2', 's3', 's4'], ['ss1', 'ss2']]
index=pd.MultiIndex.from_product(iterables, names=['first', 'second'])

# MultiIndex(levels=[['s1', 's2', 's3', 's4'], ['ss1', 'ss2']],
# labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],names=['first', 'second'])
 

應用: 

例項4:#用多層索引
s = pd.Series(np.arange(8), index=index)

例項5:#自動構建多層索引:將陣列列表直接傳遞給Series或DataFrame
arrays = [np.array(['s1', 's1', 's2', 's2', 's3', 's3', 's4', 's4']),
np.array(['ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2', 'ss1', 'ss2'])]
s = pd.Series(np.arange(8), index=arrays)
s.index.names=['first','second']

# first second
# s1 ss1       0
#    ss2       1
# s2 ss1       2
#    ss2       3
# s3 ss1       4
#    ss2       5
# s4 ss1       6
#    ss2       7
# dtype: int32 
例項6: 
df = pd.DataFrame(np.arange(24).reshape(3, 8), index=['A', 'B', 'C'], columns=index)
'''''''''
first   s1      s2      s3      s4
second ss1 ss2 ss1 ss2 ss1 ss2 ss1 ss2
A       0   1   2   3   4   5   6   7
B       8   9  10  11  12  13  14  15
C      16  17  18  19  20  21  22  23
'''
pd.DataFrame(np.arange(36).reshape(6, 6), index=index[:6], columns=index[:6])
'''
first     s1      s2      s3 
second   ss1 ss2 ss1 ss2 ss1 ss2
first second 
s1    ss1  0  1   2   3   4   5
      ss2  6  7   8   9  10  11
s2    ss1 12 13  14  15  16  17
      ss2 18 19  20  21  22  23
s3    ss1 24 25  26  27  28  29
      ss2 30 31  32  33  34  35
'''