1. 程式人生 > >pandas_cookbook學習(八)

pandas_cookbook學習(八)

切片一個數據集:

In [122]: df = pd.DataFrame(data={'Case' : ['A','A','A','B','A','A','B','A','A'],
   .....:                         'Data' : np.random.randn(9)})
   .....: 

In [123]: dfs = list(zip(*df.groupby((1*(df['Case']=='B')).cumsum().rolling(window=3,min_periods=1).median())))[-1]

In [124]
: dfs[0] Out[124]: Case Data 0 A 0.174068 1 A -0.439461 2 A -0.741343 3 B -0.079673 In [125]: dfs[1] Out[125]: Case Data 4 A -0.922875 5 A 0.303638 6 B -0.917368 In [126]: dfs[2] Out[126]: Case Data 7 A -1.624062 8 A -0.758514

資料透視表:

In [127]: df = pd.DataFrame(
data={'Province' : ['ON','QC','BC','AL','AL','MN','ON'], .....: 'City' : ['Toronto','Montreal','Vancouver','Calgary','Edmonton','Winnipeg','Windsor'], .....: 'Sales' : [13,6,16,8,4,3,1]}) .....: In [128]: table = pd.pivot_table(df,values=[
'Sales'],index=['Province'],columns=['City'],aggfunc=np.sum,margins=True) In [129]: table.stack('City') Out[129]: Sales Province City AL All 12.0 Calgary 8.0 Edmonton 4.0 BC All 16.0 Vancouver 16.0 MN All 3.0 Winnipeg 3.0 ... ... All Calgary 8.0 Edmonton 4.0 Montreal 6.0 Toronto 13.0 Vancouver 16.0 Windsor 1.0 Winnipeg 3.0 [20 rows x 1 columns]
In [130]: grades = [48,99,75,80,42,80,72,68,36,78]

In [131]: df = pd.DataFrame( {'ID': ["x%d" % r for r in range(10)],
   .....:                     'Gender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
   .....:                     'ExamYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
   .....:                     'Class': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
   .....:                     'Participated': ['yes','yes','yes','yes','no','yes','yes','yes','yes','yes'],
   .....:                     'Passed': ['yes' if x > 50 else 'no' for x in grades],
   .....:                     'Employed': [True,True,True,False,False,False,False,True,True,False],
   .....:                     'Grade': grades})
   .....: 

In [132]: df.groupby('ExamYear').agg({'Participated': lambda x: x.value_counts()['yes'],
   .....:                     'Passed': lambda x: sum(x == 'yes'),
   .....:                     'Employed' : lambda x : sum(x),
   .....:                     'Grade' : lambda x : sum(x) / len(x)})
   .....: 
Out[132]: 
          Participated  Passed  Employed      Grade
ExamYear                                           
2007                 3       2         3  74.000000
2008                 3       3         0  68.500000
2009                 3       2         2  60.666667

以年度資料形式展現:

In [133]: df = pd.DataFrame({'value': np.random.randn(36)},
   .....:                   index=pd.date_range('2011-01-01', freq='M', periods=36))
   .....: 

In [134]: pd.pivot_table(df, index=df.index.month, columns=df.index.year,
   .....:                values='value', aggfunc='sum')
   .....: 
Out[134]: 
        2011      2012      2013
1  -0.560859  0.120930  0.516870
2  -0.589005 -0.210518  0.343125
3  -1.070678 -0.931184  2.137827
4  -1.681101  0.240647  0.452429
5   0.403776 -0.027462  0.483103
6   0.609862  0.033113  0.061495
7   0.387936 -0.658418  0.240767
8   1.815066  0.324102  0.782413
9   0.705200 -1.403048  0.628462
10 -0.668049 -0.581967 -0.880627
11  0.242501 -1.233862  0.777575
12  0.313421 -3.520876 -0.779367