1. 程式人生 > >pandas_cookbook學習(七)

pandas_cookbook學習(七)

根據索引值將每一組資料滯後一項:

In [112]: df = pd.DataFrame(
   .....:    {u'line_race': [10, 10, 8, 10, 10, 8],
   .....:     u'beyer': [99, 102, 103, 103, 88, 100]},
   .....:     index=[u'Last Gunfighter', u'Last Gunfighter', u'Last Gunfighter',
   .....:            u'Paynter', u'Paynter', u'Paynter']); df
   ..
...: Out[112]: line_race beyer Last Gunfighter 10 99 Last Gunfighter 10 102 Last Gunfighter 8 103 Paynter 10 103 Paynter 10 88 Paynter 8 100 #level=0的意思是將索引分開 In [113]: df['beyer_shifted'] = df.groupby(
level=0)['beyer'].shift(1) In [114]: df Out[114]: line_race beyer beyer_shifted Last Gunfighter 10 99 NaN Last Gunfighter 10 102 99.0 Last Gunfighter 8 103 102.0 Paynter 10 103 NaN Paynter 10
88 103.0 Paynter 8 100 88.0

獲得每一組資料的最大值

In [115]: df = pd.DataFrame({'host':['other','other','that','this','this'],
   .....:                    'service':['mail','web','mail','mail','web'],
   .....:                    'no':[1, 2, 1, 2, 1]}).set_index(['host', 'service'])
   .....: 

In [116]: mask = df.groupby(level=0).agg('idxmax')

In [117]: df_count = df.loc[mask['no']].reset_index()

In [118]: df_count
Out[118]: 
    host service  no
0  other     web   2
1   that    mail   1
2   this    mail   2
In [119]: df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A'])

In [120]: df.A.groupby((df.A != df.A.shift()).cumsum()).groups
Out[120]: 
{1: Int64Index([0], dtype='int64'),
 2: Int64Index([1], dtype='int64'),
 3: Int64Index([2], dtype='int64'),
 4: Int64Index([3, 4, 5], dtype='int64'),
 5: Int64Index([6], dtype='int64'),
 6: Int64Index([7, 8], dtype='int64')}

In [121]: df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum()
Out[121]: 
0    0
1    1
2    0
3    1
4    2
5    3
6    0
7    1
8    2
Name: A, dtype: int64