pandas_cookbook學習(八)
阿新 • • 發佈:2018-11-14
切片一個數據集:
In [122]: df = pd.DataFrame(data={'Case' : ['A','A','A','B','A','A','B','A','A'],
.....: 'Data' : np.random.randn(9)})
.....:
In [123]: dfs = list(zip(*df.groupby((1*(df['Case']=='B')).cumsum().rolling(window=3,min_periods=1).median())))[-1]
In [124] : dfs[0]
Out[124]:
Case Data
0 A 0.174068
1 A -0.439461
2 A -0.741343
3 B -0.079673
In [125]: dfs[1]
Out[125]:
Case Data
4 A -0.922875
5 A 0.303638
6 B -0.917368
In [126]: dfs[2]
Out[126]:
Case Data
7 A -1.624062
8 A -0.758514
資料透視表:
In [127]: df = pd.DataFrame( data={'Province' : ['ON','QC','BC','AL','AL','MN','ON'],
.....: 'City' : ['Toronto','Montreal','Vancouver','Calgary','Edmonton','Winnipeg','Windsor'],
.....: 'Sales' : [13,6,16,8,4,3,1]})
.....:
In [128]: table = pd.pivot_table(df,values=[ 'Sales'],index=['Province'],columns=['City'],aggfunc=np.sum,margins=True)
In [129]: table.stack('City')
Out[129]:
Sales
Province City
AL All 12.0
Calgary 8.0
Edmonton 4.0
BC All 16.0
Vancouver 16.0
MN All 3.0
Winnipeg 3.0
... ...
All Calgary 8.0
Edmonton 4.0
Montreal 6.0
Toronto 13.0
Vancouver 16.0
Windsor 1.0
Winnipeg 3.0
[20 rows x 1 columns]
In [130]: grades = [48,99,75,80,42,80,72,68,36,78]
In [131]: df = pd.DataFrame( {'ID': ["x%d" % r for r in range(10)],
.....: 'Gender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
.....: 'ExamYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
.....: 'Class': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
.....: 'Participated': ['yes','yes','yes','yes','no','yes','yes','yes','yes','yes'],
.....: 'Passed': ['yes' if x > 50 else 'no' for x in grades],
.....: 'Employed': [True,True,True,False,False,False,False,True,True,False],
.....: 'Grade': grades})
.....:
In [132]: df.groupby('ExamYear').agg({'Participated': lambda x: x.value_counts()['yes'],
.....: 'Passed': lambda x: sum(x == 'yes'),
.....: 'Employed' : lambda x : sum(x),
.....: 'Grade' : lambda x : sum(x) / len(x)})
.....:
Out[132]:
Participated Passed Employed Grade
ExamYear
2007 3 2 3 74.000000
2008 3 3 0 68.500000
2009 3 2 2 60.666667
以年度資料形式展現:
In [133]: df = pd.DataFrame({'value': np.random.randn(36)},
.....: index=pd.date_range('2011-01-01', freq='M', periods=36))
.....:
In [134]: pd.pivot_table(df, index=df.index.month, columns=df.index.year,
.....: values='value', aggfunc='sum')
.....:
Out[134]:
2011 2012 2013
1 -0.560859 0.120930 0.516870
2 -0.589005 -0.210518 0.343125
3 -1.070678 -0.931184 2.137827
4 -1.681101 0.240647 0.452429
5 0.403776 -0.027462 0.483103
6 0.609862 0.033113 0.061495
7 0.387936 -0.658418 0.240767
8 1.815066 0.324102 0.782413
9 0.705200 -1.403048 0.628462
10 -0.668049 -0.581967 -0.880627
11 0.242501 -1.233862 0.777575
12 0.313421 -3.520876 -0.779367