1. 程式人生 > 實用技巧 >pandas組隊學習:task4

pandas組隊學習:task4

一、分組Groupby

使用方式:df.groupby([分組的依據])[分組的資料]

例如,對不同學校和性別的學生身高分組:

df.groupby(['School', 'Gender'])['Height']

練一練:請根據上下四分位數分割,將體重分為high、normal、low三組,統計身高的均值。

low = df['Weight'].quantile(0.25)
high = df['Weight'].quantile(0.25)
condition1 =  df['Weight']>high
condition2 = df['Weight']<low
condition3 = low< df['Weight']<high			#這一塊有問題,還沒來得及問
df_high = df.groupby(condition1)['Height'].mean()
df_mid = df.groupby(condition3)['Height'].mean()
df_low = df.groupby(condition2)['Height'].mean()

通過 ngroups 屬性,可以得到分組個數:

a = df.groupby(['School', 'Gender'])
a.ngroups
Out[33]: 8

進一步,通過 groups 屬性,可以返回從 組名 對映到 組索引列表 的字典

a.groups.keys()
Out[37]: dict_keys([('Fudan University', 'Female'), ('Fudan University', 'Male'), ('Peking University', 'Female'), ('Peking University', 'Male'), ('Shanghai Jiao Tong University', 'Female'), ('Shanghai Jiao Tong University', 'Male'), ('Tsinghua University', 'Female'), ('Tsinghua University', 'Male')])

也可以直接通過 drop_duplicates 就能知道具體的組類別,其結果和上面的一致:

In [11]: df[['School', 'Gender']].drop_duplicates()
Out[11]: 
                           School  Gender
0   Shanghai Jiao Tong University  Female
1               Peking University    Male
2   Shanghai Jiao Tong University    Male
3                Fudan University  Female
4                Fudan University    Male
5             Tsinghua University  Female
9               Peking University  Female
16            Tsinghua University    Male

練一練:上一小節介紹了可以通過 drop_duplicates 得到具體的組類別,現請用 groups 屬性完成類似的功能。

a = df.groupby(['School', 'Gender'])
list(a.groups.keys())
Out[43]: 
[('Fudan University', 'Female'),
 ('Fudan University', 'Male'),
 ('Peking University', 'Female'),
 ('Peking University', 'Male'),
 ('Shanghai Jiao Tong University', 'Female'),
 ('Shanghai Jiao Tong University', 'Male'),
 ('Tsinghua University', 'Female'),
 ('Tsinghua University', 'Male')]