pandas組隊學習:task4
阿新 • • 發佈:2020-12-26
一、分組Groupby
使用方式:df.groupby([分組的依據])[分組的資料]
例如,對不同學校和性別的學生身高分組:
df.groupby(['School', 'Gender'])['Height']
練一練:請根據上下四分位數分割,將體重分為high、normal、low三組,統計身高的均值。
low = df['Weight'].quantile(0.25) high = df['Weight'].quantile(0.25) condition1 = df['Weight']>high condition2 = df['Weight']<low condition3 = low< df['Weight']<high #這一塊有問題,還沒來得及問 df_high = df.groupby(condition1)['Height'].mean() df_mid = df.groupby(condition3)['Height'].mean() df_low = df.groupby(condition2)['Height'].mean()
通過 ngroups
屬性,可以得到分組個數:
a = df.groupby(['School', 'Gender'])
a.ngroups
Out[33]: 8
進一步,通過 groups
屬性,可以返回從 組名 對映到 組索引列表 的字典
a.groups.keys() Out[37]: dict_keys([('Fudan University', 'Female'), ('Fudan University', 'Male'), ('Peking University', 'Female'), ('Peking University', 'Male'), ('Shanghai Jiao Tong University', 'Female'), ('Shanghai Jiao Tong University', 'Male'), ('Tsinghua University', 'Female'), ('Tsinghua University', 'Male')])
也可以直接通過 drop_duplicates
就能知道具體的組類別,其結果和上面的一致:
In [11]: df[['School', 'Gender']].drop_duplicates() Out[11]: School Gender 0 Shanghai Jiao Tong University Female 1 Peking University Male 2 Shanghai Jiao Tong University Male 3 Fudan University Female 4 Fudan University Male 5 Tsinghua University Female 9 Peking University Female 16 Tsinghua University Male
練一練:上一小節介紹了可以通過 drop_duplicates
得到具體的組類別,現請用 groups
屬性完成類似的功能。
a = df.groupby(['School', 'Gender'])
list(a.groups.keys())
Out[43]:
[('Fudan University', 'Female'),
('Fudan University', 'Male'),
('Peking University', 'Female'),
('Peking University', 'Male'),
('Shanghai Jiao Tong University', 'Female'),
('Shanghai Jiao Tong University', 'Male'),
('Tsinghua University', 'Female'),
('Tsinghua University', 'Male')]