python pandas 聚合與分組函式
阿新 • • 發佈:2019-01-08
1 主要內容
- DataFrame.groupby().sum()
- DataFrame.groupby().agg()
- pandas.concat([DataFrame1,DataFrame2])
- pandas.merge(DataFrame1,DataFrame2,parameters….)
- DataFrame1.join(DataFrame2,lsuffix=’列名 on DataFrame1’,rsuffix=’列名 on DataFrame2’)
- 幫助文件的獲取
2 例項
- 構造dataframe如下所示:
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
3 chocolate 3 6 5.247564 4 hot
4 iceream 6 3 4.319450 4 cold
5 iceream 5 4 2.912291 1 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
實現程式如下所示:
import pandas as pd
from numpy import random
from numpy.random import rand
import numpy as np
random.seed(42)
df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),
'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],
'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],
'price':10 * rand(10),'number':random.randint(1,9,10)})
print df
2 groupby函式應用
程式碼
groupby1 = df.groupby(['user_id'])
i = 0
for user_id,group in groupby1:
i = i + 1
print "group", i , user_id
print group
結果
group 1 1
food food_id number price user_id weather
5 iceream 5 4 2.912291 1 cold
group 2 2
food food_id number price user_id weather
2 iceream 8 7 3.042422 2 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
group 3 3
food food_id number price user_id weather
0 soup 4 6 1.81825 3 cold
group 4 4
food food_id number price user_id weather
1 soup 8 6 1.834045 4 hot
3 chocolate 3 6 5.247564 4 hot
4 iceream 6 3 4.319450 4 cold
9 chocolate 2 1 3.663618 4 hot
3 groupby和sum等函式結合使用
程式碼
print groupby1.sum()#對除了groupby索引以外的每個數值列進行求和
print groupby1['food_id','number'].sum()#對除了groupby索引以外的特定數值列進行求和
print df.groupby(['user_id'],as_index=False).sum()#預設as_index=True
#當然除了sum,還有mean,min,max,median,mode,std,mad等等,操作方法同理
#groupby()中的形參可用help(df.groupby)來檢視
#常用的引數axis=0,表示對行進行操作,即指定列中不同值進行分組;axis=1,表示對列進行分組
結果
output[1]:
food_id number price
user_id
1 5 4 2.912291
2 24 26 13.477336
3 4 6 1.818250
4 19 16 15.064678
output[2]:
food_id number
user_id
1 5 4
2 24 26
3 4 6
4 19 16
output[3]:
user_id food_id number price
0 1 5 4 2.912291
1 2 24 26 13.477336
2 3 4 6 1.818250
3 4 19 16 15.064678
4 agg函式
程式碼
print df.groupby(['weather','food']).agg([np.mean,np.median])
結果
output[4]:
food_id number price \
mean median mean median mean median
weather food
cold iceream 5.250000 5.5 5.25 5.5 4.098173 3.680936
soup 4.000000 4.0 6.00 6.0 1.818250 1.818250
hot chocolate 2.500000 2.5 3.50 3.5 4.455591 4.455591
soup 7.333333 8.0 6.00 6.0 2.050143 1.834045
user_id
mean median
weather food
cold iceream 2.250000 2
soup 3.000000 3
hot chocolate 4.000000 4
soup 2.666667 2
5 concat()
程式碼
print "df :3\n",df[:3]
print "df :4\n",df[6:]
print pd.concat([df[:3],df[6:]],axis=0)
結果
df :3
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
df :4
food food_id number price user_id weather
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
df.concat
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
6 merge()和join()
程式碼
df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})
df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})
print "df1\n",df1
print "df2\n",df2
print "Merge() on Key\n",pd.merge(df1,df2,on='EmpNr')
print "inner join with Merge()\n",pd.merge(df1,df2,how='inner')
print "Dests join tips\n",df1.join(df2,lsuffix='Dest',rsuffix='Tips')
結果
df1
Dest EmpNr
0 The Hague 5
1 Amsterdam 3
2 Rotterdam 9
df2
Amount EmpNr
0 10.0 5
1 5.0 9
2 2.5 7
Merge() on Key
Dest EmpNr Amount
0 The Hague 5 10.0
1 Rotterdam 9 5.0
inner join with Merge()
Dest EmpNr Amount
0 The Hague 5 10.0
1 Rotterdam 9 5.0
Dests join tips
Dest EmpNrDest Amount EmpNrTips
0 The Hague 5 10.0 5
1 Amsterdam 3 5.0 9
2 Rotterdam 9 2.5 7
6幫助文件獲取方式
1.help(pd.concat)
2.dir(pd.concat)
3.pd.concat?
...
7 參考文獻
利用python進行資料分析筆記
python資料分析,Ivan Idris著