pandas group by
GroupBy with MultiIndex
In [24]: s
Out[24]:
first second
bar one -0.575247
two 0.254161
baz one -1.143704
two 0.215897
foo one 1.193555
two -0.077118
qux one -0.408530
two -0.862495
dtype: float64
In [25]: grouped = s.groupby(level=0)
In [26]: grouped.sum()
Out[26]:
first
bar -0.321085
baz -0.927807
foo 1.116437
qux -1.271025
dtype: float64
In [27]: s.groupby(level=’second’).sum()
Out[27]:
second
one -0.933926
two -0.469555
dtype: float64
In [28]: s.sum(level=’second’)
Out[28]:
second
one -0.933926
two -0.469555
dtype: float64
Iterating through groups
In [35]: grouped = df.groupby(’A’)
In [36]: for name, group in grouped:
....: print(name)
....: print(group)
....:
bar
A B C D
1 bar one -0.042379 -0.089329
3 bar three -0.009920 -0.945867
5 bar two 0.495767 1.956030
foo
A B C D
0 foo one -0.919854 -1.131345
2 foo two 1.247642 0.337863
4 foo two 0.290213 -0.932132
6 foo one 0.362949 0.017587
7 foo three 1.548106 -0.016692
In [37]: for name, group in df.groupby([’A’, ’B’]):
....: print(name)
....: print(group)
(’bar’, ’one’)
A B C D
1 bar one -0.042379 -0.089329
(’bar’, ’three’)
A B C D
3 bar three -0.00992 -0.945867
(’bar’, ’two’)
A B C D
5 bar two 0.495767 1.95603
(’foo’, ’one’)
A B C D
0 foo one -0.919854 -1.131345
6 foo one 0.362949 0.017587
(’foo’, ’three’)
A B C D
7 foo three 1.548106 -0.016692
(’foo’, ’two’)
A B C D
2 foo two 1.247642 0.337863
4 foo two 0.290213 -0.932132
Aggregation
In [38]: grouped = df.groupby(’A’)
In [39]: grouped.aggregate(np.sum)
Out[39]:
C D
A
bar 0.443469 0.920834
foo 2.529056 -1.724719
In [40]: grouped = df.groupby([’A’, ’B’])
In [41]: grouped.aggregate(np.sum)
Out[41]:
C D
A B
bar one -0.042379 -0.089329
three -0.009920 -0.945867
two 0.495767 1.956030
foo one -0.556905 -1.113758
three 1.548106 -0.016692
two 1.537855 -0.594269
In [42]: grouped = df.groupby([’A’, ’B’], as_index=False)
In [43]: grouped.aggregate(np.sum)
Out[43]:
A B C D
0 bar one -0.042379 -0.089329
1 bar three -0.009920 -0.945867
2 bar two 0.495767 1.956030
3 foo one -0.556905 -1.113758
4 foo three 1.548106 -0.016692
5 foo two 1.537855 -0.594269
In [44]: df.groupby(’A’, as_index=False).sum()
Out[44]:
A C D
0 bar 0.443469 0.920834
1 foo 2.529056 -1.724719
In [45]: df.groupby([’A’, ’B’]).sum().reset_index()
Out[45]:
A B C D
0 bar one -0.042379 -0.089329
1 bar three -0.009920 -0.945867
2 bar two 0.495767 1.956030
3 foo one -0.556905 -1.113758
4 foo three 1.548106 -0.016692
5 foo two 1.537855 -0.594269
In [46]: grouped.size()
Out[46]:
A B
bar one 1
three 1
two 1
foo one 2
three 1
two 2
dtype: int64
Applying multiple functions at once
In [48]: grouped = df.groupby(’A’)
In [49]: grouped[’C’].agg([np.sum, np.mean, np.std])
Out[49]:
sum mean std
A
bar 0.443469 0.147823 0.301765
foo 2.529056 0.505811 0.966450
In [50]: grouped[’D’].agg({’result1’ : np.sum,
....: ’result2’ : np.mean})
....:
Out[50]:
result2 result1
A
bar 0.306945 0.920834
foo -0.344944 -1.724719
In [51]: grouped.agg([np.sum, np.mean, np.std])
Out[51]:
C D
sum mean std sum mean std
A
bar 0.443469 0.147823 0.301765 0.920834 0.306945 1.490982
foo 2.529056 0.505811 0.966450 -1.724719 -0.344944 0.645875
16.3.2 Applying different functions to DataFrame columns
In [52]: grouped.agg({’C’ : np.sum,
....: ’D’ : lambda x: np.std(x, ddof=1)})
....:
Out[52]:
C D
A
bar 0.443469 1.490982
foo 2.529056 0.645875
Transformation
In [56]: index = date_range(’10/1/1999’, periods=1100)
In [57]: ts = Series(np.random.normal(0.5, 2, 1100), index)
In [58]: ts = rolling_mean(ts, 100, 100).dropna()
In [61]: key = lambda x: x.year
In [62]: zscore = lambda x: (x - x.mean()) / x.std()
In [63]: transformed = ts.groupby(key).transform(zscore)
In [70]: compare = DataFrame({’Original’: ts, ’Transformed’: transformed})
In [71]: compare.plot()
Out[71]: <matplotlib.axes._subplots.AxesSubplot at 0xa10a7e6c>
Another common data transform is to replace missing data with the group mean.
In [77]: f = lambda x: x.fillna(x.mean())
In [78]: transformed = grouped.transform(f)
In [79]: grouped_trans = transformed.groupby(key)
Flexible apply
In [109]: grouped = df.groupby(’A’)[’C’]
In [110]: def f(group):
.....: return DataFrame({’original’ : group,
.....: ’demeaned’ : group - group.mean()})
.....:
In [111]: grouped.apply(f)
Out[111]:
demeaned original
0 -1.425665 -0.919854
1 -0.190202 -0.042379
2 0.741831 1.247642
3 -0.157743 -0.009920
4 -0.215598 0.290213
5 0.347944 0.495767
6 -0.142862 0.362949
7 1.042295 1.548106
In [112]: def f(x):
.....: return Series([ x, x**2 ], index = [’x’, ’x^s’])
.....:
In [113]: s
Out[113]:
0 9.0
1 8.0
2 7.0
3 5.0
4 19.0
5 1.0
6 4.2
7 3.3
dtype: float64
In [114]: s.apply(f)
Out[114]:
x x^s
0 9.0 81.00
1 8.0 64.00
2 7.0 49.00
3 5.0 25.00
4 19.0 361.00
5 1.0 1.00
6 4.2 17.64
7 3.3 10.89
Grouping with a Grouper specification
In [123]: import datetime as DT
In [124]: df = DataFrame({
.....: ’Branch’ : ’A A A A A A A B’.split(),
.....: ’Buyer’: ’Carl Mark Carl Carl Joe Joe Joe Carl’.split(),
.....: ’Quantity’: [1,3,5,1,8,1,9,3],
.....: ’Date’ : [
.....: DT.datetime(2013,1,1,13,0),
.....: DT.datetime(2013,1,1,13,5),
.....: DT.datetime(2013,10,1,20,0),
.....: DT.datetime(2013,10,2,10,0),
.....: DT.datetime(2013,10,1,20,0),
.....: DT.datetime(2013,10,2,10,0),
.....: DT.datetime(2013,12,2,12,0),
.....: DT.datetime(2013,12,2,14,0),
.....: ]})
Branch Buyer Date Quantity
0 A Carl 2013-01-01 13:00:00 1
1 A Mark 2013-01-01 13:05:00 3
2 A Carl 2013-10-01 20:00:00 5
3 A Carl 2013-10-02 10:00:00 1
4 A Joe 2013-10-01 20:00:00 8
5 A Joe 2013-10-02 10:00:00 1
6 A Joe 2013-12-02 12:00:00 9
7 B Carl 2013-12-02 14:00:00 3
In [126]: df.groupby([pd.Grouper(freq=’1M’,key=’Date’),’Buyer’]).sum()
Out[126]:
Quantity
Date Buyer
2013-01-31 Carl 1
Mark 3
2013-10-31 Carl 6
Joe 9
2013-12-31 Carl 3
Joe 9
Taking the nth row of each group
In [136]: df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=[’A’, ’B’])
In [137]: g = df.groupby(’A’)
In [138]: g.nth(0)
Out[138]:
B
A
1 NaN
5 6
In [139]: g.nth(-1)
Out[139]:
B
A
1 4
5 6
In [140]: g.nth(1)
Out[140]:
B
A
1 4
相關推薦
pandas group by
GroupBy with MultiIndex In [24]: s Out[24]: first second bar one -0.575247 two 0.254161 baz one -1.143704 two 0.2158
解決mysql報錯:- Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ'
_for tran contains column schema mysql eat table express mysql執行報錯: - Expression #1 of ORDER BY clause is not in GROUP BY clause and cont
mysql group by
style images pan sel mysq http src .com try w SELECT COUNT(*) FROM ( SELECT COUNT(*) FROM
GROUP BY 和 ORDER BY一起使用時,要註意的問題!
聚合 pan csdn under line order 註意 net asp 轉:http://blog.csdn.net/haiross/article/details/38897835 註意:ORDER BY 子句中的列必須包含在聚合函數或 GROUP BY 子句中。
C#中對DataTable進行全連接後group by,orderby
orderby utime mp3 var c# put enume enum solver var result = from temp2 in ( f
mybatis group by查詢返回map類型
macro fig link context hand out 取值 image 對象 故事的發生是這樣的. . . . . . . 一天 我發現我們的頁面顯示了這樣的匯總統計數據,看起來體驗還不錯哦~~ 然後,我發現代碼是這樣滴:分開每個狀態分別去查詢數量。 額e,可是
有關group by;
並不是 劃分 表達 avi 最小 tin select tno 執行過程 作為一個ORACLE數據庫初學者,基本上都會經歷的一個階段,讓人看到就想要吐的練習題,第一波,第二波,第三波......第n波 現在在網上也可以找到諸多波的答案,可是,我想說的是,復制答案,解決不了問
hive------ Group by、join、distinct等實現原理
map etc 條件 val log in use ins none 操作 1. Hive 的 distribute by Order by 能夠預期產生完全排序的結果,但是它是通過只用一個reduce來做到這點的。所以對於大規模的數據集它的效率非常低。在很多
group by 使用註意
blog image logs -668 分享 .com -1 com 技術 例子: 使用註意: group by 使用註意
mysql使用GROUP BY分組實現取前N條記錄的方法
cls class ges rom 當前 分組 實現 一個 images MySQL中GROUP BY分組取前N條記錄實現 mysql分組,取記錄 GROUP BY之後如何取每組的前兩位下面我來講述mysql中GROUP BY分組取前N條記錄實現方法。 這是測試表(也
group by having執行順序
時間 執行 百度 where tro 順序 weight red order 原文發布時間為:2009-07-28 —— 來源於本人的百度文章 [由搬家工具導入]核心原理where>group>having
MySQL 基本應用 count() 與 group by
技術 .cn ima count() 應用 nbsp .com count 基本 MySQL 基本應用 count() 與 group by
[Mysql 查詢語句]——分組查詢group by
dash sel concat avg 年齡 http 查詢語句 表示 單獨 #group by #group by + group_concat() #group by + 集合函數 #group by + having #group by (1) group by
SQL 筆記1,left join,group by,having
rom from 報錯 order by use join unknown and select 表:XS,XK,CJ left join 表1 on 表1.字段=表2.字段 group by 分組條件 order by 排序條件 asc正序(小到大),desc倒序 hav
mysql group by 詳解
target ron 分組 tar 一個 詳解 targe strong 字段 GROUP BY X意思是將所有具有相同X字段值的記錄放到一個分組裏。 那麽GROUP BY X, Y呢? GROUP BY X, Y意思是將所有具有相同X字段值和Y字段值的記錄放到一個分組
mysql學習(一)-group by的使用
bold 場景 from mysq table 我們 ble count logs 業務場景一:查詢主表並帶出與主表關聯的子表的個數 實現方法:分組group by 子表外鍵 sql如下: 1 select 2 main.id id, 3 main.nam
group by 和 having子句
列名 2010年 article 入職 maximum 標準 imu taf 分組查詢 GROUP BY語法可以根據給定數據列的每個成員對查詢結果進行分組統計,最終得到一個分組匯總表。 select子句中的列名必須為分組列或列函數,列函數對於group by
group by多字段分組
意思 order 商品 james pre ring 2.0 分組 課程 在平時的開發任務中我們經常會用到MYSQL的GROUP BY分組, 用來獲取數據表中以分組字段為依據的統計數據。比如有一個學生選課表,表結構如下: Table: Subject_Selection
mysql ORDER BY,GROUP BY 和DISTINCT原理
如果 str reat 個數 需要 是我 如何 where子句 三種 前言 除了常規的Join語句之外,還有一類Query語句也是使用比較頻繁的,那就是ORDERBY,GROUP BY以及DISTINCT這三類查詢。考慮到這三類查詢都涉及到數據的排序等操作,所以我將他們放在
Mysql group by,order by,dinstict優化
border 合並結果集 strong borde ron 無法找到 原理 sub contain 1.order by優化 實現方式: 1. 根據索引字段排序,利用索引取出的數據已經是排好序的,直接返回給客戶端; 2. 沒有用到索引,將取出的數據進行一次排序操作後返回給客