pandas group by

阿新 • • 發佈：2019-01-17

GroupBy with MultiIndex

In [24]: s
Out[24]:
    first second
bar one -0.575247
    two 0.254161
baz one -1.143704
    two 0.215897
foo one 1.193555
    two -0.077118
qux one -0.408530
    two -0.862495
dtype: float64
In [25]: grouped = s.groupby(level=0)
In [26]: grouped.sum()
Out[26]:
first
bar -0.321085
baz -0.927807 

foo 1.116437
qux -1.271025
dtype: float64
In [27]: s.groupby(level=’second’).sum()
Out[27]:
second
one -0.933926
two -0.469555
dtype: float64
In [28]: s.sum(level=’second’)
Out[28]:
second
one -0.933926
two -0.469555
dtype: float64

Iterating through groups

In [35]: grouped = df.groupby(’A’)
In [36]: for 
 name, group in grouped:
....: print(name)
....: print(group)
....:
bar
  A   B   C         D
1 bar one -0.042379 -0.089329
3 bar three -0.009920 -0.945867
5 bar two 0.495767 1.956030
foo
  A   B   C         D
0 foo one -0.919854 -1.131345
2 foo two 1.247642 0.337863
4 foo two 0.290213 -0.932132
6 foo one 0.362949 0.017587 

7 foo three 1.548106 -0.016692
In [37]: for name, group in df.groupby([’A’, ’B’]):
....: print(name)
....: print(group)
(’bar’, ’one’)
  A   B   C         D
1 bar one -0.042379 -0.089329
(’bar’, ’three’)
  A   B     C        D
3 bar three -0.00992 -0.945867
(’bar’, ’two’)
  A   B   C        D
5 bar two 0.495767 1.95603
(’foo’, ’one’)
  A   B   C         D
0 foo one -0.919854 -1.131345
6 foo one 0.362949 0.017587
(’foo’, ’three’)
  A   B     C        D
7 foo three 1.548106 -0.016692
(’foo’, ’two’)
  A   B   C        D
2 foo two 1.247642 0.337863
4 foo two 0.290213 -0.932132

Aggregation

In [38]: grouped = df.groupby(’A’)
In [39]: grouped.aggregate(np.sum)
Out[39]:
    C        D
A
bar 0.443469 0.920834
foo 2.529056 -1.724719
In [40]: grouped = df.groupby([’A’, ’B’])
In [41]: grouped.aggregate(np.sum)
Out[41]:
          C         D
A   B
bar one   -0.042379 -0.089329
    three -0.009920 -0.945867
    two   0.495767 1.956030
foo one   -0.556905 -1.113758
    three 1.548106 -0.016692
    two   1.537855 -0.594269
In [42]: grouped = df.groupby([’A’, ’B’], as_index=False)
In [43]: grouped.aggregate(np.sum)
Out[43]:
  A   B     C         D
0 bar one   -0.042379 -0.089329
1 bar three -0.009920 -0.945867
2 bar two   0.495767 1.956030
3 foo one   -0.556905 -1.113758
4 foo three 1.548106 -0.016692
5 foo two   1.537855 -0.594269
In [44]: df.groupby(’A’, as_index=False).sum()
Out[44]:
  A   C        D
0 bar 0.443469 0.920834
1 foo 2.529056 -1.724719
In [45]: df.groupby([’A’, ’B’]).sum().reset_index()
Out[45]:
  A   B     C         D
0 bar one   -0.042379 -0.089329
1 bar three -0.009920 -0.945867
2 bar two   0.495767 1.956030
3 foo one   -0.556905 -1.113758
4 foo three 1.548106 -0.016692
5 foo two   1.537855 -0.594269
In [46]: grouped.size()
Out[46]:
    A     B
bar one   1
    three 1
    two   1
foo one   2
    three 1
    two   2
dtype: int64

Applying multiple functions at once

In [48]: grouped = df.groupby(’A’)
In [49]: grouped[’C’].agg([np.sum, np.mean, np.std])
Out[49]:
    sum      mean     std
A
bar 0.443469 0.147823 0.301765
foo 2.529056 0.505811 0.966450
In [50]: grouped[’D’].agg({’result1’ : np.sum,
....: ’result2’ : np.mean})
....:
Out[50]:
    result2   result1
A
bar 0.306945  0.920834
foo -0.344944 -1.724719
In [51]: grouped.agg([np.sum, np.mean, np.std])
Out[51]:
    C                          D
    sum      mean     std      sum mean std
A
bar 0.443469 0.147823 0.301765 0.920834 0.306945 1.490982
foo 2.529056 0.505811 0.966450 -1.724719 -0.344944 0.645875

16.3.2 Applying different functions to DataFrame columns

In [52]: grouped.agg({’C’ : np.sum,
....: ’D’ : lambda x: np.std(x, ddof=1)})
....:
Out[52]:
    C        D
A
bar 0.443469 1.490982
foo 2.529056 0.645875

Transformation

In [56]: index = date_range(’10/1/1999’, periods=1100)
In [57]: ts = Series(np.random.normal(0.5, 2, 1100), index)
In [58]: ts = rolling_mean(ts, 100, 100).dropna()
In [61]: key = lambda x: x.year
In [62]: zscore = lambda x: (x - x.mean()) / x.std()
In [63]: transformed = ts.groupby(key).transform(zscore)
In [70]: compare = DataFrame({’Original’: ts, ’Transformed’: transformed})
In [71]: compare.plot()
Out[71]: <matplotlib.axes._subplots.AxesSubplot at 0xa10a7e6c>

Another common data transform is to replace missing data with the group mean.

In [77]: f = lambda x: x.fillna(x.mean())
In [78]: transformed = grouped.transform(f)
In [79]: grouped_trans = transformed.groupby(key)

Flexible apply

In [109]: grouped = df.groupby(’A’)[’C’]
In [110]: def f(group):
.....: return DataFrame({’original’ : group,
.....: ’demeaned’ : group - group.mean()})
.....:
In [111]: grouped.apply(f)
Out[111]:
  demeaned original
0 -1.425665 -0.919854
1 -0.190202 -0.042379
2 0.741831 1.247642
3 -0.157743 -0.009920
4 -0.215598 0.290213
5 0.347944 0.495767
6 -0.142862 0.362949
7 1.042295 1.548106
In [112]: def f(x):
.....: return Series([ x, x**2 ], index = [’x’, ’x^s’])
.....:
In [113]: s
Out[113]:
0 9.0
1 8.0
2 7.0
3 5.0
4 19.0
5 1.0
6 4.2
7 3.3
dtype: float64
In [114]: s.apply(f)
Out[114]:
  x   x^s
0 9.0 81.00
1 8.0 64.00
2 7.0 49.00
3 5.0 25.00
4 19.0 361.00
5 1.0 1.00
6 4.2 17.64
7 3.3 10.89

Grouping with a Grouper specification

In [123]: import datetime as DT
In [124]: df = DataFrame({
.....: ’Branch’ : ’A A A A A A A B’.split(),
.....: ’Buyer’: ’Carl Mark Carl Carl Joe Joe Joe Carl’.split(),
.....: ’Quantity’: [1,3,5,1,8,1,9,3],
.....: ’Date’ : [
.....: DT.datetime(2013,1,1,13,0),
.....: DT.datetime(2013,1,1,13,5),
.....: DT.datetime(2013,10,1,20,0),
.....: DT.datetime(2013,10,2,10,0),
.....: DT.datetime(2013,10,1,20,0),
.....: DT.datetime(2013,10,2,10,0),
.....: DT.datetime(2013,12,2,12,0),
.....: DT.datetime(2013,12,2,14,0),
.....: ]})
  Branch Buyer Date                Quantity
0 A      Carl  2013-01-01 13:00:00 1
1 A      Mark  2013-01-01 13:05:00 3
2 A      Carl  2013-10-01 20:00:00 5
3 A      Carl  2013-10-02 10:00:00 1
4 A      Joe   2013-10-01 20:00:00 8
5 A      Joe   2013-10-02 10:00:00 1
6 A      Joe   2013-12-02 12:00:00 9
7 B      Carl  2013-12-02 14:00:00 3
In [126]: df.groupby([pd.Grouper(freq=’1M’,key=’Date’),’Buyer’]).sum()
Out[126]:
                 Quantity
Date       Buyer
2013-01-31 Carl  1
           Mark  3
2013-10-31 Carl  6
           Joe   9
2013-12-31 Carl  3
           Joe   9

Taking the nth row of each group

In [136]: df = DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=[’A’, ’B’])
In [137]: g = df.groupby(’A’)
In [138]: g.nth(0)
Out[138]:
  B
A
1 NaN
5 6
In [139]: g.nth(-1)
Out[139]:
  B
A
1 4
5 6
In [140]: g.nth(1)
Out[140]:
  B
A
1 4

pandas group by

GroupBy with MultiIndex In [24]: s Out[24]: first second bar one -0.575247 two 0.254161 baz one -1.143704 two 0.2158

解決mysql報錯：- Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ'

_for tran contains column schema mysql eat table express mysql執行報錯： - Expression #1 of ORDER BY clause is not in GROUP BY clause and cont

mysql group by

style images pan sel mysq http src .com try w SELECT COUNT(*) FROM ( SELECT COUNT(*) FROM

GROUP BY 和 ORDER BY一起使用時，要註意的問題！

聚合 pan csdn under line order 註意 net asp 轉：http://blog.csdn.net/haiross/article/details/38897835 註意：ORDER BY 子句中的列必須包含在聚合函數或 GROUP BY 子句中。

C#中對DataTable進行全連接後group by,orderby

orderby utime mp3 var c# put enume enum solver var result = from temp2 in ( f

mybatis group by查詢返回map類型

macro fig link context hand out 取值 image 對象故事的發生是這樣的. . . . . . . 一天我發現我們的頁面顯示了這樣的匯總統計數據，看起來體驗還不錯哦～～然後，我發現代碼是這樣滴：分開每個狀態分別去查詢數量。額e，可是

有關group by;

並不是劃分表達 avi 最小 tin select tno 執行過程作為一個ORACLE數據庫初學者，基本上都會經歷的一個階段，讓人看到就想要吐的練習題，第一波，第二波，第三波......第n波現在在網上也可以找到諸多波的答案，可是，我想說的是，復制答案，解決不了問

hive------ Group by、join、distinct等實現原理

map etc 條件 val log in use ins none 操作 1. Hive 的 distribute by Order by 能夠預期產生完全排序的結果，但是它是通過只用一個reduce來做到這點的。所以對於大規模的數據集它的效率非常低。在很多

group by 使用註意

blog image logs -668 分享 .com -1 com 技術例子：使用註意： group by 使用註意

mysql使用GROUP BY分組實現取前N條記錄的方法

cls class ges rom 當前分組實現一個 images MySQL中GROUP BY分組取前N條記錄實現 mysql分組,取記錄 GROUP BY之後如何取每組的前兩位下面我來講述mysql中GROUP BY分組取前N條記錄實現方法。這是測試表（也

group by having執行順序

時間執行百度 where tro 順序 weight red order 原文發布時間為：2009-07-28 —— 來源於本人的百度文章 [由搬家工具導入]核心原理where>group>having

MySQL 基本應用 count() 與 group by

技術 .cn ima count() 應用 nbsp .com count 基本 MySQL 基本應用 count() 與 group by

[Mysql 查詢語句]——分組查詢group by

dash sel concat avg 年齡 http 查詢語句表示單獨 #group by #group by + group_concat() #group by + 集合函數 #group by + having #group by (1) group by

SQL 筆記1，left join，group by，having

rom from 報錯 order by use join unknown and select 表：XS,XK,CJ left join 表1 on 表1.字段=表2.字段 group by 分組條件 order by 排序條件 asc正序(小到大)，desc倒序 hav

mysql group by 詳解

target ron 分組 tar 一個詳解 targe strong 字段 GROUP BY X意思是將所有具有相同X字段值的記錄放到一個分組裏。那麽GROUP BY X, Y呢？ GROUP BY X, Y意思是將所有具有相同X字段值和Y字段值的記錄放到一個分組

mysql學習(一)-group by的使用

bold 場景 from mysq table 我們 ble count logs 業務場景一：查詢主表並帶出與主表關聯的子表的個數實現方法：分組group by 子表外鍵 sql如下： 1 select 2 main.id id, 3 main.nam

group by 和 having子句

列名 2010年 article 入職 maximum 標準 imu taf 分組查詢 GROUP BY語法可以根據給定數據列的每個成員對查詢結果進行分組統計，最終得到一個分組匯總表。 select子句中的列名必須為分組列或列函數，列函數對於group by

group by多字段分組

意思 order 商品 james pre ring 2.0 分組課程在平時的開發任務中我們經常會用到MYSQL的GROUP BY分組，用來獲取數據表中以分組字段為依據的統計數據。比如有一個學生選課表，表結構如下： Table: Subject_Selection

mysql ORDER BY,GROUP BY 和DISTINCT原理

如果 str reat 個數需要是我如何 where子句三種前言除了常規的Join語句之外，還有一類Query語句也是使用比較頻繁的，那就是ORDERBY，GROUP BY以及DISTINCT這三類查詢。考慮到這三類查詢都涉及到數據的排序等操作，所以我將他們放在

Mysql group by,order by,dinstict優化

border 合並結果集 strong borde ron 無法找到原理 sub contain 1.order by優化實現方式： 1. 根據索引字段排序，利用索引取出的數據已經是排好序的，直接返回給客戶端； 2. 沒有用到索引，將取出的數據進行一次排序操作後返回給客

pandas group by

相關推薦