Pandas GroupBy物件
阿新 • • 發佈:2018-12-10
建立GroupBy物件
GroupBy物件可以通過pandas.DataFrame.groupby(), pandas.Series.groupby()來建立。
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]
- 1
Parameters:
- by : mapping, function, str, or iterable
- axis : int, default 0
- level : int, level name, or sequence of such, default None(複合索引的時候指定索引層級)
- as_index : boolean, default True(by列當成索引)
- sort : boolean, default True(排序)
- group_keys : boolean, default True(?)
- squeeze : boolean, default False(?)
Returns:
- GroupBy object
索引與迭代
屬性 | 描述 |
---|---|
dict {group name -> group labels} | |
dict {group name -> group indices} | |
A Grouper allows the user to specify a groupby instruction for a target |
函式應用(Function application)
函式應用經常結合numpy庫與lamda來使用
描述統計
資料框(DataFrame)與序列(Series)通用函式
Function | Describe |
---|---|
統計函式 | |
GroupBy.sum() | 計算每組的和 |
GroupBy.ohlc() | Compute sum of values, excluding missing values |
GroupBy.cumcount([ascending]) | Number each item in each group from 0 to the length of that group - 1. |
GroupBy.mean(*args, **kwargs) | 均值,不包含缺失值 |
GroupBy.prod() | Compute prod of group values |
GroupBy.var([ddof]) | 方差,不包含缺失值 |
GroupBy.std([ddof]) | 標準差,不包含缺失值 |
GroupBy.sem([ddof]) | 標準誤,不包含缺失值 |
描述函式 | |
GroupBy.size() | 組大小 |
GroupBy.count() | 組元素個數,不包含缺失值 |
GroupBy.max() | 組最大值 |
GroupBy.min() | 組最小值 |
GroupBy.median() | 組中間值 |
索引函式 | |
GroupBy.first() | Compute first of group values |
GroupBy.head([n]) | Returns first n rows of each group. |
GroupBy.last() | Compute last of group values |
GroupBy.tail([n]) | Returns last n rows of each group |
GroupBy.nth(n[, dropna]) | 每組第n條資料 |
資料框(DataFrame)與序列(Series)不一致函式
Function | Describe |
---|---|
DataFrameGroupBy.agg(arg,?*args,?**kwargs) | Aggregate using input function or dict of {column -> |
DataFrameGroupBy.all([axis,?bool_only,?…]) | Return whether all elements are True over requested axis |
DataFrameGroupBy.any([axis,?bool_only,?…]) | Return whether any element is True over requested axis |
DataFrameGroupBy.bfill([limit]) | Backward fill the values |
DataFrameGroupBy.corr([method,?min_periods]) | Compute pairwise correlation of columns, excluding NA/null values |
DataFrameGroupBy.count() | Compute count of group, excluding missing values |
DataFrameGroupBy.cov([min_periods]) | Compute pairwise covariance of columns, excluding NA/null values |
DataFrameGroupBy.cummax([axis,?skipna]) | Return cumulative max over requested axis. |
DataFrameGroupBy.cummin([axis,?skipna]) | Return cumulative minimum over requested axis. |
DataFrameGroupBy.cumprod([axis]) | Cumulative product for each group |
DataFrameGroupBy.cumsum([axis]) | Cumulative sum for each group |
DataFrameGroupBy.describe([percentiles,?…]) | Generate various summary statistics, excluding NaN values. |
DataFrameGroupBy.diff([periods,?axis]) | 1st discrete difference of object |
DataFrameGroupBy.ffill([limit]) | Forward fill the values |
DataFrameGroupBy.fillna([value,?method,?…]) | Fill NA/NaN values using the specified method |
DataFrameGroupBy.hist(data[,?column,?by,?…]) | Draw histogram of the DataFrame’s series using matplotlib / pylab. |
DataFrameGroupBy.idxmax([axis,?skipna]) | Return index of first occurrence of maximum over requested axis. |
DataFrameGroupBy.idxmin([axis,?skipna]) | Return index of first occurrence of minimum over requested axis. |
DataFrameGroupBy.mad([axis,?skipna,?level]) | Return the mean absolute deviation of the values for the requested axis |
DataFrameGroupBy.pct_change([periods,?…]) | Percent change over given number of periods. |
DataFrameGroupBy.plot | Class implementing the .plot attribute for groupby objects |
DataFrameGroupBy.quantile([q,?axis,?…]) | Return values at the given quantile over requested axis, a la numpy.percentile. |
DataFrameGroupBy.rank([axis,?method,?…]) | Compute numerical data ranks (1 through n) along axis. |
DataFrameGroupBy.resample(rule,?*args,?**kwargs) | Provide resampling when using a TimeGrouper |
DataFrameGroupBy.shift([periods,?freq,?axis]) | Shift each group by periods observations |
DataFrameGroupBy.size() | Compute group sizes |
DataFrameGroupBy.skew([axis,?skipna,?level,?…]) | Return unbiased skew over requested axis |
DataFrameGroupBy.take(indices[,?axis,?…]) | Analogous to ndarray.take |
DataFrameGroupBy.tshift([periods,?freq,?axis]) | Shift the time index, using the index’s frequency if available. |
僅支援序列(Series)的函式
Function | Describe |
---|---|
SeriesGroupBy.nlargest(*args,?**kwargs) | Return the largest?n?elements. |
SeriesGroupBy.nsmallest(*args,?**kwargs) | Return the smallest?n?elements. |
SeriesGroupBy.nunique([dropna]) | Returns number of unique elements in the group |
SeriesGroupBy.unique() | Return np.ndarray of unique values in the object. |
SeriesGroupBy.value_counts([normalize,?…]) |
僅支援資料框(DataFrame)的函式
Function | Describe |
---|---|
DataFrameGroupBy.corrwith(other[,?axis,?drop]) | Compute pairwise correlation between rows or columns of two DataFrame objects. |
DataFrameGroupBy.boxplot(grouped[,?…]) | Make box plots from DataFrameGroupBy data. |