1. 程式人生 > >Pandas GroupBy物件

Pandas GroupBy物件

建立GroupBy物件

GroupBy物件可以通過pandas.DataFrame.groupby(), pandas.Series.groupby()來建立。

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)[source]
  • 1

Parameters:

  1. by : mapping, function, str, or iterable
  2. axis : int, default 0
  3. level : int, level name, or sequence of such, default None(複合索引的時候指定索引層級)
  4. as_index : boolean, default True(by列當成索引)
  5. sort : boolean, default True(排序)
  6. group_keys : boolean, default True(?)
  7. squeeze : boolean, default False(?)

Returns:

  1. GroupBy object

索引與迭代

屬性 描述
dict {group name -> group labels}
dict {group name -> group indices}
A Grouper allows the user to specify a groupby instruction for a target

函式應用(Function application)

函式應用經常結合numpy庫與lamda來使用

描述統計

資料框(DataFrame)與序列(Series)通用函式

Function Describe
統計函式
GroupBy.sum() 計算每組的和
GroupBy.ohlc() Compute sum of values, excluding missing values
GroupBy.cumcount([ascending]) Number each item in each group from 0 to the length of that group - 1.
GroupBy.mean(*args, **kwargs) 均值,不包含缺失值
GroupBy.prod() Compute prod of group values
GroupBy.var([ddof]) 方差,不包含缺失值
GroupBy.std([ddof]) 標準差,不包含缺失值
GroupBy.sem([ddof]) 標準誤,不包含缺失值
描述函式
GroupBy.size() 組大小
GroupBy.count() 組元素個數,不包含缺失值
GroupBy.max() 組最大值
GroupBy.min() 組最小值
GroupBy.median() 組中間值
索引函式
GroupBy.first() Compute first of group values
GroupBy.head([n]) Returns first n rows of each group.
GroupBy.last() Compute last of group values
GroupBy.tail([n]) Returns last n rows of each group
GroupBy.nth(n[, dropna]) 每組第n條資料

資料框(DataFrame)與序列(Series)不一致函式

Function Describe
DataFrameGroupBy.agg(arg,?*args,?**kwargs) Aggregate using input function or dict of {column ->
DataFrameGroupBy.all([axis,?bool_only,?…]) Return whether all elements are True over requested axis
DataFrameGroupBy.any([axis,?bool_only,?…]) Return whether any element is True over requested axis
DataFrameGroupBy.bfill([limit]) Backward fill the values
DataFrameGroupBy.corr([method,?min_periods]) Compute pairwise correlation of columns, excluding NA/null values
DataFrameGroupBy.count() Compute count of group, excluding missing values
DataFrameGroupBy.cov([min_periods]) Compute pairwise covariance of columns, excluding NA/null values
DataFrameGroupBy.cummax([axis,?skipna]) Return cumulative max over requested axis.
DataFrameGroupBy.cummin([axis,?skipna]) Return cumulative minimum over requested axis.
DataFrameGroupBy.cumprod([axis]) Cumulative product for each group
DataFrameGroupBy.cumsum([axis]) Cumulative sum for each group
DataFrameGroupBy.describe([percentiles,?…]) Generate various summary statistics, excluding NaN values.
DataFrameGroupBy.diff([periods,?axis]) 1st discrete difference of object
DataFrameGroupBy.ffill([limit]) Forward fill the values
DataFrameGroupBy.fillna([value,?method,?…]) Fill NA/NaN values using the specified method
DataFrameGroupBy.hist(data[,?column,?by,?…]) Draw histogram of the DataFrame’s series using matplotlib / pylab.
DataFrameGroupBy.idxmax([axis,?skipna]) Return index of first occurrence of maximum over requested axis.
DataFrameGroupBy.idxmin([axis,?skipna]) Return index of first occurrence of minimum over requested axis.
DataFrameGroupBy.mad([axis,?skipna,?level]) Return the mean absolute deviation of the values for the requested axis
DataFrameGroupBy.pct_change([periods,?…]) Percent change over given number of periods.
DataFrameGroupBy.plot Class implementing the .plot attribute for groupby objects
DataFrameGroupBy.quantile([q,?axis,?…]) Return values at the given quantile over requested axis, a la numpy.percentile.
DataFrameGroupBy.rank([axis,?method,?…]) Compute numerical data ranks (1 through n) along axis.
DataFrameGroupBy.resample(rule,?*args,?**kwargs) Provide resampling when using a TimeGrouper
DataFrameGroupBy.shift([periods,?freq,?axis]) Shift each group by periods observations
DataFrameGroupBy.size() Compute group sizes
DataFrameGroupBy.skew([axis,?skipna,?level,?…]) Return unbiased skew over requested axis
DataFrameGroupBy.take(indices[,?axis,?…]) Analogous to ndarray.take
DataFrameGroupBy.tshift([periods,?freq,?axis]) Shift the time index, using the index’s frequency if available.

僅支援序列(Series)的函式

Function Describe
SeriesGroupBy.nlargest(*args,?**kwargs) Return the largest?n?elements.
SeriesGroupBy.nsmallest(*args,?**kwargs) Return the smallest?n?elements.
SeriesGroupBy.nunique([dropna]) Returns number of unique elements in the group
SeriesGroupBy.unique() Return np.ndarray of unique values in the object.
SeriesGroupBy.value_counts([normalize,?…])

僅支援資料框(DataFrame)的函式

Function Describe
DataFrameGroupBy.corrwith(other[,?axis,?drop]) Compute pairwise correlation between rows or columns of two DataFrame objects.
DataFrameGroupBy.boxplot(grouped[,?…]) Make box plots from DataFrameGroupBy data.