python 擴充套件庫 pandas
阿新 • • 發佈:2018-11-13
pd.qcut(x,bins,retbins=False)
根據陣列x內各數值的頻率以及bins數量對x進行等頻率分箱。retbins決定是否返回一個含有各切分點的list。返回值首先是一個含有每個x值所對應的分箱區間的list,其次是每個分箱的區間。呼叫返回物件的.value_counts()函式可檢視各分箱對應頻率。.describe()函式可展示各區間的count和freq,注意,如果輸入為pd.Series,describe函式將展示series類的describe,因此將不展示區間,因此我們需要輸入的是pd.Series.values
>>> a= pd.qcut([1,1,2,3,4,4,5,6,7],3)
>>> a
[(0.999, 2.667], (0.999, 2.667], (0.999, 2.667], (2.667, 4.333], (2.667, 4.333], (2.667, 4.333], (4.333, 7.0], (4.333, 7.0], (4.333, 7.0]]
Categories (3, interval[float64]): [(0.999, 2.667] < (2.667, 4.333] < (4.333, 7.0]]
>>> a.value_counts()
(0.999 , 2.667] 3
(2.667, 4.333] 3
(4.333, 7.0] 3
dtype: int64
>>> a.describe()
counts freqs
categories
(0.999, 2.667] 3 0.333333
(2.667, 4.333] 3 0.333333
(4.333, 7.0] 3 0.333333