Pandas之Series的使用
阿新 • • 發佈:2020-08-08
熟悉pandas的兩個工具資料結構:Series和DataFrame
Series
Series
是一種一維的陣列物件,它包含了一個值序列(與Numpy
中的型別相似),並且包含了資料標籤,稱為索引(index)
建立一個Series
物件
import pandas as pd
obj = pd.Series([4,7,-5,3])
obj
0 4
1 7
2 -5
3 3
dtype: int64
-
左邊的是索引,右邊是值。預設索引從
0
到N-1
(N
是資料的長度) -
可以通過
values
屬性和index
屬性獲得值和索引obj.values array([ 4, 7, -5, 3], dtype=int64) obj.index RangeIndex(start=0, stop=4, step=1)
-
可以建立一個索引序列,用標籤標識每個資料點
obj2 = pd.Series([4,7,-5,3],index=['d','b','a','c']) obj2 d 4 b 7 a -5 c 3 dtype: int64 obj2.index Index(['d', 'b', 'a', 'c'], dtype='object') obj2['a'] -5
-
使用布林值陣列進行過濾
obj2[obj2 > 2] d 4 b 7 c 3 dtype: int64 np.exp(obj2) d 54.598150 b 1096.633158 a 0.006738 c 20.085537 dtype: float64
-
可以傳入字典生成
Series
物件sdata = {'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000} obj3 = pd.Series(sdata) obj3 obj3 Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64 Texas states = ['California','Ohio','Oregon','Texas'] obj4 = pd.Series(sdata, index=states) 4 obj4 California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64
-
pandas
中使用isnull
和notnull
函式來檢查缺失資料pd.isnull(obj4) California True Ohio False Oregon False Texas False dtype: bool pd.notnull(obj4) California False Ohio True Oregon True Texas True dtype: bool obj4.isnull() California True Ohio False Oregon False Texas False dtype: bool
-
自動對齊索引
obj4 California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 dtype: float64 obj3 Ohio 35000 Texas 71000 Oregon 16000 Utah 5000 dtype: int64 obj3+obj4 California NaN Ohio 70000.0 Oregon 32000.0 Texas 142000.0 Utah NaN dtype: float64
-
Series
物件自身和其索引都有name
屬性obj4.name = 'population' obj4.index.name = 'state' obj4 state California NaN Ohio 35000.0 Oregon 16000.0 Texas 71000.0 Name: population, dtype: float64