1. 程式人生 > 實用技巧 >Pandas之Series的使用

Pandas之Series的使用

熟悉pandas的兩個工具資料結構:SeriesDataFrame

Series

Series是一種一維的陣列物件,它包含了一個值序列(與Numpy中的型別相似),並且包含了資料標籤,稱為索引(index)

建立一個Series物件

import pandas as pd

obj = pd.Series([4,7,-5,3])
obj
0    4
1    7
2   -5
3    3
dtype: int64
  • 左邊的是索引,右邊是值。預設索引從0N-1N是資料的長度)

  • 可以通過values屬性和index屬性獲得值和索引

    obj.values
    array([ 4,  7, -5,  3], dtype=int64)
    obj.index
    RangeIndex(start=0, stop=4, step=1)
    
  • 可以建立一個索引序列,用標籤標識每個資料點

    obj2 = pd.Series([4,7,-5,3],index=['d','b','a','c'])
    
    obj2
    d    4
    b    7
    a   -5
    c    3
    dtype: int64
    
    obj2.index
    Index(['d', 'b', 'a', 'c'], dtype='object')
    
    obj2['a']
    -5
    
  • 使用布林值陣列進行過濾

    obj2[obj2 > 2]
    d    4
    b    7
    c    3
    dtype: int64
    
    np.exp(obj2)
    d      54.598150
    b    1096.633158
    a       0.006738
    c      20.085537
    dtype: float64
    
  • 可以傳入字典生成Series物件

    sdata = {'Ohio':35000,'Texas':71000,'Oregon':16000,'Utah':5000}
    obj3 = pd.Series(sdata)
    obj3
    obj3
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    Texas
    states = ['California','Ohio','Oregon','Texas']
    obj4 = pd.Series(sdata, index=states)
    4
    obj4
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    
  • pandas中使用isnullnotnull函式來檢查缺失資料

    pd.isnull(obj4)
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    
    pd.notnull(obj4)
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool
    
    
    obj4.isnull()
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    
  • 自動對齊索引

    obj4
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
        
    obj3
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    
    obj3+obj4
    California         NaN
    Ohio           70000.0
    Oregon         32000.0
    Texas         142000.0
    Utah               NaN
    dtype: float64
    
  • Series物件自身和其索引都有name屬性

    obj4.name = 'population'
    obj4.index.name = 'state'
    obj4
    state
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    Name: population, dtype: float64