1. 程式人生 > 其它 >Pandas庫Series基本操作

Pandas庫Series基本操作

技術標籤:pythonpython列表資料分析

Series的概念

之前講述的一直是DataFrame結構,也是Pandas中最核心的結構
我們把dataFrame進行分解,其中的一行,或者一列,就是一個Series結構。

  • Series:collection of values
  • DataFrame: collection of Series objects
import pandas as pd
fandango=pd.read_csv("fandango_score_comparison.csv")

#提取一個列,一個列就是一個series
series_film=
fandango["FILM"] #獲取這個列的type,可以得到型別為<class 'pandas.core.series.Series'>,即一個Series print(type(series_film)) #通過索引和切片的值得到資料 print(series_film[0:5]) series_rt=fandango["RottenTomatoes"] print(series_rt[0:5]) fandango.head()

Series內部結構以及Series物件的生成

內部:ndarray物件構成

Import the Series object from pandas

通過下面的Series結構的.values()方法,我們可以得到一個ndarray物件

即DataFrame內部是由Series組成,Series內部則是由一個個ndarray物件構成

  • 注意的是得到的ndarray物件與之前的DataFrame獲取元素的方式是不同的:
    • DataFrame需要使用.loc()函式
    • ndarray物件則不需要,當成普通列表處理即可

Pandas其實很多的物件是封裝在NumPy之上的,很多函式是把NumPy兩個庫很多操作都是互通的

Series物件的生成Series()

可以通過Series()方法實現,函式引數為兩個ndarray陣列,其中一組作為values,另一組作為這組資料相關的一組索引

由nddarray和Series的關係可知,引數也可以是兩個列的Series

通過索引獲取value:

即如何獲取Series中的元素:

  • Series_Name[Index]即可 這種方式返回的是一個value值,如下例:<class ‘numpy.int64’>
  • Series_Name[[Index1,Index2]] 這種方式獲得key和value,如下例: <class ‘pandas.core.series.Series’>
from pandas import Series
#調取一個Series的.values方法,返回一個ndarray
film_names=series_film.values
print(type(film_names))
#輸出結果為:<class 'numpy.ndarray'>
#注意這裡的元素獲取和DataFrame對元素的獲取是不同的,DataFrame對於元素的獲取要使用.loc[]函式
#而這裡直接切片即可
print(film_names[0:10])

rt_scores=series_rt.values
print(rt_scores[0:10])

series_custom=Series(rt_scores,index=film_names)
#特別的是一般地索引只能是1,2,3等數字,但Series是可以用字串作為索引的
#將兩個ndarray物件傳入,類似於key-value的結構
print(type(series_custom[["Minions (2015)","Leviathan (2014)"]]))
print(type(series_custom["Cinderella (2015)"]))
#這一步驟類似於字典中取出value值的操作,傳入一個key,得到一個value,只不過這裡是傳入了兩個,其實也可以傳入一個
<class 'numpy.ndarray'>
['Avengers: Age of Ultron (2015)' 'Cinderella (2015)' 'Ant-Man (2015)'
 'Do You Believe? (2015)' 'Hot Tub Time Machine 2 (2015)'
 'The Water Diviner (2015)' 'Irrational Man (2015)' 'Top Five (2014)'
 'Shaun the Sheep Movie (2015)' 'Love & Mercy (2015)']
[74 85 80 18 14 63 42 86 99 89]
<class 'pandas.core.series.Series'>
<class 'numpy.int64'>
series_custom=Series(rt_scores,index=film_names)
#索引的多樣性:
#一:字串作為索引
print(series_custom[["Minions (2015)","Leviathan (2014)"]])
#二:數字下標作為索引
series_custom[5:10]
Minions (2015)      54
Leviathan (2014)    99
dtype: int64





The Water Diviner (2015)        63
Irrational Man (2015)           42
Top Five (2014)                 86
Shaun the Sheep Movie (2015)    99
Love & Mercy (2015)             89
dtype: int64

Series的排序操作

Series的排序使用的不多,使用Sorted()方法。

排序的方法直接使用sorted()函式,可以類比對DataFrame的排序操作,類比DataFrame的*sorted_values()*方法;

使用reindex()方法可以將Series按照重新排序過得index進行排序,類比DataFrame的*reset_index()*方法;

按照index排序還是按照value值進行排序可以分別呼叫:

  • sort_index()方法
  • sort_values()方法
origin_index=series_custom.index.tolist()
#print(origin_index)  即那些字串
sorted_index=sorted(origin_index)
#print(sorted_index)  將字串升序排列
sorted_by_index=series_custom.reindex(sorted_index)
print(sorted_by_index)
'71 (2015)                          97
5 Flights Up (2015)                 52
A Little Chaos (2015)               40
A Most Violent Year (2014)          90
About Elly (2015)                   97
                                    ..
What We Do in the Shadows (2015)    96
When Marnie Was There (2015)        89
While We're Young (2015)            83
Wild Tales (2014)                   96
Woman in Gold (2015)                52
Length: 146, dtype: int64
#按照index的排序
sc2=series_custom.sort_index()
#按照values的排序
sc3=series_custom.sort_values()
print(sc3[0:10])
Paul Blart: Mall Cop 2 (2015)     5
Hitman: Agent 47 (2015)           7
Hot Pursuit (2015)                8
Fantastic Four (2015)             9
Taken 3 (2015)                    9
The Boy Next Door (2015)         10
The Loft (2015)                  11
Unfinished Business (2015)       11
Mortdecai (2015)                 12
Seventh Son (2015)               12
dtype: int64

Series中的每個值可以當做一個ndarray對待,匯入numpy後可以使用庫的中的函式對values進行操作

# The values in a Series object are treated as a ndarray, the core data type in NumPy
import numpy as np
#add each value with each other
print(np.add(series_custom,series_custom))
#apply sine function to each value
np.sin(series_custom)
#Return the highest value (will return a single value but not a series)
np.max(series_custom)
Avengers: Age of Ultron (2015)               148
Cinderella (2015)                            170
Ant-Man (2015)                               160
Do You Believe? (2015)                        36
Hot Tub Time Machine 2 (2015)                 28
                                            ... 
Mr. Holmes (2015)                            174
'71 (2015)                                   194
Two Days, One Night (2014)                   194
Gett: The Trial of Viviane Amsalem (2015)    200
Kumiko, The Treasure Hunter (2015)           174
Length: 146, dtype: int64





100

使用True和False列表作為index值

#使用True和False列表作為index值
series_greater_than_50=series_custom[series_custom>50]
print(series_custom)

criteria_one=series_custom>50
criteria_two=series_custom<75
both_criteria=series_custom[criteria_one&criteria_two]
print(both_criteria)
Avengers: Age of Ultron (2015)                74
Cinderella (2015)                             85
Ant-Man (2015)                                80
Do You Believe? (2015)                        18
Hot Tub Time Machine 2 (2015)                 14
                                            ... 
Mr. Holmes (2015)                             87
'71 (2015)                                    97
Two Days, One Night (2014)                    97
Gett: The Trial of Viviane Amsalem (2015)    100
Kumiko, The Treasure Hunter (2015)            87
Length: 146, dtype: int64
Avengers: Age of Ultron (2015)                                            74
The Water Diviner (2015)                                                  63
Unbroken (2014)                                                           51
Southpaw (2015)                                                           59
Insidious: Chapter 3 (2015)                                               59
The Man From U.N.C.L.E. (2015)                                            68
Run All Night (2015)                                                      60
5 Flights Up (2015)                                                       52
Welcome to Me (2015)                                                      71
Saint Laurent (2015)                                                      51
Maps to the Stars (2015)                                                  60
Pitch Perfect 2 (2015)                                                    67
The Age of Adaline (2015)                                                 54
The DUFF (2015)                                                           71
Ricki and the Flash (2015)                                                64
Unfriended (2015)                                                         60
American Sniper (2015)                                                    72
The Hobbit: The Battle of the Five Armies (2014)                          61
Paper Towns (2015)                                                        55
Big Eyes (2014)                                                           72
Maggie (2015)                                                             54
Focus (2015)                                                              57
The Second Best Exotic Marigold Hotel (2015)                              62
The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015)    67
Escobar: Paradise Lost (2015)                                             52
Into the Woods (2014)                                                     71
Inherent Vice (2014)                                                      73
Magic Mike XXL (2015)                                                     62
Woman in Gold (2015)                                                      52
The Last Five Years (2015)                                                60
Jurassic World (2015)                                                     71
Minions (2015)                                                            54
Spare Parts (2015)                                                        52
dtype: int64

相同Index值的資料可以進行加減運算

#首先生成兩個Index相同的Series
rt_critics=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'])
rt_users=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'].values)
rt_mean=(rt_critics+rt_users)/2
print(rt_mean)
FILM
Avengers: Age of Ultron (2015)                74.0
Cinderella (2015)                             85.0
Ant-Man (2015)                                80.0
Do You Believe? (2015)                        18.0
Hot Tub Time Machine 2 (2015)                 14.0
                                             ...  
Mr. Holmes (2015)                             87.0
'71 (2015)                                    97.0
Two Days, One Night (2014)                    97.0
Gett: The Trial of Viviane Amsalem (2015)    100.0
Kumiko, The Treasure Hunter (2015)            87.0
Length: 146, dtype: float64