Pandas的Apply函式具體使用

阿新 • • 發佈：2020-07-22

Pandas最好用的函式

Pandas是Python語言中非常好用的一種資料結構包，包含了許多有用的資料操作方法。而且很多演算法相關的庫函式的輸入資料結構都要求是pandas資料，或者有該資料的介面。

仔細看pandas的API說明文件，就會發現有好多有用的函式，比如非常常用的檔案的讀寫函式就包括如下函式：

Format Type Data Description Reader Writer

text CSV read_csv to_csv

text JSON read_json to_json

text HTML read_html to_html

text Local clipboard read_clipboard to_clipboard

binary MS Excel read_excel to_excel

binary HDF5 Format read_hdf to_hdf

binary Feather Format read_feather to_feather

binary Parquet Format read_parquet to_parquet

binary Msgpack read_msgpack to_msgpack

binary Stata read_stata to_stata

binary SAS read_sas

binary Python Pickle Format read_pickle to_pickle

SQL SQL read_sql to_sql

SQL Google Big Query read_gbq to_gbq

Format Type	Data Description	Reader	Writer
text	CSV	read_csv	to_csv
text	JSON	read_json	to_json
text	HTML	read_html	to_html
text	Local clipboard	read_clipboard	to_clipboard
binary	MS Excel	read_excel	to_excel
binary	HDF5 Format	read_hdf	to_hdf
binary	Feather Format	read_feather	to_feather
binary	Parquet Format	read_parquet	to_parquet
binary	Msgpack	read_msgpack	to_msgpack
binary	Stata	read_stata	to_stata
binary	SAS	read_sas
binary	Python Pickle Format	read_pickle	to_pickle
SQL	SQL	read_sql	to_sql
SQL	Google Big Query	read_gbq	to_gbq

讀取資料後，對於資料處理來說，有好多有用的相關操作的函式，但是我認為其中最好用的函式是下面這個函式：

apply函式

apply函式是`pandas`裡面所有函式中自由度最高的函式。該函式如下：

DataFrame.apply(func,axis=0,broadcast=False,raw=False,reduce=None,args=(),**kwds)

該函式最有用的是第一個引數，這個引數是函式，相當於C/C++的函式指標。

這個函式需要自己實現，函式的傳入引數根據axis來定，比如axis = 1，就會把一行資料作為Series的資料結構傳入給自己實現的函式中，我們在函式中實現對Series不同屬性之間的計算，返回一個結果，則apply函式會自動遍歷每一行DataFrame的資料，最後將所有結果組合成一個Series資料結構並返回。

比如讀取一個表格：

Pandas的Apply函式具體使用

假如我們想要得到表格中的PublishedTime和ReceivedTime屬性之間的時間差資料，就可以使用下面的函式來實現：

import pandas as pd
import datetime  #用來計算日期差的包

def dataInterval(data1,data2):
  d1 = datetime.datetime.strptime(data1,'%Y-%m-%d')
  d2 = datetime.datetime.strptime(data2,'%Y-%m-%d')
  delta = d1 - d2
  return delta.days

def getInterval(arrLike): #用來計算日期間隔天數的呼叫的函式
  PublishedTime = arrLike['PublishedTime']
  ReceivedTime = arrLike['ReceivedTime']
#  print(PublishedTime.strip(),ReceivedTime.strip())
  days = dataInterval(PublishedTime.strip(),ReceivedTime.strip()) #注意去掉兩端空白
  return days

if __name__ == '__main__':  
  fileName = "NS_new.xls";
  df = pd.read_excel(fileName) 
  df['TimeInterval'] = df.apply(getInterval,axis = 1)

有時候，我們想給自己實現的函式傳遞引數，就可以用的apply函式的*args和**kwds引數，比如同樣的時間差函式，我希望自己傳遞時間差的標籤，這樣沒次標籤更改就不用修改自己實現的函數了，實現程式碼如下：

import pandas as pd
import datetime  #用來計算日期差的包

def dataInterval(data1,'%Y-%m-%d')
  delta = d1 - d2
  return delta.days

def getInterval_new(arrLike,before,after): #用來計算日期間隔天數的呼叫的函式
  before = arrLike[before]
  after = arrLike[after]
#  print(PublishedTime.strip(),ReceivedTime.strip())
  days = dataInterval(after.strip(),before.strip()) #注意去掉兩端空白
  return days


if __name__ == '__main__':  
  fileName = "NS_new.xls";
  df = pd.read_excel(fileName) 
  df['TimeInterval'] = df.apply(getInterval_new,axis = 1,args = ('ReceivedTime','PublishedTime'))  #呼叫方式一
  #下面的呼叫方式等價於上面的呼叫方式
  df['TimeInterval'] = df.apply(getInterval_new,**{'before':'ReceivedTime','after':'PublishedTime'}) #呼叫方式二
  #下面的呼叫方式等價於上面的呼叫方式
  df['TimeInterval'] = df.apply(getInterval_new,before='ReceivedTime',after='PublishedTime') #呼叫方式三

修改後的getInterval_new函式多了兩個引數，這樣我們在使用apply函式的時候要自己傳遞引數，程式碼中顯示的三種傳遞方式都行。

最後，本篇的全部程式碼在下面這個網頁可以下載：

https://github.com/Dongzhixiao/Python_Exercise/tree/master/pandas_apply

到此這篇關於Pandas的Apply函式具體使用的文章就介紹到這了,更多相關Pandas Apply函式內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

Pandas的Apply函式具體使用

pytorch之Resize()函式具體使用詳解

Pandas的Apply函式具體使用

PIL包中Image模組的convert()函式的具體使用

小程式中的箭頭函式的具體使用

SQL開窗函式的具體實現詳解

python pandas dataframe 去重函式的具體使用

c# - 實體類和有參無參建構函式的具體寫法

單鏈表具體主要自編函式

mongodb的update函式更新資料,更新文件中的某個具體欄位的資料

Oracle單行函式和多行函式的使用，含具體使用原始碼哦~~

lambda 表示式（匿名函式）的具體應用和使用場景

WWDC 20 前你應該知道的 Swift 新特性(2)：KeyPath 用作函式

iOS-Swift初級知識-回撥函式中in的意思

PHP 原始碼 — intval 函式原始碼分析（演演算法：字串轉換為整形）

c++從入門到放棄（五）函式基礎

Java Stream函式語言程式設計第三篇：管道流結果處理

函式語言程式設計+ Kubernetes ，部署視訊流錄製伺服器

iOS中的函式響應式程式設計思想

一日一技：在 Python 中實現函式過載

Java代理設計模式(Proxy)的幾種具體實現

Pandas的Apply函式具體使用

相關推薦