Pandas系列教程（9）Pandas字串處理

阿新 • • 發佈：2020-10-21

Pandas字串處理

前面我們已經使用了字串處理函式：

　　df['bWendu'].try.replace('℃', '').astype('int32')

Pandas的字串處理：

1、使用方法：先獲取Series的str屬性，然後在屬性上呼叫函式；

2、只能在字串列上使用，不能在數字列上使用；

3、DataFrame上沒有str屬性和處理方法；

4、Series.str並不是Python原生字串，而是自己的一套方法，不過大部分和原生str很相似

Series.str字串方法列表參考文件

　　https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handing

本節演示內容：

1、獲取Series的str屬性，然後使用各種字串處理函式

2、使用str的startswith,contains等bool類Series可以做條件查詢

3、需要多次str處理的鏈式操作

4、使用正則表示式處理

1、讀取北京2018年天氣資料

import pandas as pd

file_path = "../../datas/files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
print(df.head())
print(df.dtypes)

2、獲取Series的str屬性，使用各種字串處理函式

import pandas as pd

file_path = "../../datas/files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
print('*' * 25, '列印前幾行資料', '*' * 25)
print(df.head())
print('*' * 25, '列印每列資料型別', '*' * 25)
print(df.dtypes)

print('*' * 25, '獲取Series的str屬性', '*' * 25)
print(df['bWendu'].str)

print('*' * 25, '字串替換函式 
', '*' * 25)
df['bWendu'].str.replace('℃', '')

print('*' * 25, '判斷是不是數字', '*' * 25)
print(df['bWendu'].str.isnumeric())

print('*' * 25, '判斷是不是數字', '*' * 25)
print(df['aqi'].str.len())

3、使用str的startswith，contains等得到bool的Series可以做條件查詢

import pandas as pd

file_path = "../../datas/files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

print('*' * 25, '列印前幾行資料', '*' * 25)
print(df.head())
print('*' * 25, '列印每列資料型別', '*' * 25)
print(df.dtypes)

condition = df['ymd'].str.startswith('2018-03')
print(condition)
print(df[condition].head())

4、需要多次str處理的鏈式操作

怎樣提取201803這樣的數字月份？

1、先將日期2018-03-31替換成20180331的形式

2、提取月份字串201803

import pandas as pd

file_path = "../../datas/files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

print('*' * 25, '列印前幾行資料', '*' * 25)
print(df.head())
print('*' * 25, '列印每列資料型別', '*' * 25)
print(df.dtypes)

# 先將日期2018-03-31替換成20180331的形式
print('*' * 50)
print(df['ymd'].str.replace('-', ''))

# 每次呼叫函式，都返回一個新的Series
# df['ymd'].str.replace('-', '').slice(0, 6)    # 錯誤寫法
print('*' * 50)
print(df['ymd'].str.replace('-', '').str.slice(0, 6))

# slice就是切片語法，可以直接使用
print('*' * 50)
print(df['ymd'].str.replace('-', '').str[0:6])

5、使用正則表示式處理

import pandas as pd

file_path = "../../datas/files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

print('*' * 25, '列印前幾行資料', '*' * 25)
print(df.head())
print('*' * 25, '列印每列資料型別', '*' * 25)
print(df.dtypes)


# 新增新列
print('*' * 25, '新增新列', '*' * 25)
def get_nianyyueri(x):
    year, month, day = x['ymd'].split('-')
    return f'{year}年{month}月{day}日'

df['中文日期'] = df.apply(get_nianyyueri, axis=1)
print(df['中文日期'])

# 問題：怎樣將"2018年12月31日"中的年、月、日三個中文字元去掉
# 方法1：鏈式replace
# print(df['中文日期'].str.replace('年', '').str.replace('月', '').str.replace('日', ''))

# 方法2：正則表示式替換(推薦使用)
# Series.str預設就開啟了正則表示式模式
print('*' * 25, '正則表示式替換', '*' * 25)
print(df['中文日期'].str.replace('[年月日]', ''))

Pandas系列教程（9）Pandas字串處理

Pandas字串處理前面我們已經使用了字串處理函式：　　df[\'bWendu\'].try.replace(\'℃\', \'\').astype(\'int32\')

Pandas系列教程（1）Pandas資料讀取

1. 下載安裝pandas pip install pandas pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple

Pandas系列教程（2）Pandas資料結構

Pandas資料結構 DataFrame: 二維陣列，整個表格，多行多列 Series: 一維資料，一行或一列

Pandas系列教程（4）Pandas新增資料列

Pandas新增資料列在進行資料分析時，經常需要按照一定的條件建立新的資料列，然後進行進一步分析

Pandas系列教程（3）Pandas資料查詢

Pandas資料查詢 pandas 查詢資料的幾種方法 df.loc方法，根據行，列的標籤值查詢 df.iloc方法，根據行，列的數字位置查詢

Pandas系列教程（5）Pandas資料統計函式

Pandas資料統計函式 1、讀取csv資料 import pandas as pd file_path = \"../../datas/files/beijing_tianqi_2018.csv\"

Pandas系列教程（6）Pandas缺失值處理

Pandas缺失值處理 Pandas使用這些函式處理缺失值： isnull和notnull: 檢測是否是空值，可用於df和Series

Pandas系列教程（7）Pandas的SettingWithCopyWarning

Pandas的SettingWithCopyWarning 1、讀取資料 import pandas as pd file_path = \"../datas/files/beijing_tianqi_2018.csv\"

Pandas系列教程（10）Pandas的axis引數

Pandas的axis引數 1、axis = 0 或者 axis = \'index\' 如果是單行操作，就是指某一行如果是聚合操作，指的就是跨行corss rows

Pandas系列教程（8）pandas資料排序

pandas資料排序 1. Series的排序： Series.sort_values(ascending=True, inplace=Flase) 引數說明：

Pandas系列教程（11）Pandas的索引index

Pandas的索引index 把資料儲存於普通的column列也能用於資料查詢，那使用index有什麼好處？

Angular入門到精通系列教程（9）- 元件的生命週期(Component Lifecycle Hook)

1. 摘要 2. 生命週期及順序 3. 響應生命週期事件 4. 主要生命週期事件 4.1. 初始化事件 ngOnInit()

WINFORM許可權系統開發系列教程（八）角色管理模組

實現過程 1 角色列表頁和資訊頁面佈局 2 功能實現分析載入所有角色列表新增 --角色資訊頁面許可權分配--入口--角色選單設定頁面

Java NIO系列教程（六） Selector

Selector（選擇器）是Java NIO中能夠檢測一到多個NIO通道，並能夠知曉通道是否為諸如讀寫事件做好準備的元件。這樣，一個單獨的執行緒可以管理多個channel，從而管理多個網路連線。

Java NIO系列教程（十）client和server 示例

//客戶但package com.example.demo.nio;import java.io.IOException;import java.net.InetSocketAddress;import java.nio.ByteBuffer;import java.nio.channels.SelectionKey;import java.nio.channels.Selector;imp

Selenium系列教程（十）BasePage 封裝

之前寫的程式碼中都沒有加入異常處理，規範寫法，應該在每次查詢元素或操作時加上異常處理、日誌資訊、失敗截圖等，如下：

Java NIO系列教程（一） Java NIO 概述

>>> Java NIO 由以下幾個核心部分組成： Channels Buffers Selectors 雖然Java NIO 中除此之外還有很多類和元件，但在我看來，Channel，Buffer 和 Selector 構成了核心的API。其它元件，如Pip

J20航模遙控器開源專案系列教程（七）PPM輸出 | 關於按鍵版本和旋轉編碼器版本的相容說明、佈局建議 | 關於MINI版PCB的相容說明

我們的開源宗旨：自由協調開放合作共享擁抱開源，豐富國內開源生態，開展多人運動，歡迎加入我們哈~

Angular入門到精通系列教程（4）- 開發環境搭建以及入手專案

1. 本地開發環境搭建 1.1. node.js 1.2. Angular CLI 2. 開發工具 - Visual Studio Code 第一個Anuglar專案

Angular入門到精通系列教程（6）- Angular的升級

1. 摘要 2. https://update.angular.io/ 3. 總結環境: Angular CLI: 11.0.6 Angular: 11.0.7 Node: 12.18.3

Pandas系列教程（9）Pandas字串處理

Pandas字串處理

1、讀取北京2018年天氣資料

2、獲取Series的str屬性，使用各種字串處理函式

3、使用str的startswith，contains等得到bool的Series可以做條件查詢

4、需要多次str處理的鏈式操作

5、使用正則表示式處理

相關推薦