Python標準庫筆記(2) — re模組

阿新 • • 發佈：2022-05-04

目錄[-]

re模組提供了一系列功能強大的正則表示式(regular expression)工具，它們允許你快速檢查給定字串是否與給定的模式匹配(match函式), 或者包含這個模式(search函式)。正則表示式是以緊湊(也很神祕)的語法寫出的字串模式。

1. 常用方法

常用方法	描述
match(pattern, string, flags=0)	如果字串string的開頭和正則表示式pattern匹配返回相應的MatchObject的例項，否則返回None
search(pattern, string, flags=0)	掃描string，如果有個位置可以匹配正則表示式pattern，就返回一個MatchObject的例項，否則返回None
sub(pattern, repl, string, count=0, flags=0)	將string裡匹配pattern的部分，用repl替換掉，最多替換count次
subn(pattern, repl, string, count=0, flags=0)	和sub類似，subn返回的是一個替換後的字串和匹配次陣列成的元組
split(pattern, string, maxsplit=0, flags=0)	用pattern匹配到的字串來分割string
findall(pattern, string, flags=0)	以列表的形式返回string裡匹配pattern的字串
compile(pattern, flags=0)compile(pattern, flags=0)	把一個正則表示式pattern編譯成正則物件，以便可以用正則物件的match和search方法
purge()	Clear the regular expression cache
escape(string)	把string中除了字母和數字以外的字元，都加上反斜杆

2. 特殊匹配符

語法	說明
.	匹配除了換行符外的任何字元
^	頭匹配
$	尾匹配
*	匹配前一個字元0次或多次
+	匹配前一個字元1次或多次
?	匹配前一個字元0次或一次
{m,n}	匹配前一個字元m至n次
	對任一特殊字元進行轉義
[]	用來表示一個字元集合

3. 模組方法

re.match(pattern, string, flags=0)

從字串的開始匹配，如果pattern匹配到就返回一個Match物件例項(Match物件在後面描述)，否則放回None。flags為匹配模式(會在下面描述)，用於控制正則表示式的匹配方式。

import re

a = 'abcdefg'
print re.match(r'abc', a)  # 匹配成功
print re.match(r'abc', a).group()
print re.match(r'cde', a)  # 匹配失敗

>>><_sre.SRE_Match object at 0x0000000001D94578>
>>>abc
>>>None

search(pattern, string, flags=0)

用於查詢字串中可以匹配成功的子串，如果找到就返回一個Match物件例項,否則返回None。

import re

a = 'abcdefg'
print re.search(r'bc', a)
print re.search(r'bc', a).group()
print re.search(r'123', a)

>>><_sre.SRE_Match object at 0x0000000001D94578>
>>>bc
>>>None

sub(pattern, repl, string, count=0, flags=0)

替換，將string裡匹配pattern的部分，用repl替換掉，最多替換count次（剩餘的匹配將不做處理），然後返回替換後的字串。

import re

a = 'a1b2c3'
print re.sub(r'd+', '0', a)  # 將數字替換成'0'
print re.sub(r's+', '0', a)  # 將空白字元替換成'0'

>>>a0b0c0
>>>a1b2c3

subn(pattern, repl, string, count=0, flags=0)

跟sub()函式一樣，只是它返回的是一個元組，包含新字串和匹配到的次數

import re

a = 'a1b2c3'
print re.subn(r'd+', '0', a)  # 將數字替換成'0'

>>>('a0b0c0', 3)

split(pattern, string, maxsplit=0, flags=0)

正則版的split(),用匹配pattern的子串來分割string，如果pattern裡使用了圓括號，那麼被pattern匹配到的串也將作為返回值列表的一部分,maxsplit為最多被分割的字串。

import re

a = 'a1b1c'
print re.split(r'd', a)
print re.split(r'(d)', a)

>>>['a', 'b', 'c']
>>>['a', '1', 'b', '1', 'c']

findall(pattern, string, flags=0)

以列表的形式返回string裡匹配pattern的不重疊的子串。

import re

a = 'a1b2c3d4'
print re.findall('d', a)

>>>['1', '2', '3', '4']

4. Match物件

re.match()、re.search()成功匹配的話都會返回一個Match物件，它包含了很多此次匹配的資訊，可以使用Match提供的屬性或方法來獲取這些資訊。例如：

>>>import re

>>>str = 'he has 2 books and 1 pen'
>>>ob = re.search('(d+)', str)

>>>print ob.string  # 匹配時使用的文字
he has 2 books and 1 pen

>>>print ob.re # 匹配時使用的Pattern物件
re.compile(r'(d+)')

>>>print ob.group()  # 獲得一個或多個分組截獲的字串
2

>>>print ob.groups()  # 以元組形式返回全部分組截獲的字串
('2',)

5.Pattern物件

Pattern物件物件由re.compile()返回，它帶有許多re模組的同名方法，而且方法作用類似一樣的。例如:

>>>import re
>>>pa = re.compile('(d+)')

>>>print pa.split('he has 2 books and 1 pen')
['he has ', '2', ' books and ', '1', ' pen']

>>>print pa.findall('he has 2 books and 1 pen')
['2', '1']

>>>print pa.sub('much', 'he has 2 books and 1 pen')
he has much books and much pen

6.匹配模式

匹配模式取值可以使用按位或運算子'|'表示同時生效，比如re.I | re.M, 下面是常見的一些flag。

re.I(re.IGNORECASE): 忽略大小寫

>>>pa = re.compile('abc', re.I)
>>>pa.findall('AbCdEfG')
>>>['AbC']

re.L(re.LOCALE)：字符集本地化

這個功能是為了支援多語言版本的字符集使用環境的，比如在轉義符w，在英文環境下，它代表[a-zA-Z0-9]，即所以英文字元和數字。如果在一個法語環境下使用，有些法語字串便匹配不上。加上這L選項和就可以匹配了。不過這個對於中文環境似乎沒有什麼用，它仍然不能匹配中文字元。

re.M(re.MULTILINE): 多行模式，改變'^'和'$'的行為

>>>pa = re.compile('^d+')
>>>pa.findall('123 456n789 012n345 678')
>>>['123']

>>>pa_m = re.compile('^d+', re.M)
>>>pa_m.findall('123 456n789 012n345 678')
>>>['123', '789', '345']

re.S(re.DOTALL): 點任意匹配模式，改變'.'的行為

.號將匹配所有的字元。預設情況下.匹配除換行符n外的所有字元，使用這一選項以後，點號就能匹配包括換行符的任何字元。

re.U(re.UNICODE): 根據Unicode字符集解析字元
re.X(re.VERBOSE): 詳細模式

# 這個模式下正則表示式可以是多行，忽略空白字元，並可以加入註釋。以下兩個正則表示式是等價的
a = re.compile(r"""d +  # the integral part
                   .    # the decimal point
                   d *  # some fractional digits""", re.X)
b = re.compile(r"d+.d*")
# 但是在這個模式下，如果你想匹配一個空格，你必須用'/ '的形式（'/'後面跟一個空格）

Python標準庫筆記(2) — re模組

1. 常用方法

2. 特殊匹配符

3. 模組方法

re.match(pattern, string, flags=0)

search(pattern, string, flags=0)

sub(pattern, repl, string, count=0, flags=0)

subn(pattern, repl, string, count=0, flags=0)

split(pattern, string, maxsplit=0, flags=0)

findall(pattern, string, flags=0)

4. Match物件

5.Pattern物件

6.匹配模式

Python標準庫筆記(2) — re模組

Python標準庫筆記(3) — datetime模組

Python標準庫筆記(1) — string模組

Python標準庫筆記(5) — sched模組

Python標準庫筆記(4) — collections模組

Python標準庫筆記(6) — struct模組

python標準庫OS模組函式列表與例項全解

python標準庫OS模組詳解

Python標準庫json模組和pickle模組使用詳解

Python標準庫shutil模組使用方法解析

19 Python 標準庫 datetime 模組

18 Python 標準庫之 Json 模組

17 Python 標準庫之 random 模組

16 Python 標準庫之 math 模組

15 Python 標準庫之 sys 模組

14 Python 標準庫之 os 模組

Python標準庫-Json模組（轉載）

python爬蟲常用模組及一些python標準庫

Python標準庫 - ftplib模組

Python 標準庫 string模組

Python標準庫筆記(2) — re模組

1. 常用方法

2. 特殊匹配符

3. 模組方法

re.match(pattern, string, flags=0)

search(pattern, string, flags=0)

sub(pattern, repl, string, count=0, flags=0)

subn(pattern, repl, string, count=0, flags=0)

split(pattern, string, maxsplit=0, flags=0)

findall(pattern, string, flags=0)

4. Match物件

5.Pattern物件

6.匹配模式

相關推薦