Python3正則表示式(二)re模組

阿新 • • 發佈：2019-01-16

在Python3正則表示式(一)基本語法規則已經記錄了正則表示式的基本規則，接下來將寫一下在python當中如何利用正則表示式去匹配字串，即re模組中功能函式的使用。
使用時要先進行匯入re模組：import re

一、re模組中常用的函式

1.compile()

原始碼描述：

def compile(pattern, flags=0):
    "Compile a regular expression pattern, returning a pattern object."
    # 生成一個正則表示式模式，返回一個Regex物件
    return _compile(pattern, flags)

引數說明：

pattern: 正則表示式
flags: 用於修改正則表示式的匹配方式，就是我們在基本語法規則中說到的(iLmsux)六種模式，預設正常模式

示例程式碼：

pattern = re.compile(r"\d")
result = pattern.match("123")
print(result.group())
# 輸出結果為1 因為這裡只有一個\d 所以只匹配到一個數字

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.match("AbcD")
print(result.group())
# 輸出結果為AbcD 證明可以同時使用多個模式

2.match()

原始碼描述：

1. def match(pattern, string, flags=0):
    """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found."""
    # 在字串的開頭匹配pattern，返回Match匹配物件，如果沒有不到匹配的物件，返回None。
    return _compile(pattern, flags).match(string)

2. def match(self, string, pos=0 
, endpos=-1):
    """Matches zero | more characters at the beginning of the string."""
    pass
    # 可以指定匹配的字串起始位置

引數說明：

# 其他兩個引數與compile()當中的意義一致
string: 需要驗證的字串
pos: 設定開始位置，預設0
endpos: 設定結束位置，預設-1

示例程式碼：

result = re.match(r"a+\d", "aA123", re.I)
print(result.group())
# 輸出結果為aA1 只要pattern匹配完了，則視為成功，並將匹配成功的字串返回

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.match("0AbcD5", 1, 5)
print(result.group())
# 輸出結果為AbcD 從第1個位置開始，到第5個位置之前的字串

3.search()

原始碼描述：

1. def search(pattern, string, flags=0):
    """Scan through string looking for a match to the pattern, returning a match object, or None if no match was found."""
    # 大致意思與match方法相同，不同的地方在於search時整個字串任意位置匹配，而match時從特定的位置(pos)開始向後僅匹配一次
    return _compile(pattern, flags).search(string)

2. def search(self, string, pos=0, endpos=-1):
    """Scan through string looking for a match, and return a corresponding match instance. Return None if no position in the string matches."""
    pass
    # 可以指定字串的子串進行匹配

引數說明：

# 與match中的一致

示例程式碼：

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.search("0A2aBcd7")
print(result.group())
# 輸出結果為aBcd 在字串中任意位置只要匹配到就返回結果

pattern = re.compile(r"abc d", re.I|re.X)
matchResult = pattern.match("0AbcD5")
searchResult = pattern.search("0AbcD5")
# matchResult的結果是None
# searchResult.group()的結果結果為AbcD 
# 因為在pattern中第一個位置是a，但是在字串中第一個位置是0，所以match方法在這裡匹配失敗

4.group()，groups()和groupdict()

原始碼描述：

1.def group(self, *args):
   """Return one or more subgroups of the match."""
   # 返回成功匹配的一個或多個子組
   pass

2.def groups(self, default=None):
   """Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern."""
   # 以元組的格式返回所有分組匹配到的字元
   pass

3.def groupdict(self, default=None):
   """Return a dictionary containing all the named subgroups of the match,keyed by the subgroup name."""
   # 以字典的格式返回所有分組匹配到的字元
   pass

引數說明：

group中的*args: 如果引數為一個，就返回一個子串；如果引數有多個，就返回多個子串的元組。如果不傳任何引數，和傳入0一樣，將返回整個匹配子串。
groups中的default: 用於給那些沒有匹配到的分組做預設值，它的預設值是None
groupdict中的default: 用於給那些沒有匹配到的分組做預設值，它的預設值是None

示例程式碼：

pattern = re.compile(r"([\w]+) ([\w]+)")
m = pattern.match("Hello World Hi Python")
print(m.group())
# 輸出結果為Hello World 第一個分組成功匹配到Hello第二個成功匹配到World 正則表示式已匹配結束
print(m.group(1))
# 輸出結果為Hello 取第一個分組成功匹配到Hello
print(m.group(2))
# 輸出結果為World 取第二個分組成功匹配到World 

pattern = re.compile(r"([\w]+)\.?([\w]+)?")
m = pattern.match("Hello")
print(m.groups())
# 輸出結果為('Hello', None) 第一個元素是一個分組匹配到的Hello，因為第二個分組沒有匹配到，所以返回None
print(m.groups("Python"))
# 輸出結果為('Hello', 'Python') 因為第二個分組沒有匹配到，所以返回在groups中設定的預設值

pattern = re.compile(r"(?P<first_str>\w+) (?P<last_str>\w+)")
m = pattern.match("Hello Python")
print(m.groupdict())
# 輸出結果為{'first_name': 'Hello', 'last_name': 'Python'} 預設值的用法與groups中的相同

5.findall()

原始碼描述：

def findall(self, string, pos=0, endpos=-1):
   """Return a list of all non-overlapping matches of pattern in string."""
   # 返回字串中所有匹配成功的子串的列表，
   #重點：返回的是一個列表，沒有group方法
   pass

引數說明：

# 與match方法一致

示例程式碼：

pattern = re.compile(r'\d+')
m = pattern.findall('a1b2c33d4')
print(m)
# 輸出['1', '2', '33', '4'] 查找出字串中的所有數字

m = pattern.findall('a1b2c33d4', 1, 6)
print(m)
# 輸出['1', '2', '3'] 在"1b2c3"中查詢

6.finditer()

原始碼描述：

def finditer(self, string, pos=0, endpos=-1):
   """Return an iterator over all non-overlapping matches for the pattern in string. For each match, the iterator returns a match object."""
   # 返回字串中所有匹配成功的子串的迭代器
   pass

引數說明：

# 與match方法一致

示例程式碼：

pattern = re.compile(r'\d+')
m = pattern.finditer('a1b2c33d4')
print(m)
# 輸出<callable_iterator object at 0x0000017A8C0F8240>迭代器

print(next(m).group())
# 依次輸出匹配到的結果

7.finditer()

原始碼描述：

def split(self, string, maxsplit=0):
   """Split string by the occurrences of pattern."""
   # 返回根據匹配到的的子串將字串分割後成列表
   pass

引數說明：

maxsplit: 指定最大分割次數，不指定將全部分割。

示例程式碼：

pattern = re.compile(r'\d+')
m = pattern.split('a1b2c3d4e')
print(m)
# 輸出['a', 'b', 'c', 'd', 'e'] 根據數字，全部分割

m = pattern.split('a1b2c3d4e', 3)
print(m)
# 輸出['a', 'b', 'c', 'd4e'] 只分割三次，後面的不進行分割

8.split()

原始碼描述：

def split(self, string, maxsplit=0):
   """Split string by the occurrences of pattern."""
   # 返回根據匹配到的的子串將字串分割後成列表
   pass

引數說明：

maxsplit: 指定最大分割次數，不指定將全部分割。

示例程式碼：

pattern = re.compile(r'\d+')
m = pattern.split('a1b2c3d4e')
print(m)
# 輸出['a', 'b', 'c', 'd', 'e'] 根據數字，全部分割

m = pattern.split('a1b2c3d4e', 3)
print(m)
# 輸出['a', 'b', 'c', 'd4e'] 只分割三次，後面的不進行分割

9.sub()

原始碼描述：

def sub(self, repl, string, count=0):
   """Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl."""
   # repl替換掉字串中匹配到的子串，變成新的字串返回
   pass

引數說明：

repl: 替補內容
string: 原字串
count: 替換次數,預設全部替換

示例程式碼：

pattern = re.compile(r'\s+')
text = "Process finished with exit code 0"
m = pattern.sub('-', text, 3)
print(m)
# 輸出結果Process-finished-with-exit code 0 前三個空格被‘-’替換了

10.subn()

原始碼描述：

def subn(self, repl, string, count=0):
   """Return the tuple (new_string, number_of_subs_made) found by replacing the leftmost non-overlapping occurrences of pattern with the replacement repl."""
   # 返回一個由替換後的結果和替換的次陣列成的元組
   pass

引數說明：

與sub()引數含義一致

示例程式碼：

pattern = re.compile(r'\s+')
text = "Process finished with exit code 0"
m = pattern.subn('-', text)
print(m)
# 輸出結果('Process-finished-with-exit-code-0', 5)

二、總結

上一部分只是記錄了re模組當中比較常用的十種方法，大家可以在原始碼中看到另外幾種簡單的或者不常用的方法：

fullmatch(string, pos=0, endpos=-1)
start(group=0)
end(group=0)
escape(string)

如果可以掌握上述的十種方法，那理解這四種方法也是輕而易舉。
re模組的使用方法就講這麼多了，如果有錯誤的地方，希望可以指正，我自己也是在學習階段，謝謝。

介紹一個正則測試小工具：正則表示式測試工具
後續，還將在寫一篇 Python3正則表示式(三)貪婪模式與非貪婪模式

Python3正則表示式(二)re模組

在Python3正則表示式(一)基本語法規則已經記錄了正則表示式的基本規則，接下來將寫一下在python當中如何利用正則表示式去匹配字串，即re模組中功能函式的使用。使用時要先進行匯入re模組：import re 一、re模組中常用的函式 1.c

python3 正則表示式，re模組學習

正則表示式：正則表示式有特殊的語法，有些符號需要轉義，所以一般來說使用原始字串模式，也就是r''。模式描述^匹配字串的開頭$匹配字串的末尾。.匹配任意字元，除了換行符，當re.DOTALL標記被指定時，則可以匹配包括換行符的任意字元。[...]用來表示一組字元,單獨列出：[a

python3 學習5 正則表示式，re模組學習

正則表示式：正則表示式有特殊的語法，有些符號需要轉義，所以一般來說使用原始字串模式，也就是r''。轉自：https://blog.csdn.net/qq_33720683/article/details/81023115 模式描述

python3進階之正則表示式之re模組之分組（group）、貪心匹配、編譯

　　除了簡單地判斷是否匹配之外，正則表示式還有提取子串的強大功能。用()表示的就是要提取的分組（Group）。比如：^(\d{3})-(\d{3,8})$分別定義了兩個組，可以直接從匹配的字串中提取出區號和本地號碼m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345'

【轉】Python之正則表示式（re模組）

【轉】Python之正則表示式（re模組）本節內容 re模組介紹使用re模組的步驟 re模組簡單應用示例關於匹配物件的說明說說正則表示式字串前的r字首 re模組綜合應用例項參考文件提示：由於該站對MARKDOWN的表格支援的不是很好，所以本文中的表

正則表示式和re模組知識點彙總

"\^"：匹配字元的開始"\$"：匹配字元的結尾"[]"：字元組"[^a]"：如果在字元組中以^開頭，就是除了a不匹配，其他的都匹配"a|b"：匹配字元a或b 注意：使用或關係的時候，要把長規則放在短規則的前面"()"分組，需要對一個整體匹配規則量詞約束的，就對整體匹配規則加一個括號字串最前面加上r 就是不

資料提取——正則表示式的 re 模組

什麼是正則表示式正則表示式，又稱規則表示式，通常被用來檢索、替換那些符合某個模式(規則)的文字。正則表示式是對字串操作的一種邏輯公式，就是用事先定義好的一些特定字元、及這些特定字元的組合，組成一個“規則字串”，這個“規則字串”用來表達對字串的一種過濾邏輯。給定一個正則表示式

day023正則表示式，re模組，簡單爬蟲和多頁面爬蟲（幹掉數字簽名證書驗證）

本節內容： 1、正則表示式 2、re模組的運用 3、簡單的爬蟲練習一、正則表示式(Regular Expression) 正則表示式是對字串操作的⼀種邏輯公式. 我們⼀般使⽤正則表示式對字串進⾏匹配和過濾. 使⽤正則的優缺點: 優點: 靈活, 功能性強, 邏輯性強. 缺點: 上⼿難. ⼀旦上⼿, 會愛

day023 正則表示式和re模組

一.正則1.字元組 [a-zA-Z0-9]字元組中的　 [^a] 除了字元組的2.　 3. 4. 二.re模組 re.S 設定 .的換行 obj=re 1.ret=re.search(正則，content) 找到一個結果就返回　　拿

正則表示式（re模組）

正則表示式，用來處理什麼的呢？它有什麼作用？正則表示式是用來處理字串匹配的！講正題之前我們先來看一個例子：：https://reg.jd.com/reg/person?ReturnUrl=https%3A//www.jd.com/ 這是京東的註冊頁面，開啟頁面我們就看到這些要求輸入個

python之正則表示式：re模組

一.正則表示式中常用的字元含義 1、普通字元和11個元字元：常用字元劃分匹配範圍示例資料匹配的正則表示式目標匹配的字串普通字元匹配自身 abc

Python 正則表示式，re模組，match匹配(預設從開頭匹配)，分組

單個字元：數量詞：匹配開頭、結尾：匹配分組： demo.py（正則表示式，match從開頭匹配，分組，分組別名）： # coding=utf-8 import re # 小括號()表示分組 \1表示取出第

python中的正則表示式（re模組）

一、簡介正則表示式本身是一種小型的、高度專業化的程式語言，而在python中，通過內嵌整合re模組，程式媛們可以直接呼叫來實現正則匹配。正則表示式模式被編譯成一系列的位元組碼，然後由用C編寫的匹配引擎執行。二、正則表示式中常用的字元含義 1、普通字元和11個元字

024-2018-1010 正則表示式和re模組

1.今日內容大綱一. 昨日內容回顧序列化: pickle: 把物件序列化成bytes dumps() 序列化 loads() 反序列化

[轉]python中的正則表示式（re模組）

轉自:https://www.cnblogs.com/tina-python/p/5508402.html 一、簡介正則表示式本身是一種小型的、高度專業化的程式語言，而在python中，通過內嵌整合re模組，程式媛們可以直接呼叫來實現正則匹配。正則表示式模式被編譯成一系列的位元組碼

正則表示式和re模組

1. 正則表示式匹配字串　　元字元　　　　. 除了換行　　　　\w 數字, 字母, 下劃線　　　　\d 數字　　　　[] 字元組　　　　^ 字串的開始　　　　$ 字串的結束　　　　| 或者　　　　[^xxx] 非xxxx 　　　　\s 空白符　　　　\n 換行　　　　\t

python正則表示式與re模組

python中的re模組常用函式/方法 0.正則表示式物件　　（re.compile(pattern, flags=0)）將正則表示式編譯成正則表示式物件，該物件可呼叫正則表示式物件方法如:re.match(),re.search(),re.findall等。 prog = re.c

正則表示式（re模組，匹配單個字元，匹配多個字元，匹配分組，python貪婪和非貪婪，r的作用）

re.match() 能夠匹配出以xxx開頭的字串匹配單個字元示例1： . #coding=utf-8 import re ret = re.match(".","M") print(ret.group()) ret = re.match("t.o","too") print

正則表示式之re模組compile()

定義： compile(pattern[,flags] ) 根據包含正則表示式的字串建立模式物件。compile(pattern, flags=0) 通過help可以看到compile方法的介紹，返回一個pattern物件，但是卻沒有對第二個引數flags進行介紹。第二個引數

正則表示式之re模組findall()

[python] view plain copy >>> import re >>> s = "adfad asdfasdf asdfas asdfawef asd adsfas " >>> reObj1 =

Python3正則表示式(二)re模組

一、re模組中常用的函式

1.compile()

2.match()

3.search()

4.group()，groups()和groupdict()

5.findall()

6.finditer()

7.finditer()

8.split()

9.sub()

10.subn()

二、總結

相關推薦