python模式匹配與正則表示式
阿新 • • 發佈:2020-12-13
正則表示式使用方法:
1> 用import re匯入正則表示式模組
2> 用re.compile()函式建立一個正則表示式物件(記得使用原始字串)
3> 向Regex物件的search()方法傳入想查詢的字串。它返回一個Match物件
4> 呼叫Match物件的group()方法,返回實際匹配文字的字串
import re phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') mo = phoneNumRegex.search('my number is 425-444-3467') print('phone number found ' + mo.group()) ====================================================== result: phone number found 425-444-3467
利用括號分組:
import re
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('my number is 425-444-3467')
print(mo.group())
print(mo.group(1))
print(mo.group(2))
========================================================
result:
425-444-3467
425
444-3467
用管道匹配多個分組:希望匹配多個表示式中的一個時,可以使用管道
import re
heroRegex = re.compile(r'Batman|Tina Fey')
mo = heroRegex.search('Batman and Tina Fey')
print(mo.group())
mo = heroRegex.search('Tina Fey and Batman')
print(mo.group())
=============================================
result:
Batman
Tina Fey
import re batRegex = re.compile(r'Bat(man|mobile|copter|bat)') mo = batRegex.search('Batcopter lost a wheel') print(mo.group()) print(mo.group(1)) ==================================================== result: Batcopter copter
用問號實現可選匹配:?表示出現0次或一次
import re
batRegex = re.compile(r'Bat(wo)?man')
mo = batRegex.search('Batman lost a wheel')
print(mo.group())
mo = batRegex.search('Batwoman lost a wheel')
print(mo.group())
=============================================
result:
Batman
Batwoman
用星號匹配零次或多次:
import re
batRegex = re.compile(r'Bat(wo)*man')
mo = batRegex.search('Batman lost a wheel')
print(mo.group())
mo = batRegex.search('Batwoman lost a wheel')
print(mo.group())
mo = batRegex.search('Batwowowowoman lost a wheel')
print(mo.group())
===================================================
result:
Batman
Batwoman
Batwowowowoman
用加號匹配一次或多次:
import re
batRegex = re.compile(r'Bat(wo)+man')
mo = batRegex.search('Batman lost a wheel')
print(mo)
mo = batRegex.search('Batwoman lost a wheel')
print(mo.group())
mo = batRegex.search('Batwowowowoman lost a wheel')
print(mo.group())
==================================================
result:
None
Batwoman
Batwowowowoman
用花括號匹配特定次數:
import re
haRegex = re.compile(r'(Ha){2,3}')
mo = haRegex.search('Ha')
print(mo)
mo = haRegex.search('HaHa')
print(mo.group())
mo = haRegex.search('HaHaHa')
print(mo.group())
mo = haRegex.search('HaHaHaHa')
print(mo.group())
===============================
result:
None
HaHa
HaHaHa
HaHaHa
貪心和非貪心匹配:python預設是貪心匹配,eg:(Ha){3,5}預設以匹配更多的例項為準,可在{3,5}後加?表示使用非貪心匹配
import re
haRegex = re.compile(r'(Ha){2,3}?')
mo = haRegex.search('Ha')
print(mo)
mo = haRegex.search('HaHa')
print(mo.group())
mo = haRegex.search('HaHaHa')
print(mo.group())
mo = haRegex.search('HaHaHaHa')
print(mo.group())
==================================
result:
None
HaHa
HaHa
HaHa
findall方法:
import re
phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
mo = phoneNumRegex.search('person01: 425-444-3467, person02: 425-678-4678')
print(mo.group())
mo = phoneNumRegex.findall('person01: 425-444-3467, person02: 425-678-4678')
print(mo)
============================================================================
result:
425-444-3467
[('425', '444-3467'), ('425', '678-4678')]
字元分類:
import re
# 至少一個數字+空格+一個字母
phoneNumRegex = re.compile(r'\d+\s\w+')
mo = phoneNumRegex.findall('8 c, 7 rrr, qqg, 19 tt')
print(mo)
====================================================
result:
['8 c', '7 rrr', '19 tt']
建立自己的字元分類:在縮寫的\d \s \w太寬泛的情況下可以自定義字符集
import re
phoneNumRegex = re.compile(r'[aeiou]')
mo = phoneNumRegex.findall('Hello world')
print(mo)
=========================================
result:
['e', 'o', 'o']
插入字元和美元字元:
1> 插入字元^表示以字串開頭的匹配
2> 美元字元$表示以字串結束的匹配
3> 同時使用^$字元表示整個子串必須匹配模式,如r'^\d+$'表示全是數字
import re
beginRegex = re.compile(r'^Hello')
mo = beginRegex.findall('Hello world')
print(mo)
endRegex = re.compile(r'\d$')
mo = endRegex.findall('Hello world')
print(mo)
mo = endRegex.findall('Hello world4')
print(mo)
beginEndRegex = re.compile(r'^\d+$')
mo = beginEndRegex.findall('45y889')
print(mo)
mo = beginEndRegex.findall('45889')
print(mo)
====================================
result:
['Hello']
[]
['4']
[]
['45889']
通配字元:.表示匹配除了換行以外的任意字元
import re
beginRegex = re.compile(r'.at')
mo = beginRegex.findall('The cat in the hat sat on the first mat')
print(mo)
==================================================================
result:
['cat', 'hat', 'sat', 'mat']
用.*匹配所有字元:
import re
beginRegex = re.compile(r'First Name:(.*) Last Name:(.*)')
mo = beginRegex.search('First Name:Broad Last Name:Cast')
print(mo.group(1))
print(mo.group(2))
==========================================================
result:
Broad
Cast
正則的第二個引數:
1> DOTALL:全部字元,包括換行
2> IGNORECASE:忽略大小寫
3> VERBOSE:忽略空白符和註釋
import re
beginRegex = re.compile(r'(.*)last', re.DOTALL | re.IGNORECASE | re.VERBOSE)
mo = beginRegex.search('First Name:Broad Last Name:Cast'
'sdf sdf fs dffsdf dfdfasdf ')
print(mo.group())
===========================================================================
result:
First Name:Broad Last
正則表示式做替換:sub()函式有兩個引數,第一個引數用於取代發現的匹配字串,第二個引數是匹配的內容
import re
agentRegex = re.compile(r'Agent \w+')
mo = agentRegex.sub('Agent xx', 'Agent Alice gave the secret documents to Agent Bob')
print(mo)
====================================================================================
result:
Agent xx gave the secret documents to Agent xx