可有可無正則_正則詳細用法——python實現

阿新 • • 發佈：2021-01-22

一、什麼是正則表示式？

正則表示式：是匹配或者查詢符合某些規則的字串資料

引數：

re.match(pattern, string, flags=0)
    從字串的起始位置匹配，如果匹配成功則返回匹配內容, 否則返回none。
re.search(pattern, string, flags=0)
    掃描整個字串並返回第一個成功的匹配。
re.findall(pattern, string, flags=0)(重點)
    掃描整個串,返回所有與pattern匹配的列表
    注意: 如果pattern中有分組則返回與分組匹配的列表
    舉例： re.findall("d","chuan1zhi2") >> ["1","2"]
re.sub(pattern, repl, string, count=0, flags=0)(重點)
    使用repl替換string中的所有的匹配項。
    舉例： re.sub("d","_","chuan1zhi2") >> ["chuan_zhi_"]
引數說明:
    pattern :   模式字串。
    repl :      替換的字串，也可為一個函式。
    string :    要被查詢替換的原始字串。
    count :     模式匹配後替換的最大次數，預設 0 表示替換所有的匹配。
    flags:      匹配方式:
        re.I    使匹配對大小寫不敏感,
        re.S    使 . 匹配包括換行在內的所有字元
        re.M    多行模式,會影響^,$

二、正則表示式的匹配規則

（一）匹配單個字元：

程式碼      功能
    .        匹配任意1個字元（除了n）
    [ ]      匹配[ ]中列舉的字元
    d       匹配數字，即0-9
    D       匹配非數字，即不是數字
    s       匹配空白符，即 空格，tab trnvf
    S       匹配非空白符
    w       匹配非特殊字元，即a-z、A-Z、0-9、_、漢字
    W       匹配特殊字元，即非字母、非數字、非漢字

示例：

注意點：匹配時，都是從字串第一個字元開始匹配的，如果不符合，則報錯

# . 匹配任意1個字元（除了n）
ret = re.match(".", "Msd")
print(ret.group())

# [ ]   匹配[ ]中列舉的字元
ret = re.match("a", "asd")
print(ret.group())

# d    匹配數字，即0-9
ret = re.match("d", "213Msd")
print(ret.group())

# D    匹配非數字，即不是數字
ret = re.match("D", "asd123")
print(ret.group())

# s    匹配空白符： 空格，trnvf
ret = re.match("hellosword", "hello word")
print(ret.group())

# S    匹配非空白
ret = re.match("helloSword", "hello_word")
print(ret.group())

# w    匹配非特殊字元，即a-z、A-Z、0-9、_、漢字
ret = re.match("hello word w", "hello word A")
print(ret.group())

# W    匹配特殊字元，即非字母、非數字、非漢字
ret = re.match("hello wordW", "hello word!")
print(ret.group())

思考

密碼中的其中一位，密碼是由字母、數字、下劃線組成，請列舉的方式匹配?

（二）匹配多個字元：

程式碼      功能
*       匹配前一個字元出現0次或者無限次，即可有可無
+       匹配前一個字元出現1次或者無限次，即至少有1次
?       匹配前一個字元出現1次或者0次，即要麼有1次，要麼沒有
{m}     匹配前一個字元出現m次
{m,n}   匹配前一個字元出現從m到n次

示例：

# * 匹配前一個字元出現0次或者無限次，即可有可無
# 需求：匹配出一個字串第一個字母為大小字元，後面都是小寫字母並且這些小寫字母可 有可無
ret = re.match("[A-Z][a-z]*", "M")
print(ret.group())

ret = re.match("[A-Z][a-z]*", "Aabcdef")
print(ret.group())

# + 匹配前一個字元出現1次或者無限次，即至少有1次
# 需求：匹配一個字串，第一個字元是t,最後一個字串是o,中間至少有一個字元
match_obj = re.match("t.+o", "two")
print(match_obj.group())

match_obj = re.match("t.+o", "twasdfo")
print(match_obj.group())

# ? 匹配前一個字元出現1次或者0次，即要麼有1次，要麼沒有
# 需求：匹配出這樣的資料，但是https 這個s可能有，也可能是http 這個s沒有
match_obj = re.match("https?", "http")
print(match_obj.group())

match_obj = re.match("https?", "https")
print(match_obj.group())

# {m}   匹配前一個字元出現m次
ret = re.match("[a-zA-Z0-9_]{6}", "12a3g45678")
print(ret.group())

# {m,n} 匹配前一個字元出現從m到n次
# 需求：匹配出，8到20位的密碼，可以是大小寫英文字母、數字、下劃線
ret = re.match("[a-zA-Z0-9_]{8,20}", "1ad12f23s34455ff66")
print(ret.group())

思考

如何使用正則表示式把qq:10567這樣的資料匹配處理?

（三）匹配開頭和結尾：

程式碼      功能
^        匹配字串開頭
$        匹配字串結尾

示例：

# 需求：匹配以數字開頭的資料
match_obj = re.match("^d.*", "3hello")
print(match_obj.group())

# 匹配以數字結尾的資料
match_obj = re.match(".*d$", "hello5")
print(match_obj.group())

（四）除了指定字元以外都匹配

[^指定字元]: 表示除了指定字元都匹配
示例：

# 需求: 第一個字元除了aeiou的字元都匹配
match_obj = re.match("[^aeiou]", "hello")
print(match_obj.group())

（五）匹配分組

程式碼          功能
|               匹配左右任意一個表示式
(ab)            將括號中字元作為一個分組
num            引用分組num匹配到的字串
(?P<name>)      分組起別名
(?P=name)       引用別名為name分組匹配到的字串

示例：

# |  匹配左右任意一個表示式
# 需求：在列表中["apple", "banana", "orange", "pear"]，匹配apple和pear
# 水果列表
fruit_list = ["apple", "banana", "orange", "pear"]
# 遍歷資料
for value in fruit_list:
    # |    匹配左右任意一個表示式
    match_obj = re.match("apple|pear", value)
    if match_obj:
        print("%s是我想要的" % match_obj.group())
    else:
        print("%s不是我要的" % value)

# (ab) 將括號中字元作為一個分組
# 需求：匹配出163、126、qq等郵箱
match_obj = re.match("[a-zA-Z0-9_]{4,20}@(163|126|qq|sina|yahoo).com", "[email protected]")
print(match_obj.group())
# 獲取分組資料,預設是1個分組，多個分組從左到右依次加1
print(match_obj.group(1))

# num 引用分組num匹配到的字串
# 需求：匹配出<html><h1>www.baidu.com</h1></html>
match_obj = re.match("<([a-zA-Z1-6]+)><([a-zA-Z1-6]+)>.*</2></1>", "<html><h1>www.baidu.com</h1></html>")
print(match_obj.group())

# (?P=name)     引用別名為name分組匹配到的字串
# 需求：匹配出<html><h1>www.baidu.com</h1></html>
match_obj = re.match("<(?P<name1>[a-zA-Z1-6]+)><(?P<name2>[a-zA-Z1-6]+)>.*</(?P=name2)></(?P=name1)>", "<html><h1>www.baidu.com</h1></html>")
print(match_obj.group())

三、正則（re）的高階用法

1.search

# 需求：匹配出水果的個數
# 根據正則表示式查詢資料，提示：只查詢一次
match_obj = re.search("d+", "水果有20個 其中蘋果10個")
print(match_obj.group())

2.findall

# findall: 匹配所有符合正則的內容，並把所有的匹配的內容，放入到列表中
rsf = re.findall('d', 'dhf12343df33d3f')
print(rsf)

# 如果正則中沒有（），會使用整個正則進行匹配提取
# 如果正則中有（），只會提取和小括號裡面正則匹配的內容，，兩邊的是負責定位資料的
rs = re.findall('a.+bc', 'anbc', re.DOTALL)
print(rs)
rs = re.findall('a(.+)bc', 'anbc', re.DOTALL)
print(rs)
rs = re.findall('a(.+)b(.d*)c', 'anb654c', re.DOTALL)
print(rs)

3.sub 將匹配到的資料進行替換

# sub：把所有的匹配內容，使用第二個引數，進行替換
res = re.sub('d', '_', 'dhfk32df423j')
print(res)

4.split 根據匹配進行切割字串，並返回一個列表

# 對字串進行分割
# maxsplit=1 分割次數， 預設全部分割
str = "劉德華,劉亦菲,成龍"
result = re.split(",", str)
print(result)

result = re.split(",", str, maxsplit=1)
print(result)

5.compile 預處理，編譯一個正則表示式模式，返回一個模式物件

作用：預編譯正則表示式, 把正則表示式編譯為2進位制形式,提高匹配的速度

# 預編譯
regex = re.compile('d+')
# 查詢
rs = regex.findall('chuang35431233')
print(rs)
# 替換
rs = regex.sub('_', 'sdf234hdkfs32')
print(rs)
# 注意匹配模式的位置, 規則必須放到compile裡面
regex = re.compile('.', re.S)
rs = regex.findall('anb')
print(rs)

四、正則的其他用法

1.python字串中r的用法和正則中的對比

# 普通字串中r原串, 就讓  (轉義符) 變成一個普通的 
print(len('n'))
print(len(r'n'))
# 正則中的r原串, 要匹配的字串中有多少個  轉義符，在r原串的正則中就有多少個  轉義符
rs = re.findall('anb', 'anb')
print(rs)
rs1 = re.findall('anb', 'anb')
print(rs1)
rs2 = re.findall(r'anb', 'anb')
print(rs2)
rs3 = re.findall(r'anb', 'anb')
print(rs3)

2.匹配中文

word = '你好，世界， hello'
rs = re.findall('[u4e00-u9fa5]+', word)
print(rs)

# 貪婪：儘可能多的匹配內容      表現: .*, .+
string_a = '<meta http-equiv="X-UA-Compatible">nt<meta http-equiv="content-type">'
ret = re.findall("<.*>", string_a, re.S)
print(ret)
# 非貪婪：儘可能少的匹配內容    表現：.*?, .+?：
string_b = '<meta http-equiv="X-UA-Compatible">nt<meta http-equiv="content-type">'
ret = re.findall("<.*?>", string_b, re.S)
print(ret)