python正則一些簡單匹配

阿新 • • 發佈：2018-04-02

元字符貪婪分組非貪婪

元字符的使用

re.findall(regex,string)

功能：在string字符串中，匹配regex正則表達式能夠匹配的項，放到一個列表中返回

* 普通字符串

元字符：abc

匹配規則：匹配字符串的值

匹配示例：abc

In [3]: re.findall('abc','abcdeabc')

Out[3]: ['abc', 'abc']

* 使用“或”進行多個匹配

元字符： re1 | re2

匹配規則：既能匹配正則表達式re1所表達內容，也能匹配 re2所表達內容

匹配示例：ab | bc --》 ab bc

In [5]: re.findall('ab|de','abcdeabc')

Out[5]: ['ab', 'de', 'ab']

* 點號 "."

元字符： .

匹配規則：匹配任意一個字符

匹配示例：f.o ――》 foo fao f@o

In [6]: re.findall('f.o','foo,f@oabfabo')

Out[6]: ['foo', 'f@o']

* 匹配開頭子串

元字符： ^

匹配規則：匹配一個字符串的開頭位置

匹配示例：^From 匹配以 From 開頭的字符串起始部分

In [9]: re.findall('^From','From China')

Out[9]: ['From']

In [10]: re.findall('^From','I come From China')

Out[10]: []

*匹配字符串的結尾

元字符 : $

匹配規則：當一個字符串以什麽結尾時使用$標記

匹配示例： py$ -》匹配所有以py結尾的字符串

In [17]: re.findall('py$','test.py')

Out[17]: ['py']

In [18]: re.findall('py$','python')

Out[18]: []

* 匹配任意0個或多個字符

元字符 : *

匹配規則：匹配前面出現的字符或正則表達式0次或者多次

匹配示例： ab* -> abbbbbbbb

In [23]: re.findall('.*','askjdfh89w4234')

Out[23]: ['askjdfh89w4234', '']

In [24]: re.findall('.*','askjdfh89w4234sdfhhg')

Out[24]: ['askjdfh89w4234sdfhhg', '']

In [25]: re.findall('ab*','a')

Out[25]: ['a']

In [26]: re.findall('ab*','abbbb')

Out[26]: ['abbbb']

* 匹配任意1個或多個字符

元字符 : +

匹配規則：匹配前面出現的字符或正則表達式1次或者多次

匹配示例： ab+ -> abbbbbbbb

In [28]: re.findall('ab+','abbbb')

Out[28]: ['abbbb']

In [29]: re.findall('ab+','a')

Out[29]: []

* 匹配字符 0 次或1次

元字符：？

匹配規則：匹配前面出現的字符或正則表達式0次或1次

匹配示例： ab？ --》 a 或者 ab

In [31]: re.findall('ab?','a')

Out[31]: ['a']

In [32]: re.findall('ab?','ab')

Out[32]: ['ab']

* 匹配前面的字符或re指定次數

元字符： {N} N代表一個數字

匹配規則：匹配前面出現的字符或正則表達式N次

匹配示例： ab{3} --》 abbb

In [34]: re.findall('ab{3}','abbbbbb')

Out[34]: ['abbb']

In [35]: re.findall('ab{3}','abb')

Out[35]: []

* 匹配前面的字符或re指定次數

元字符： {M,N} M,N代表數字

匹配規則：匹配前面出現的字符或正則表達式M 到 N次

匹配示例： ab{3，8} --》 abbb abbbbbbbb

In [36]: re.findall('ab{3,8}','abbb')

Out[36]: ['abbb']

In [37]: re.findall('ab{3,8}','abbbbbbbbbbb')

Out[37]: ['abbbbbbbb']

* 字符集合匹配

元字符： [abcd]

匹配規則：匹配中括號中任意一個字符

匹配示例： b[abcd]t -> bat bbt bct bdt

In [40]: re.findall('b[abc123]t','bat,b1tba3t')

Out[40]: ['bat', 'b1t']

In [41]: re.findall('[ab][cd]','acadbcbd')

Out[41]: ['ac', 'ad', 'bc', 'bd']

* 字符集合匹配

元字符： [a-zA-Z0-9] [a-z] [0-9] [a-zA-Z] [3-8]

[b-x]

匹配規則：匹配中括號中任意一個區間內的字符

匹配示例： [a-zA-Z0-9]+ 匹配任意一個由字母數字組 In [43]: re.findall('[a-zA-Z0-9]+','safd1324')

Out[43]: ['safd1324']

In [44]: re.findall('[a-zA-Z0-9]+','adf$&^%123')

Out[44]: ['adf', '123']

成的非空字符串

* 字符集合不匹配

元字符： [^...] ... 表示上面兩項中任意內容

匹配規則：匹配任意非中括號中的字符集

匹配示例： [^aeiou] 匹配任意一個非aeiou字符

[^a-z] 匹配任意一個非小寫字母

In [46]: re.findall('[^a-z]','abc1j2^&d')

Out[46]: ['1', '2', '^', '&']

In [47]: re.findall('[^aeiou]','hello world')

Out[47]: ['h', 'l', 'l', ' ', 'w', 'r', 'l', 'd']

* 匹配(非)數字字符

元字符 : \d [0-9] \D [^0-9]

匹配規則 : \d 匹配任意一個數字字符

\D 匹配任意一個非數字字符

匹配示例：\d{3} --> '123'

In [49]: re.findall('\d{3}','hello 1234')

Out[49]: ['123']

In [50]: re.findall('\D{3}','hello 1234')

Out[50]: ['hel', 'lo ']

* 匹配(非)字母數字字符

元字符 : \w [a-zA-Z0-9] \W [^a-zA-Z0-9]

匹配規則 : \w 匹配任意一個字母或數字字符

\W 匹配任意一個非字母或數字字符

匹配示例：\w{3} --> 'a23'

In [51]: re.findall('[A-Z]\w*','Hello World')

Out[51]: ['Hello', 'World']

In [52]: re.findall('\w+-\d+','xiaoming-56')

Out[52]: ['xiaoming-56']

* 匹配(非)空字符

元字符 : \s (空格 \n \0 \t \r) \S

匹配規則 : \s 匹配任意一個空字符

\S 匹配任意一個非空字符

匹配示例：hello world -> hello world

In [58]: re.findall('hello\s+world','hello world')

Out[58]: ['hello world']

In [60]: re.findall('\S*','helloworld&* ask')

Out[60]: ['helloworld&*', '', 'ask', '']

In [61]: re.findall('\s','a b c\n')

Out[61]: [' ', ' ', '\n']

*匹配字符串開頭和結尾

元字符 \A (^) \Z ($)

匹配規則： \A 匹配字符串的開頭位置

\Z 匹配字符串的結尾位置

匹配示例： \Aabc\Z ^abc$ - > abc

In [70]: re.findall('\Aabc\Z','abcabc')

Out[70]: []

In [66]: re.findall('\Aabc\Z','abc')

Out[66]: ['abc']

In [68]: re.findall('efg\Z','hi,abcdefg')

Out[68]: ['efg']

* 匹配（非）單詞邊界

元字符： \b \B

匹配規則：將非字母的部分不認為是單詞部分

將連續字母的部分認為是一個單詞

匹配示例： “This is a %test%”

In [74]: re.findall(r'\btest\b','This is a %test%')

Out[74]: ['test']

In [75]: re.findall(r'\bThis\b','This is a %test%')

Out[75]: ['This']

In [76]: re.findall(r'\bis\b','This is a %test%')

Out[76]: ['is']

In [77]: re.findall(r'\Bis\b','This is a %test%')

Out[77]: ['is']

In [78]: re.findall(r'is\b','This is a %test%')

Out[78]: ['is', 'is']

元字符總結

字符：匹配實際字符

匹配單個字符： . [] \d \D \w \W \s \S

匹配重復次數： * + ？ {}

匹配開頭結尾： ^ $ \A \Z \b \B

其他： | [^ ]

raw字串和轉義

r“hello world” -> raw字串

raw字串特點：不進行轉義解析

“hello \n world” -> \n表示換行

r"hello \n world" -> \n表示兩個字符

什麽時候加r

轉為raw字符串是為了防止python對字符串的轉義解析，所以在正則表達式本身有“\”的時候最好加上r

正則表達式的轉義匹配

當匹配正則表達式內的特殊字符的時候，正則表達式本身也需要進行轉義，如要匹配字符串中的 * 則正則表達式應為“\*”

特殊字符如下：

\ * . ？ () [] {} "" ''

匹配字符串中的*

In [86]: re.findall(r'\*','* is not \\, \\ is not ?')

Out[86]: ['*']

In [87]: re.findall('\\*','* is not \\, \\ is not ?')

Out[87]: ['*']

匹配字符串中的“\”

In [89]: re.findall('\\\\','* is not \\, \\ is not ?')

Out[89]: ['\\', '\\']

In [90]: re.findall(r'\\','* is not \\, \\ is not ?')

Out[90]: ['\\', '\\']

貪婪和非貪婪

貪婪模式：不做處理的情況下，正則表達式默認是貪婪模式。即在使用 * + ？ {M,N} 的時候盡可能多的向後進行匹配。

e.g.

ab* 可以匹配 a ab abbb... 那麽當b足夠多的時候它會盡可能多的去匹配

In [96]: re.findall(r'ab*','abbbbbbb')

Out[96]: ['abbbbbbb']

非貪婪模式：盡可能少的匹配復合正則條件的內容

貪婪模式 ---》非貪婪模式方法：後面加“？”

即 *？ +？？？ {M,N}？

In [100]: re.findall(r'ab*?','abbbbbbb')

Out[100]: ['a']

In [101]: re.findall(r'ab+?','abbbbbbb')

Out[101]: ['ab']

In [102]: re.findall(r'ab??','abbbbbbb')

Out[102]: ['a']

In [103]: re.findall(r'ab{2,4}?','abbbbbbb')

Out[103]: ['abb']

正則表達式分組

((ab)*(cd))

正則表達式 (ab)*cd

1. 正則表達式可以分組，分組的標誌即括號()，每個括號都是正則表達式的一個子組，而每個子組是整體正則表達式的一部分，同時也是一個小的正則表達式

2. 當有多個子組的時候，我們從外層向內側分別叫第一，第二....子組。當同一層次的時候，從左向右分別計數

3. 分組會該表* + ？ {} 的重復行為，即把每個分組當做一個整體對待，進行相應的重復操作

4. 當子組能後和多個目標字符串內容進行匹配時，只返回一個內容

In [113]: re.findall(r'(ab)+cd','ababcdef')

Out[113]: ['ab']

5.每個組都可以起名字，我們可以根據起的名字辨別各個組。

格式： (?P<word>hello)

給子組(hello) 起一個名字，這個名字是 “word”

子組通過名字進行調用 (?P=word) 表示復制子組正則表達式內容

In [123]: re.findall(r'((?P<word>hello)\s+(?P=word))','hello hello')

Out[123]: [('hello hello', 'hello')]

python正則一些簡單匹配

元字符貪婪分組非貪婪元字符的使用re.findall(regex,string)功能：在string字符串中，匹配regex正則表達式能夠匹配的項，放到一個列表中返回* 普通字符串元字符：abc 匹配規則：匹配字符串的值匹配示例：abc In [3]: re.findall

python正則表達式匹配十六進制數據

fin phy decimal 進制 ref check 十六 http ffi 1. Find any hexadecimal number in a larger body of text \b[0-9a-fA-F]+\b 2. Check whether a

Python: 正則表達式匹配反斜杠 ""

details 字符串 art tails spa .net python 正在 12px Python正則表達式匹配反斜杠 "\" eg: >>>a=‘w\w\w‘ ‘w\\w\\w‘　　# 打印出來的 "\\" 被轉義成一個反斜杠 "\" 如果需要

Python: 正則表達式匹配多行，實現多行匹配模式

post 表達式包括實現 body 表達捕獲 blog class 1) 點（.）去匹配任意字符的時候，不能匹配換行符在這個模式中(?:.|\n)是指定了一個非捕獲組（僅僅用來做匹配，部能通過單獨捕獲或者編號的組） 2) re.DOTALL 　　re

關於python正則表示式中匹配分組的問題

在爬取網頁資訊時，我們不妨會用到Python正則表示式。之前一直沒有太明白關於正則表示式匹配分組的問題，今天終於搞清楚了，所以特意寫一下讓自己印象深刻。 myPage = requests.get(url).content.decode("gbk") 通過requests我們在網頁得到了這樣

python正則表示式簡單的手機號碼格式的驗證

import re #手機號的匹配 phone = re.compile('^(13(7|8|9|6|5|4)|17(0|8|3|7)|18(2|3|6|7|9)|15(3|5|6|7|8|9))\d{8}$') num = input('請輸入手機號:') if re.match(phone

python正則re------簡單理解

　元字元： 1 　　.　　　匹配除換行符以外的任意字元 2　　 ^　　　必須從字串的開始匹配 3　　 $　　必須匹配字串的結尾 4　　 *　　　（0，+00） 5　　 + 　　（1，+00） 6　　？　　（0，1） 7 　　{}　　 {0,n} 取0到n次中的任意一個 8 　　\w　　

python正則表示式中文匹配

一般中文部分的unicode 值是4e00 - 9f5a，但是要注意，這是基本漢字編碼範圍，還有一些擴充套件集，後面介紹下面例子標識我們要查詢一段字串中的漢字：import re s =

python正則表示式的匹配優先順序

在python正則表示式中，預設是匹配最多的字元，這是貪婪匹配，比如：字串： abbbab 正則表示式： a.*b 得到的結果是 abbbab 如果希望匹配最少的字元，只需要在 *

python正則中如何匹配漢字

這裡邊重點用到了 r'[\u4e00-\u9fa5]+' 的正則規則，表示1到多個任意漢字。 import re str1='hjggj小vjjk明' pat=re.compile(r'[\u4e00-\u9fa5]+') result=pat.findall(str1

python正則表示式，匹配電話號碼

#寫一個正則表示式，能匹配出多種格式的電話號碼，包括： text = "(021)88776543 010-55667890 02584533622 057184720483 837922740" m = re.findall(r'\(?0\d{2,3}[)-]?\d{7,

Python實現正則表達式匹配任意的郵箱

blog too toc print python實現簡單的 python blank 郵箱首先來個簡單的例子，利用Python實現匹配163郵箱的代碼： [python] view plain copy print? #-*- coding:ut

python正則匹配——中文字符的匹配

pri bsp odi col div class cnblogs mat 結果 # -*- coding:utf-8 -*- import re ‘‘‘python 3.5版本正則匹配中文，固定形式：\u4E00-\u9FA5 ‘‘‘ words = ‘stud

Python 正則re匹配中文、英式數字

article 正則 find tin 中文自動 nbsp ont ron #coding:utf-8 import re s = u‘‘‘ 或多或少的好好讀書電鋸驚魂20202 和水電費後是否會時候1212沒收到風10.12海大富的是粉紅色的和辦法的1244525

兄弟連學Python（06）---- 正則表達式匹配規則

驗證列表 cas 斜杠小數點 php 能夠 spa 超過正則表達式 - 匹配規則基本模式匹配一切從最基本的開始。模式，是正則表達式最基本的元素，它們是一組描述字符串特征的字符。模式可以很簡單，由普通的字符串組成，也可以非常復雜，往往用特殊的字符表示一個範圍內的字

Python正則表達式返回首次匹配到的字符及查詢的健壯性

ror exe https -m rec last first sta clas re.findall(pattern,string)會搜索所有匹配的字符，返回的是一個列表，獲取首個匹配需要re.findall(pattern,string)[0]訪問, 但是如果finda

python正則表達式3-模式匹配

dex import mail blog 正則表達 gpo .cn span OS re.S，使 ‘.‘ 匹配換行在內的所有字符 >>> pattern=r‘ghostwu.com‘ >>> import re >>

Python正則表示式的簡單應用和示例演示

前一陣子小編給大家連續分享了十篇關於Python正則表示式基礎的文章，感興趣的小夥伴可以點選連結進去檢視。今天小編給大家分享的是Python正則表示式的簡單應用和示例演示，將前面學習的Python正則表示式做一個概括。下面的栗子是用於提取高考日期，一般來說，我們填寫日期都會寫2018年6月7日，但

Python學習筆記模式匹配與正則表達式之用正則表達式匹配更多模式

重復實例 int clas span 就是 image 特定 mat 隨筆記錄方便自己和同路人查閱。 #------------------------------------------------我是可恥的分割線--------------------------

python正則匹配內網IP

rex_ip = re.compile('^(127\\.0\\.0\\.1)|(localhost)|(10\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})|(172\\.((1[6-9])|(2\\d)|(3[01]))\\.\\d{1,3}\\.\\d{1,3}

python正則一些簡單匹配

相關推薦