RE2，C++正則表示式庫實戰

RE2簡介

RE2是,一個高效、原則性的正則表示式庫，由Rob Pike和Russ Cox兩位來自google的大牛用C++實現。他倆同時也是Go語言的主導者。Go語言中的regexp正則表示式包，也是RE2的Go實現。

RE2是，一個快速、安全，執行緒友好，PCRE、PERL和Python等回溯正則表示式引擎（backtracking regular expression engine）的一個替代品。RE2支援Linux和絕大多數的Unix平臺，但不支援Windows（如果有必要，你可以自己hack）。

RE2的特點

回溯引擎（Backtracking engine）通常是典型的完整的功能和便捷的語法糖，但是即使很小的輸入

都可能強制進入指數級時間處理場景。RE2應用自動機理論理論，來保證在一個尺寸的輸入上正則表示式搜尋運行於一個時間線。RE2實現了記憶體限制，所以搜尋可以被制約在一個固定大小的記憶體。RE2被設計為使用一個很小的固定C++堆疊足跡，無論它必須處理的輸入或正則表示式是什麼。從而RE2在多執行緒環境非常有用，當執行緒棧不能武斷的增大時。

當輸入（資料集）很大時，RE2通常比回溯引擎快很多。它採用自動機理論，實施別的引擎無法進行的優化。

不同於絕大多數基於自動機的引擎，RE2實現了幾乎所有Perl和PCRE特點，和語法糖。它找到最左-優先（leftmost-first）匹配，同時匹配Perl可能匹配的，並且能返回子匹配

資訊。最明顯的例外是，RE2去掉了對反向引用（backreferences）和一般性零-寬度斷言（zero-width assertion）的支援，因為無法高效實現。

為了相對簡單語法的使用者，RE2，有一個POSIX模式，僅接受POSIX egrep運算元，實現最左-最長整體匹配（leftmost-longest overall matching）。

¹ Technical note: there's a difference between submatches and backreferences. Submatches let you find out what certain subexpressions matched after the match is over, so that you can find out, after matching dogcat against (cat|dog)(cat|dog), that \1 is dog and \2 is cat. Backreferences let you use those subexpressions during the match, so that (cat|dog)\1 matches catcat and dogdog but not catdog or dogcat.

RE2支援子匹配萃取（submatch extraction），但是不支援反向引用（backreferences）。

如果你必須要反向引用和一般性斷言，而RE2不支援，那麼你可以看一下irregexp，Google Chrome的正則表示式引擎。

玩轉RE2

安裝

你可以下載發行版的程式碼包，然後解壓進行安裝。這裡介紹，另一種安裝方式：

需要安裝Mercurial SCM和C++編譯器（g++的克隆）：

下載程式碼，並進行安裝：


    hg clone http://re2.googlecode.com/hg re2
    cd re2
    make test
    make testinstall
    sudo make install

在BSD系統, 使用gmake替換make

使用RE2庫

使用RE2庫開發C++應用，需要在程式碼中包含re2/re2.h標頭檔案，連結時增加 -lre2以及-lpthread（多線環境使用）選項。

語法

在POSIX模式，[email protected]接受標準POSIX (egrep)語法正則表示式。在Perl模式，RE2接受大部分Perl操作符。唯一例外的是，那些要求回溯（潛在需要指數級的執行時）實現的部分。其中，包括反向引用（子匹配，還是支援的）和一般性斷言。RE2,預設為Perl模式。

C++ 高階介面

這裡包括兩個基本的操作：

RE2::FullMatch: 要求regexp表示式匹配整個輸入文字。
RE2::PartialMatch: 在輸入文字中尋找一個子匹配。在POSIX模式，返回最左-最長匹配，Perl模式也是相同的匹配。

例如，

vi re2_high_interface_test.cc


#include <re2/re2.h>
#include <iostream>
#include <assert.h>

int
main(void)
{
    assert(RE2::FullMatch("hello", "h.*o"));
    assert(!RE2::FullMatch("hello", "e"));

    assert(RE2::PartialMatch("hello", "h.*o"));
    assert(RE2::PartialMatch("hello", "e"));

    std::cout << "Ok" << std::endl;
    return 0;
}

編譯程式：

 g++ -o re2_high_interface_test re2_high_interface_test.cc -lre2

執行re2_high_interface_test，程式正常執行，顯示結果Ok。

子匹配萃取

兩個匹配函式，都支援附加引數，來指定子匹配。此引數可以是一個字串或一個整數型別或StringPiece型別。一個StringPiece是一個指向原始輸入的指標,和一個字串的長度計數。有點類似一個string，但是有自己的儲存。和使用指標一樣，當使用StringPiece時，你必須小心謹慎，原始文字已被刪除或不在相同的邊界時，不能使用。

示例：

vi re2_submatch_ex_test.cc


#include <re2/re2.h>
#include <iostream>
#include <assert.h>

int
main(void)
{
    int i;
    std::string s;
    assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", &s, &i));
    assert(s == "ruby");
    assert(i == 1234);

    // Fails: "ruby" cannot be parsed as an integer.
    assert(!RE2::FullMatch("ruby", "(.+)", &i));

    // Success; does not extract the number.
    assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", &s));

    // Success; skips NULL argument.
    assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", (void*)NULL, &i));

    // Fails: integer overflow keeps value from being stored in i.
    assert(!RE2::FullMatch("ruby:123456789123", "(\\w+):(\\d+)", &s, &i));

    std::cout << "Ok" << std::endl;
    return 0;
}

g++ -o re2_submatch_ex_test re2_submatch_ex_test.cc -lre2

預編譯的正則表示式

上面的示例都是每次呼叫的時編譯一次正則表示式。相反，你可以編譯一次正則表示式，儲存到一個RE2物件中，然後在每次呼叫時重用這個物件。

示例:

vi re2_prec_re_test.cc


#include <re2/re2.h>
#include <iostream>
#include <assert.h>

int
main(void)
{
    int i;
    std::string s;
    RE2 re("(\\w+):(\\d+)");
    assert(re.ok());  // compiled; if not, see re.error();

    assert(RE2::FullMatch("ruby:1234", re, &s, &i));
    assert(RE2::FullMatch("ruby:1234", re, &s));
    assert(RE2::FullMatch("ruby:1234", re, (void*)NULL, &i));
    assert(!RE2::FullMatch("ruby:123456789123", re, &s, &i));

    std::cout << "Ok" << std::endl;
    return 0;
}

g++ -o re2_prec_re_test re2_prec_re_test.cc -lre2

選項

RE2構造器還有第二個可選引數，可以用來改變RE2的預設選項。例如，預定義的Quiet選項，當正則表示式解析失敗時，不列印錯誤訊息：

vi re2_options_test.cc


#include <re2/re2.h>
#include <iostream>
#include <assert.h>

int
main(void)
{
    RE2 re("(ab", RE2::Quiet);  // don't write to stderr for parser failure
    assert(!re.ok());  // can check re.error() for details

    std::cout << "Ok" << std::endl;
    return 0;
}

編譯程式：

g++ -o re2_options_test re2_options_test.cc -lre2

其他有用的預定義選項，是Latin1 (禁用UTF-8)和POSIX (使用POSIX語法和最左-最長匹配)。

你可以定義自己的RE2::Options物件，然後配置它。所有的選項在re2/re2.h檔案中。

Unicode規範化

RE2操作Unicode的碼點（code points）: 它沒有試圖進行規範化。例如，正則表示式/ü/(U+00FC, u和分音符)不匹配"ü"(U+0075 U+0308, u緊挨結合分音符)。規範化，是一個長期，參與的話題。最小的解決方案，如果你需要這樣的匹配，是在使用RE2之前的處理環節中同時規範化正則表示式和輸入。相關主題的更多細節，請參考http://www.unicode.org/reports/tr15/。

額外的技巧和竅門

RE2的高階應用技巧，如構造自己的引數列表，或將RE2作為詞法分析器使用或解析十六進位制、十進位制和C-基數數字，請參考re2.h檔案。

“回溯”與“非回溯”的區別

以下照片內容，源自“sregex: matching Perl 5 regexes on data streams”講演文件.

回溯的意思

回溯方式實現

Robe Pike的演算法

Thompson的構造的演算法

RE2的各種包裝

RE2支援的語法

這裡列出了RE2支援的正則表示式語法。同時，也列出了PCRE、PERL和VIM接受的語法。藍色內容是，RE2不支援的語法。

Single characters:
`.`	any character, including newline (s=true)
`[xyz]`	character class
`[^xyz]`	negated character class
`\d`	Perl character class
`\D`	negated Perl character class
`[:alpha:]`	ASCII character class
`[:^alpha:]`	negated ASCII character class
`\pN`	Unicode character class (one-letter name)
`\p{Greek}`	Unicode character class
`\PN`	negated Unicode character class (one-letter name)
`\P{Greek}`	negated Unicode character class
Composites:
`xy`	`x` followed by `y`
`x\|y`	`x` or `y` (prefer `x`)
Repetitions:
`x`	zero or more `x`, prefer more
`x+`	one or more `x`, prefer more
`x?`	zero or one `x`, prefer one
`x{n,m}`	`n` or `n`+1 or ... or `m` `x`, prefer more
`x{n,}`	`n` or more `x`, prefer more
`x{n}`	exactly `n` `x`
`x?`	zero or more `x`, prefer fewer
`x+?`	one or more `x`, prefer fewer
`x??`	zero or one `x`, prefer zero
`x{n,m}?`	`n` or `n`+1 or ... or `m` `x`, prefer fewer
`x{n,}?`	`n` or more `x`, prefer fewer
`x{n}?`	exactly `n` `x`
`x{}`	(≡ `x`) (NOT SUPPORTED) VIM
`x{-}`	(≡ `x?`) (NOT SUPPORTED) VIM
`x{-n}`	(≡ `x{n}?`) (NOT SUPPORTED) VIM
`x=`	(≡ `x?`) (NOT SUPPORTED) VIM
Possessive repetitions:
`x+`	zero or more `x`, possessive (NOT SUPPORTED)
`x++`	one or more `x`, possessive (NOT SUPPORTED)
`x?+`	zero or one `x`, possessive (NOT SUPPORTED)
`x{n,m}+`	`n` or ... or `m` `x`, possessive (NOT SUPPORTED)
`x{n,}+`	`n` or more `x`, possessive (NOT SUPPORTED)
`x{n}+`	exactly `n` `x`, possessive (NOT SUPPORTED)
Grouping:
`(re)`	numbered capturing group
`(?Pre)`	named & numbered capturing group
`(?re)`	named & numbered capturing group (NOT SUPPORTED)
`(?'name're)`	named & numbered capturing group (NOT SUPPORTED)
`(?:re)`	non-capturing group
`(?flags)`	set flags within current group; non-capturing
`(?flags:re)`	set flags during re; non-capturing
`(?#text)`	comment (NOT SUPPORTED)
`(?\|x\|y\|z)`	branch numbering reset (NOT SUPPORTED)
`(?>re)`	possessive match of `re` (NOT SUPPORTED)
`[email protected]>`	possessive match of `re` (NOT SUPPORTED) VIM
`%(re)`	non-capturing group (NOT SUPPORTED) VIM
Flags:
`i`	case-insensitive (default false)
`m`	multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
`s`	let `.` match `\n` (default false)
`U`	ungreedy: swap meaning of `x` and `x?`, `x+` and `x+?`, etc (default false)
Flag syntax is `xyz` (set) or `-xyz` (clear) or `xy-z` (set `xy`, clear `z`).
Empty strings:
`^`	at beginning of text or line (`m`=true)
`$`	at end of text (like `\z` not `\Z`) or line (`m`=true)
`\A`	at beginning of text
`\b`	at word boundary (`\w` on one side and `\W`, `\A`, or `\z` on the other)
`\B`	not a word boundary
`\G`	at beginning of subtext being searched (NOT SUPPORTED) PCRE
`\G`	at end of last match (NOT SUPPORTED) PERL
`\Z`	at end of text, or before newline at end of text (NOT SUPPORTED)
`\z`	at end of text
`(?=re)`	before text matching `re` (NOT SUPPORTED)
`(?!re)`	before text not matching `re` (NOT SUPPORTED)
`(?<=re)`	after text matching `re` (NOT SUPPORTED)
`(?<!re)`	after text not matching `re` (NOT SUPPORTED)
`re&`	before text matching `re` (NOT SUPPORTED) VIM
`[email protected]=`	before text matching `re` (NOT SUPPORTED) VIM
`[email protected]!`	before text not matching `re` (NOT SUPPORTED) VIM
`[email protected]<=`	after text matching `re` (NOT SUPPORTED) VIM
`[email protected]<!`	after text not matching `re` (NOT SUPPORTED) VIM
`\zs`	sets start of match (= \K) (NOT SUPPORTED) VIM
`\ze`	sets end of match (NOT SUPPORTED) VIM
`\%^`	beginning of file (NOT SUPPORTED) VIM
`\%$`	end of file (NOT SUPPORTED) VIM
`\%V`	on screen (NOT SUPPORTED) VIM
`\%#`	cursor position (NOT SUPPORTED) VIM
`\%'m`	mark `m` position (NOT SUPPORTED) VIM
`\%23l`	in line 23 (NOT SUPPORTED) VIM
`\%23c`	in column 23 (NOT SUPPORTED) VIM
`\%23v`	in virtual column 23 (NOT SUPPORTED) VIM
Escape sequences:
`\a`	bell (≡ `\007`)
`\f`	form feed (≡ `\014`)
`\t`	horizontal tab (≡ `\011`)
`\n`	newline (≡ `\012`)
`\r`	carriage return (≡ `\015`)
`\v`	vertical tab character (≡ `\013`)
`*`	literal , for any punctuation character
`\123`	octal character code (up to three digits)
`\x7F`	hex character code (exactly two digits)
`\x{10FFFF}`	hex character code
`\C`	match a single byte even in UTF-8 mode
`\Q...\E`	literal text `...` even if `...` has punctuation
`\1`	backreference (NOT SUPPORTED)
`\b`	backspace (NOT SUPPORTED) (use `\010`)
`\cK`	control char ^K (NOT SUPPORTED) (use `\001` etc)
`\e`	escape (NOT SUPPORTED) (use `\033`)
`\g1`	backreference (NOT SUPPORTED)
`\g{1}`	backreference (NOT SUPPORTED)
`\g{+1}`	backreference (NOT SUPPORTED)
`\g{-1}`	backreference (NOT SUPPORTED)
`\g{name}`	named backreference (NOT SUPPORTED)
`\g`	subroutine call (NOT SUPPORTED)
`\g'name'`	subroutine call (NOT SUPPORTED)
`\k`	named backreference (NOT SUPPORTED)
`\k'name'`	named backreference (NOT SUPPORTED)
`\lX`	lowercase `X` (NOT SUPPORTED)
`\ux`	uppercase `x` (NOT SUPPORTED)
`\L...\E`	lowercase text `...` (NOT SUPPORTED)
`\K`	reset beginning of `$0` (NOT SUPPORTED)
`\N{name}`	named Unicode character (NOT SUPPORTED)
`\R`	line break (NOT SUPPORTED)
`\U...\E`	upper case text `...` (NOT SUPPORTED)
`\X`	extended Unicode sequence (NOT SUPPORTED)
`\%d123`	decimal character 123 (NOT SUPPORTED) VIM
`\%xFF`	hex character FF (NOT SUPPORTED) VIM
`\%o123`	octal character 123 (NOT SUPPORTED) VIM
`\%u1234`	Unicode character 0x1234 (NOT SUPPORTED) VIM
`\%U12345678`	Unicode character 0x12345678 (NOT SUPPORTED) VIM
Character class elements:
`x`	single character
`A-Z`	character range (inclusive)
`\d`	Perl character class
`[:foo:]`	ASCII character class `foo`
`\p{Foo}`	Unicode character class `Foo`
`\pF`	Unicode character class `F` (one-letter name)
Named character classes as character class elements:
`[\d]`	digits (≡ `\d`)
`[^\d]`	not digits (≡ `\D`)
`[\D]`	not digits (≡ `\D`)
`[^\D]`	not not digits (≡ `\d`)
`[[:name:]]`	named ASCII class inside character class (≡ `[:name:]`)
`[^[:name:]]`	named ASCII class inside negated character class (≡ `[:^name:]`)
`[\p{Name}]`	named Unicode property inside character class (≡ `\p{Name}`)
`[^\p{Name}]`	named Unicode property inside negated character class (≡ `\P{Name}`)
Perl character classes:
`\d`	digits (≡ `[0-9]`)
`\D`	not digits (≡ `[^0-9]`)
`\s`	whitespace (≡ `[\t\n\f\r ]`)
`\S`	not whitespace (≡ `[^\t\n\f\r ]`)
`\w`	word characters (≡ `[0-9A-Za-z]`)
`\W`	not word characters (≡ `[^0-9A-Za-z]`)
`\h`	horizontal space (NOT SUPPORTED)
`\H`	not horizontal space (NOT SUPPORTED)
`\v`	vertical space (NOT SUPPORTED)
`\V`	not vertical space (NOT SUPPORTED)
ASCII character classes:
`[:alnum:]`	alphanumeric (≡ `[0-9A-Za-z]`)
`[:alpha:]`	alphabetic (≡ `[A-Za-z]`)
`[:ascii:]`	ASCII (≡ `[\x00-\x7F]`)
`[:blank:]`	blank (≡ `[\t ]`)
`[:cntrl:]`	control (≡ `[\x00-\x1F\x7F]`)
`[:digit:]`	digits (≡ `[0-9]`)
`[:graph:]`	graphical (≡ `[!-~] == [A-Za-z0-9!"#$%&'()+,-./:;<=>[email protected][\]^``</tt><tt>{\|}~]</tt>)</td></tr> <tr><td><tt>[:lower:]</tt></td><td>lower case (≡ <tt>[a-z]</tt>)</td></tr> <tr><td><tt>[:print:]</tt></td><td>printable (≡ <tt>[ -~] == [ [:graph:]]</tt>)</td></tr> <tr><td><tt>[:punct:]</tt></td><td>punctuation (≡ <tt>[!-/:[email protected][-</tt><tt>{-~]`)
`[:space:]`	whitespace (≡ `[\t\n\v\f\r ]`)
`[:upper:]`	upper case (≡ `[A-Z]`)
`[:word:]`	word characters (≡ `[0-9A-Za-z]`)
`[:xdigit:]`	hex digit (≡ `[0-9A-Fa-f]`)
Unicode character class names--general category:
`C`	other
`Cc`	control
`Cf`	format
`Cn`	unassigned code points (NOT SUPPORTED)
`Co`	private use
`Cs`	surrogate
`L`	letter
`LC`	cased letter (NOT SUPPORTED)
`L&`	cased letter (NOT SUPPORTED)
`Ll`	lowercase letter
`Lm`	modifier letter
`Lo`	other letter
`Lt`	titlecase letter
`Lu`	uppercase letter
`M`	mark
`Mc`	spacing mark
`Me`	enclosing mark
`Mn`	non-spacing mark
`N`	number
`Nd`	decimal number
`Nl`	letter number
`No`	other number
`P`	punctuation
`Pc`	connector punctuation
`Pd`	dash punctuation
`Pe`	close punctuation
`Pf`	final punctuation
`Pi`	initial punctuation
`Po`	other punctuation
`Ps`	open punctuation
`S`	symbol
`Sc`	currency symbol
`Sk`	modifier symbol
`Sm`	math symbol
`So`	other symbol
`Z`	separator
`Zl`	line separator
`Zp`	paragraph separator
`Zs`	space separator
Unicode character class names--scripts:
`Arabic`	Arabic
`Armenian`	Armenian
`Balinese`	Balinese
`Bengali`	Bengali
`Bopomofo`	Bopomofo
`Braille`	Braille
`Buginese`	Buginese
`Buhid`	Buhid
`Canadian_Aboriginal`	Canadian Aboriginal
`Carian`	Carian
`Cham`	Cham
`Cherokee`	Cherokee
`Common`	characters not specific to one script
`Coptic`	Coptic
`Cuneiform`	Cuneiform
`Cypriot`	Cypriot
`Cyrillic`	Cyrillic
`Deseret`	Deseret
`Devanagari`	Devanagari
`Ethiopic`	Ethiopic
`Georgian`	Georgian
`Glagolitic`	Glagolitic
`Gothic`	Gothic
`Greek`	Greek
`Gujarati`	Gujarati
`Gurmukhi`	Gurmukhi
`Han`	Han
`Hangul`	Hangul
`Hanunoo`	Hanunoo
`Hebrew`	Hebrew
`Hiragana`	Hiragana
`Inherited`	inherit script from previous character
`Kannada`	Kannada
`Katakana`	Katakana
`Kayah_Li`	Kayah Li
`Kharoshthi`	Kharoshthi
`Khmer`	Khmer
`Lao`	Lao
`Latin`	Latin
`Lepcha`	Lepcha
`Limbu`	Limbu
`Linear_B`	Linear B
`Lycian`	Lycian
`Lydian`	Lydian
`Malayalam`	Malayalam
`Mongolian`	Mongolian
`Myanmar`	Myanmar
`New_Tai_Lue`	New Tai Lue (aka Simplified Tai Lue)
`Nko`	Nko
`Ogham`	Ogham
`Ol_Chiki`	Ol Chiki
`Old_Italic`	Old Italic
`Old_Persian`	Old Persian
`Oriya`	Oriya
`Osmanya`	Osmanya
`Phags_Pa`	'Phags Pa
`Phoenician`	Phoenician
`Rejang`	Rejang
`Runic`	Runic
`Saurashtra`	Saurashtra
`Shavian`	Shavian
`Sinhala`	Sinhala
`Sundanese`	Sundanese
`Syloti_Nagri`	Syloti Nagri
`Syriac`	Syriac
`Tagalog`	Tagalog
`Tagbanwa`	Tagbanwa
`Tai_Le`	Tai Le
`Tamil`	Tamil
`Telugu`	Telugu
`Thaana`	Thaana
`Thai`	Thai
`Tibetan`	Tibetan
`Tifinagh`	Tifinagh
`Ugaritic`	Ugaritic
`Vai`	Vai
`Yi`	Yi
Vim character classes:
`\i`	identifier character (NOT SUPPORTED)/font> VIM
`\I`	`\i` except digits 相關推薦 RE2，C++正則表示式庫實戰 RE2簡介 RE2是,一個高效、原則性的正則表示式庫，由Rob Pike和Russ Cox兩位來自google的大牛用C++實現。他倆同時也是Go語言的主導者。Go語言中的regexp正則表示式包，也是RE2的Go實現。 RE2是，一個快速、安全，執行緒友好，P C++11新特性(74)-正則表示式庫(regular-expression library) 正則表示式（regular expression）是一種描述字元序列的方法，從C++11起，C++正則表示式庫（regular-expression library）成為新標準庫的一部分。由於正則表示式本身就是一個非常龐大的系統，本文只介紹C++中使用正則表示式的小例子，淺嘗輒止。基 logging日誌模組，re正則表示式模組，hashlib hash演算法相關的庫， logging：功能完善的日誌模組 import logging #日誌的級別 logging.debug("這是個除錯資訊")#級別10 #常規資訊 logging.info("常規資訊")#20 #警告資訊 logging.warning("警告 C#正則表示式，匹配小數匹配上一個元素零次或多次\+轉義後為‘+’\-轉義後為‘-’，匹配負數\d一個數字\d零個或多個數字\.一個點，點前加\是為了轉義，如果不轉義的話，一個點代表一個萬用字元，也就是任意一個字元都會被匹配，所以此處只匹配小數點要加\\d{2}匹配兩個數字string strs [C/C++11]_[初級]_[使用正則表示式庫regex] 場景正則表示式在處理非常量字串查詢,替換時能很省事,如果稍微複雜點的字串匹配, 沒有正則表示式還真做不出來. C++11 為我們提供了正則表示式庫. 使用起來比boost的正則庫方便. 搞Java C# 正則表示式抓取網頁上某個標籤的內容，並替換連結地址和圖片地址 #region 獲取第三方網站內容 //獲取其他網站網頁內容的關鍵程式碼 WebRequest request = WebRequest.Create(第三方的網站地址); WebResponse response = requ C/C++高效的正則表示式庫PCRE/PCRE++ 寫在前面：本文是《深入淺出C/C++中的正則表示式庫》系列的第三篇，本文的主要內容是介紹PCRE和PCRE++，因為它們兩個是很有淵源的，所以放在一起講。 1. 什麼是PCRE? 什麼是PCRE++? PCRE，全稱是Perl Compatible Regular Expressions。從名字我們可以看出 [C/C++11]_[初級]_[使用正則表示式庫進行分組查詢] 場景 1.正則表示式在查詢替換字串資料時效率很高, 可以節省很多不必要的查詢程式碼. 特別是對字串分組的查詢, 可以說如果沒有正則表示式,查詢分組裡的字串需要寫很多額外的程式碼，還不一定準確. 2.查詢並替換XML標籤是比較常見的需求, 比如過濾掉HTML標 C# 正則表示式的一些語法筆記基礎語法看這裡：正則表示式語法下面是一些其他要注意的點…… 反斜槓 + 元字元表示匹配元字元本身。 eg：元字元 . 表示除換行符以外的任意字元，而 \. 表示匹配 . 。一般定義正則表示式字串時會加上 @，表示不轉義。 eg：string patt C#正則表示式簡單總結 C#： c#中驗證正則表示式的類為System.Text.RegularExpressions.Regex 簡單的匹配方法為IsMatch（4個過載方法）正則表示式語法：一、匹配單個字元 [ ]：從中選擇一個字元匹配如：單詞字元（[ae]）、非單詞 C# -- 正則表示式匹配字元之含義原文: C# -- 正則表示式匹配字元之含義 C#正則表示式匹配字元之含義 1.正則表示式的作用：用來描述字串的特徵。 2.各個匹配字元的含義： . ：表示除\n以外的單個字元 [ ] ：表示在字元陣列[]中羅列出來的字元任意取單個 \| ：表示“或”的意思 Re 庫——正則表示式庫 regular expression, regex, RE 正則表示式是用來簡潔表達一組字串的表示式正則表示式的常用操作符操作符說明例項 . 表示任何單個字元 C# 正則表示式檢查輸入字元 public class Validator 2 { 3 #region 匹配方法 4 /// <summary> 5 /// 驗證字串是否匹配正則表示式描述的規則 6 python--正則表示式的實戰正則表示式是在re模組裡的，所以需要事前匯入re模組，re模組裡面的search(）方法用於在字串中搜索正則表示式模式第一次出現的位置。注意：下標從0開始，原字串前面需要加入r'原字串' 上圖因為.號可以匹配除了換行符之外的任何字元，所以它匹配了I，也就是第一個字元 C 正則表示式的完全匹配部分匹配及忽略大小寫的問題分享一下我老師大神的人工智慧教程！零基礎，通俗易懂！http://blog.csdn.net/jiangjunshow 也歡迎大家轉載本篇文章。分享知識，造福人民，實現我們中華民族偉大復興！ C#正則表示式入門（下）一、匹配郵政編碼，郵政編碼為6位數字組成。 string code; code = Console.ReadLine(); Regex reg = new Regex(@"^\d{6}$",RegexOptions.None); Console.WriteLine(reg.IsMat C#正則表示式入門（中）一、忽略匹配優先模式 *? 重複任意次，但儘可能少重複 +? 重複1次或更多次，但儘可能少重複 ?? 重複0次或1次，但儘可能少重複 {n,m}? 重複n到m次，但儘可能少重複 {n,}? 重複n次以上，但儘可能少重複【例二】在滿足匹配時 C#正則表示式入門（上）一、說明使用正則表示式需要包含名字空間using System.Text.RegularExpressions; .Net使用的是傳統型NFA引擎，.NET正則表示式流派概述分類舉例字元縮略表示法 C#正則表示式提取HTML中IMG標籤的SRC地址原文：http://blog.csdn.net/smeller/article/details/7108502 一般來說一個 HTML 文件有很多標籤，比如“<html>”、“<body>”、“<table>”等，想把文件中的 img 標籤提取出來並不 C#正則表示式Regex類的使用 C#中為正則表示式的使用提供了非常強大的功能，這就是Regex類。這個包包含於System.Text.RegularExpressions名稱空間下面，而這個名稱空間所在DLL基本上在所有的專案模板中都不需要單獨去新增引用，可以直接使用。 1、定義一個Regex類的例項搜尋基礎教學 Mysql入門 Sql入門 Android入門 Docker入門 Go語言入門 Ruby程式入門 Python入門 Python進階 Django入門 Python爬蟲入門最近訪問首頁前端設計程式設計免費資源實用技巧資料庫資訊字典 Copyright © 2002-2020 程式人生 796T.COM All rights reserved.