正則表示式筆記 6 邊界符中的 ^, $, \A, \Z, \z

Regex :

本文介紹正則表示式中邊界符 ^ 和 $ 以及 \A 和 \Z, \z 的比較和用法
本文的正則表示式在 Java 中測試
本文的一些概念關鍵詞以高亮標出，正則表示式以高亮標出

State :

這個是在 Java 7 的文件裡截圖下來的：

^ 和 $ 分別代表一行（line）的開始和結束的位置；\A 和 \z 分別代表輸入（input）的開始和結束位置；\Z 代表輸入的結尾位置，但是字串的結尾可以有也可以沒有終止子（final terminator：\n, \r, \r\n, \u0085, \u2028, \u2029）。

Line & Input

（行和輸入的區別）:

行是以終止子作為標誌結束的字串片段，輸入是整一段字串。例如 "Ggicci is a good guy.\nGgicci's real name is OOXX."，這段字串就是一個輸入，其中 "Ggicci is a good guy." 就是一個行。

Single-line Mode & Multi-line Mode（匹配的單行模式和多行模式）:

在用 Java 寫一些匹配的時候，Pattern類的靜態方法 static Pattern compile(String regex, int flags) 引數表中 flags 有 DOTALL和 MULTILINE

兩個標誌，DOTALL 表示表示式 . 能匹配任何字元，包括終止子，即通常所說的單行模式（single-line mode），此時，^ 和 $ 只能匹配整一個輸入序列的開始和結束位置； MULTILINE 表示 ^, $ 能分辨出終止子的位置，即多行模式（multi-line mode）。

Sample_1 : ^,$ 在單行和多行模式下的匹配的差別

單行模式：
Input: "Google\nApple" Regex: ^Google\nApple$ 匹配到: "Google\nApple"
Input: "Google\nApple" Regex: ^Google$ 匹配到: 無
多行模式：
Input: "Google\nApple" Regex: ^Google\nApple$ 匹配到: "Google\nApple"

Input: "Google\nApple" Regex: ^Google$ 匹配到: "Google"

   1: String source = "Google\nApple";

   2: Pattern pattern = Pattern.compile("^Google\nApple$"); //--> "Google\nApple"

   3: //Pattern pattern = Pattern.compile("^Google$"); //--> null

   4: //Pattern pattern = Pattern.compile("^Google\nApple$", Pattern.MULTILINE); //--> "Google\nApple"

   5: //Pattern pattern = Pattern.compile("^Google$", Pattern.MULTILINE); //--> "Google"

   6: Matcher matcher = pattern.matcher(source);

   7: while (matcher.find()) {

   8:     System.out.println(matcher.group());

   9: }

Sample_2 : \z 和 \Z 的區別

首先，\z 和 \Z 在單行和多行模式下都是對整個輸入而言。\z 比較好理解，不管怎樣，\A和\z匹配的是整段輸入；而 \Z 匹配的時候在輸入的結尾處有和沒有終止子都能匹配。

Input: "Google\nApple" Regex: \AGoogle\nApple\z 匹配到: "Google\nApple"
Input: "Google\nApple" Regex: \AGoogle\nApple\Z 匹配的: "Google\nApple"
Input: "Google\nApple\n" Regex: \AGoogle\nApple\z 匹配到: 無

Input: "Google\nApple\n" Regex: \AGoogle\nApple\Z 匹配到: "Google\nApple" --> \Z 可以要求輸入的結尾處有一個終止子，這裡是\n當然換成\r或者\r\n也行

   1: String source = "Google\nApple";

   2: //String source = "Google\nApple\n";

   3: Pattern pattern = Pattern.compile("\\AGoogle\nApple\\Z");

   4: //Pattern pattern = Pattern.compile("\\AGoogle\nApple\\Z");

   5: Matcher matcher = pattern.matcher(source);

   6: while (matcher.find()) {

   7:     System.out.println(matcher.group());

   8: }

Conclusion :

\A 和 \z 匹配的是整段輸入，完完整整，不偏不倚，不管在單行模式還是多行模式下
\A 和 \Z 匹配的是整段輸入，結尾終止子可有可無，不管在單行模式還是多行模式下
^ 和 $ 在單行模式下匹配整段輸入，同 \A 和 \z，在多行模式下匹配行，可以分辨終止子

正則表示式筆記 6 邊界符中的 ^, $, \A, \Z, \z

相關推薦