1. 程式人生 > 其它 >正則表示式-捕獲組和反向引用

正則表示式-捕獲組和反向引用

一、捕獲組

捕獲組是正則中分組的一個概念,若是要對一段字元進行重複,就須要有用到分組,分組在正則中用"()"表示.而後後面能夠對這個組進行重複引用。

捕獲組分為兩類:普通捕獲組和命名捕獲組。

我們可以通過以下兩個簡單的demo來體會:

1. 普通捕獲組

從正則表示式左側開始,每出現一個左括號“(”記作一個分組,分組編號從1開始。0表明整個表示式。

public void test8() {
        String DATE_STRING = "2021-07-01";
        String P_COMM = "(\\d{4})-((\\d{2})-(\\d{2}))";
        Pattern pattern 
= Pattern.compile(P_COMM); Matcher matcher = pattern.matcher(DATE_STRING); matcher.find();//必需要有這句 System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0)); System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1)); System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2)); System.out.printf(
"\nmatcher.group(3) value:%s", matcher.group(3)); System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4)); }

列印結果:

matcher.group(0) value:2021-07-01
matcher.group(1) value:2021
matcher.group(2) value:07-01
matcher.group(3) value:07
matcher.group(4) value:01
Process finished with exit code 0

2. 命名捕獲組

每一個以左括號開始的捕獲組,都緊跟著“?”,然後才是正則表示式。

public void test9() {
        String P_NAMED = "(?<year>\\d{4})-(?<md>(?<month>\\d{2})-(?<date>\\d{2}))";
        String DATE_STRING = "2021-07-01";

        Pattern pattern = Pattern.compile(P_NAMED);
        Matcher matcher = pattern.matcher(DATE_STRING);
        matcher.find();
        System.out.printf("\n===========使用名稱獲取=============");
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\n matcher.group('year') value:%s", matcher.group("year"));
        System.out.printf("\nmatcher.group('md') value:%s", matcher.group("md"));
        System.out.printf("\nmatcher.group('month') value:%s", matcher.group("month"));
        System.out.printf("\nmatcher.group('date') value:%s", matcher.group("date"));
        matcher.reset();
        System.out.printf("\n===========使用編號獲取=============");
        matcher.find();
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1));
        System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2));
        System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3));
        System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4));
    }

程式結果:

===========使用名稱獲取=============
matcher.group(0) value:2021-07-01
 matcher.group('year') value:2021
matcher.group('md') value:07-01
matcher.group('month') value:07
matcher.group('date') value:01
===========使用編號獲取=============
matcher.group(0) value:2021-07-01
matcher.group(1) value:2021
matcher.group(2) value:07-01
matcher.group(3) value:07
matcher.group(4) value:01

3. 非捕獲組

在左括號後緊跟“?:”,然後再加上正則表示式,構成非捕獲組(?:Expression)

  public void test10() {
        String P_UNCAP = "(?:\\d{4})-((\\d{2})-(\\d{2}))";
        String DATE_STRING = "2021-07-01";

        Pattern pattern = Pattern.compile(P_UNCAP);
        Matcher matcher = pattern.matcher(DATE_STRING);
        matcher.find();
        System.out.printf("\nmatcher.group(0) value:%s", matcher.group(0));
        System.out.printf("\nmatcher.group(1) value:%s", matcher.group(1));
        System.out.printf("\nmatcher.group(2) value:%s", matcher.group(2));
        System.out.printf("\nmatcher.group(3) value:%s", matcher.group(3));

        // Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4
        System.out.printf("\nmatcher.group(4) value:%s", matcher.group(4));
    }

執行結果:

matcher.group(0) value:2021-07-01
matcher.group(1) value:07-01
matcher.group(2) value:07
matcher.group(3) value:01
java.lang.IndexOutOfBoundsException: No group 4

二、反向引用

1.反向引用須要使用到分組,分組就是使用()括起來的部分為一個總體,在進行分組匹配時的原則是:由外向內,由左向右3d

2.反向引用如:\1,\2等 \1:表示的是引用第一次匹配到的()括起來的部分 \2:表示的是引用第二次匹配到的()括起來的部分 例:String regex = "^(\\d)\\1$"; 首先這裡是匹配兩位,\d一位,\1又引用\d一位這裡的\1會去引用(\d)匹配到的內容,由於(\d)是第一次匹配到的內容。 如:str = "22"時,(\\d)匹配到2,因此\1引用(\\d)的值也為2,因此str="22"能匹配 str = "23"時,(\\d)匹配到2,由於\1引用(\\d)的值2,而這裡是3,因此str="23"不能匹配 下面通過一些demo來體會下:
    @Test
    public void test1() {
        String reg = "([a-z]{3}[1-9]{3})[a-z]{3}[1-9]{3}";
        String str = "asd123asd123";
        //常規寫法 true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test2() {
        String reg = "([a-z]{3}[1-9]{3})\\1";
        String str = "asd123asd123";
        //使用到反向引用 true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test3() {
        String str = "1234567123123123";
        // 只能匹配“123123”
        Pattern p = Pattern.compile("(\\d\\d\\d)\\1");
        Matcher m = p.matcher(str);
        // 1
        System.out.println(m.groupCount());
        while (m.find()) {
            String word = m.group();
            // 123123 7 13
            System.out.println(word + " " + m.start() + " " + m.end());
        }
    }

    @Test
    public void test4() {
        String pattern = "\\b(\\w+)\\b[\\w\\W]*\\b\\1\\b";
        Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
        String phrase = "unique is not duplicate but unique, Duplicate is duplicate.";
        Matcher m = p.matcher(phrase);
        while (m.find()) {
            String val = m.group();
            System.out.println("Matching subsequence is \"" + val + "\"");
            System.out.println("Duplicate word: " + m.group(1) + "\n");
        }
    }

    @Test
    public void test5() {
        String reg = "(\\w)(\\w)\\2\\1";
        String str = "abba";
        // true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test6() {
        String reg = "(\\w)(\\w)\\2\\1";
        String str = "abba";
        // true
        System.out.println(Pattern.matches(reg, str));
    }

    @Test
    public void test7() {
        String reg = "([a-z]{3})([1-9]{3})\\1\\2";
        String str = "asd123asd123";
        // true
        System.out.println(Pattern.matches(reg, str));

        String reg1 = "([a-z]{3})([1-9]{3})\\2\\1";
        String str1 = "asd123123asd";
        // true
        System.out.println(Pattern.matches(reg1, str1));
    }