1. 程式人生 > 實用技巧 >POJ 3461 Oulipo (kmp模板題)

POJ 3461 Oulipo (kmp模板題)

題目連結:POJ 3461

Describe:

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T'

s is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W

in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input:

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

  • One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
  • One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output:
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input:
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output:
1
3
0

題目大意:

若干樣例,每個樣例第一行給出一個單詞,第二行給出一個字串,要求輸出,字串中有多少個給定的單詞。

解題思路:

比較裸的KMP,不過有一點需要根據kmp得到的next陣列進行優化,在我們匹配完一次單詞以後,如果所給單詞從頭匹配就會TLE,所以此時讓下標j=next[j-1],道理很簡單,同樣是根據字首字尾,公共的部分已經匹配過,不需要重複匹配。

AC程式碼:

 1 #include <iostream>
 2 #include <cstring>
 3 #include <cstdio>
 4 using namespace std;
 5 char str[1000010];   // 輸入的字串
 6 char pattern[10010]; // 輸入的單詞,相當於模式串
 7 int next[10010];
 8 int n1,n2; // 兩個串的長度
 9 // 常規next陣列
10 void getnext()
11 {
12     memset(next,0,sizeof(next));
13     int i,j;
14     for(i=1,j=0; i < n2; )
15     {
16         if(pattern[i] == pattern[j]) {
17             next[i] = next[i-1]+1;
18             i++;j++;
19         } else {
20             if(j != 0)
21             {
22                 j = next[j-1];
23             } else next[i++] = 0;
24         }
25     }
26 }
27 // 常規kmp
28 int kmp()
29 {
30     int ans = 0; // 記錄出現次數
31     int i,j;
32     for(i=0,j=0; i <= n1; ) // 迴圈條件注意,我這種寫法要有等號
33     {
34         if(j == n2)
35         {
36             j = next[j-1]; // 重點,根據next陣列優化
37             ans++;
38             if(i == n1) break;
39         }
40         if(str[i] == pattern[j])
41         {
42             i++;
43             j++;
44         } else {
45             if(j != 0) j = next[j-1];
46             else i++;
47         }
48     }
49     return ans;
50 }
51 int main()
52 {
53     int T;
54     scanf("%d",&T);
55     while(T--)
56     {
57         scanf("%s%s",pattern,str);
58         n1 = strlen(str);
59         n2 = strlen(pattern);
60         getnext();
61         printf("%d\n",kmp());
62     }
63     return 0;
64 }