2018ICPC北京區域 Approximate Matching(AC自動機+DP)
String matching, a common problem in DNA sequence analysis and text editing, is to find the occurrences of one certain string (called pattern) in a larger string (called text). In some cases, the pattern is not required to be exactly in the text, and minor differences are acceptable (due to possible typing mistakes). When given a pattern string and a text string, we say pattern P is approximately matched within text S, if there is a substring of S which is at most one letter different from P. Note that the length of this substring and the pattern must be identical. For example, pattern “abb” is approximately matched in text “babc” but not matched in “bbac”.
It is easy to check if a pattern is approximately matched in a text. So your task is to count the number of all text strings of length m in which the given pattern can be approximately matched, and both of the patterns and texts are binary strings in order not to handle big integers.
Input
The first line of input is a single integer T (1 ≤ T ≤ 666), the number of test cases. Each test case begins with a line of two integers n,m (1 ≤ n,m ≤ 40), denoting the length of pattern string and text string. Then a single line of binary string P follows, which denotes the pattern. Note that there will be at most 15 test cases in which n ≥ 16.
Output
For each test case, output a single line with one integer, representing the answer.
給出了一個“相似”的概念:兩個串最多有一個位置不同稱之為相似,然後給出一個長度為n的串,問有多少個長度為m的串可以取出一個長度為n的子串,與給出串相似,並且說明了給出的是01串。
據說是一類模板題,我是第一次見,這題重新整理了我對ac自動機的印象,ac自動機nb!
由於是01串,而且串長極小(40),因此可以考慮暴力列舉與給出串有一個位置不同的串,建立ac自動機,然後再在這上面dp。
之所以可以這樣做,是利用了字典樹的性質:只要加入字典樹的兩個串有任何一個位置不同,他們的終點一定不同,所以dp過程中只要碰到終態,就可以對結果+1。
選用ac自動機是因為他建成trie圖之後前進(走next)和後退(跳fail)有著極強的統一性,拿來dp非常好寫。實際上用廣義Sam應該也是可以dp的,但是比較難寫。
然後有一個神奇的操作,可以對ac自動機上每一個串的終態對dp數組裡的一個不會用到的地方連邊,在dp過程中,碰到終態的就全部會彙總到這裡。
最後是dp[i][j],i表示第要求串的長度,j是ac自動機的每個狀態,一開始dp[0][root] = 1,把這個1往後推,推到最後就會因為終態的連邊跳到答案收集點。
不過這樣搞可能會把自動機的next邊搞出環,在getFail跑bfs時會死迴圈,要判一下。
ac自動機,沒有fail也能跑的自動機.jpg
#include<bits/stdc++.h>
using namespace std;
typedef long long ll;
const int maxn = 50005;
int t, n, m;
char s[maxn];
struct AC_Automaton {
int next[maxn][2];
int fail[maxn];
ll dp[41][2005];
int sz, root;
int newNode() {
for(int i = 0; i < 2; i++) {
next[sz][i] = -1;
}
fail[sz] = -1;
return sz++;
}
void init() {
sz = 1;
memset(dp, 0, sizeof(dp));
root = newNode();
next[0][0] = next[0][1] = 0;
}
void add() {
int p = root, c;
for(int i = 0, len = strlen(s); i < len; i++) {
c = s[i] - '0';
if(i == len - 1) {
next[p][c] = 0;
return;
}
if(next[p][c] == -1) {
next[p][c] = newNode();
}
p = next[p][c];
}
}
void getFail() {
queue<int> q;
fail[root] = root;
for(int i = 0; i < 2; i++) {
if(~next[root][i]) {
fail[next[root][i]] = root;
q.push(next[root][i]);
} else {
next[root][i] = root;
}
}
while(!q.empty()) {
int p = q.front();
q.pop();
for(int i = 0; i < 2; i++) {
if(~next[p][i]) {
fail[next[p][i]] = next[fail[p]][i];
if(next[p][i]) {
q.push(next[p][i]);
}
} else {
next[p][i] = next[fail[p]][i];
}
}
}
}
void build() {
init();
add();
char xorer = '0' ^ '1';
for(int i = 0; s[i]; i++) {
s[i] ^= xorer;
add();
s[i] ^= xorer;
}
getFail();
}
void solve() {
build();
ll ans = 0;
dp[0][root] = 1;
for(int i = 0; i < m; i++) {
for(int j = 0; j < sz; j++) {
dp[i + 1][next[j][0]] += dp[i][j];
dp[i + 1][next[j][1]] += dp[i][j];
}
}
printf("%lld\n", dp[m][0]);
}
} ac;
int main() {
scanf("%d", &t);
while(t--) {
scanf("%d%d%s", &n, &m, s);
ac.solve();
}
return 0;
}