1. 程式人生 > >4821 String(hash+map去重)

4821 String(hash+map去重)

String

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submission(s): 4430    Accepted Submission(s): 1341 Problem Description

Given a string S and two integers L and M, we consider a substring of S as “recoverable” if and only if   (i) It is of length M*L;   (ii) It can be constructed by concatenating M “diversified” substrings of S, where each of these substrings has length L; two strings are considered as “diversified” if they don’t have the same character for every position. Two substrings of S are considered as “different” if they are cut from different part of S. For example, string "aa" has 3 different substrings "aa", "a" and "a". Your task is to calculate the number of different “recoverable” substrings of S.

Input

The input contains multiple test cases, proceeding to the End of File. The first line of each test case has two space-separated integers M and L. The second ine of each test case has a string S, which consists of only lowercase letters. The length of S is not larger than 10^5, and 1 ≤ M * L ≤ the length of S.

Output

For each test case, output the answer in a single line.

Sample Input

3 3 abcabcbcaabc

Sample Output

2

Source

題意:給出一個字串S,輸出S有多少個字串,這些子串由M個長度為L不相同的子串組成

------------------------------------------------------------------------------------------------------------------------------

首先我們會想一下二進位制數。

對於任意一個二進位制數,我們將它化為10進位制的數的方法如下(以二進位制數1101101為例):

hash用的也是一樣的原理,為每一個字首(也可以後綴,筆者習慣1 base,所以喜歡用字首來計算,Hash[i] = Hash[i - 1] * x + s[i](其中1 < i <= n,Hash[0] = 0)。

一般地,

而對於l - r區間的hash值,則為:

但是如果n很大呢?那樣不是會溢位了嗎?

因此我們把hash值儲存在unsigned long long裡面, 那樣溢位時,會自動取餘2的64次方,but這樣可能會使2個不同串的雜湊值相同,但這樣的概率極低(不排除你的運氣不好)。

因此我們可以通過Hash值來比較兩個字串是否相等。

-----------------------------------------------------------------------------------------------------------------------------

一位一位的列舉複雜度為(len-ML)*(ML),肯定會超時,是用到了類似於滑動視窗的原理降低了複雜度,每次列舉L位,枚舉了M個之後檢查時候符合題意,然後去掉開頭的L位,再在末尾加上L位,判斷是否合法,最多隻需要從L個位置開始滑動就行了,因為從L+1位置開始滑的之前在1位置開始滑已經記錄過了

#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
const int maxn = 1e5 + 5;
const int seed = 31;
ull _base[maxn],_hash[maxn];
map<ull,ull> mp;
int M,L;
char s[maxn];
void init()
{
    _base[0] = 1;
    for(int i = 1; i <= maxn; i++) {
        _base[i] = _base[i - 1] * seed;
    }
}
ull str_hash(int l,int r)
{
    return _hash[r] - _hash[l - 1] * _base[r - l + 1];
}
int main(void)
{

    init();
    while(scanf("%d %d",&M,&L) != EOF) {
        scanf("%s",s + 1);
        int len = strlen(s + 1);
        _hash[0] = 0;
        for(int i = 1; i <= len; i++) {
            _hash[i] = _hash[i - 1] * seed + (s[i] - 'a');
        }
        int ans = 0;
        for(int i = 1; i <= L && i + M * L <= len; i++) {
            mp.clear();
            for(int j = i; j < i + M * L; j += L) {
                mp[str_hash(j,j + L - 1)]++;
            }
            if(mp.size() == M) {
                ans++;
            }
            for(int j = i + M * L; j <= len - L + 1; j += L) {
                mp[str_hash(j - M * L,j - M * L + L - 1)]--;
                if(mp[str_hash(j - M * L,j - M * L + L - 1)] == 0) {
                    mp.erase(str_hash(j - M * L,j - M * L + L - 1));
                }
                mp[str_hash(j,j + L - 1)]++;
                if(mp.size() == M) {
                    ans++;
                }
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}