字串匹配——Rabin–Karp algorithm（二）

阿新 • • 發佈：2018-12-17

上一篇https://blog.csdn.net/To_be_to_thought/article/details/84890018只是介紹了樸素的Rabin–Karp algorithm，這一篇主要說說樸素Rabin–Karp algorithm的優化。

模式串P長度為L，文字串S長度為n，在S的一輪遍歷中找到P的位置，上文提到的hash(P)的複雜度為O(L)，對S的每個長度為L的子字串進行hash函式計算需要O(nL)，如果某個子串雜湊值與hash(P)相等，則進行該子串與P的一一比對，該步複雜度為O(L)，這種樸素的方法總共花費O(nL)的時間。

我們注意到這些字串的字元有很多是相互重合的，比如字串“algorithms”的5字元的子字串“algor”和“lgori”有四個字母是一樣的，如果能利用這個共享子串來減少計算，這就是“rolling hash”的由來。

舉個例子：P=“90210”，S=“48902107”

S的5字元的子串包括：，數字字符集為。

子字串k的雜湊函式計算方法：

遞推公式(類似於滑動視窗)為：

使用rolling hash將計算每個子串的雜湊值複雜度變成O(1)，而所有子字串的雜湊值計算也降到了O(n)複雜度。

更一般的：

其中，表示文字串裡的第i+1個長度為L的子字串，為文字串中第i個字元（i從0到n-L取值）。

演算法程式碼如下：

class Solution {
    public static int base=256;
    public static int module=101;
    public static boolean match(String str1,String str2)
    {
        assert str1.length()==str2.length();
        for(int i=0;i<str1.length();i++)
        {
            if(str1.charAt(i)!=str2.charAt(i))
                return false;
        }
        return true;
    }
    
    public int strStr(String haystack, String needle) {
        if(needle=="" || needle.length()==0)
            return 0;
        int m=needle.length(),n=haystack.length(),h=1;
        if(n==0 || n<m)
            return -1;
        for(int i=0;i<m-1;i++)
            h=(h*base)%module;
        int p=0,t=0;
        for(int i=0;i<m;i++)
        {
            p=(p*base+needle.charAt(i))%module;
            t=(t*base+haystack.charAt(i))%module;
        }
        for(int i=0;i<n-m+1;i++)
        {
            if(t==p)
            {
                if(match(needle,haystack.substring(i,i+m)))
                    return i;
            }
            if(i<n-m)
            {
                t=( base * (t-haystack.charAt(i) * h) + haystack.charAt(i+m) )%module;
                if(t<0)
                    t=t+module;
            }
        }
        return -1;
    }
}

字串匹配——Rabin–Karp algorithm（二）

字串匹配——Rabin–Karp algorithm（二）

字串匹配——Rabin–Karp algorithm

C# 基礎知識系列- 9 字串的更多用法（二）

字串匹配演算法（二）窮舉與自動機

java基礎學python（二）-------------字串

JAVA基礎複習（二）類、方法、字串

Go語言字串高效拼接（二）

java學習筆記（二）parseInt和valueOf 以及字串+和StringBuilder的區別

spring boot 整合thyemleaf基本使用字串的操作（二）

手遊客戶端的效能篇（二）----Unity和C#版之字串拼接，Struct和Class的區別與應用

Python基礎學習---運算子和字串（二）

格式化字串（二）

Redis原始碼剖析（二）--簡單動態字串

Redis底層詳解（二）字串

字串匹配原理及實現（C++版）

動態記憶體分配實用案例（二）之複製字串

STL（八）——演算法（Algorithm）（二）：copy

MFC字串操作（二）CString.Format的用法

redis原始碼解析（二）動態字串sds基本功能函式

Boost（五）——字串處理（二）：正則表示式操作

字串匹配——Rabin–Karp algorithm（二）

相關推薦