Hash Tables: String Search--Data Structure
Naive Algorithm
For each position i from 0 to |T| − |P|,check character-by-character whetherT[i..i+ |P| −1] = Por not. If yes, append i to the result.
AreEqual(S1,S2) |
if|S1|≠ |S2|: return False for i from 0 to|S1|−1: if S1[i]̸=S2[i]: return False
return True
|
FindPatternNaive |
result ←
empty list if AreEqual(T[i..i+|P|−1],P): result.Append(i) return result
|
Rabin-Karp’s Algorithm
RabinKarp(T, P) |
p←big prime, x←random(1,p−1) result ←
empty list tHash ← if pHash !=tHash: continue result.Append(i) return result |
Improving
PrecomputeHashes(T,|P|,p,x) |
H ←array of length |T|−|P|+1 S ←T[|T| − |P|..|T| − 1] H[|T|−|P|]← PolyHash(S,p,x)y←1 for i from 1 to|P|: y ←(y for i from|T|−|P|−1
down to 0: return H |
O(|P|+|P|+|T| − |P|) =O(|T|+ |P|)
RabinKarp(T, P) |
p←big prime, x←random(1,p−1) result ←
empty list for i from 0 to|T|−|P|: if pHash ̸=H[i]: continue if AreEqual(T[i..i+|P|−1],P): result.Append(i) return result
|
Improved Running Time
h(P)is computed in O(|P|)
Precompute Hashes runs in O(|T|+
|P|)
Total time spent in AreEqual is O(q|P|)on
average where q is the number of occurrences ofP
in T
Average running time O(|T|+ (q+ 1)|P|)
Usually q is small, so this is much less than O(|T||P|)
Conclusion
Hash tables are useful for storing Sets and Maps
Possible to search and modify hash tables in O(1)on average!
Must use good hash families and randomization
Hashes are also useful while working with strings and texts
There are many more applications indistributed systems and data science