1. 程式人生 > >Hash Tables: String Search--Data Structure

Hash Tables: String Search--Data Structure

Naive Algorithm

For each position i from 0 to |T| − |P|,check character-by-character whetherT[i..i+ |P| −1] = Por not. If yes, append i to the result. 

AreEqual(S1,S2)

if|S1|≠ |S2|:

  return False

for i from 0 to|S1|−1:

  if S1[i]̸=S2[i]:

  return False
return True

FindPatternNaive

(T, P)

result empty list
for i from 0 to |T|−|P|:

  if AreEqual(T[i..i+|P|−1],P):

    result.Append(i)

return result

Rabin-Karp’s Algorithm 

RabinKarp(T, P)

pbig prime, xrandom(1,p1)

result empty list
pHash PolyHash(Ppx)
for i from 0 to |T|−|P|:

  tHash

PolyHash(T[i..i+|P|−1]px)

  if pHash !=tHash:

    continue
  if AreEqual(T[i..i+|P|−1],P):

    result.Append(i)

return result

Improving 

PrecomputeHashes(T,|P|,p,x)

H array of length |T|−|P|+1

S T[|T| − |P|..|T| − 1]

H[|T|−|P|]PolyHash(S,p,x)y1

for i from 1 to|P|:

  y (y

× x)mod p

for i from|T|−|P|−1 down to 0:
  H[i](xH[i+ 1] +T[i]yT[i+ |P|])mod p

return H

O(|P|+|P|+|T| − |P|) =O(|T|+ |P|

RabinKarp(T, P)

pbig prime, xrandom(1,p1)

result empty list
pHash PolyHash(P,p,x)
H PrecomputeHashes(T,|P|,p,x)

for i from 0 to|T|−|P|:

  if pHash ̸=H[i]:

    continue

  if AreEqual(T[i..i+|P|−1],P):

    result.Append(i)

return result

Improved Running Time

h(P)is computed in O(|P|)

Precompute Hashes runs in O(|T|+ |P|)
Total time spent in AreEqual is O(q|P|)on average where q is the number of occurrences ofP in T

Average running time O(|T|+ (q+ 1)|P|)

Usually q is small, so this is much less than O(|T||P|

Conclusion

Hash tables are useful for storing Sets and Maps
Possible to search and modify hash tables in O(1)on average!

Must use good hash families and randomization
Hashes are also useful while working with strings and texts

There are many more applications indistributed systems and data science