實操案例：字串雜湊表操作

阿新 • • 發佈：2020-07-20

摘要：當遇到C語言庫沒有字串雜湊表的時候，該如何進行操作。

有考C語言可信程式設計認證的同事經常會問到，C語言庫沒有字串雜湊表操作，那考試遇到了怎麼辦。雖然歷次考試的題目中沒有必須要用到C語言雜湊表的題目（至少我都能用常規C做出來），但是還需要防患未然，這裡給出一道有代表性的題目，可以嘗試做做看：https://leetcode-cn.com/problems/substring-with-concatenation-of-all-words/

給定一個字串 s 和一些長度相同的單詞 words。找出 s 中恰好可以由 words 中所有單詞串聯形成的子串的起始位置。
注意子串要與 words 中的單詞完全匹配，中間不能有其他字元，但不需要考慮 words 中單詞串聯的順序。

示例：
輸入：
  s = "barfoothefoobarman",
  words = ["foo","bar"]
輸出：[0,9]
解釋：
從索引 0 和 9 開始的子串分別是 "barfoo" 和 "foobar" 。
輸出的順序不重要, [9,0] 也是有效答案。

這題不考慮程式語言的話，用雜湊表會比較簡單，那要是用C語言的話，可以自己擼個雜湊表用，對付這類題目還是綽綽有餘的。

思路的話參考https://leetcode-cn.com/problems/substring-with-concatenation-of-all-words/solution/xiang-xi-tong-su-de-si-lu-fen-xi-duo-jie-fa-by-w-6/

中的解法二，這裡只講下怎麼最簡單構造一個雜湊表。

首先是選取雜湊函式，這裡我用的是djb2演算法，參考http://www.cse.yorku.ca/~oz/hash.html，碰撞率相當低，分佈平衡，實現也很簡單，就兩三行程式碼，記住關鍵數字(5381和33)。

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. it has excellent distribution and speed on many different sets of keys and table sizes.

Language- 程式碼

unsigned long
hash(unsigned char *str)
{
    unsigned long hash = 5381;

    int c;

    while (c = *str++)

        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;

}

有了字串雜湊函式，就能夠將大串字串轉換成數字，數字進而可以作為陣列的下標（key）儲存資訊。那麼雜湊表的大小怎麼取呢？一般大小要大於儲存的資料個數，比如最多100個數據，存到雜湊表的話大小肯定要大於100才行。對於這題而言，沒有明確告訴你單詞的最大個數，只能估值了，這裡經過幾輪提交測試，得到雜湊表大小與通過用例個數的關係，說明這道題目最多的單詞數可能在300左右，平均個數<50個吧：

5 -> 110/173
10 -> 143/173
50 -> 170/173
100 -> 170/173
300 -> 172/173
400 -> 173/173

這裡給出我的解答：

C 程式碼

// 字串最大值，hash表大小，估值和實際資料個數有關

#define MAXWORDCOUNT 1000


static int wordCount[MAXWORDCOUNT];

static int currWordCount[MAXWORDCOUNT];



// ref: http://www.cse.yorku.ca/~oz/hash.html


unsigned long DJBHash(const char* s, int len) {

    unsigned long hash = 5381; // 經驗值，hash衝突概率低，分佈平衡


    while (len--) {

        hash = (((hash << 5) + hash) + *(s++)) % MAXWORDCOUNT; /* hash * 33 + c */


    }

    return hash;


}



int* findSubstring(char * s, char ** words, int wordsSize, int* returnSize){

    memset(wordCount, 0, sizeof(wordCount));


    *returnSize = 0;



    const int kSLen = strlen(s);

    if (kSLen == 0 || wordsSize == 0) return NULL;



    const int kWordLen = strlen(words[0]);


    // 將單詞數量存到雜湊表中，key: word, value: 單詞數量

    for (int i = 0; i < wordsSize; ++i)


        ++wordCount[DJBHash(words[i], kWordLen)];



    int *result = malloc(sizeof(int) * kSLen);

    for (int i = 0; i < kWordLen; ++i) {


        for (int j = i; j + kWordLen * wordsSize <= kSLen; j += kWordLen) {

            // 統計當前視窗的單詞數量


            for (int k = (j == i ? 0 : wordsSize - 1); k < wordsSize; ++k)

                ++currWordCount[DJBHash(s + j + k * kWordLen, kWordLen)];



            // 判斷兩個雜湊表是否相等，即視窗中的單詞是否和給定詞典完全匹配


            if (memcmp(wordCount, currWordCount, sizeof(wordCount)) == 0)

                result[(*returnSize)++] = j;



            --currWordCount[DJBHash(s + j, kWordLen)];


        }

        // 雜湊表清零操作


        memset(currWordCount, 0, sizeof(currWordCount));

    }


    return result;

}

點選關注，第一時間瞭解華為雲新鮮技術~

實操案例：字串雜湊表操作

實操案例：字串雜湊表操作

搞定技術面試：字典和雜湊（雜湊表）

LeetCode 雜湊表 387. 字串中的第一個唯一字元（計數雜湊表，字串）

16.字串雜湊雜湊表

基礎資料結構（一）——雜湊表和字串雜湊

SDUT E - 資料結構實驗之查詢五：平方之雜湊表

資料結構實驗：線性之雜湊表

筆記：散列表/雜湊表

Leetcode刷題筆記（python|C++）（1）：兩數之和（陣列、雜湊表）

資料結構實驗：雜湊表

leetcode：第 224 場周賽：5243. 同積元組（簡單，雜湊表）

【LeetCode】C++ ：簡單題 - 雜湊表 961. 重複 N 次的元素

【LeetCode】C++ ：簡單題 - 雜湊表 1189. “氣球” 的最大數量

YbtOJ hash和hash表課堂過關例1 字串雜湊【bfs】

模板題——堆排序雜湊表字串雜湊

【LeetCode】C++ ：簡單題 - 雜湊表 463. 島嶼的周長

Oulipo(雜湊表與字串雜湊)

力扣刷題筆記13：雜湊表+字首和經典題目

程式碼隨想錄：雜湊表

面對物件程式設計(Java)實驗1—陣列、字串、向量與雜湊表

實操案例：字串雜湊表操作

相關推薦