[LeetCode] Sentence Similarity II 句子相似度之二

阿新 • • 發佈：2018-12-27

Given two sentences words1, words2 (each represented as an array of strings), and a list of similar word pairs pairs, determine if two sentences are similar.

For example, words1 = ["great", "acting", "skills"] and words2 = ["fine", "drama", "talent"] are similar, if the similar word pairs are pairs = [["great", "good"], ["fine", "good"], ["acting","drama"], ["skills","talent"]]

Note that the similarity relation is transitive. For example, if "great" and "good" are similar, and "fine" and "good" are similar, then "great" and "fine" are similar.

Similarity is also symmetric. For example, "great" and "fine" being similar is the same as "fine" and "great" being similar.

Also, a word is always similar with itself. For example, the sentences words1 = ["great"], words2 = ["great"], pairs = []

are similar, even though there are no specified similar word pairs.

Finally, sentences can only be similar if they have the same number of words. So a sentence like words1 = ["great"] can never be similar to words2 = ["doubleplus","good"].

Note:

The length of words1 and words2 will not exceed 1000

.
The length of pairs will not exceed 2000.
The length of each pairs[i] will be 2.
The length of each words[i] and pairs[i][j] will be in the range [1, 20].

這道題是之前那道Sentence Similarity的拓展，那道題說單詞之間不可傳遞，於是乎這道題就變成可以傳遞了，那麼難度就增加了。不過沒有關係，還是用我們的經典老三樣來解，BFS，DFS，和Union Find。我們先來看BFS的解法，其實這道題的本質是無向連通圖的問題，那麼首先要做的就是建立這個連通圖的資料結構，對於每個結點來說，我們要記錄所有和其相連的結點，所以我們建立每個結點和其所有相連結點集合之間的對映，比如對於這三個相似對(a, b), (b, c)，和(c, d)，我們有如下的對映關係：

a -> {b}

b -> {a, c}

c -> {b, d}

d -> {c}

那麼如果我們要驗證a和d是否相似，就需要用到傳遞關係，a只能找到b，b可以找到a，c，為了不陷入死迴圈，我們將訪問過的結點加入一個集合visited，那麼此時b只能去，c只能去d，那麼說明a和d是相似的了。那麼我們用for迴圈來比較對應位置上的兩個單詞，如果二者相同，那麼直接跳過去比較接下來的。否則就建一個訪問即可visited，建一個佇列queue，然後把words1中的單詞放入queue，建一個布林型變數succ，標記是否找到，然後就是傳統的BFS遍歷的寫法了，從佇列中取元素，如果和其相連的結點中有words2中的對應單詞，標記succ為true，並break掉。否則就將取出的結點加入佇列queue，並且遍歷其所有相連結點，將其中未訪問過的結點加入佇列queue繼續迴圈，參見程式碼如下：

解法一：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, unordered_set<string>> m;
        for (auto pair : pairs) {
            m[pair.first].insert(pair.second);
            m[pair.second].insert(pair.first);
        }    
        for (int i = 0; i < words1.size(); ++i) {
            if (words1[i] == words2[i]) continue;
            unordered_set<string> visited;
            queue<string> q{{words1[i]}};
            bool succ = false;
            while (!q.empty()) {
                auto t = q.front(); q.pop();
                if (m[t].count(words2[i])) {
                    succ = true; break;
                }
                visited.insert(t);
                for (auto a : m[t]) {
                    if (!visited.count(a)) q.push(a);
                }
            }
            if (!succ) return false;
        }    
        return true;
    }
};

下面來看遞迴的寫法，解題思路跟上面的完全一樣，把主要操作都放到了一個遞迴函式中來寫，參見程式碼如下：

解法二：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, unordered_set<string>> m;
        for (auto pair : pairs) {
            m[pair.first].insert(pair.second);
            m[pair.second].insert(pair.first);
        }
        for (int i = 0; i < words1.size(); ++i) {
            unordered_set<string> visited;
            if (!helper(m, words1[i], words2[i], visited)) return false;
        }
        return true;
    }
    bool helper(unordered_map<string, unordered_set<string>>& m, string& cur, string& target, unordered_set<string>& visited) {
        if (cur == target) return true;
        visited.insert(cur);
        for (string word : m[cur]) {
            if (!visited.count(word) && helper(m, word, target, visited)) return true;
        }
        return false;
    }
};

下面這種解法就是碉堡了的聯合查詢Union Find了，這種解法的核心是一個getRoot函式，如果兩個元素屬於同一個群組的話，呼叫getRoot函式會返回相同的值。主要分為兩部，第一步是建立群組關係，suppose開始時每一個元素都是獨立的個體，各自屬於不同的群組。然後對於每一個給定的關係對，我們對兩個單詞分別呼叫getRoot函式，找到二者的祖先結點，如果從未建立過聯絡的話，那麼二者的祖先結點時不同的，此時就要建立二者的關係。等所有的關係都建立好了以後，第二步就是驗證兩個任意的元素是否屬於同一個群組，就只需要比較二者的祖先結點都否相同啦。是不是有點深度學習的趕腳，先建立模型training，然後再test。哈哈，博主亂扯的，二者並沒有什麼聯絡。我們儲存群組關係的資料結構，有時用陣列，有時用雜湊map，看輸入的資料型別吧，如果輸入元素的整型數的話，用root陣列就可以了，如果是像本題這種的字串的話，需要用雜湊表來建立對映，建立每一個結點和其祖先結點的對映。注意這裡的祖先結點不一定是最終祖先結點，而最終祖先結點的對映一定是最重祖先結點，所以我們的getRoot函式的設計思路就是要找到最終祖先結點，那麼就是當結點和其對映結點相同時返回，否則繼續迴圈，可以遞迴寫，也可以迭代寫，這無所謂。注意這裡第一行判空是相當於初始化，這個操作可以在外面寫，就是要讓初始時每個元素屬於不同的群組，參見程式碼如下：

解法三：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, string> m;       
        for (auto pair : pairs) {
            string x = getRoot(pair.first, m), y = getRoot(pair.second, m);
            if (x != y) m[x] = y;
        }
        for (int i = 0; i < words1.size(); ++i) {
            if (getRoot(words1[i], m) != getRoot(words2[i], m)) return false;
        }
        return true;
    }
    string getRoot(string word, unordered_map<string, string>& m) {
        if (!m.count(word)) m[word] = word;
        return word == m[word] ? word : getRoot(m[word], m);
    }
};

類似題目：

參考資料：

[LeetCode] Sentence Similarity II 句子相似度之二

[LeetCode] Sentence Similarity II 句子相似度之二

LeetCode 90. Subsets II （子集合之二）

[LeetCode] Majority Element II 求眾數之二

[LeetCode] Contains Duplicate II 包含重複值之二

[LeetCode] Single Number II 單獨的數字之二

[LeetCode] Sentence Similarity 句子相似度

LeetCode 59. Spiral Matrix II （螺旋矩陣之二）

[LeetCode] Sentence Similarity

[LeetCode] 305. Number of Islands II 島嶼的數量之二

句子相似度比較的歸一化

句子相似度_tf/idf

NLP入門（一）詞袋模型及句子相似度

基於WMD（詞移距離）的句子相似度分析簡介

LeetCode Day51 Unique Paths II 不同的路徑之二

句子相似度計算的幾種方法

學習筆記--NLP文字相似度之LCS（最長公共子序列）

[LeetCode] Number of Islands II 島嶼的數量之二

[LeetCode] Guess Number Higher or Lower II 猜數字大小之二

[LeetCode] Pascal's Triangle II 楊輝三角之二

[LeetCode] Student Attendance Record II 學生出勤記錄之二

[LeetCode] Sentence Similarity II 句子相似度之二

相關推薦