187. Repeated DNA Sequences

阿新 • • 發佈：2017-06-25

topic some ive ack 所有 write 影響 useful content

題目：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

鏈接： http://leetcode.com/problems/repeated-dna-sequences/　　

6/25/2017

好久沒刷題了，這道題也是參考別人的答案。

48ms, 80%時間復雜度O(N*N*k)，k=10, 第一個N來自遍歷數組，第二個N來自substring

註意第8行，結束的位置是i <= s.length() - 10，要包含最後一位。

 1 public class 
 Solution {
 2     public List<String> findRepeatedDnaSequences(String s) {
 3         List<String> res = new ArrayList<String>();
 4         if (s == null || s.length() < 10) {
 5             return res;
 6         }
 7         Map<String, Integer> substringCount = new HashMap<String, Integer>();
 
 8         for (int i = 0; i <= s.length() - 10; i++) {
 9             String substring = s.substring(i, i + 10);
10             if (substringCount.containsKey(substring)) {
11                 int count = substringCount.get(substring);
12                 if (count == 1) {
13                     res.add(substring);
14                 }
15                 substringCount.put(substring, count + 1);
16             } else {
17                 substringCount.put(substring, 1);
18             }
19         }
20         return res;
21     }
22 }

別人的答案：

類似rabin-karp，因為只有4個字符，所以每個字符用2位來表示（4^10 < 2^32），map裏只需要比較數組而不是string，map的效率更高。鏈接裏有解釋

https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation

類似的，只不過用了8進制，鏈接裏有解釋，但是我稍微寫詳細一些。

t存的是所有10個字符的int hash值，這個值是通過這個算法裏來計算的。註意有個ox3FFFFFFF，想明白了這個是只保留最後30位，為什麽因為字符通過&7之後每個只保留3位2進制數，如果是10個字符的話正好是30位，可以消去10個字符之前的影響。

https://discuss.leetcode.com/topic/8487/i-did-it-in-10-lines-of-c

1 vector<string> findRepeatedDnaSequences(string s) {
2     unordered_map<int, int> m;
3     vector<string> r;
4     for (int t = 0, i = 0; i < s.size(); i++)
5         if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
6             r.push_back(s.substr(i - 9, 10));
7     return r;
8 }

更多討論

https://discuss.leetcode.com/category/195/repeated-dna-sequences

187. Repeated DNA Sequences

topic some ive ack 所有 write 影響 useful content 題目： All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for

187. Repeated DNA Sequences

187. Repeated DNA Sequences

LeetCode 187. Repeated DNA Sequences 20170706 第三十次作業

*187. Repeated DNA Sequences (hashmap, one for loop)(difference between subsequence & substring)

[LeetCode] 187. Repeated DNA Sequences 求重復的DNA序列

LeetCode--187. Repeated DNA Sequences

leetcode 187. Repeated DNA Sequences 編碼計數統計重複字串 + 移動視窗

187. Repeated DNA Sequences - Medium

leetcode:(187) Repeated DNA Sequence(java)

[Swift]LeetCode187. 重復的DNA序列 | Repeated DNA Sequences

[Swift]LeetCode187. 重複的DNA序列 | Repeated DNA Sequences

[LeetCode] Repeated DNA Sequences 求重複的DNA序列

Leetcode: Repeated DNA Sequence

LeetCode：187. 重複的DNA序列

【LeetCode】187. 重複的DNA序列結題報告 (C++)

187. 重複的DNA序列（中等，字串）（12.24）

webpack 打包壓縮 ES6文件報錯UglifyJs + Unexpected token punc «(», expected punc «:»

HDU 1560 DNA sequence（DNA序列）

fasta文件中DNA to RNA

51nod 1445 變色DNA，最短路好題

【樹狀數組+dp+離散化】Counting Sequences

187. Repeated DNA Sequences

相關推薦