1. 程式人生 > >187. Repeated DNA Sequences

187. Repeated DNA Sequences

topic some ive ack 所有 write 影響 useful content

題目:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

鏈接: http://leetcode.com/problems/repeated-dna-sequences/  

6/25/2017

好久沒刷題了,這道題也是參考別人的答案。

48ms, 80%時間復雜度O(N*N*k),k=10, 第一個N來自遍歷數組,第二個N來自substring

註意第8行,結束的位置是i <= s.length() - 10,要包含最後一位。

 1 public class
Solution { 2 public List<String> findRepeatedDnaSequences(String s) { 3 List<String> res = new ArrayList<String>(); 4 if (s == null || s.length() < 10) { 5 return res; 6 } 7 Map<String, Integer> substringCount = new HashMap<String, Integer>();
8 for (int i = 0; i <= s.length() - 10; i++) { 9 String substring = s.substring(i, i + 10); 10 if (substringCount.containsKey(substring)) { 11 int count = substringCount.get(substring); 12 if (count == 1) { 13 res.add(substring); 14 } 15 substringCount.put(substring, count + 1); 16 } else { 17 substringCount.put(substring, 1); 18 } 19 } 20 return res; 21 } 22 }

別人的答案:

類似rabin-karp,因為只有4個字符,所以每個字符用2位來表示(4^10 < 2^32),map裏只需要比較數組而不是string,map的效率更高。鏈接裏有解釋

https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation

類似的,只不過用了8進制,鏈接裏有解釋,但是我稍微寫詳細一些。

t存的是所有10個字符的int hash值,這個值是通過這個算法裏來計算的。註意有個ox3FFFFFFF,想明白了這個是只保留最後30位,為什麽因為字符通過&7之後每個只保留3位2進制數,如果是10個字符的話正好是30位,可以消去10個字符之前的影響。

https://discuss.leetcode.com/topic/8487/i-did-it-in-10-lines-of-c

1 vector<string> findRepeatedDnaSequences(string s) {
2     unordered_map<int, int> m;
3     vector<string> r;
4     for (int t = 0, i = 0; i < s.size(); i++)
5         if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
6             r.push_back(s.substr(i - 9, 10));
7     return r;
8 }

更多討論

https://discuss.leetcode.com/category/195/repeated-dna-sequences

187. Repeated DNA Sequences