09 Finding a Motif in DNA
Problem
Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous collection of symbols in ss (as a result, tt must be no longer than ss).
The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of ‘U‘ in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position
A substring of ss can be represented as s[j:k]s[j:k], where jj and kk represent the starting and ending positions of the substring in ss; for example, if ss = "AUGCUUCAGAAAGGUCUUACG", then s[2:5]s[2:5] = "UGCU".
The location of a substring s[j:k]s[j:k] is its beginning position
Given: Two DNA strings ss and tt (each of length at most 1 kbp).
Return: All locations of tt as a substring of ss.
Sample Dataset
GATATATGCATATACTT ATAT
Sample Output
2 4 10
#-*-coding:UTF-8-*- ### 9. Finding a Motif in DNA ### # Method 1: Use Module regex.finditer import regex # 比re更強大的模塊 matches = regex.finditer(‘ATAT‘, ‘GATATATGCATATACTT‘, overlapped=True) # 返回所有匹配項, for match in matches: print (match.start() + 1) # Method 2: Brute Force Search seq = ‘GATATATGCATATACTT‘ pattern = ‘ATAT‘ def find_motif(seq, pattern): position = [] for i in range(len(seq) - len(pattern)): if seq[i:i + len(pattern)] == pattern: position.append(str(i + 1)) print (‘\t‘.join(position)) find_motif(seq, pattern) # methond 3 import re seq=‘GATATATGCATATACTT‘ print [i.start()+1 for i in re.finditer(‘(?=ATAT)‘,seq)] # ?= 之後字符串內容需要匹配表達式才能成功匹配。
09 Finding a Motif in DNA