1. 程式人生 > >利用Needleman–Wunsch算法進行DNA序列全局比對

利用Needleman–Wunsch算法進行DNA序列全局比對

gap aac print man sat odi sequence org 命令

生物信息學原理作業第二彈:利用Needleman–Wunsch算法進行DNA序列全局比對。

具體原理:https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm。

貼上python代碼:

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Sat Nov 25 18:20:01 2017
 4 
 5 @author: zxzhu
 6 後需修改:
 7 1.加命令行參數
 8 2.給出多種比對結果
 9 """
10 
11 import numpy as np
12 import pandas as pd
13 sequence1 = AACGTACTCA 14 sequence2 = TCGTACTCA 15 s1 = ‘‘ 16 s2 = ‘‘ 17 gap = -4 18 score_matrix = pd.read_excel(score.xlsx) #score matrix 19 best_matrix = np.empty(shape= (len(sequence2)+1,len(sequence1)+1),dtype = int) 20 21 def get_match_score(s1,s2): 22 score = score_matrix[s1][s2]
23 return score 24 25 for i in range(len(sequence2)+1): 26 for j in range(len(sequence1)+1): 27 if i == 0: 28 best_matrix[i][j] = gap * j 29 30 elif j == 0: 31 best_matrix[i][j] = gap *i 32 else: 33 match = get_match_score(sequence2[i-1],sequence1[j-1])
34 gap1_score = best_matrix[i-1][j]+gap 35 gap2_score = best_matrix[i][j-1]+gap 36 match_score = best_matrix[i-1][j-1]+match 37 best_matrix[i][j] = max(gap1_score,gap2_score,match_score) 38 print(best_matrix) 39 i,j = len(sequence2),len(sequence1) 40 while(i>0 or j>0): 41 match = get_match_score(sequence2[i-1],sequence1[j-1]) 42 if i>0 and j>0 and best_matrix[i][j] == best_matrix[i-1][j-1]+match: 43 s1 += sequence1[j-1] 44 s2 += sequence2[i-1] 45 i-=1;j-=1 46 elif i>0 and best_matrix[i,j] == best_matrix[i-1,j]+gap: 47 s1+=- 48 s2+=sequence2[i-1] 49 i-=1 50 else: 51 s1+=sequence1[j-1] 52 s2+=- 53 j-=1 54 print(s1[::-1]+\n+s2[::-1])

後面會加入命令行。

多種結果這裏只取了一種,這個問題有待解決。

如果有其他的方法我會及時添加。

利用Needleman–Wunsch算法進行DNA序列全局比對