python Fasta檔案格式化-每行固定數目鹼基輸出
程式設計要求如下:
1) 程式採用optparse模組從命令列輸入,引數共有4個,-infile,-outfile,-width,-h;
2) 其中-infile用於接收輸入的FASTA檔名;
3) -outfile用於格式化的FASTA結果檔名;
4) –width用於接收每一行輸出多少個鹼基,要求只能從10到200,否則就報錯;
5) -h用於給出程式的使用說明;
輸入檔案 in.fa
>1
GCGGATTAAA
AGCAGAAAGAAAACAAGCTTTTCATTTAA
TCAGTTGCTTGTGTATCAAGTTA
CATAAA
AAATCAAA
>2
TATAAGTTACACTCTGGC
TTTGTAATTCT
GCAAGGGCAGGCCCGGGAAGCCT
ATGCAAA
AGCACATGAAATGAAAAGTTTAGTTGGCATCAAGA
TA
>3
GACCAATAACCTTGCCGGTGGCAGTGTCCAGATCAATGTCA
命令列命令:
python thir_test.py -infile in.fa -outfile out.fa -width 20
輸出檔案 out.fa
>1
GCGGATTAAAAGCAGAAAGAAAACA
AGCTTTTCATTTAATCAGTTGCTTG
TGTATCAAGTTACATAAAAAATCAA
A
>2
TATAAGTTACACTCTGGCTTTGTAA
TTCTGCAAGGGCAGGCCCGGGAAGC
CTATGCAAAAGCACATGAAATGAAA
AGTTTAGTTGGCATCAAGATA
>3
GACCAATAACCTTGCCGGTGGCAGT
GTCCAGATCAATGTCA
查詢幫助文件
$python thir_test.py -h
Usage: thir_test.py [options]
Options:
-h, --help show this help message and exit
--infile=FILE give a fasta file to me
--outfile=FILE the name of oufput file [fasta]
--width=int the seq_length of each line
指令碼 thir_test.py
#!/usr/bin/env
import sys
from Bio import SeqIO
from optparse import OptionParser
parser = OptionParser()
parser.add_option("--infile", dest="infile", help="give a fasta file to me", metavar="FILE")
parser.add_option("--outfile", dest="outfile", help="the name of oufput file [fasta]", metavar="FILE")
parser.add_option("--width", dest="width", help="the seq_length of each line", metavar="int")
(options, args) = parser.parse_args()
width=int(options.width)
if width < 10 or width > 200 :
sys.stderr.write("ERROR: The width value must between 10 and 200!!!\n")
exit(1)
outfile=file(options.outfile,'w')
def seq_width(seq,width,outfile):
start=0;end=start+width
while end < len(seq):
outfile.write(str(seq[start:end])+"\n")
start+=width;end+=width
while end >= len(seq):
outfile.write(str(seq[start:])+"\n")
break
fafile=file(options.infile,'r')
outfile=file(options.outfile,'w')
for seq_record in SeqIO.parse(fafile,'fasta'):
outfile.write('>'+seq_record.id+"\n")
seq_width(seq_record.seq,width,outfile)