1. 程式人生 > >python Fasta檔案格式化-每行固定數目鹼基輸出

python Fasta檔案格式化-每行固定數目鹼基輸出

程式設計要求如下:

1)      程式採用optparse模組從命令列輸入,引數共有4個,-infile,-outfile,-width,-h;

2)      其中-infile用於接收輸入的FASTA檔名;

3)      -outfile用於格式化的FASTA結果檔名;

4)      –width用於接收每一行輸出多少個鹼基,要求只能從10到200,否則就報錯;

5)      -h用於給出程式的使用說明;

輸入檔案 in.fa

>1
GCGGATTAAA
AGCAGAAAGAAAACAAGCTTTTCATTTAA
TCAGTTGCTTGTGTATCAAGTTA
CATAAA
AAATCAAA
>2
TATAAGTTACACTCTGGC
TTTGTAATTCT
GCAAGGGCAGGCCCGGGAAGCCT
ATGCAAA
AGCACATGAAATGAAAAGTTTAGTTGGCATCAAGA
TA
>3

GACCAATAACCTTGCCGGTGGCAGTGTCCAGATCAATGTCA

命令列命令:

python   thir_test.py   -infile   in.fa   -outfile   out.fa  -width  20

輸出檔案 out.fa

>1
GCGGATTAAAAGCAGAAAGAAAACA
AGCTTTTCATTTAATCAGTTGCTTG
TGTATCAAGTTACATAAAAAATCAA
A
>2
TATAAGTTACACTCTGGCTTTGTAA
TTCTGCAAGGGCAGGCCCGGGAAGC
CTATGCAAAAGCACATGAAATGAAA
AGTTTAGTTGGCATCAAGATA
>3
GACCAATAACCTTGCCGGTGGCAGT
GTCCAGATCAATGTCA

查詢幫助文件

$python thir_test.py -h
Usage: thir_test.py [options]


Options:
  -h, --help      show this help message and exit
  --infile=FILE   give a fasta file to me
  --outfile=FILE  the name of oufput file [fasta]
  --width=int     the seq_length of each line

指令碼 thir_test.py

#!/usr/bin/env
import sys
from Bio import SeqIO
from optparse import OptionParser


parser = OptionParser()
parser.add_option("--infile", dest="infile", help="give a fasta file to me", metavar="FILE")
parser.add_option("--outfile",  dest="outfile", help="the name of oufput file [fasta]", metavar="FILE")
parser.add_option("--width", dest="width", help="the seq_length of each line", metavar="int")
(options, args) = parser.parse_args()


width=int(options.width)
if width < 10 or width > 200 :
        sys.stderr.write("ERROR: The width value must between 10 and 200!!!\n")
        exit(1)


outfile=file(options.outfile,'w')
def seq_width(seq,width,outfile):
        start=0;end=start+width
        while end < len(seq):
                outfile.write(str(seq[start:end])+"\n")
                start+=width;end+=width
        while end >= len(seq):
                outfile.write(str(seq[start:])+"\n")
                break


fafile=file(options.infile,'r')
outfile=file(options.outfile,'w')
for seq_record in SeqIO.parse(fafile,'fasta'):
        outfile.write('>'+seq_record.id+"\n")
        seq_width(seq_record.seq,width,outfile)