1. 程式人生 > >利用JAVA完成上述的操作

利用JAVA完成上述的操作

java stat tsv body lda time info lin imp

  還是沒能忍住,想看一下用JAVA語言處理上一篇文章的任務能快多少,畢竟編譯語言遠快於腳本語言。廢話不多說,直接上代碼:

import java.io.FileReader;
import java.io.BufferedReader;
import java.io.BufferedWriter;  
import java.io.FileWriter;
import java.io.IOException;

public class Split{
    public static void main(String[] args) throws IOException
    {
        
long startTime = System.currentTimeMillis(); BufferedReader read_line = new BufferedReader(new FileReader("head_10000000.vcf"), 5000000); BufferedWriter write_line = new BufferedWriter(new FileWriter("result.tsv"), 5000000); String current_line = read_line.readLine(); while(current_line != null
) { while(current_line.startsWith("#")) { current_line = read_line.readLine(); } String[] split1 = current_line.split("\t"); String info = split1[7]; String[] split2 = info.split(";AF="); String str1
= split2[1]; String[] split3 = str1.split(";"); write_line.write(current_line + " " + split3[0]); write_line.newLine(); current_line = read_line.readLine(); } write_line.flush(); write_line.close(); read_line.close(); long endTime = System.currentTimeMillis(); System.out.println("run time:"+(endTime-startTime)+"ms"); } }

 程序運行結果:

run time:47473ms

 檢驗結果:

$ wc -l result.tsv 
10000000 result.tsv

$ sed -n ‘3435534p‘ result.tsv
2 29509274 rs114511873 C A 100 PASS AA=C;AN=2184;AVGPOST=0.9997;VT=SNP;THETA=0.0006;AC=14;SNPSOURCE=LOWCOV;LDAF=0.0065;ERATE=0.0003;RSQ=0.9798;AF=0.01;AFR_AF=0.03 0.01

$ sed -n ‘7546563p‘ result.tsv
3 84580386 rs191768644 T C 100 PASS RSQ=0.6088;AA=T;AN=2184;VT=SNP;AVGPOST=0.9991;SNPSOURCE=LOWCOV;AC=1;THETA=0.0007;ERATE=0.0002;LDAF=0.0008;AF=0.0005;AFR_AF=0.0020 0.0005

$ sed -n ‘987345p‘ result.tsv
1 74709013 rs185004386 A C 100 PASS AN=2184;LDAF=0.0018;THETA=0.0005;VT=SNP;AA=A;SNPSOURCE=LOWCOV;RSQ=0.7110;ERATE=0.0003;AVGPOST=0.9987;AC=3;AF=0.0014;ASN_AF=0.01 0.0014

  我們檢查了文件的總行數以及隨機抽取了若幹行,發現結果正確。相比較於前面的R語言計算效率,這個結果表示十分震驚! 相差太遠!!!

Time(java代碼編寫 + 編譯 + 運行) < Time(R腳本運行)

利用JAVA完成上述的操作