1. 程式人生 > >NGS NGS ngs(hisat,stringtie,ballgown)

NGS NGS ngs(hisat,stringtie,ballgown)

NGS

ngs(hisat,stringtie,ballgown)

#HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ~64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

HISAT是對映的RNA序列讀取快速,靈敏拼接比對程式。除了一個表示全基因組一個全域性調頻索引,HISAT使用一大組小調頻索引共同地覆蓋整個基因組(每個索引表示〜64000鹼基對的基因組區域並且需要〜48000的索引,以覆蓋人基因組)。這些小的索引(稱為本地索引)與幾個對準策略相結合使讀取,特別是讀取跨越多個外顯子的RNA序列的有效對準。HISAT的記憶體佔用量是比較低的(〜4.3GB為人類基因組)。我們的基礎上發展HISAT Bowtie2實現來處理大部分的操作在FM指數。

#Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.

StringTie是RNA測序比對快速,高效的彙編成潛在的成績單。它採用了一種新的網路流演算法以及一個可選的從頭組裝步驟來組裝和定量表示每個基因座的多個剪接變體的全長轉錄物。其輸入可以不僅包括原始的比對讀取被其他轉錄物裝配,也比對已經從那些reads.In為了鑑定實驗之間差異表達的基因組裝更長的序列,StringTie的輸出可以用相同的專門的軟體來處理長禮服, Cuffdiff或其他程式(DESeq2,磨邊機等)。

#Analysis of raw reads from RNA sequencing (RNA-seq) makes it possible to reconstruct complete gene structures, including multiple splice variants, without relying on previously established annotations. Downstream statistical modeling of summarized gene or transcript expression data output from these pipelines is facilitated by the Bioconductor project

ballgown是一個軟體包,旨在促進RNA測序資料的靈活的差異表達分析。它還提供了功能來組織,視覺化和分析你的轉錄組組裝表達測量。