測多少資料量？幾個G？多少reads？如何換算？

阿新 • • 發佈：2018-11-19

關鍵詞：

lncRNA表達量低，所以要看lncRNA的表達量變化，就要比普通RNA-seq多測一些。

要兼顧SNP和低表達量的lncRNA，要測得更深一些~

到底需要測多少資料量呢？

我們看看權威的ENCODE對RNA-seq的測序深度是如何評價的：

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011)

The ENCODE Consortium

Sequencing depth.

The amount of sequencing needed for a given sample is determined by the goals of the experiment and the nature of the RNA sample. Experiments whose purpose is to evaluate the similarity between the transcriptional profiles of two polyA+ samples may require only modest depths of sequencing

(e.g. 30M pair-end reads of length > 30NT, of which 20-25M are mappable to the genome or known transcriptome, Experiments whose purpose is discovery of novel transcribed elements and strong quantification of known transcript isoforms requires more extensive sequencing.

The ability to detect reliably low copy number transcripts/isoforms depends upon the depth of sequencing and on a sufficiently complex library. For experiments from a typical mammalian tissue or in which sensitivity of detection is important, a minimum depth of 100-200 M 2 x 76 bp or longer reads is currently recommended.

[Specialized studies in which the prevalence of different RNAs has been intentionally altered (e.g. “normalizing” using DSN) as part of sample preparation need more than the read amounts (>30M paired end reads) used for simple comparison (see above). Reasons for this include:

(1) overamplification of inserts as a result of an additional round of PCR after DSN and

(2) much more broad coverage given the nature of A(-) and low abundance transcripts.

權威的話轉換如下：

根據研究目的決定測序深度：

目的1：通過抓取polyA尾巴建庫（只測那些帶有polyA尾巴的基因，大多是蛋白編碼基因），

尋找樣品間基因轉錄譜的相似性，只需要30M reads，長度大於30nt即可，雙端測序，其中20-25M能夠回帖到已知轉錄組上。

目的2：要發現新的轉錄本，對已知isoform（同一基因由於不同的可變剪接方式形成多種isoform，勉強譯為亞型）進行定量分析，

兼顧低表達量的轉錄本或isoform，就需要100-200M read，長度大於76bp，雙端測序。

lncRNA-seq屬於這一型別。

注：ENCODE測的是人和小鼠，其他物種不包括在此推薦範圍內。

另外，miRNA測序，只需要10M read，每條read長50bp，單端測序。

ChIP-seq，需要20M read，每條read長50bp，單端測序。

銷售只說多少G，不說reads數，如何把reads數換算成G呢？

這跟測序長度有關：

PE150或2*150，即 雙端測序，每條read長度150bp。

150bp X 2端 X read數 = 資料量

例如，測50M read，150bp X 2端 X 50M read = 15000M = 15G

注：對於雙端測序，一個RNA片段，即fragment，也叫read，會測出來2條序列。

SE50或1*50，即 單端測序，每條read長度50bp。

50bp X 1端 X read數 = 資料量

例如，測20M read，50bp X 1端 X 20M read = 1000M = 1G

再絮叨一句：這裡的G是鹼基數（Gbase，Gb），跟你看到的檔案大小（gigabyte，GB）不是一回事哦~

測序公司給你的檔案通常是壓縮的fastq格式，裡面有read ID號，有鹼基，有每個鹼基的質量。

小哈看到檔案大小就感覺資料量不夠，是基於經驗的推測，要明確測了多少資料量，跑一個FastQC或RSeQC就知道了。

測多少資料量？幾個G？多少reads？如何換算？

我們看看權威的ENCODE對RNA-seq的測序深度是如何評價的：

測多少資料量？幾個G？多少reads？如何換算？

CSS響應式：根據分辨路加載不同CSS的幾個方法，親測可用

JDK8 switch使用字串比if else 效率高,親測大資料量資料下

牛客：階乘結果換算進位制後得到數字的尾部有幾個0

dedecms 織夢資料量達到幾十萬生成速度很慢

影響快取的三個因素（命中率、快取更新策略、快取最大資料量）

excel資料轉成insert語句插入資料庫（資料量為幾十萬）；

把資料量大匯出放入多個excel 然後壓縮成zip檔案，匯出

大資料量獲取TopK的幾種方案

爭對mysql表資料量比較大時優化的幾點建議

查詢近7天，近1個月，近3個月每天的資料量，查詢近一年每個月的資料量

如何用python的畫幾組資料量不同的boxplot（箱線圖）

網路爬蟲設計中需要注意的幾個問題us時時彩原始碼五合一盤口藍色版本親測功能完美運營版

前端還原設計圖常遇到的幾個坑

幾個很好的OJ網站

JAVA開發中文亂碼的幾個解決方案

大學的最後幾個月

分針網——每日分享：網站易用性的解讀及提高易用性幾個技巧

幾個例子弄懂JS 的setTimeout的運行方式

學習Java分為幾個階段，分別是什麽？

測多少資料量？幾個G？多少reads？如何換算？

我們看看權威的ENCODE對RNA-seq的測序深度是如何評價的：

相關推薦