ChIP-seq 學習內容

阿新 • • 發佈：2018-12-26

chip-seq
流程圖
書籍資料
工具
UCSU
安裝
使用
原理
手冊
Swiss線上分析工具
短序列比對工具
BWA
流程
格式處理
序列比對
peak-calling
motif
視覺化
輸出文件
上下游分析

chip-seq

流程圖

【怪毛匠子】

【獨家整理-怪毛匠子】

書籍資料

生物資訊學許忠能

生物資訊學——計算的視角李嶺譯

工具

UCSU

http://genome.ucsc.edu

安裝

獲得原始檔

http://liulab.dfci.harvard.edu/MACS/Download.html MACS-1.4.2-1.tar.gz http://github.com/downloads/taoliu/MACS/MACS-1.4.2-1.tar.gz 解壓縮檔案生成MACS-1.4.2資料夾 tar xvzf MACS-1.4.2-1.tar.gz cd MACS-1.4.2 python setup.py install –prefix /your_directory/ prefix用於指定安裝目錄修改環境變數：（使用sudo可以不用設定環境變數。。。） export PATH = /your_directory/bin:$PATH export PYTHONPATH = /your_directory/lib/python2.X/site-packages/:$PYTHONPATH 使用命令macs14 -h 驗證並檢視macs的使用說明

使用

假設我們現在有mouse的一組CTCF的ChIP-seq測序資料CTCF.fastq，首先，我們把這些reads map到mouse基因組（這裡我們採用mm10）上。假設基因組的index檔案已經建好，存在/path_to/資料夾下。

bowtie –m 1 -S -q /path_to/mm10 CTCF.fastq CTCF.sam

-m 最終只保留map上一次的reads

-S 輸出檔案格式是SAM

-q 輸入檔案格式是fastq

peak-callingmacs 14 -t CTCF.sam -n CTCF –g mm-t

實驗組資料檔名（相對對照組control而言，後面會進一步說明）-n 輸出檔名字首

-g 基因組的大致大小，-g number。MACS內建了一些基因組長度，“mm”表示小鼠的，“hs”表示人的，“ce”表示線蟲，“dm”是果蠅。

執行成功後，將得到如下檔案：

CTCF_model.r，CTCF_peaks.bed，CTCF_peaks.xls，CTCF_summits.bed

其中，CTCF_model.r以程式碼的形式儲存了“雙峰模型”。在終端中輸入:

Rscript CTCF_model.r

原理

手冊

Swiss線上分析工具

http://ccg.vital-it.ch/chipseq/

短序列比對工具

soap 針對single-end

maq

bwa

Bowtie 速度很快 chipseq適用

BWA

下載地址

http://bio-bwa.sourceforge.net/bwa.shtml

步驟

第一步: 建立 Index

根據reference genome data(e.g. reference.fa) 建立 Index File

[[email protected] ]# bwa index -a bwtsw human_hg18_ref.fa（human參考基因組18）

第二步: 尋找 SA coordinates

如果是pair-end 資料（leftRead.fastq和rightRead.fastq）兩個檔案分別處理

1 bwa aln reference.fa leftRead.fastq > leftRead.sai

2 bwa aln reference.fa rightRead.fastq > rightRead.sai

3 bwa aln reference.fa singleRead.fastq > singleRead.sai

如果希望多執行緒執行，在其中加入 -t這個引數，另外-f這個引數可以指定結果輸出檔案，如:

1 bwa aln -c -t 3 -f leftreads.sai reference.fa leftreads.fastq

第三步：轉換SA coordinates輸出為sam

如果是pair-end資料

1 bwa sampe -f pair-end.sam reference.fa leftRead.sai rightRead.sai leftRead.fastq rightread.fastq

如果是single reads資料

1 bwa samse -f single.sam reference.fa single.sai single.fastq

流程

格式處理

格式：fastq

工具：FASTQ Groomer、samtools

序列比對

工具：bowtie 輸入：fastq 輸出：SAM/BAM

peak-calling

工具：MACS(peak-calling) 輸入：mapped reads 輸出：peaks(BED)、report(html)【】引數：連結：

motif

http://blog.163.com/zju_whw/blog/static/225753129201532104815301/

motif分為兩種：

1.Consensus（共識序列），這種就是有序列或是說字母表示，如果同時出現“A”和“G”就用“R”表示，具體是根據IUPAC code（International Union of Pure and Applied Chemistry，http://www.bioinformatics.org/sms2/iupac.html

2.Matrix-based（矩陣方法），就是利用矩陣將每個位置的A，G，C，T的量都表示出來。該方法又有三種變化，Count-matrix，PFM（position frequency matrix）和PWM（position weight scoring）。Count matirx是每個位置計數得來的，PFM是每個位置的百分比得來的，而PWM是通過取對數得來的。

1. 工具：Homer(motif富集的幾何優化)

輸入：

輸出：

引數：

連結：http://homer.salk.edu/homer/

download：http://homer.salk.edu/homer/configureHomer.pl

http://blog.163.com/zju_whw/blog/static/225753129201532104815301/

工具：RAST(RSA-Tools)

http://floresta.eead.csic.es/rsat/peak-motifs_form.cgi

http://floresta.eead.csic.es/rsat/RSAT_home.cgi

視覺化

峰圖視覺化

UCSC

GREAT

輸入：BED檔案

http://bejerano.stanford.edu/great/public/html/

motif分析工具

輸出文件

圖、質量引數、FDR、

上下游分析

ChIP-seq 學習內容

ChIP-seq 學習內容

2017.5.3上午學習內容

2017.5.4下午學習內容

2017.5.5上午學習內容

2017.5.9學習內容

2017.5.17上午學習內容

2017.5.22下午學習內容

2017.5.23上午學習內容

2017.5.24上午學習內容

2017.5.24下午學習內容

2017.5.25上午學習內容

2017.6,2下午學習內容

2017.6.5上午學習內容

2017.6.5下午學習內容

學習內容3

《軟件工程綜合實踐》學習內容2

《軟件工程綜合實踐》學習內容3

2017年8月9日學習內容存放 #socket通信介紹

Sql學習內容

階段性學習內容

ChIP-seq 學習內容

相關推薦