Chip-seq peak annontation

阿新 • • 發佈：2018-03-10

4.5 seq output value pro bili 5.1 right ace

PeRl

narrowPeak/boardPeak

narrowPeak/boardPeak 是ENCODE可提供下載的兩種 Chip-seq 經過參考人類基因組mapping後的關於peak的數據.
其他類型的seq數據儲存個數可以參看FAQformat

narrowPeak

數據按照以下規則儲存:

1. string chrom: "Reference sequence chromosome or scaffold"
2. uint   chromStart: "Start position in chromosome"
3. uint   chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint   score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1]  strand: "+ or - or . for unknown"
7. float  signalValue: "Measurement of average enrichment for the region"
8. float  pValue: "Statistical significance of signal value (-log10). Set to -1 if not used."
9. float  qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."
10. int   peak: "Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-sour-ce called."

boardPeak

數據按照以下規則儲存:

1. string chrom: "Reference sequence chromosome or scaffold"
2. uint   chromStart: "Start position in chromosome"
3. uint   chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint   score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1]  strand: "+ or - or . for unknown"
7. float  signalValue: "Measurement of average enrichment for the region"
8. float  pValue: "Statistical significance of signal value (-log10). Set to -1 if<BR> not used."
9. float  qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."

在接下去的peak annotation中,只演示narrowPeak.bed格式數據.

示例數據

演示數據來自ENCODE 的 H3K9me3 的Chip-seq,樣本ID為ENCFF199BLM.

樣本的基本信息:

Homo sapiens liver male adult (32 years)
- Target: H3K9me3
- Lab: Bing Ren, UCSD
- Project: Roadmap

narrowPeak 數據基本信息:

##   seqnames     start       end       name score strand signalValue  pValue
## 1    chr10 100134639 100134831  Peak_7974   178      .     4.41261 6.68330
## 2    chr10 100446376 100446664  Peak_4189   244      .     5.13248 8.41215
## 3    chr10 100779568 100779699 Peak_32369   101      .     3.34436 4.50936
## 4    chr10  10088147  10088346 Peak_28509   112      .     4.05830 4.84890
## 5    chr10 101149252 101149594  Peak_6146   211      .     4.89252 7.53305
## 6    chr10 101173156 101173424 Peak_32985    94      .     3.45278 4.33257
##    qValue peak
## 1 2.96490  132
## 2 4.37217  165
## 3 1.47374  105
## 4 1.69566   90
## 5 3.67439  174
## 6 1.30993  237

構建註釋文件

在peak annotation中需要一個用於參考的註釋文件,需要包括一下信息: 1. 染色體名 2. 轉錄本起始位點 3. 轉錄本終止位點 4. gene的名字 5. 正反鏈

註釋文件可以直接從外部導入,也可以利用biomaRt 生成

library(ChIPpeakAnno)
library(biomaRt)
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)
# 需要篩選的特征
props <- c("ensembl_gene_id", "external_gene_name", "transcript_biotype", "chromosome_name", "start_position", "end_position", "strand")

# 篩選染色體號
lincRNA <- subset(
  getBM(attributes=props, mart=mart, filters = "chromosome_name", values = c(1:22,"X","Y")),
  transcript_biotype == "lincRNA"
  )
# 在染色體前加"chr"保持和narrowPeak數據一致
lincRNA[,4] <- paste0("chr", lincRNA[,4])

得到的數據包含以下信息:

##   ensembl_gene_id external_gene_name transcript_biotype chromosome_name
## 1 ENSG00000276255       RP5-881P19.7            lincRNA            chr1
## 2 ENSG00000234277          LINC01641            lincRNA            chr1
## 3 ENSG00000238107      RP11-495P10.5            lincRNA            chr1
## 4 ENSG00000274020          LINC01138            lincRNA            chr1
## 5 ENSG00000225620      RP11-569A11.2            lincRNA            chr1
## 6 ENSG00000237520       RP11-443B7.2            lincRNA            chr1
##   start_position end_position strand
## 1      228073909    228076550     -1
## 2      227393591    227431035      1
## 3      148295180    148297556      1
## 4      148290889    148519604     -1
## 5      202632428    202632911      1
## 6      234957231    234959989      1

進行peak annotation

有了這個註釋文件以後,我們就可以根據我們想要的篩選規則對peak進行註釋,主要用到的包是 ChIPpeakAnno.用到的函數為annotatePeakInBatch.

# 將前面得到的註釋文件轉換為RangedData對象
library(ChIPpeakAnno)
myCustomAnno <- RangedData(
  IRanges(
    start=lincRNA[,"start_position"], 
    end=lincRNA[,"end_position"],
    names=lincRNA[,"ensembl_gene_id"]),
  space=lincRNA[,"chromosome_name"],
  strand=lincRNA[,"strand"])
# 讀入需要註釋的narrowPeak數據
bed_file <- read.table("ENCFF199BLM.bed", header = T, sep = "\t", stringsAsFactors = F)
#　將peak數據轉換為GRanges
peaks <- toGRanges(bed_file, format="narrowPeak", colNames = colnames(bed_file))
#　根據需要進行篩選
anno <- annotatePeakInBatch(peaks, AnnotationData=myCustomAnno, 
                            output="overlapping", 
                            FeatureLocForDistance="TSS",
                            bindingRegion=c(-2000, 2000))

從anno中提取我們需要的lincRNA的ID

result_lincRNA <- anno@elementMetadata$feature
head(result_lincRNA)

## [1] "ENSG00000237579" "ENSG00000235180" "ENSG00000232259" "ENSG00000204365"
## [5] "ENSG00000226578" "ENSG00000260137"

Chip-seq peak annontation

4.5 seq output value pro bili 5.1 right ace Chip-seq peak annontation Chip-seq peak annontation PeRl

chip-seq技術

1.ChIP-seq簡介染色質免疫共沉澱技術（Chromatin-immunoprecipitation）也稱結合位點分析法，被用於蛋白質與DNA的互動作用。該技術將染色質免疫沉澱與大規模並行DNA測序相結合起來鑑定與DNA相關蛋白結合部位。其可被用於精確繪製任意

Chip-seq流程報告

一、摘要實驗旨在瞭解Chip-seq的基本原理。通過模仿文獻《Targeting super enhancer associated oncogenes in oesophageal squamous cell carcinoma》的流程，學會利用NCBI和EBI

ChIP-seq motif ROC 相關資料

【怪毛匠子】獨家整理不可以轉載 https://bioconductor.org/help/course-materials/2010/EMBL2010/100609-multtestindepfilt-huber.pdf ROC sensitivity…… &nbs

ChIP-seq 學習內容

chip-seq 流程圖書籍資料工具 UCSU 安裝使用原理手冊 Swiss線上分析工具短序列比對工具 BWA 流程格式處理序列比對 peak-calling motif 視覺化輸出文件上下游分析 ch

解決mysql報錯：- Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ'

_for tran contains column schema mysql eat table express mysql執行報錯： - Expression #1 of ORDER BY clause is not in GROUP BY clause and cont

Chip-seq peak annontation

Chip-seq peak annontation

PeRl

narrowPeak/boardPeak

narrowPeak

boardPeak

示例數據

構建註釋文件

進行peak annotation

Chip-seq peak annontation

chip-seq技術

Chip-seq流程報告

ChIP-seq motif ROC 相關資料

ChIP-seq 學習內容

解決mysql報錯：- Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'information_schema.PROFILING.SEQ'

162. Find Peak Element

LeetCode 162. Find Peak Element 20170706

【BioCode】根據seq與位點信息截取窗口

scRNA-seq單細胞測序數據分析工具匯總

[HEOI 2016] seq

Seq和Ack

xgene：WGS，突變與癌，RNA-seq，WES

xgene：之ROC曲線、ctDNA、small-RNA seq、甲基化seq、單細胞DNA, mRNA

RNA-Seq分析軟件HTSeq的安裝

shell 中seq的用法 echo -n用法

lintcode75- Find Peak Element- medium

《轉載》TCP三次握手中SYN，ACK，Seq三者的關系

.NET下使用 Seq結構化日誌系統

sbt編譯spark程序提示value toDF is not a member of Seq()

Chip-seq peak annontation

Chip-seq peak annontation

PeRl

narrowPeak/boardPeak

narrowPeak

boardPeak

示例數據

構建註釋文件

進行peak annotation

相關推薦