單細胞分析實錄(8): 展示marker基因的4種圖形(一)
阿新 • • 發佈:2021-01-09
今天的內容講講單細胞文章中經常出現的展示細胞marker的圖:tsne/umap圖、熱圖、堆疊小提琴圖、氣泡圖,每個圖我都會用兩種方法繪製。
> 使用的資料來自文獻:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月發表在Cell Research上的關於鼻咽癌的文章,資料下載:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222848618-152201216.png)
髓系細胞數量相對少一些,為了方便演示,選它作為例子。
```
library(Seurat)
library(tidyverse)
library(harmony)
Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F)
Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F)
```
> 匯入資料的時候需要注意一個地方:從cell ranger得到的矩陣,每一列的列名會在CB後面加上"-1"這個字串,在R裡面匯入資料時,會自動轉化為".1",在做匹配的時候需要注意一下。我已經提前轉換為"_1"
```
> summary(as.numeric(Myeloid["CD14",]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.689 1.080 2.111 4.500
> summary(as.numeric(Myeloid["PTPRC",]))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.200 1.681 1.607 2.124 3.520
```
考慮到下載的表達矩陣,表達值都不是整數,且大於0,推測該矩陣已經經過了標準化,因此下面的流程會跳過這一步
### 0. Seurat流程
我們直接把註釋結果賦值到[email protected]矩陣中,後面省去聚類這一步
```
mye.seu=CreateSeuratObject(Myeloid)
[email protected]$CB=rownames([email protected])
[email protected]=merge([email protected],Myeloid_anno,by="CB")
rownames([email protected])[email protected]$CB
#替代LogNormalize這一步
mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts
mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000)
mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu))
mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE)
mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE)
mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20)
mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20)
#少了聚類
DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme(
axis.line = element_blank(),
axis.ticks = element_blank(),axis.text = element_blank()
)
ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm")
```
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222849738-1322101526.png)
基本符合原圖,三個亞群分得開,三個亞群分不開。
***
接下來,我用4種方式展示marker基因,這些基因可以在文獻的補充材料裡面找到。
### 1. tsne展示marker基因
```
FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+
scale_x_continuous("")+scale_y_continuous("")+
theme_bw()+ #改變ggplot2的主題
theme( #進一步修改主題
panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景線
axis.ticks = element_blank(),axis.text = element_blank(), #去掉座標軸刻度和數字
legend.position = "none", #去掉圖例
plot.title = element_text(hjust = 0.5,size=14) #改變標題位置和字型大小
)
ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm")
```
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850349-101913774.png)
另一種方法就是把tsne的座標和基因的表達值提取出來,用ggplot2畫,其實不是很必要,因為FeaturePlot也是基於ggplot2的,我還是演示一下
```
mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",])
colnames(mat1)="exp"
mat2=Embeddings(mye.seu,"tsne")
mat3=merge(mat2,mat1,by="row.names")
#資料格式如下:
> head(mat3)
Row.names tSNE_1 tSNE_2 exp
1 N01_AAACGGGCATTTCAGG_1 5.098727 32.748145 0.000
2 N01_AAAGATGCAATGTAAG_1 -24.394040 26.176422 0.000
3 N01_AACTCAGGTAATAGCA_1 11.856730 8.086553 0.000
4 N01_AACTCAGGTCTTCGTC_1 10.421878 12.660407 0.000
5 N01_AACTTTCAGGCCATAG_1 33.555756 -10.437406 1.606
6 N01_AAGACCTTCGAATGGG_1 -23.976967 11.897753 0.738
mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+
scale_color_gradient(low = "grey",high = "purple")+theme_bw()
ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm")
```
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850898-20484357.png)
用ggplot2的好處就是圖形修改很方便,畢竟ggplot2大家都很熟悉
### 2. 熱圖展示marker基因
畫圖前,需要給每個細胞一個身份,因為我們跳過了聚類這一步,此處需要手動賦值
```
Idents(mye.seu)="celltype"
library(xlsx)
markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1)
markerdf1$gene=as.character(markerdf1$gene)
# 這個表格整理自原文的附表,選了53個基因
#資料格式
# > head(markerdf1)
# gene celltype
# 1 S100B DC2(CD1C+)
# 2 HLA-DQB2 DC2(CD1C+)
# 3 FCER1A DC2(CD1C+)
# 4 CD1A DC2(CD1C+)
# 5 PKIB DC2(CD1C+)
# 6 NDRG2 DC2(CD1C+)
DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data")
ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm")
```
label = F不在熱圖的上方標註細胞型別,
slot = "scale.data"使用scale之後的矩陣畫圖,預設就是這個
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851364-2051865684.png)
接下來用pheatmap畫,在佈局上可以自由發揮
```
library(pheatmap)
[email protected][,c("CB","celltype")]
colanno=colanno%>%arrange(celltype)
rownames(colanno)=colanno$CB
colanno$CB=NULL
colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype))
```
先對細胞進行排序,按照celltype的順序,然後對基因排序
```
rowanno=markerdf1
rowanno=rowanno%>%arrange(celltype)
```
提取scale矩陣的行列時,按照上面的順序
```
mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)]
mat4[mat4>=2.5]=2.5
mat4[mat4 < (-1.5)]= -1.5 #小於負數時,加括號!
```
下面就是繪圖程式碼了,我加了分界線,使其看上去更有區分度
```
pheatmap(mat4,cluster_rows = F,cluster_cols = F,
show_colnames = F,
annotation_col = colanno,
gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]),
gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]),
filename="heatmap.2.pdf",width=11,height = 7
)
```
![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851925-2023792964.png)
***
先寫到這兒吧(原本以為能寫完的),剩下的氣泡圖、堆疊小提琴圖改天再補上。
> 因水平有限,有錯誤的地方,歡迎批評