單細胞分析實錄(8): 展示marker基因的4種圖形（一）

阿新 • • 發佈：2021-01-09

今天的內容講講單細胞文章中經常出現的展示細胞marker的圖：tsne/umap圖、熱圖、堆疊小提琴圖、氣泡圖，每個圖我都會用兩種方法繪製。 > 使用的資料來自文獻：Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma. 去年7月發表在Cell Research上的關於鼻咽癌的文章，資料下載：https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE150430 ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222848618-152201216.png) 髓系細胞數量相對少一些，為了方便演示，選它作為例子。 ``` library(Seurat) library(tidyverse) library(harmony) Myeloid=read.table("Myeloid_mat.txt",header = T,row.names = 1,sep = "\t",stringsAsFactors = F) Myeloid_anno=read.table("Myeloid_anno.txt",header = T,sep = "\t",stringsAsFactors = F) ``` > 匯入資料的時候需要注意一個地方：從cell ranger得到的矩陣，每一列的列名會在CB後面加上"-1"這個字串，在R裡面匯入資料時，會自動轉化為".1"，在做匹配的時候需要注意一下。我已經提前轉換為"_1" ``` > summary(as.numeric(Myeloid["CD14",])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 0.000 0.689 1.080 2.111 4.500 > summary(as.numeric(Myeloid["PTPRC",])) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 1.200 1.681 1.607 2.124 3.520 ``` 考慮到下載的表達矩陣，表達值都不是整數，且大於0，推測該矩陣已經經過了標準化，因此下面的流程會跳過這一步 ### 0. Seurat流程我們直接把註釋結果賦值到[email protected]矩陣中，後面省去聚類這一步 ``` mye.seu=CreateSeuratObject(Myeloid) [email protected]$CB=rownames([email protected]) [email protected]=merge([email protected],Myeloid_anno,by="CB") rownames([email protected])[email protected]$CB #替代LogNormalize這一步 mye.seu[["RNA"]]@data=mye.seu[["RNA"]]@counts mye.seu <- FindVariableFeatures(mye.seu, selection.method = "vst", nfeatures = 2000) mye.seu <- ScaleData(mye.seu, features = rownames(mye.seu)) mye.seu <- RunPCA(mye.seu, npcs = 50, verbose = FALSE) mye.seu=mye.seu %>% RunHarmony("sample", plot_convergence = TRUE) mye.seu <- RunUMAP(mye.seu, reduction = "harmony", dims = 1:20) mye.seu <- RunTSNE(mye.seu, reduction = "harmony", dims = 1:20) #少了聚類 DimPlot(mye.seu, reduction = "tsne", group.by = "celltype", pt.size=1)+theme( axis.line = element_blank(), axis.ticks = element_blank(),axis.text = element_blank() ) ggsave("tsne1.pdf",device = "pdf",width = 17,height = 14,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222849738-1322101526.png) 基本符合原圖，三個亞群分得開，三個亞群分不開。 *** 接下來，我用4種方式展示marker基因，這些基因可以在文獻的補充材料裡面找到。 ### 1. tsne展示marker基因 ``` FeaturePlot(mye.seu,features = "CCR7",reduction = "tsne",pt.size = 1)+ scale_x_continuous("")+scale_y_continuous("")+ theme_bw()+ #改變ggplot2的主題 theme( #進一步修改主題 panel.grid.major = element_blank(),panel.grid.minor = element_blank(), #去掉背景線 axis.ticks = element_blank(),axis.text = element_blank(), #去掉座標軸刻度和數字 legend.position = "none", #去掉圖例 plot.title = element_text(hjust = 0.5,size=14) #改變標題位置和字型大小 ) ggsave("CCR7.pdf",device = "pdf",width = 10,height = 10.5,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850349-101913774.png) 另一種方法就是把tsne的座標和基因的表達值提取出來，用ggplot2畫，其實不是很必要，因為FeaturePlot也是基於ggplot2的，我還是演示一下 ``` mat1=as.data.frame(mye.seu[["RNA"]]@data["CCR7",]) colnames(mat1)="exp" mat2=Embeddings(mye.seu,"tsne") mat3=merge(mat2,mat1,by="row.names") #資料格式如下： > head(mat3) Row.names tSNE_1 tSNE_2 exp 1 N01_AAACGGGCATTTCAGG_1 5.098727 32.748145 0.000 2 N01_AAAGATGCAATGTAAG_1 -24.394040 26.176422 0.000 3 N01_AACTCAGGTAATAGCA_1 11.856730 8.086553 0.000 4 N01_AACTCAGGTCTTCGTC_1 10.421878 12.660407 0.000 5 N01_AACTTTCAGGCCATAG_1 33.555756 -10.437406 1.606 6 N01_AAGACCTTCGAATGGG_1 -23.976967 11.897753 0.738 mat3%>%ggplot(aes(tSNE_1,tSNE_2))+geom_point(aes(color=exp))+ scale_color_gradient(low = "grey",high = "purple")+theme_bw() ggsave("CCR7.2.pdf",device = "pdf",width = 13.5,height = 12,units = "cm") ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222850898-20484357.png) 用ggplot2的好處就是圖形修改很方便，畢竟ggplot2大家都很熟悉 ### 2. 熱圖展示marker基因畫圖前，需要給每個細胞一個身份，因為我們跳過了聚類這一步，此處需要手動賦值 ``` Idents(mye.seu)="celltype" library(xlsx) markerdf1=read.xlsx("ref_marker.xlsx",sheetIndex = 1) markerdf1$gene=as.character(markerdf1$gene) # 這個表格整理自原文的附表，選了53個基因 #資料格式 # > head(markerdf1) # gene celltype # 1 S100B DC2(CD1C+) # 2 HLA-DQB2 DC2(CD1C+) # 3 FCER1A DC2(CD1C+) # 4 CD1A DC2(CD1C+) # 5 PKIB DC2(CD1C+) # 6 NDRG2 DC2(CD1C+) DoHeatmap(mye.seu,features = markerdf1$gene,label = F,slot = "scale.data") ggsave("heatmap.pdf",device = "pdf",width = 23,height = 16,units = "cm") ``` label = F不在熱圖的上方標註細胞型別， slot = "scale.data"使用scale之後的矩陣畫圖，預設就是這個 ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851364-2051865684.png) 接下來用pheatmap畫，在佈局上可以自由發揮 ``` library(pheatmap) [email protected][,c("CB","celltype")] colanno=colanno%>%arrange(celltype) rownames(colanno)=colanno$CB colanno$CB=NULL colanno$celltype=factor(colanno$celltype,levels = unique(colanno$celltype)) ``` 先對細胞進行排序，按照celltype的順序，然後對基因排序 ``` rowanno=markerdf1 rowanno=rowanno%>%arrange(celltype) ``` 提取scale矩陣的行列時，按照上面的順序 ``` mat4=mye.seu[["RNA"]]@scale.data[rowanno$gene,rownames(colanno)] mat4[mat4>=2.5]=2.5 mat4[mat4 < (-1.5)]= -1.5 #小於負數時，加括號！ ``` 下面就是繪圖程式碼了，我加了分界線，使其看上去更有區分度 ``` pheatmap(mat4,cluster_rows = F,cluster_cols = F, show_colnames = F, annotation_col = colanno, gaps_row=as.numeric(cumsum(table(rowanno$celltype))[-6]), gaps_col=as.numeric(cumsum(table(colanno$celltype))[-6]), filename="heatmap.2.pdf",width=11,height = 7 ) ``` ![](https://img2020.cnblogs.com/other/2265439/202101/2265439-20210108222851925-2023792964.png) *** 先寫到這兒吧（原本以為能寫完的），剩下的氣泡圖、堆疊小提琴圖改天再補上。 > 因水平有限，有錯誤的地方，歡迎批評

單細胞分析實錄(8): 展示marker基因的4種圖形（一）

單細胞分析實錄(8): 展示marker基因的4種圖形（一）

單細胞分析實錄(9): 展示marker基因的4種圖形（二）

Linux 4.10.8 根文件系統制作（一）---環境搭建

Java併發（4）深入分析java執行緒池框架及實現原理（一）

更改Mysql 密碼的4種方法（轉）

java保留兩位小數4種方法（轉載）

4.flume實戰（一）

Redis3.0.4基礎操作（一）

【Flask】4個session（一）狀態保持及請求/應用向下文

spring-boot-admin原始碼分析及單機監控spring-boot-monitor的實現（一）

jumpserver1.4.4使用體驗（一）

sqoop-1.4.7安裝（一）

4種認證（authentication）或授權（authorization）方式

java網路程式設計：8、基於TCP的socket程式設計（一）簡單的socket通訊_一個客戶端

Spring 4.1.6（一）

.NET 4.0改進（一）

C#中操作Word（8）—— 向Word中插入圖表的三種方法（一）

單細胞分析實錄(5): Seurat標準流程

單細胞分析實錄(7): 差異表達分析/細胞型別註釋

【集合框架】JDK1.8源碼分析之HashMap（一）轉載

單細胞分析實錄(8): 展示marker基因的4種圖形（一）

相關推薦