1. 程式人生 > 其它 >Eigensoft-smartpca分析PCA報錯:warning (mapfile): bad chrom: Segmentation fault

Eigensoft-smartpca分析PCA報錯:warning (mapfile): bad chrom: Segmentation fault

目錄

問題

一直以來用Eigensoft的smartpca來做群體遺傳的PCA分析很順暢,結果也比較靠譜。

但今天報錯如下:

$ ~/miniconda3/bin/smartpca -p smartpca.par
parameter file: smartpca.par
### THE INPUT PARAMETERS
##PARAMETER NAME: VALUE
genotypename: plink.ped
snpname: plink.pedsnp
indivname: plink.pedind
evecoutname: pca.vec
evaloutname: pca.val
numoutlieriter: 0
numchrom: 1000000
## smartpca version: 16000
norm used

warning (mapfile): bad chrom: 100	100:1816	0	1816
warning (mapfile): bad chrom: 101	101:1388	0	1388
warning (mapfile): bad chrom: 101	101:1922	0	1922
warning (mapfile): bad chrom: 102	102:1286	0	1286
warning (mapfile): bad chrom: 103	103:867	0	867
warning (mapfile): bad chrom: 104	104:149	0	149
warning (mapfile): bad chrom: 105	105:1532	0	1532
warning (mapfile): bad chrom: 106	106:1201	0	1201
warning (mapfile): bad chrom: 107	107:1113	0	1113
warning (mapfile): bad chrom: 108	108:255	0	255
Segmentation fault

這個原因有可能是染色體號為0導致。smartpca中 ,0意味著染色體編號資訊缺失。

檢查我的map檔案中第一列(染色體號),從1開始,並沒有為0。以前用帶chr或scaffold開頭的染色體資料做過,也沒有報錯。

解決

在Google group上找到了原因。

I have got Smartpca within EIGENSOFT (6.0.1) to work without converting with convertf - it will take map/ped directly. I have madified the output map/ped that stacks outputs.

EIGENSOFT and PLINK don't with thousands of chromosomes/contigs well - so I would suggest removing that info from the map file - replace the first column with all '1' for example. I do have some chromosome info so I have chromosomes 1-37 for assigned loci and I used for '40' for unassigned loci. I dont think smartpca likes a zero in the frist column of the map file.

example map file:

[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map)

ped file - I have the phenotype (col 6) set to missing (-9) and smartpca complains about it - but it works. 

example ped file:

[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped)

example parfile:

[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile) 

-Ryan

https://groups.google.com/g/stacks-users/c/rkN9Q5G6hXg

可以看到smartpca並不支援上千條的scaffold/contig(查看了下我的資料,有3000多contigs),而在做PCA分析時,染色體號並不影響最終結果。因此可將很碎的contig統一一個染色體號。

sed 's/contig[0-9]*/20/g' map.vcf

最終得到所有材料PCA結果。

https://www.jianshu.com/p/bdf1bc116127