Eigensoft-smartpca分析PCA報錯:warning (mapfile): bad chrom: Segmentation fault
阿新 • • 發佈:2021-07-21
目錄
問題
一直以來用Eigensoft的smartpca來做群體遺傳的PCA分析很順暢,結果也比較靠譜。
但今天報錯如下:
$ ~/miniconda3/bin/smartpca -p smartpca.par parameter file: smartpca.par ### THE INPUT PARAMETERS ##PARAMETER NAME: VALUE genotypename: plink.ped snpname: plink.pedsnp indivname: plink.pedind evecoutname: pca.vec evaloutname: pca.val numoutlieriter: 0 numchrom: 1000000 ## smartpca version: 16000 norm used warning (mapfile): bad chrom: 100 100:1816 0 1816 warning (mapfile): bad chrom: 101 101:1388 0 1388 warning (mapfile): bad chrom: 101 101:1922 0 1922 warning (mapfile): bad chrom: 102 102:1286 0 1286 warning (mapfile): bad chrom: 103 103:867 0 867 warning (mapfile): bad chrom: 104 104:149 0 149 warning (mapfile): bad chrom: 105 105:1532 0 1532 warning (mapfile): bad chrom: 106 106:1201 0 1201 warning (mapfile): bad chrom: 107 107:1113 0 1113 warning (mapfile): bad chrom: 108 108:255 0 255 Segmentation fault
這個原因有可能是染色體號為0導致。smartpca中 ,0意味著染色體編號資訊缺失。
檢查我的map檔案中第一列(染色體號),從1開始,並沒有為0。以前用帶chr或scaffold開頭的染色體資料做過,也沒有報錯。
解決
在Google group上找到了原因。
I have got Smartpca within EIGENSOFT (6.0.1) to work without converting with convertf - it will take map/ped directly. I have madified the output map/ped that stacks outputs. EIGENSOFT and PLINK don't with thousands of chromosomes/contigs well - so I would suggest removing that info from the map file - replace the first column with all '1' for example. I do have some chromosome info so I have chromosomes 1-37 for assigned loci and I used for '40' for unassigned loci. I dont think smartpca likes a zero in the frist column of the map file. example map file: [https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map) ped file - I have the phenotype (col 6) set to missing (-9) and smartpca complains about it - but it works. example ped file: [https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped) example parfile: [https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile) -Ryan
可以看到smartpca並不支援上千條的scaffold/contig(查看了下我的資料,有3000多contigs),而在做PCA分析時,染色體號並不影響最終結果。因此可將很碎的contig統一一個染色體號。
sed 's/contig[0-9]*/20/g' map.vcf
最終得到所有材料PCA結果。