Admixture的監督分群（Supervised analysis）

阿新 • • 發佈：2020-12-17

說明
實戰

說明

Admixture通過EM演算法一般用於指定亞群分類；或者在不知材料群體結構背景下，通過迭代交叉驗證獲得error值，取最小error對應的K值為推薦亞群數目。如果我們預先已知群體的型別（百分百確信），那麼可以考慮監督分類方法，設定標籤，提高分群的準確性。

Admixture目前是1.3.0，文件也剛更新不久。

怕翻譯有誤，貼上官方文件：

Estimating P and Q from the SNP matrix G, without any additional information, can be
viewed as an unsupervised learning problem. However it is not uncommon that some or
all of the individuals in our data sample will have known ancestries, allowing us to set
some rows in the matrix Q to known constants. This allows more accurate estimation of
the ancestries of the remaining individuals, and of the ancestral allele frequencies. Viewing
these reference individuals as training samples, the problem is transformed into a supervised
learning problem.

Supervised learning mode is enabled with the flag --supervised and requires an additional
file with a .pop suffix, specifying the ancestries of the reference individuals. It is assumed
that all reference samples have 100% ancestry from some ancestral population. Each line
of the .pop file corresponds to individual listed on the same line number in the .fam or
.ped file. If the individual is a population reference, the .pop file line should be a string
(beginning with an alphanumeric character) designating the population. If the individual
is of unknown ancestry, use “-” (or a blank line, or any non-alphanumeric character) to
indicate that the ancestry should be estimated.

文件中說要準備一個.pop為字尾的群體檔案，就是對個體進行分類（字元型），型別未知的可用“-”替代。不建議在windows中建立，因為換行符不同的問題。

如何驗證準備的.pop檔案？作者建議使用paste .fam .pop檢視個體數目是不是相等（用wc -l不是更簡單嗎？）。

問題來了，作者根本就沒說明到底怎麼執行？我嘗試了下，簡單記錄下。

實戰

下載官網示例資料：
http://dalexander.github.io/admixture/download.html

解壓後，有plink資料格式，配套的bed,bim,fam，但少了個ped，沒有和map配套。這個作者有點粗心，不過可以用plink轉一下：

wget http://dalexander.github.io/admixture/hapmap3-files.tar.gz
tar -xvf hapmap3-files.tar.gz
plink --bfile hapmap3 --recode --out hapmap3--noweb
wc -l hapmap3*

準備hapmap3.pop檔案（注意字首和pink資料保持一致，且在同一目錄），可用R、awk等工具，隨意模擬一個：

dat = data.frame(V1 = rep(c("A","-","B","-","C","-"),each=54))
write.table(dat,"hapmap3.pop",row.names=F,col.names=F,quote=F,sep="\t")

加上supervised，執行admixture即可：

admixture hapmap3.ped 3 --supervised

可以看看不加supervised和加了的區別，沒加的結果：

加了的結果：

還是有很大差異的。具體對後續結果的影響這裡就不研究了。

Admixture的監督分群（Supervised analysis）

目錄說明實戰說明 Admixture通過EM演算法一般用於指定亞群分類；或者在不知材料群體結構背景下，通過迭代交叉驗證獲得error值，取最小error對應的K值為推薦亞群數目。如果我們預先已知群體的型別（百分百確信），那

分貝殼（二分法）

題目連結： https://www.nowcoder.com/practice/9b59014cc1544aeeb4082f5f37ecfaea?tpId=122&&tqId=33725&rp=1&ru=/ta/exam-wangyi&qru=/ta/exam-wangyi/question-ranking

分治法（C語言）leetcode 169,53,215

技術標籤：leetcode 169多數元素給定一個大小為 n 的陣列，找到其中的多數元素。多數元素是指在陣列中出現次數大於 ⌊ n/2 ⌋ 的元素。

老闆派給李四的新任務-分土豆（抽象工廠）-設計模式篇

技術標籤：# 設計模式java設計模式文章目錄情景描述一、明確土豆大小二、確定抽象工廠開始生產開始新生產線結尾

/* * 程式設計第一題(20分): 1+（1+2）+（1+2+3）+……+（1+2+3+……+98+99+100） */

技術標籤：java演算法上機java演算法題目： /* 程式設計第一題(20分): 1+（1+2）+（1+2+3）+……+（1+2+3+……+98+99+100） */

「IOI 2021」分糖果（線段樹）

傳送門分析：離線後對序列做掃描線。相當於掃到 \$l\$ 和 \$r + 1\$ 時，新增或刪除一個 \$t\$ 時刻的操作。開個陣列 \$a\$，把 \$t\$ 時刻對糖果數量的修改量記在 \$a_t\$ 上，我們要支援 \$a\$ 的

【ShardingSphere技術專題】「ShardingJDBC」SpringBoot之整合ShardingJDBC實現分庫分表（JavaConfig方式）

前提介紹 ShardingSphere介紹 ShardingSphere是一套開源的分散式資料庫中介軟體解決方案組成的生態圈，它由Sharding-JDBC、Sharding-Proxy和Sharding-Sidecar（計劃中）這3款相互獨立的產品組成。他們均提供標準化

【資料結構】分塊（2/8）希望能抓到8月份的小尾巴

目錄分塊分塊的意思如何寫分塊處理最後一塊的兩種寫法藍書（開闢新的塊）Pecco（拉長最後一塊至包含到結尾）分塊的用途實現對區間l到r的加減，並最終能夠查詢到區間內某一個點的值。區間修改（加減部分）LOJ#6277.

jieba 分詞（西遊記）

import jieba with open(\'./西遊記.txt\', \'r\', encoding=\'utf_8\') as f: words = jieba.lcut(f.read())# 使用精確模式對文字進行分詞

字首和與差分模板（互逆運算）

序列和 #include<iostream> using namespace std; const int N=100010; int a[N],b[N]; int main(){ int n,m,l,r;

原神群（招人）誠心誠意，好客原人，歡迎您。

旅行者擊劍基地 955135368 本群含有以下要素：海豹上岸，非酋沉船萌新互助，鹹魚養老

明日方舟愚人節活動63025分！（第一關）

要300字啊啊啊，攻略我1個贊後再出吧即使道路的盡頭沒有獎賞，前進本身就是舉火，我亦當向前。我所做的一切不為交易恩賜，不為換取垂憐，我依仗自己堅信的事物而行路，是因為這是唯一有尊嚴的生活。

【自然框架】QuickPager分頁控制元件，新增一種分頁方式——偽URL分頁（Postback版）

適用場景　　先說一下偽URL分頁的適用場景。在網站的網頁裡實現查詢功能，如果查詢條件比較少的話，還比較好辦，把查詢條件放到URL裡面傳遞即可。但是如果查詢條件過多，就會照成URL的長度過長。既不好看，編寫起

資料分析之兩種使用者分群方法（RFM和聚類）

本文由於沒有現成的資料，就自己生成了一些商品訂單資料，基於該資料進行了RFM和聚類的構建

Elasticsearch-Analysis-IK中文分詞器安裝配置和使用（非常詳細）

技術標籤：ESelasticsearch Elasticsearch 預設已經含有的分詞法 Standard 分詞器英文的處理能力同於StopAnalyzer.支援中文采用的方法為單字切分。他會將詞彙單元轉換成小寫形式，並去除停用詞和標點符號simple

寫在教師節：分頁場景（limit,offset）為什麼會慢

從一個問題說起五年前在騰訊的時候，發現分頁場景下，mysql請求速度非常慢。資料量只有10w的情況下，select xx from 單機大概2，3秒。

【BZOJ2724】蒲公英題解（分塊+區間眾數）

題目連結題目大意：給定一段長度為$n$的序列和$m$次詢問，每次詢問區間$[l,r]$內的最小的眾數。$n\\leq 40000,a_i\\leq 10^9$

Spring Boot入門系列（十六）使用pagehelper實現分頁功能

之前講了Springboot整合Mybatis，然後介紹瞭如何自動生成pojo實體類、mapper類和對應的mapper.xml 檔案，並實現最基本的增刪改查功能。接下來要說一說Mybatis 的分頁功能：使用Mybatis-PageHelper外掛，實現分頁功能

樹鏈剖分（輕重鏈）

<前言> 樹鏈剖分是我開始有點手熟的資料結構，未免遺忘，總結。其他資料結構會一一補上，而且會多次修訂，歡迎指教。

poj 1026 Cipher （置換群，迴圈節）

poj 1026Cipher （置換群，迴圈節） POJ - 1026 題意： Bob and Alice started to use a brand-new encoding scheme. Surprisingly it is not a Public Key Cryptosystem, but their encoding and decoding is base

Admixture的監督分群（Supervised analysis）

說明

實戰

相關推薦