Google雲平臺使用方法 | Hail | GWAS

阿新 • • 發佈：2018-05-14

lar rom highlight serve BE min -s mea control

參考：

Hail

Hail - Tutorial windows也可以安裝：Spark在Windows下的環境搭建

spark-2.2.0-bin-hadoop2.7 - Hail依賴的平臺，並行處理

google cloud platform - 雲平臺

Broad‘s data cluster set-up tool

對Google cloud SDK的一個簡單的wrap，方便操作。

cloudtools is a small collection of command line tools intended to make using Hail on clusters running in Google Cloud‘s Dataproc service simpler.

These tools are written in Python and mostly function as wrappers around the gcloud suite of command line tools included in the Google Cloud SDK.

Google cloud基本使用

安裝gcloud

登錄，[GCloud] 讓 gcloud 連到新的 Google 帳戶下的 Google Cloud Platform

只需15分鐘，使用谷歌雲平臺運行Jupyter Notebook

基本操作：

創建項目

進入控制臺，點擊三點符號

創建和刪除虛擬機

gcloud dataproc clusters delete name

上傳和刪除文件

gcloud datastore create-indexes index.yaml

在程序中讀入和寫出文件

f1 = hc.read("gs://somewhere")

目前只是單獨的使用一個VM，如何想批量並行使用Google cloud的VM就必須要使用分布式管理系統，如spark等，hail就是集成了spark。

Hail的基本使用

This snippet starts a cluster named "testcluster" with the 1 master machine, 2 worker machines (the minimum/default), and 6 additional preemptible worker machines. Then, after the cluster is started (this can take a few minutes), a Hail script is submitted to the cluster "testcluster".

spark基本原理

1. 在本地運行wrapper，創建Google cloud虛擬機

cluster start testcluster   --master-machine-type n1-highmem-8   --worker-machine-type n1-standard-8   --num-workers 8   --version devel   --spark 2.2.0   --zone asia-east1-a

2. 啟動notebook

cluster connect testcluster notebook

3. 本地提交腳本到Google cloud上

cluster submit testcluster myhailscript.py

4. 登錄到Google cloud，安裝必備軟件

gcloud compute ssh testcluster-m --zone asia-east1-a

5. 安裝sklearn

sudo su # to be root and install packages
/opt/conda/bin/conda install scikit-learn

文章案例

Genome-wide gene-environment analyses of depression and reported lifetime traumatic experiences in UK Biobank

把這篇文章搞懂80%，遺傳和統計就基本入門了，操作性很強。

Depression is more frequently observed among individuals exposed to traumatic events. The relationship between trauma exposure and depression, including the role of genetic variation, is complex and poorly understood. The UK Biobank concurrently assessed depression and reported trauma exposure in 126,522 genotyped individuals of European ancestry. We compared the shared aetiology of depression and a range of phenotypes, contrasting individuals reporting trauma exposure with those who did not (final sample size range: 24,094- 92,957). Depression was heritable in participants reporting trauma exposure and in unexposed individuals, and the genetic correlation between the groups was substantial and not significantly different from 1. Genetic correlations between depression and psychiatric traits were strong regardless of reported trauma exposure, whereas genetic correlations between depression and body mass index (and related phenotypes) were observed only in trauma exposed individuals. The narrower range of genetic correlations in trauma unexposed depression and the lack of correlation with BMI echoes earlier ideas of endogenous depression.

Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression

Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.

一些問題

Hail是用來幹嘛的？

案例：gnomAD

The Neale Lab at the Broad Institute used Hail to perform QC and genome-wide association analysis of 2419 phenotypes across 10 million variants and 337,000 samples from the UK Biobank in 24 hours. paper

Hail’s functionality is exposed through Python and backed by distributed algorithms built on top of Apache Spark to efficiently analyze gigabyte-scale data on a laptop or terabyte-scale data on a cluster.

a library for analyzing structured tabular and matrix data
a collection of primitives for operating on data in parallel
a suite of functionality for processing genetic data
not an acronym

# conda env create -n hail -f $HAIL_HOME/python/hail/environment.yml
source activate hail
cd $HAIL_HOME/tutorials
jhail

運行GWAS

1kg_annotations.txt

Sample  Population      SuperPopulation isFemale        PurpleHair      CaffeineConsumption
HG00096 GBR     EUR     False   False   77.0
HG00097 GBR     EUR     True    True    67.0
HG00098 GBR     EUR     False   False   83.0
HG00099 GBR     EUR     True    False   64.0
HG00100 GBR     EUR     True    False   59.0
HG00101 GBR     EUR     False   True    77.0

1kg.mt目錄

.
├── _SUCCESS
├── cols
│   ├── _SUCCESS
│   ├── metadata.json.gz
│   └── rows
│       ├── metadata.json.gz
│       └── parts
│           └── part-0
├── entries
│   ├── _SUCCESS
│   ├── metadata.json.gz
│   └── rows
│       ├── metadata.json.gz
│       └── parts
│           ├── part-00-2-0-0-6886f608-afb6-1e68-684b-3c5920e7edd5
│           ├── part-01-2-1-0-3d30160f-dba0-16f4-e898-4e7c30148855
│           ├── part-02-2-2-0-1051da4b-6799-6074-7d32-9bd7fa9ed9af
├── globals
│   ├── _SUCCESS
│   ├── globals
│   │   ├── metadata.json.gz
│   │   └── parts
│   │       └── part-0
│   ├── metadata.json.gz
│   └── rows
│       ├── metadata.json.gz
│       └── parts
│           └── part-0
├── metadata.json.gz
├── references
└── rows
    ├── _SUCCESS
    ├── metadata.json.gz
    └── rows
        ├── metadata.json.gz
        └── parts
            ├── part-00-2-0-0-6886f608-afb6-1e68-684b-3c5920e7edd5
            ├── part-01-2-1-0-3d30160f-dba0-16f4-e898-4e7c30148855
            ├── part-02-2-2-0-1051da4b-6799-6074-7d32-9bd7fa9ed9af

問題：只需15分鐘，使用谷歌雲平臺運行Jupyter Notebook

GWAS的原理

臨床生物信息學中的GWAS分析

GWAS基本分析內容

待續~

Google雲平臺使用方法 | Hail | GWAS

lar rom highlight serve BE min -s mea control 參考： Hail Hail - Tutorial windows也可以安裝：Spark在Windows下的環境搭建 spark-2.2.0-bin-hadoop2.7 - H

Google雲平臺使用方法 | Hail | GWAS

Google cloud基本使用

Hail的基本使用

文章案例

一些問題

運行GWAS

Google雲平臺使用方法 | Hail | GWAS

Google雲平臺使用方法

Google雲平臺容器引擎GKE_Kubernetes中文社群

google雲平臺的使用

在雲平臺上基於Go語言+Google圖表API提供二維碼生成應用

ThoughtSpot宣佈與Google雲端平臺合作，為企業提供多雲分析

資訊儲存在雲平臺上通常採用什麼方法?

自己更換雲平臺繫結QQ號的方法

更換雲平臺繫結QQ號的方法

Discuz X3.2雲平臺開通地址及方法

Colaboratory掛靠Google Drive雲盤方法

discuz 進入開通雲平臺頁面和服務列表頁面方法

氚雲平臺介紹

基於 Arduino 和 IoT 雲平臺搭建物聯網系統

google gflag使用方法舉例

案例解讀｜江蘇銀行—智多星大數據分析雲平臺實踐

基於TFS的.net技術路線的雲平臺DevOps實踐

亞馬遜AWS在線系列講座——基於AWS雲平臺的高可用應用設計

魅族容器雲平臺自動化運維實踐

Microsoft Azure 微軟雲平臺系列新品發布

Google雲平臺使用方法 | Hail | GWAS

Google cloud基本使用

Hail的基本使用

文章案例

一些問題

運行GWAS

相關推薦