Lasso linear model實例 | Proliferation index | 評估單細胞的增殖指數

阿新 • • 發佈：2018-04-04

獨立 -m take 其他應用 round ever 方法 air ssi

背景：We developed a cell-cycle scoring approach that uses expression data to compute an index for every cell that scores the cell according to its expression of cell-cycle genes. In brief, our approach proceeded through four steps. (A) We reduced dimensionality of the dataset to the cell-cycle relevant genes. (B) In this subspace we performed, as a first approximation, a simple K-means clustering to separate non cycling from cycling cells and (C) we used this clustering as a reference to learn a function that takes the gene expression as the input and returns a cell-cycle score as an output. (D) We used this function to calculate a score for each single cell.

數據是每個細胞的基因表達矩陣，需求是根據基因表達信息計算每一個細胞的增殖指數（依據是細胞周期基因）。

我們常規能想到的就是建立一個線性模型，每一個細胞周期基因當做一個變量，輸出一個數值，就是增殖指數，然後正則化到0~1.

問題是這樣的話，每個基因前面的系數怎麽確定？所以建議一個簡單的方程是不可行的，我們必須要做有監督學習模型。那麽有監督的數據怎麽來呢？我們的數據沒有lable啊。

下面就是文章中的方法：

我們需要計算增殖指數的數據沒有lable，那我們就手動為其建立lable。

通過簡單的kmeans聚類，我們就可以篩選出增殖指數高的細胞類群，以此為訓練集，來構建監督學習模型。

然後用建好的模型再來對我們的數據進行預測，得到每一個細胞的增殖指數。

We started by selecting a wide selection of genes related to cell-cycle and proliferation. We used the PANTHER GO database and selected all the genes that were described by one of the following terms: DNA metabolic process, DNA replication, mitosis, regulation of cell cycle, cell cycle, cytokinesis, histone, DNA-directed DNA polymerase, DNA polymerase processivity factor, centromere DNAbinding protein. We restricted our features to those genes. Genes that were detected at less than 10 molecules in the dataset were removed. We calculated the pairwise correlation coefficient matrix, and selected the genes that were strongly correlated (99th percentile of the matrix) with at least 12 other genes. The genes passing the filters described above were used for clustering cells using K-means (Python scikit-learn implementation, on log-centered data, default parameters) with the rationale that the main axis of variation expected would span across dividing and non-dividing cells. Then a linear regression model with L1-norm regularization was fitted that used a learning function which took expression data of a cell and categorized into two classes, 1 when a cell belongs to the cycling cluster and 0 when it did not. Importantly, to avoid both overfitting the score on the first approximation clusters and also to obtain a more generalizable model, we used a strong regularization (5 times the one determined by cross-validation; alpha = 0.01).

This procedure was used for both the mouse and human embryonic dataset. The function learnt on the human embryonic dataset was also used to determine the proliferation index of the hPSCs.

當然文章的處理更加細心：

1. 首先從PANTHER GO數據庫選出cell cycle相關的基因；

2. 計算了每個基因的相關性，去掉了獨立存在的基因；

3. K-means聚類分三類，得到學習數據

4. linear regression model with L1-norm，為防止過擬合，參數設得比較嚴格。

這種方法從機器學習的角度給了一個大致的增殖指數，肯定不會錯，但是應該也不會太準，但是用於比較不同細胞的增殖差異還是足夠的。

如果想要ground truth，就必須要得到實驗上更嚴格的數據來源，比如高度增殖的細胞和完全不增殖的細胞的基因表達數據。

代碼：ipynb-lamanno2016-proliferation.ipynb

代碼註釋已經比較完善，後續會進行總結分析，並擴展延伸到其他應用上。

所以這種模型通用性還是比較強的。

比如拿細胞雕亡和細胞衰老相關的基因來計算每個細胞的衰老程度。

核心問題是如何選擇出合適的gene list！對於有的指標很難選出合適的gene list。

Lasso linear model實例 | Proliferation index | 評估單細胞的增殖指數

獨立 -m take 其他應用 round ever 方法 air ssi 背景：We developed a cell-cycle scoring approach that uses expression data to compute an index for eve

Bayesian generalized linear model (GLM) | 貝葉斯廣義線性回歸實例

gamma tail merge detailed 變量 clas under acc sig 學習GLM的時候在網上找不到比較通俗易懂的教程。這裏以一個實例應用來介紹GLM。 We used a Bayesian generalized linear model

10.model/view實例（2）

code == splay stat 根據表格例子修改顯示任務：顯示一個2x3的表格，將表格中的數據顯示如下：　　　　思考： 1.如何顯示數據和上個例子一樣。 2.但是每個單元格的數據都是有角色劃分的。 Qt::ItemDataRole 3.View從

互聯網金融不同渠道評估實例

但是整體 blog 默認 5.7 使用列數時間轉換上月一、分析背景評估不同渠道，為差異化運營做支持，拉動復投二、選擇數據源維度說明（維度根據目的可選擇不同維度，本例側重拉動復投）截止上月新客人數：對應渠道在統計截止時間轉換的客戶人數截止上月新客金額：客

利用反射和泛型把Model對象按行儲存進數據庫以及按行取出然後轉換成Model 類實例 MVC網站通用配置項管理

serve 取數 sele oid 數據庫操作 comm 評論 data http 利用反射和泛型把Model對象按行儲存進數據庫以及按行取出然後轉換成Model 類實例 MVC網站通用配置項管理 2018-3-10 15:18 | 發布:Admin | 分類:代碼庫

最大似然估計實例 | Fitting a Model by Maximum Likelihood (MLE)

-- sed clu ans fail warnings reg model perf 參考：Fitting a Model by Maximum Likelihood 最大似然估計是用於估計模型參數的，首先我們必須選定一個模型，然後比對有給定的數據集，然後構建一個聯合概

Mongoose 內置 CURD 方法、擴展 Mongoose Model 的靜態方法和實例方法

基本上 fault number upd 實例方法方法 ODB log ejs Mongoose 內置 CURD 方法 Mongoose 內置 CURD 方法文檔地址：https://mongoosejs.com/docs/queries.html 常用的方法如下

元數據管理器中存在錯誤。實例化來自文件“\?C:Program FilesMicrosoft SQL ServerMSAS11.MSSQLSERVEROLAPDataTfs_Analysis.0.dbvDimTestCaseOverlay.874.dim.xml”的元數據對象時出錯。

參數配置錯誤 manage 但是加密 olap 右上角 alt 剛才一、發現問題啟動SQLSERVER的數據分析服務失敗查看系統日誌錯誤如下：雙擊錯誤後顯示詳細錯誤：元數據管理器中存在錯誤。實例化來自文件“\\?\C:\Pro

Lasso linear model實例 | Proliferation index | 評估單細胞的增殖指數

Lasso linear model實例 | Proliferation index | 評估單細胞的增殖指數

Bayesian generalized linear model (GLM) | 貝葉斯廣義線性回歸實例

10.model/view實例（2）

互聯網金融不同渠道評估實例

利用反射和泛型把Model對象按行儲存進數據庫以及按行取出然後轉換成Model 類實例 MVC網站通用配置項管理

最大似然估計實例 | Fitting a Model by Maximum Likelihood (MLE)

Mongoose 內置 CURD 方法、擴展 Mongoose Model 的靜態方法和實例方法

元數據管理器中存在錯誤。實例化來自文件“\?C:Program FilesMicrosoft SQL ServerMSAS11.MSSQLSERVEROLAPDataTfs_Analysis.0.dbvDimTestCaseOverlay.874.dim.xml”的元數據對象時出錯。

WordPress主題開發：格式化標題實例

Cocos2d-x開發實例介紹幀動畫使用

每天一個JavaScript實例-展示設置和獲取CSS樣式設置

nodejs+mysql入門實例（增）

nodejs+mysql入門實例（改）

MVC模式在Java Web應用程序中的實例分析

Selenium_Python接口-實例對象操作類WebDriver

AI實例教程0001-制作車輪&0002-卡通小人&0003-制作卡通小鴨子

Cocos2d-x中Vector<T>容器以及實例介紹

MVC實例

單實例12.2.0.1安裝

MVC實例應用模式

Lasso linear model實例 | Proliferation index | 評估單細胞的增殖指數

相關推薦