機器學習筆記（Washington University）- Clustering Specialization-week four

阿新 • • 發佈：2017-06-02

++ blog isp special specified ring all png mat

1. Probabilistic clustering model

(k-means) Hard assignments do not tell the full story, capture the uncertainty
k-means only considers the cluster center, not good for overlapping clusters,disparate cluster size,different shaped cluster
learn weights on dimensions
can learn cluster-specific weights on dimensions

2. Gaussian distribution

1-D gaussian is fully specified by mean μ and variance σ².

2-D gaussian is fully specified by mean μ vector and covariance matrix Σ.

技術分享

thusly our mixture model of gaussian is defined by

{π_k, μ_k, Σ_k}

3. EM(Expectation maximization)

what if we knew the cluster parameters {π_k

, μ_k, Σ_k} ?

compute responsibilites:

技術分享

r_ik is the responsibility cluster k takes for observation i.

p is the probability of assignment to cluster k, given model parameters and observaed value.

π_k is the initial probability of being from cluster k.

N is the gaussian model.

what if we knew the cluster soft assignments r_ij

技術分享

The procedure for the iterative algorithm:

1. initialize

2. estimate cluster responsibilities given current parameter estimates(E-step)

3. maximize likelihood given soft assignments

Notes:

EM is a coordinate-ascent algorithm

EM converges to a local mode

There are many ways to initialize the EM algorithm and it is important for convergence rates and quality of local mode

random choose k centroids
pick center sequentially like in k-means++
initilize k-means solution
grow mixture model by splitting until k clusters are formed

prevent overfitting

Do not let the variance goes down to zero, add small amount to diagonal of covariance estimate

機器學習筆記（Washington University）- Clustering Specialization-week four

++ blog isp special specified ring all png mat 1. Probabilistic clustering model (k-means) Hard assignments do not tell the full story,

機器學習筆記（Washington University）- Clustering Specialization-week six

with idea help gaussian cif big tar elong efi 1. Hierarchical clustering Avoid choosing number of clusters beforehand Dendrograms help v

機器學習筆記（Washington University）- Regression Specialization-week five

ril ... des stl it is idg evaluate date lec 1. Feature selection Sometimes, we need to decrease the number of features Efficiency: With f

機器學習筆記（Washington University）- Regression Specialization-week six

lar fec space cti different only similar ant var 1. Fit locally If the true model changes much, we want to fit our function locally to di

機器學習筆記（Washington University）- Classification Specialization-week 3

read was lowest already start choose class sort pty 1. Quality metric Quality metric for the desicion tree is the classification error er

機器學習筆記（Washington University）- Classification Specialization-week six & week 7

ges only end label rod eas point for lar 1. Precisoin and recall precision is how precise i am at showing good stuff on my website recall

機器學習筆記（十一）： TensorFlow實戰三（MNIST數字識別問題）

1 - MNIST數字識別問題前面介紹了這樣用TensorFlow訓練一個神經網路模型和主要考慮的問題及解決這些問題的常用方法。下面我們用一個實際的問題來驗證之前的解決方法。我們使用的是MNIST手寫數字識別資料集。在很多深度學習教程中，這個資料集都會被當做一個案例。 1.1

機器學習筆記（十五）：TensorFlow實戰七（經典卷積神經網路：VGG）

1 - 引言之前我們介紹了LeNet-5和AlexNet，在AlexNet發明之後，卷積神經網路的層數開始越來越複雜，VGG-16就是一個相對前面2個經典卷積神經網路模型層數明顯更多了。 VGGNet是牛津大學計算機視覺組（Visual Geometry Group）和Google

機器學習筆記（十四）：TensorFlow實戰六（經典卷積神經網路：AlexNet ）

1 - 引言 2012年，Imagenet比賽冠軍的model——Alexnet [2]（以第一作者alex命名）。這個網路算是一個具有突破性意義的模型首先它證明了CNN在複雜模型下的有效性，然後GPU實現使得訓練在可接受的時間範圍內得到結果，讓之後的網路模型構建變得更加複雜，並且通過

機器學習筆記（十二）：TensorFlow實戰四（影象識別與卷積神經網路）

1 - 卷積神經網路常用結構 1.1 - 卷積層我們先來介紹卷積層的結構以及其前向傳播的演算法。一個卷積層模組，包含以下幾個子模組：使用0擴充邊界(padding) 卷積視窗過濾器（filter）前向卷積反向卷積（可選） 1.1

機器學習筆記（二十）：TensorFlow實戰十二（TensorBoard視覺化）

1 - 引言前面已經介紹到TensorFlow可以實現許多非常常用的神經網路結構，有的網路結構十分複雜，裡面的引數關係更是難以管理。因此，TensorFlow提供了一個視覺化工具TensorBoard。可以有效的展示執行過程中的計算圖、各種指標隨著時間的變化趨勢以及訓練中使用到的影象等

機器學習筆記（十九）：TensorFlow實戰十一（多執行緒輸入資料）

1 - 引言為了加速模型訓練的時間，TensorFlow提供了一套多執行緒處理輸入資料的框架。下面我們來詳細的介紹如何使用多執行緒來加速我們的模型訓練速度 2 - 佇列與多執行緒在TensorFlow中，佇列和變數類似，我們可以修改它們的狀態。下面給出一個示例來展示如

機器學習筆記（十八）：TensorFlow實戰十（影象資料處理）

1 - 引言之前我們介紹了通過卷積神經網路可以給影象識別技術帶來突破性的進展，現在我們從影象的預處理這個角度來繼續提升我們影象識別的準確率。輸入的預處理需要使用TFRecord格式來同一不同的原始資料格式，並且更加有效的管理不同的屬性。並且TensorFlow支援影象處理函式，

機器學習筆記（十七）：TensorFlow實戰九（經典卷積神經網路：ResNet）

1 - 引言我們可以看到CNN經典模型的發展從 LeNet -5、AlexNet、VGG、再到Inception，模型的層數和複雜程度都有著明顯的提高，有些網路層數更是達到100多層。但是當神經網路的層數過高時，這些神經網路會變得更加難以訓練。一個特別大的麻煩就在於訓練的時候會產

機器學習筆記（十六）：TensorFlow實戰八（經典卷積神經網路：GoogLeNet）

1 - 引言 GoogLeNet, 在2014年ILSVRC挑戰賽獲得冠軍，將Top5 的錯誤率降低到6.67%. 一個22層的深度網路論文地址：http://arxiv.org/pdf/1409.4842v1.pdf 題目為：Going deeper with convolu

機器學習筆記（十六）：大規模機器學習

目錄 1）Learning with large datasets 2）Stochastic gradient descent 3）Mini-batch gradient descent 4）Stochastic gradient descent convergence 1）

機器學習筆記（十五）：推薦系統

目錄 1）Problem formulation 2）Content-based recommendations 3）Collaborative filtering 4）Collaborative filtering algorithm 5）Vectorization: Lo

機器學習筆記（十四）：異常檢測

目錄 1）Problem motivation 2）Gaussian distribution 3）Algorithm 4）Developing and evaluating an anomaly detection system 5）Anomaly detection vs

機器學習筆記（十二）：聚類

目錄 1）Unsupervised learning introduction 2）K-means algorithm 3）Optimization objective 4）Random initialization 5）Choosing the number of clus

機器學習筆記（十二）：TensorFlow實現四（影象識別與卷積神經網路）

1 - 卷積神經網路常用結構 1.1 - 卷積層我們先來介紹卷積層的結構以及其前向傳播的演算法。一個卷積層模組，包含以下幾個子模組：使用0擴充邊界(padding) 卷積視窗過濾器（filter）前向卷積反向卷積（可選） 1.1.2 - 邊界填充

機器學習筆記（Washington University）- Clustering Specialization-week four

相關推薦