Chapter 7： Nonlinear Featurization via K-Means Model Stacking

阿新 • • 發佈：2018-12-22

本章講解內容涉及到2個知識點：“manifold learning”和“model stacking”。

manifold learning

manifold learning：與PCA不同（linear dimensionality reduction），實際上是一種nonlinear dimensionality reduction方法，利用manifold learning可以將如下的非線性特徵空間，unroll，從而達到“降維”的目的。manifold learning主要用於visulization中。

model stacking

1、model stacking 步驟
本章中的model stacking，base layer為“clustering”，top layer為“logistic regression”。
step1：將data分為train_data，test_data；
step2：將train_data分為2份，train_data_1和train_data_2，train_data_1用於訓練clustering，利用訓練好的clustering model，預測train_data_2的label。
step3：在將train_data_2投入clustering model後，可以採用model prediction的結果作為train_data_2的new feature，將其與original feature 進行合併，形成train_data_2的final feature。new feature的形式有2種：1）one-hot cluster feature，如：假如data point屬於j cluster,則new feature的形式為：除第j位為1，其它位均為0，vector的維度為k；2）利將data point與各個cluster centorid 距離的倒數形成的vector ，作為data point的new feature，如果cluster數量k較大，可選與其最近的前p個cluster的“距離的倒數”，形成vector，作為data point的new feature。
step4：將train_data_2的final feature作為top layer “logistic regression”的input，訓練logistic regression。
step5：當要預測test data的label時，首先將test data輸入clustering model，形成其final feature，然後，將其final label輸入logisitc regression，預測其label。
Note that

：在上述clustering fitting過程中，我們並不關注cluster的真正個數，我們只需要cover them。(Unlike in the classic clustering setup, we are not concerned with discovering the “true”number of clusters; we only need to cover them.)

2、key intuition for model stacking
Model Stacking has become an increasingly popular technique in recent years. Nonlinear classifiers are expensive to train and maintain. The key intuition with stacking is to push the nonlinearities into the features and use a very simple, usually linear model as the last layer. The featurizer can be trained offline, which means that one can use expensive models that require more computation power or memory but generate useful features. The simple model at the top level can be quickly adapted to the changing distributions of online data. This is a great trade-off between accuracy and speed, and this strategy is often used in applications like targeted advertising that require fast adaptation to changing data distributions。

Confused：

If the data is distributed uniformly throughout the space, then picking the right k boils down to a sphere-packing problem. In d dimensions, one could fit roughly 1/r^d spheres of radius r. Each k-means cluster is a sphere, and the radius is the maximum error of representing points in that sphere with the centroid. So, if we are willing to tolerate a maximum approximation error of r per data point, then the number of clusters is O(1/r^d

), where d is the dimension of the original feature space of the data.

Chapter 7： Nonlinear Featurization via K-Means Model Stacking

manifold learning

model stacking

Confused：

Chapter 7： Nonlinear Featurization via K-Means Model Stacking

機器學習十大經典演算法：（2）k-means演算法

AI-019: 練習：Image compression with K-means

機器學習系列：（六）K-Means聚類

機器學習演算法之二：5分鐘上手K-Means

《Java 8 in Action》Chapter 7：並行資料處理與效能

聚類：層次聚類、基於劃分的聚類（k-means）、基於密度的聚類、基於模型的聚類

CS229 Machine Learning學習筆記:Note 7(K-means聚類、高斯混合模型、EM算法)

數學模型：3.非監督學習--聚類分析和K-means聚類

第九次作業---K-means演算法應用：圖片壓縮

K-means演算法應用：圖片壓縮

K-means算法應用：圖片壓縮

第九次作業——K-means演算法應用：圖片壓縮

學習筆記（十一）：使用K-Means演算法檢測DGA域名

作業：K-means算法應用：圖片壓縮

作業：K-means演算法應用：圖片壓縮

機器學習筆記（七）：K-Means

ml課程：聚類概述及K-means講解（含程式碼實現）

K-means-：在聚類時發現異常

吳恩達機器學習第七次作業Part1： K-means聚類演算法

Chapter 7： Nonlinear Featurization via K-Means Model Stacking

manifold learning

model stacking

Confused：

相關推薦