【Python例項第17講】均值偏移聚類演算法

阿新 • • 發佈：2018-11-29

機器學習訓練營——機器學習愛好者的自由交流空間（qq 群號：696721295）

均值偏移(mean shift)是一個非引數特徵空間分析技術，用來尋找密度函式的最大值點。它的應用領域包括聚類分析和影象處理等。

均值偏移演算法

均值偏移是一個迭代地求密度函式極值點的方法。首先，從一個初始估計 $x$ 出發。這裡要給定一個核函式 $K$

( x i − x ) K(x_i-x)

K (x_{i} - x)

, 典型採用的是高斯核。核函式用來確定

x

的鄰近點的權，而這些鄰近點用來重新計算均值。這樣，在

x

點的密度的加權均值

$m (x$

) = ∑ x i ∈ N ( x ) K ( x i − x ) x i ∑ x i ∈ N ( x ) K ( x i − x ) m(x)=\dfrac{\sum_{x_i\in N(x)}K(x_i-x)x_i}{\sum_{x_i\in N(x)}K(x_i-x)}

m (x) = \frac{\sum _{x_{i} \in N (x)} K ( x _{i} - x ) x _{i}}{\sum _{x_{i} \in N (x)} K ( x _{i} - x )}

其中， $N(x)$ 是 $x_i$ 的鄰居集。稱

$m(x)-x$
是mean shift. 現在，升級 $x$ 的值為 $m(x)$ , 重複這個估計過程，直到 $m(x)$ 收斂。
以下是一個迭代過程的示意圖。
在這裡插入圖片描述

聚類應用

均值偏移聚類的目的是發現來自平滑密度的樣本團(‘blobs’). 它是一個基於質心的演算法，當質心的改變很小時，將停止搜尋。因此，它能夠自動設定類數，這是與k-means聚類法的顯著區別。當確定所有質心後，質心對應類。對於每一個樣本點，將它歸於距離最近的質心代表的類裡。

A demo example

import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets.samples_generator import make_blobs

# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.6)

# #############################################################################
# Compute clustering with MeanShift

# The following bandwidth can be automatically detected using
bandwidth = estimate_bandwidth(X, quantile=0.2, n_samples=500)

ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)

print("number of estimated clusters : %d" % n_clusters_)

# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle

plt.figure(1)
plt.clf()

colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
    my_members = labels == k
    cluster_center = cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
             markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

number of estimated clusters : 3

在這裡插入圖片描述

閱讀更多精彩內容，請關注微信公眾號：統計學習與大資料

【Python例項第17講】均值偏移聚類演算法

均值偏移演算法

聚類應用

A demo example

【Python例項第17講】均值偏移聚類演算法

【Python例項第18講】affinity propagation聚類演算法

【Python例項第8講】模型複雜度影響

【Python例項第7講】真實資料集的異常檢測

【Python例項第9講】物種分佈模型

【Python例項第15講】分類概率圖

【Python例項第14講】普通判別分析與縮水判別分析

【Python例項第13講】識別手寫數字

【Python例項第12講】譜系共聚類法

【Python例項第11講】文字的核外分類

【Python例項第10講】視覺化股票市場結構

【Python例項第20講】手寫數字識別問題的K-Means聚類

【Python例項第16講】特徵集聚

【Python例項第21講】確定類個數的silhouette分析法

【Python資料探勘課程】三.Kmeans聚類程式碼實現、作業及優化

【機器學習實戰】第10章 K-Means（K-均值）聚類演算法

【C++ Primer 第七章】隱式的類類型轉換

【神經網路】自編碼聚類演算法--DEC (Deep Embedded Clustering)

【機器學習】K-means聚類演算法初探

【OpenCV學習筆記 020】K-Means聚類演算法介紹及實現

【Python例項第17講】均值偏移聚類演算法

均值偏移演算法

聚類應用

A demo example

相關推薦