機器學習—聚類5-3(DBSCAN演算法)
阿新 • • 發佈:2022-03-15
使用DBSCAN對環形資料做聚類
主要步驟流程:
- 1. 匯入包
- 2. 生成資料並可視化
-
3. 使用DBSCAN做聚類並可視化
- 3.1 引數組合1
- 3.2 引數組合2
- 3.3 引數組合3
- 3.4 引數組合4
- 4. 使用K-Means做聚類並可視化
- 5. 列印調整蘭德指數
1. 匯入包
In [2]:# 匯入包
import numpy as np
import matplotlib.pyplot as plt
2. 生成資料並可視化
In [3]:# 生成資料
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=750, factor=0.3, noise=0.1, random_state=1)
In [4]:
# 視覺化資料
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y)
Out[4]:
<matplotlib.collections.PathCollection at 0x1a6d5026b08>
In [5]:
X.shapeOut[5]:
(750, 2)
In [6]:
y.shape
Out[6]:
(750,)
In [7]:
y
Out[7]:
array([1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0], dtype=int64)
3. 使用DBSCAN做聚類並可視化
3.1 引數組合1
In [8]:# 使用DBSCAN做聚類
from sklearn.cluster import DBSCAN
y_dbscan_pred = DBSCAN(eps=0.05, min_samples=10).fit_predict(X)
In [9]:
# 視覺化DBSCAN聚類效果
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y_dbscan_pred)
Out[9]:
<matplotlib.collections.PathCollection at 0x1a6d6603048>
3.2 引數組合2
In [10]:# 使用DBSCAN做聚類
from sklearn.cluster import DBSCAN
y_dbscan_pred = DBSCAN(eps=0.15, min_samples=10).fit_predict(X)
In [11]:
# 視覺化DBSCAN聚類效果
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y_dbscan_pred)
Out[11]:
<matplotlib.collections.PathCollection at 0x1a6d666a9c8>
3.3 引數組合3
In [12]:# 使用DBSCAN做聚類
from sklearn.cluster import DBSCAN
y_dbscan_pred = DBSCAN(eps=0.3, min_samples=10).fit_predict(X)
In [13]:
# 視覺化DBSCAN聚類效果
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y_dbscan_pred)
Out[13]:
<matplotlib.collections.PathCollection at 0x1a6d66d1108>
3.4 引數組合4
In [14]:# 使用DBSCAN做聚類
from sklearn.cluster import DBSCAN
y_dbscan_pred = DBSCAN(eps=0.15, min_samples=50).fit_predict(X)
In [15]:
# 視覺化DBSCAN聚類效果
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y_dbscan_pred)
Out[15]:
<matplotlib.collections.PathCollection at 0x1a6d6734388>
4. 使用K-Means做聚類並可視化
In [16]:# 使用K-Means做聚類
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 2, init = 'k-means++', n_init=10, max_iter=300, random_state = 0)
kmeans.fit(X)
y_kmeans_pred = kmeans.predict(X)
In [17]:
# 視覺化K-means聚類效果
plt.figure()
plt.scatter(X[:,0], X[:,1], c=y_kmeans_pred)
Out[17]:
<matplotlib.collections.PathCollection at 0x1a6dcbb6408>
5. 列印調整蘭德指數
In [18]:# 使用DBSCAN做聚類(呼叫最優的引數組合)
from sklearn.cluster import DBSCAN
y_dbscan_pred = DBSCAN(eps=0.15, min_samples=10).fit_predict(X)
由於樣本資料有標籤,所以可以用調整蘭德指數衡量模型效能
In [19]:# 打印出調整蘭德指數
from sklearn import metrics
print("DBSCAN的調整蘭德指數是:%0.3f" % metrics.adjusted_rand_score(y, y_dbscan_pred))
print("K-Means的調整蘭德指數是:%0.3f" % metrics.adjusted_rand_score(y, y_kmeans_pred))
DBSCAN的調整蘭德指數是:0.961
K-Means的調整蘭德指數是:-0.001
由打印出的蘭德係數可見,DBSCAN演算法的效果遠遠優於K-Means演算法。