sklearn:使用完全隨機樹進行雜湊特徵轉換

阿新 • • 發佈：2018-12-07

RandomTreesEmbedding提供了一種將資料對映到非常高維，稀疏表示的方法，這可能有利於分類。對映完全不受監督且非常有效。此示例視覺化由多個樹給出的分割槽，並顯示轉換如何也可用於非線性降維或非線性分類。

相鄰的點通常共享樹的相同葉子，因此共享其散列表示的大部分。這允許簡單地基於變換資料的主要分量來分離兩個同心圓。

在高維空間中，線性分類器通常可以實現極佳的精度。對於稀疏二進位制資料，BernoulliNB特別適合。底行將BernoulliNB在轉換空間中獲得的決策邊界與在原始資料上學習的ExtraTreesClassifier森林進行比較。

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_circles
from sklearn.ensemble import RandomTreesEmbedding, ExtraTreesClassifier
from sklearn.decomposition import TruncatedSVD
from sklearn.naive_bayes import BernoulliNB

# make a synthetic dataset
X, y = make_circles(factor=0.5, random_state=0, noise=0.05)

# use RandomTreesEmbedding to transform data
hasher = RandomTreesEmbedding(n_estimators=10, random_state=0, max_depth=3)
X_transformed = hasher.fit_transform(X)

# Visualize result using PCA
pca = TruncatedSVD(n_components=2)
X_reduced = pca.fit_transform(X_transformed)

# Learn a Naive Bayes classifier on the transformed data
nb = BernoulliNB()
nb.fit(X_transformed, y)


# Learn an ExtraTreesClassifier for comparison
trees = ExtraTreesClassifier(max_depth=3, n_estimators=10, random_state=0)
trees.fit(X, y)


# scatter plot of original and reduced data
fig = plt.figure(figsize=(9, 8))

ax = plt.subplot(221)
ax.scatter(X[:, 0], X[:, 1], c=y, s=50)
ax.set_title("Original Data (2d)")
ax.set_xticks(())
ax.set_yticks(())

ax = plt.subplot(222)
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, s=50)
ax.set_title("PCA reduction (2d) of transformed data (%dd)" %
             X_transformed.shape[1])
ax.set_xticks(())
ax.set_yticks(())

# Plot the decision in original space. For that, we will assign a color to each
# point in the mesh [x_min, m_max] x [y_min, y_max].
h = .01
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# transform grid using RandomTreesEmbedding
transformed_grid = hasher.transform(np.c_[xx.ravel(), yy.ravel()])
y_grid_pred = nb.predict_proba(transformed_grid)[:, 1]

ax = plt.subplot(223)
ax.set_title("Naive Bayes on Transformed data")
ax.pcolormesh(xx, yy, y_grid_pred.reshape(xx.shape))
ax.scatter(X[:, 0], X[:, 1], c=y, s=50)
ax.set_ylim(-1.4, 1.4)
ax.set_xlim(-1.4, 1.4)
ax.set_xticks(())
ax.set_yticks(())

# transform grid using ExtraTreesClassifier
y_grid_pred = trees.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

ax = plt.subplot(224)
ax.set_title("ExtraTrees predictions")
ax.pcolormesh(xx, yy, y_grid_pred.reshape(xx.shape))
ax.scatter(X[:, 0], X[:, 1], c=y, s=50)
ax.set_ylim(-1.4, 1.4)
ax.set_xlim(-1.4, 1.4)
ax.set_xticks(())
ax.set_yticks(())

plt.tight_layout()
plt.show()

sklearn:使用完全隨機樹進行雜湊特徵轉換

sklearn:使用完全隨機樹進行雜湊特徵轉換

二叉樹和雜湊表的優缺點對比與選擇

djb2：一個產生簡單的隨機分佈的雜湊函式

URAL 1989（線段樹+字串雜湊）

資料結構期末複習知識查漏補缺並配（帶詳解的）查漏習題（B樹，雜湊（雜湊），平衡二叉樹，KMP）

資料庫索引（B樹，B+樹，雜湊）

POJ 2503 Babelfish（STL、二分、字典樹、雜湊）

sklearn：使用樹集合進行特徵轉換

資料結構學習---線性表;樹;雜湊表進行查詢的區別

查詢演算法淺談演算法和資料結構: 七二叉查詢樹淺談演算法和資料結構: 十一雜湊表

51Nod1553 週期串查詢字串雜湊線段樹

27-集合--Set及其子類（HashSet+LinkedHashSet+TreeSet）+二叉樹+Comparable+Comparator+雜湊表+HashSet儲存自定義物件+判斷元素唯一的方式

線段樹+雜湊【CF580E】Kefa and Watch

演算法導論第十一章：散列表筆記（直接定址表、散列表、通過連結法解決碰撞、雜湊函式、開放定址法、完全雜湊）

自負雜湊，字典樹——Message Flood（未解決）

Java對字串資料進行MD5/SHA1雜湊雜湊運算

https是如何加密的（知道了原理之後，希望自己能用程式碼實現一下，還有用於對個人資訊和公鑰進行加密的雜湊演算法，有時間也去查一下）

資料結構和演算法精講版（陣列、棧、佇列、連結串列、遞迴、排序、二叉樹、紅黑樹、堆、雜湊表）Java版

深入理解hashmap（三）雜湊表和二叉搜尋樹的恩怨情仇

【樹雜湊】CF763D Timofey and a flat tree

sklearn:使用完全隨機樹進行雜湊特徵轉換

相關推薦