1. 程式人生 > 實用技巧 >Manifold learning of sklearn

Manifold learning of sklearn

Manifold learning

https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding

流形學習是一種非線性降維方法,演算法是基於一種想法,很多資料集的高緯度是人為製造的高,並不是真的高。

PCA 等是線性降維方法,這個是非線性方法。應對非線性問題。

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

  高維資料很難視覺化,需要降低為二三維。

最簡單的是隨機對映,丟掉資料有趣的結構。

PCA ICA LDA等線性降維方法,可以抓取到線性結構,丟掉非線性結構。

流形學習是 PCA 方法的泛化,對資料中的非線性結構更加敏感。

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

什麼是流形

https://www.cnblogs.com/jiangxinyang/p/9314256.html

流形學習的觀點:認為我們所能觀察到的資料實際上是由一個低維流行對映到高維空間的。由於資料內部特徵的限制,一些高維中的資料會產生維度上的冗餘,實際上這些資料只要比較低的維度就能唯一的表示。所以直觀上來講,一個流形好比是一個d維的空間,在一個m維的空間中m>d

被扭曲之後的結果。需要注意的是流形並不是一個形狀,而是一個空間。舉個例子來說,比如說一塊布,可以把它看成一個二維的平面,這是一個二維的空間,現在我們把它扭一扭(三維空間),它就變成了一個流形,當然不扭的時候,它也是一個流形,歐式空間是流形的一種特殊情況。如下圖所示

  

Swiss Roll reduction with LLE --- 區域性線性對映示例

https://scikit-learn.org/stable/auto_examples/manifold/plot_swissroll.html#sphx-glr-auto-examples-manifold-plot-swissroll-py

An illustration of Swiss Roll reduction with locally linear embedding

Out:

Computing LLE embedding
Done. Reconstruction error: 1.26177e-07
# Author: Fabian Pedregosa -- <[email protected]>
# License: BSD 3 clause (C) INRIA 2011

print(__doc__)

import matplotlib.pyplot as plt

# This import is needed to modify the way figure behaves
from mpl_toolkits.mplot3d import Axes3D
Axes3D

#----------------------------------------------------------------------
# Locally linear embedding of the swiss roll

from sklearn import manifold, datasets
X, color = datasets.make_swiss_roll(n_samples=1500)

print("Computing LLE embedding")
X_r, err = manifold.locally_linear_embedding(X, n_neighbors=12,
                                             n_components=2)
print("Done. Reconstruction error: %g" % err)

#----------------------------------------------------------------------
# Plot result

fig = plt.figure()

ax = fig.add_subplot(211, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)

ax.set_title("Original data")
ax = fig.add_subplot(212)
ax.scatter(X_r[:, 0], X_r[:, 1], c=color, cmap=plt.cm.Spectral)
plt.axis('tight')
plt.xticks([]), plt.yticks([])
plt.title('Projected data')
plt.show()

LocallyLinearEmbedding

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html#sklearn.manifold.LocallyLinearEmbedding

LLE找到更低維度資料對映,能夠保持區域性鄰居的距離。

可以人為是一系列的區域性主成分分析, 但是在全域性資料上比對,找出最好的非線性對映。

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods.

It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

Locally linear embedding can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding.

>>> from sklearn.datasets import load_digits
>>> from sklearn.manifold import LocallyLinearEmbedding
>>> X, _ = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> embedding = LocallyLinearEmbedding(n_components=2)
>>> X_transformed = embedding.fit_transform(X[:100])
>>> X_transformed.shape
(100, 2)