Manifold learning of sklearn

阿新 • • 發佈：2021-01-12

Manifold learning

https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding

流形學習是一種非線性降維方法，演算法是基於一種想法，很多資料集的高緯度是人為製造的高，並不是真的高。

PCA 等是線性降維方法，這個是非線性方法。應對非線性問題。

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

　　高維資料很難視覺化，需要降低為二三維。

最簡單的是隨機對映，丟掉資料有趣的結構。

PCA ICA LDA等線性降維方法，可以抓取到線性結構，丟掉非線性結構。

流形學習是 PCA 方法的泛化，對資料中的非線性結構更加敏感。

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as Principal Component Analysis (PCA), Independent Component Analysis, Linear Discriminant Analysis, and others. These algorithms define specific rubrics to choose an “interesting” linear projection of the data. These methods can be powerful, but often miss important non-linear structure in the data.

Manifold Learning can be thought of as an attempt to generalize linear frameworks like PCA to be sensitive to non-linear structure in data. Though supervised variants exist, the typical manifold learning problem is unsupervised: it learns the high-dimensional structure of the data from the data itself, without the use of predetermined classifications.

什麼是流形

https://www.cnblogs.com/jiangxinyang/p/9314256.html

流形學習的觀點：認為我們所能觀察到的資料實際上是由一個低維流行對映到高維空間的。由於資料內部特徵的限制，一些高維中的資料會產生維度上的冗餘，實際上這些資料只要比較低的維度就能唯一的表示。所以直觀上來講，一個流形好比是一個

被扭曲之後的結果。需要注意的是流形並不是一個形狀，而是一個空間。舉個例子來說，比如說一塊布，可以把它看成一個二維的平面，這是一個二維的空間，現在我們把它扭一扭（三維空間），它就變成了一個流形，當然不扭的時候，它也是一個流形，歐式空間是流形的一種特殊情況。如下圖所示

　　

Swiss Roll reduction with LLE --- 區域性線性對映示例

https://scikit-learn.org/stable/auto_examples/manifold/plot_swissroll.html#sphx-glr-auto-examples-manifold-plot-swissroll-py

An illustration of Swiss Roll reduction with locally linear embedding

Out:
Computing LLE embedding
Done. Reconstruction error: 1.26177e-07

# Author: Fabian Pedregosa -- <[email protected]>
# License: BSD 3 clause (C) INRIA 2011

print(__doc__)

import matplotlib.pyplot as plt

# This import is needed to modify the way figure behaves
from mpl_toolkits.mplot3d import Axes3D
Axes3D

#----------------------------------------------------------------------
# Locally linear embedding of the swiss roll

from sklearn import manifold, datasets
X, color = datasets.make_swiss_roll(n_samples=1500)

print("Computing LLE embedding")
X_r, err = manifold.locally_linear_embedding(X, n_neighbors=12,
                                             n_components=2)
print("Done. Reconstruction error: %g" % err)

#----------------------------------------------------------------------
# Plot result

fig = plt.figure()

ax = fig.add_subplot(211, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)

ax.set_title("Original data")
ax = fig.add_subplot(212)
ax.scatter(X_r[:, 0], X_r[:, 1], c=color, cmap=plt.cm.Spectral)
plt.axis('tight')
plt.xticks([]), plt.yticks([])
plt.title('Projected data')
plt.show()

LocallyLinearEmbedding

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.LocallyLinearEmbedding.html#sklearn.manifold.LocallyLinearEmbedding

LLE找到更低維度資料對映，能夠保持區域性鄰居的距離。

可以人為是一系列的區域性主成分分析，但是在全域性資料上比對，找出最好的非線性對映。

Locally linear embedding (LLE) seeks a lower-dimensional projection of the data which preserves distances within local neighborhoods.

It can be thought of as a series of local Principal Component Analyses which are globally compared to find the best non-linear embedding.

Locally linear embedding can be performed with function locally_linear_embedding or its object-oriented counterpart LocallyLinearEmbedding.

>>> from sklearn.datasets import load_digits
>>> from sklearn.manifold import LocallyLinearEmbedding
>>> X, _ = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> embedding = LocallyLinearEmbedding(n_components=2)
>>> X_transformed = embedding.fit_transform(X[:100])
>>> X_transformed.shape
(100, 2)

Manifold learning of sklearn

Manifold learning https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding 流形學習是一種非線性降維方法，演算法是基於一種想法，很多資料集的高緯度是人為製造的高，並不是真的高。

statistical learning -- Unsupervised learning of sklearn

Unsupervised learning https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_learning.html

statistical learning - supervised_learning of sklearn

統計學習 https://scikit-learn.org/stable/tutorial/statistical_inference/index.html 資料量不停增加，增加了機器學習的重要性。

datasets of sklearn

datasets sklearn提供了一些內建的小的玩具資料。也可以載入外部的一些資料。

Confusion Matrix of sklearn

Confusion Matrix https://machinelearningmastery.com/confusion-matrix-machine-learning/ 混淆矩陣是一種總結分類演算法效能的技術。

Classification report of sklearn

Classification report The classification_report function builds a text report showing the main classification metrics. Here is a small example with custom target_names and inferred labels:

Transforming the prediction target of sklearn

concept https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets 對於監督性學習，其目標值需要進行轉化，才能作為模型的目標，或者更加有效地適應模型。

multilabel of sklearn

multilabel https://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification 多標記，對於一個樣本資料，多個可能的標籤。

Multiclass and multioutput overview of sklearn

Multiclass and multioutput algorithms https://scikit-learn.org/stable/modules/multiclass.html# sklearn 支援如下典型型別學習

Visualizing the stock market structure of sklearn

Visualizing the stock market structure https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#stock-market

Column Transformer with Mixed Types -- of sklearn

Column Transformer with Mixed Types https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

Column Transformer with Heterogeneous Data Sources -- of sklearn

Column Transformer with Heterogeneous Data Sources https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer.html#sphx-glr-auto-examples-compose-plot-column-transformer-py

Feature extraction of sklearn

Feature extraction https://scikit-learn.org/stable/modules/feature_extraction.html 從文字或圖片的資料集中提取出機器學習支援的資料格式。

Sample pipeline for text feature extraction and evaluation of sklearn

Sample pipeline for text feature extraction and evaluation https://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection

Clustering text documents using k-means of sklearn

Clustering text documents using k-means https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html#sphx-glr-auto-examples-text-plot-document-clustering-py

2021_PCL 《Prototypical Contrastive Learning of Unsupervised Representations》

　　論文標題：Prototypical Contrastive Learning of Unsupervised Representations 　　論文方向：影象領域，提出原型對比學習，效果遠超MoCo和SimCLR

無監督-TOPOTER: UNSUPERVISED LEARNING OF TOPOLOGY TRANSFORMATION EQUIVARIANT REPRESENTATIONS

無監督-TOPOTER: UNSUPERVISED LEARNING OF TOPOLOGY TRANSFORMATION EQUIVARIANT REPRESENTATIONS 標籤：無監督學習、圖神經網路

Discriminative Learning of Deep Convolutional Feature Point Descriptors 論文閱讀筆記

Siamese network 與影象特徵描述符目錄介紹資料集及模型結構Last 介紹該文提出一種基於深度學習的特徵描述方法，並且對尺度變化、影象旋轉、透射變換、非剛性變形、光照變化等具有很好的魯棒性。該演算法的整

論文翻譯：Speeding Learning of Personalized Audio Equalization

#論文翻譯： # Speeding Learning of Personalized Audio Equalization # ## Abstract ## 音訊均衡器(eq)可能是音訊製作中最常用的工具。SocialEQ專案是一個基於網路的個性化音訊均衡系統，它使用了標準方法的替代介

HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models

論文傳送門作者香港科技大學 Qianwen WangHuamin Qu 牛津大學 William AlexanderJack PeggMin Chen

Manifold learning of sklearn

Manifold learning

什麼是流形

Swiss Roll reduction with LLE --- 區域性線性對映示例

LocallyLinearEmbedding

相關推薦