MATLAB資料降維工具箱drtoolbox介紹

阿新 • • 發佈：2018-12-31

1.MATLAB drtoolbox 介紹

The Matlab Toolbox for Dimensionality Reduction contains Matlab implementations of 38 techniques for dimensionality reduction and metric learning.

官方網站：

ThisMatlab toolbox implements 32 techniques for dimensionality reduction. Thesetechniques are all available through the COMPUTE_MAPPING function or trhoughthe GUI. The following techniques are available:

- Principal Component Analysis ('PCA')

- Linear Discriminant Analysis ('LDA')

- Multidimensional scaling ('MDS')

- Probabilistic PCA ('ProbPCA')

- Factor analysis ('FactorAnalysis')

- Sammon mapping ('Sammon')

- Isomap ('Isomap')

- Landmark Isomap ('LandmarkIsomap')

- Locally Linear Embedding ('LLE')

- Laplacian Eigenmaps ('Laplacian')

- Hessian LLE ('HessianLLE')

- Local Tangent Space Alignment ('LTSA')

- Diffusion maps ('DiffusionMaps')

- Kernel PCA ('KernelPCA')

- Generalized Discriminant Analysis('KernelLDA')

- Stochastic Neighbor Embedding ('SNE')

- Symmetric Stochastic Neighbor Embedding('SymSNE')

- t-Distributed Stochastic Neighbor Embedding('tSNE')

- Neighborhood Preserving Embedding ('NPE')

- Linearity Preserving Projection ('LPP')

- Stochastic Proximity Embedding ('SPE')

- Linear Local Tangent Space Alignment('LLTSA')

- Conformal Eigenmaps ('CCA', implemented asan extension of LLE)

- Maximum Variance Unfolding ('MVU',implemented as an extension of LLE)

- Landmark Maximum Variance Unfolding('LandmarkMVU')

- Fast Maximum Variance Unfolding ('FastMVU')

- Locally Linear Coordination ('LLC')

- Manifold charting ('ManifoldChart')

- Coordinated Factor Analysis ('CFA')

- Gaussian Process Latent Variable Model('GPLVM')

- Autoencoders using stack-of-RBMs pretraining('AutoEncoderRBM')

- Autoencoders using evolutionary optimization('AutoEncoderEA')

Furthermore,the toolbox contains 6 techniques for intrinsic dimensionality estimation.These techniques are available through the function INTRINSIC_DIM. Thefollowing techniques are available:

- Eigenvalue-based estimation ('EigValue')

- Maximum Likelihood Estimator ('MLE')

- Estimator based on correlation dimension('CorrDim')

- Estimator based on nearest neighborevaluation ('NearNb')

- Estimator based on packing numbers('PackingNumbers')

- Estimator based on geodesic minimum spanningtree ('GMST')

Inaddition to these techniques, the toolbox contains functions for prewhiteningof data (the function PREWHITEN), exact and estimate out-of-sample extension(the functions OUT_OF_SAMPLE and OUT_OF_SAMPLE_EST), and a function thatgenerates toy datasets (the function GENERATE_DATA).

Thegraphical user interface of the toolbox is accessible through the DRGUIfunction.

2.安裝

將下載好的drtoolbox工具包解壓到指定目錄：D:\MATLAB\R2012b\toolbox

找到' D:\MATLAB\R2012b\toolbox\local\pathdef.m'檔案，開啟，並把路徑新增到該檔案中，儲存。

執行rehash toolboxcache 命令，完成工具箱載入

>>rehashtoolboxcache

測試

>>what drtoolbox

3.工具箱說明

資料降維基本原理是將樣本點從輸入空間通過線性或非線性變換對映到一個低維空間，從而獲得一個關於原資料集緊緻的低維表示。

演算法基本分類：

線性/非線性

線性降維是指通過降維所得到的低維資料能保持高維資料點之間的線性關係。線性降維方法主要包括PCA、LDA、LPP（LPP其實是LaplacianEigenmaps的線性表示）；非線性降維一類是基於核的，如KPCA，此處暫不討論；另一類就是通常所說的流形學習：從高維取樣資料中恢復出低維流形結構（假設資料是均勻取樣於一個高維歐式空間中的低維流形），即找到高維空間中的低維流形，並求出相應的嵌入對映。非線性流形學習方法有：Isomap、LLE、LaplacianEigenmaps、LTSA、MVU

整體來說，線性方法計算塊，複雜度低，但對複雜的資料降維效果較差。

監督/非監督

監督式和非監督式學習的主要區別在於資料樣本是否存在類別資訊。非監督降維方法的目標是在降維時使得資訊的損失最小，如PCA、LPP、Isomap、LLE、LaplacianEigenmaps、LTSA、MVU；監督式降維方法的目標是最大化類別間的辨別信，如LDA。事實上，對於非監督式降維演算法，都有相應的監督式或半監督式方法的研究。

全域性/區域性

區域性方法僅考慮樣品集合的區域性資訊，即資料點與臨近點之間的關係。區域性方法以LLE為代表，還包括LaplacianEigenmaps、LPP、LTSA。

全域性方法不僅考慮樣本幾何的區域性資訊，和考慮樣本集合的全域性資訊，及樣本點與非臨近點之間的關係。全域性演算法有PCA、LDA、Isomap、MVU。

由於區域性方法並不考慮資料流形上相距較遠的樣本之間的關係，因此，區域性方法無法達到“使在資料流形上相距較遠的樣本的特徵也相距較遠”的目的。

4.工具箱使用

工具箱提供給使用者使用的介面函式都在與這個Readme檔案同路徑的目錄，主要包括如下檔案：

使用例項：

clc
clear
close all

% 產生測試資料
[X, labels] = generate_data('helix', 2000);
figure
scatter3(X(:,1), X(:,2), X(:,3), 5, labels)
title('Original dataset')
drawnow

% 估計本質維數
no_dims = round(intrinsic_dim(X, 'MLE'));
disp(['MLE estimate of intrinsic dimensionality: ' num2str(no_dims)]);

% PCA降維
[mappedX, mapping] = compute_mapping(X, 'PCA', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels)
title('Result of PCA')

% Laplacian降維
[mappedX, mapping] = compute_mapping(X, 'Laplacian', no_dims, 7);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Laplacian Eigenmaps')
drawnow

% Isomap降維
[mappedX, mapping] = compute_mapping(X, 'Isomap', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Isomap')
drawnow

MATLAB資料降維工具箱drtoolbox介紹

MATLAB資料降維工具箱drtoolbox介紹

MATLAB自帶工具箱實現PCA降維程式碼,著重介紹實現方法

人臉識別中用主成分分析PCA來將資料降維--MATLAB程式碼

【火爐煉AI】機器學習053-資料降維絕招-PCA和核PCA

資料降維（Dimension Reduction）

機器學習——資料降維

python資料預處理：資料降維

資料降維(Dimensionality reduction)

特徵工程-資料降維

機器學習-3.資料特徵預處理與資料降維

神經網路中embedding層作用——本質就是word2vec，資料降維，同時可以很方便計算同義詞（各個word之間的距離），底層實現是2-gram（詞頻）+神經網路

機器學習實戰（Machine Learning in Action）學習筆記————10.奇異值分解(SVD)原理、基於協同過濾的推薦引擎、資料降維

[Keras深度學習淺嘗]實戰五·使用DNN自編碼器實現聚類操作資料降維

資料降維(四)ISOMAP

資料降維(三)PCA主成分分析

資料降維(一)基礎篇

資料降維(二)多維縮放MDS

python大戰機器學習——資料降維

資料降維方法及Python實現

資料降維的作用PCA與LDA

MATLAB資料降維工具箱drtoolbox介紹

相關推薦