MATLAB資料降維工具箱drtoolbox介紹
1.MATLAB drtoolbox 介紹
The Matlab Toolbox for Dimensionality Reduction contains Matlab implementations of 38 techniques for dimensionality reduction and metric learning.
官方網站:
ThisMatlab toolbox implements 32 techniques for dimensionality reduction. Thesetechniques are all available through the COMPUTE_MAPPING function or trhoughthe GUI. The following techniques are available:
- Principal Component Analysis ('PCA')
- Linear Discriminant Analysis ('LDA')
- Multidimensional scaling ('MDS')
- Probabilistic PCA ('ProbPCA')
- Factor analysis ('FactorAnalysis')
- Sammon mapping ('Sammon')
- Isomap ('Isomap')
- Landmark Isomap ('LandmarkIsomap')
- Locally Linear Embedding ('LLE')
- Laplacian Eigenmaps ('Laplacian')
- Hessian LLE ('HessianLLE')
- Local Tangent Space Alignment ('LTSA')
- Diffusion maps ('DiffusionMaps')
- Kernel PCA ('KernelPCA')
- Generalized Discriminant Analysis('KernelLDA')
- Stochastic Neighbor Embedding ('SNE')
- Symmetric Stochastic Neighbor Embedding('SymSNE')
- t-Distributed Stochastic Neighbor Embedding('tSNE')
- Neighborhood Preserving Embedding ('NPE')
- Linearity Preserving Projection ('LPP')
- Stochastic Proximity Embedding ('SPE')
- Linear Local Tangent Space Alignment('LLTSA')
- Conformal Eigenmaps ('CCA', implemented asan extension of LLE)
- Maximum Variance Unfolding ('MVU',implemented as an extension of LLE)
- Landmark Maximum Variance Unfolding('LandmarkMVU')
- Fast Maximum Variance Unfolding ('FastMVU')
- Locally Linear Coordination ('LLC')
- Manifold charting ('ManifoldChart')
- Coordinated Factor Analysis ('CFA')
- Gaussian Process Latent Variable Model('GPLVM')
- Autoencoders using stack-of-RBMs pretraining('AutoEncoderRBM')
- Autoencoders using evolutionary optimization('AutoEncoderEA')
Furthermore,the toolbox contains 6 techniques for intrinsic dimensionality estimation.These techniques are available through the function INTRINSIC_DIM. Thefollowing techniques are available:
- Eigenvalue-based estimation ('EigValue')
- Maximum Likelihood Estimator ('MLE')
- Estimator based on correlation dimension('CorrDim')
- Estimator based on nearest neighborevaluation ('NearNb')
- Estimator based on packing numbers('PackingNumbers')
- Estimator based on geodesic minimum spanningtree ('GMST')
Inaddition to these techniques, the toolbox contains functions for prewhiteningof data (the function PREWHITEN), exact and estimate out-of-sample extension(the functions OUT_OF_SAMPLE and OUT_OF_SAMPLE_EST), and a function thatgenerates toy datasets (the function GENERATE_DATA).
Thegraphical user interface of the toolbox is accessible through the DRGUIfunction.
2.安裝
將下載好的drtoolbox工具包解壓到指定目錄:D:\MATLAB\R2012b\toolbox
找到' D:\MATLAB\R2012b\toolbox\local\pathdef.m'檔案,開啟,並把路徑新增到該檔案中,儲存。
執行rehash toolboxcache 命令,完成工具箱載入
>>rehashtoolboxcache
測試
>>what drtoolbox
3.工具箱說明
資料降維基本原理是將樣本點從輸入空間通過線性或非線性變換對映到一個低維空間,從而獲得一個關於原資料集緊緻的低維表示。
演算法基本分類:
線性/非線性
線性降維是指通過降維所得到的低維資料能保持高維資料點之間的線性關係。線性降維方法主要包括PCA、LDA、LPP(LPP其實是LaplacianEigenmaps的線性表示);非線性降維一類是基於核的,如KPCA,此處暫不討論;另一類就是通常所說的流形學習:從高維取樣資料中恢復出低維流形結構(假設資料是均勻取樣於一個高維歐式空間中的低維流形),即找到高維空間中的低維流形,並求出相應的嵌入對映。非線性流形學習方法有:Isomap、LLE、LaplacianEigenmaps、LTSA、MVU
整體來說,線性方法計算塊,複雜度低,但對複雜的資料降維效果較差。
監督/非監督
監督式和非監督式學習的主要區別在於資料樣本是否存在類別資訊。非監督降維方法的目標是在降維時使得資訊的損失最小,如PCA、LPP、Isomap、LLE、LaplacianEigenmaps、LTSA、MVU;監督式降維方法的目標是最大化類別間的辨別信,如LDA。事實上,對於非監督式降維演算法,都有相應的監督式或半監督式方法的研究。
全域性/區域性
區域性方法僅考慮樣品集合的區域性資訊,即資料點與臨近點之間的關係。區域性方法以LLE為代表,還包括LaplacianEigenmaps、LPP、LTSA。
全域性方法不僅考慮樣本幾何的區域性資訊,和考慮樣本集合的全域性資訊,及樣本點與非臨近點之間的關係。全域性演算法有PCA、LDA、Isomap、MVU。
由於區域性方法並不考慮資料流形上相距較遠的樣本之間的關係,因此,區域性方法無法達到“使在資料流形上相距較遠的樣本的特徵也相距較遠”的目的。
4.工具箱使用工具箱提供給使用者使用的介面函式都在與這個Readme檔案同路徑的目錄,主要包括如下檔案:
使用例項:
clc
clear
close all
% 產生測試資料
[X, labels] = generate_data('helix', 2000);
figure
scatter3(X(:,1), X(:,2), X(:,3), 5, labels)
title('Original dataset')
drawnow
% 估計本質維數
no_dims = round(intrinsic_dim(X, 'MLE'));
disp(['MLE estimate of intrinsic dimensionality: ' num2str(no_dims)]);
% PCA降維
[mappedX, mapping] = compute_mapping(X, 'PCA', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels)
title('Result of PCA')
% Laplacian降維
[mappedX, mapping] = compute_mapping(X, 'Laplacian', no_dims, 7);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Laplacian Eigenmaps')
drawnow
% Isomap降維
[mappedX, mapping] = compute_mapping(X, 'Isomap', no_dims);
figure
scatter(mappedX(:,1), mappedX(:,2), 5, labels(mapping.conn_comp))
title('Result of Isomap')
drawnow