PCA（主成分分析）方法資料降維、重構和人臉識別

阿新 • • 發佈：2019-02-17

本文使用matlab採用PCA完成對資料的降維、重構和人臉識別。

我眼中的PCA：

資料的維數過高，處理起來耗時又費力，於是就在想我能不能只處理部分維數，並且得到的結果與全部維數的結果一致。噹噹噹，PCA就出爐了。簡單來說，就是一個圖片有2000個特徵維度，而實際上只有其中100維（甚至更少），對結果的影響起著巨大的作用。

eg:對於皇帝來說，內閣首輔>二輔>三輔>四輔>>其他不知名官員。所以對於皇帝來說，整個內閣所提供的有效治國方略的所佔比可以看作是60%，整個文官階級可以看作是75%，武官階級20%，平民百姓5%。也就是說雖然老百姓人挺多的，但是提供的治國方案很少，所以認為可以選擇性忽略掉他們的提議。再其次，可以忽略武官、文官。。

總結一下就是，我們只關注影響最大的特徵維度，放棄掉影響力不足的特徵維度。

PCA思路流程如下：

1、減去均值，中心化

2、計算協方差矩陣

3、選取特徵值和特徵向量

4、訓練集轉換到特徵向量構成的向量空間中完成降維

5、測試集乘以特徵向量的轉置，再加上去中心化的均值以完成重構

6、識別：選取每個人的一張照片做登記記錄，減去均值，乘以降維陣（即特徵向量），並將記錄集在降維陣中的值記錄下來。遍歷圖片庫，並對照片做同樣的處理。取圖片在降維陣中的值與記錄集的值最小歐式距離的圖片所屬人，為該圖片的所屬分類。

Matlab程式碼如下：

    %% 讀入圖片    
    clear ; close all; clc    
    %m = 1680; % number of samples    
    trainset = zeros(10, 50 * 40); % 圖片大小 50 * 40    
    file_path =  'C:\Users\zyfls\Desktop\ML\第五章資料降維\資料\AR\AR\';% 影象資料夾路徑    
    img_path_list = dir(strcat(file_path,'*.bmp'));%獲取該資料夾中所有bmp格式的影象    
    img_num = length(img_path_list);%獲取影象總數量    
    for i = 10: img_num  %取出去前十張照片之外做為訓練集，前十張作為測試        
      image_name = img_path_list(i).name;% 影象名    
    end    
      
    %% before training PCA, do feature normalization  
    mu = mean(trainset);%mean函式用來求 沿陣列中不同維的元素的平均值。
    trainset_norm = bsxfun(@minus, trainset, mu);%訓練集減去平均值
      
    sigma = std(trainset_norm); %std 計算標準差
    trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);  %trainset_norm 點除 sigma(標準差)
      
    %% we could save the mean face mu to take a look the mean face  
    imwrite(uint8(reshape(mu, 50, 40)), 'C:\Users\zyfls\Desktop\ML各種截圖\5\亂七八糟PCA\meanface.bmp');  
    fprintf('mean face saved. paused\n');
    %% 計算降維陣 
    X = trainset; % just for convience  
    [m, n] = size(X);
      
    U = zeros(n);
    S = zeros(n);
      
    Cov = (1 / m) * X' * X;  %計算協方差矩陣
    [U, S, V] = svd(Cov);%奇異值分解，返回一個與X 同大小的對角矩陣S，兩個正交矩陣U 和V，且滿足= U*S*V'。若A 為m×n 陣，則U 為m×m 陣，V為n×n 陣。奇異值在S 的對角線上，非負且按降序排列。 
    %使用SVD可以對非方陣進行PCA處理，下面註釋的內容可以處理方陣
    E = diag(S);
    contribution = cumsum(E)./sum(E);%計算貢獻率
%     [U,D]=eig(Cov);       %計算矩陣R的特徵向量矩陣V和特徵值矩陣D,特徵值由小到大
%     U=(rot90(U))';      %將特徵向量矩陣U從大到小排序
%     D=rot90(rot90(D));  %將特徵值矩陣由大到小排序
%     E=diag(D);          %將特徵值矩陣轉換為特徵值向量
%     ratio=0; %累計貢獻率
%     for k=1:n
%         r=E(k)/sum(E);   %第k主成份貢獻率
%         ratio=ratio+r;  %累計貢獻率
%         if(ratio>=0.9)  %取累計貢獻率大於等於90%的主成分
%             break;
%         end
%     end
    fprintf('compute cov done.\n');
    %降維矩陣U中的特徵向量， 在關於人臉的降維中，又被稱為特徵臉，  U 中的每個特徵向量相當於找到的降維空間的一個方向。 利用U可以將特徵對映到這個空間中。  
    %% 顯示特徵臉  U的前十項
    for i = 1:10
        ef = U(:, i);
        img = ef;  
        minVal = min(img);  
        img = img - minVal;  
        max_val = max(abs(img));  
        img = img / max_val;  
        img = reshape(img, 50, 40);
        imwrite(img, strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\亂七八糟PCA\','eigenface', int2str(i), '.bmp'));  
    end  
      
    fprintf('eigen face saved, paused.\n');  
    pause;
      
    %% dimension reduction  
    k = 100; % reduce to 100 dimension  
    test = zeros(10, 50 * 40);  
    file_path =  'C:\Users\zyfls\Desktop\ML\第五章資料降維\資料\AR\AR\';% 影象資料夾路徑  
    img_path_list = dir(strcat(file_path,'*.bmp'));%獲取該資料夾中所有bmp格式的影象
    for i = 1:10  %前十個測試集
        image_name = img_path_list(i).name;% 影象名
        img =  imread(strcat(file_path,image_name));
        %img = imread(strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\', int2str(i), '.bmp'));  
        img = double(img);  
        test(i, :) = img(:);  
    end
      
    % test set need to do normalization  
    test = bsxfun(@minus, test, mu);  
      
    % reduction  降維
    Uk = U(:, 1:k);  %取從1到dimsion的特徵向量作為降維空間
    Z = test * Uk;
    fprintf('reduce done.\n');      
    %% 測試集重構
    %% for the test set images, we only minus the mean face,  
    % so in the reconstruct process, we need add the mean face back  
    Xp = Z * Uk';  
    % show reconstructed face  
    for i = 1:10
        face = Xp(i, :);
        %face = face .* sigma;
        face = face + mu;
        face = reshape((face), 50, 40);  
        imwrite(uint8(face), strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\亂七八糟PCA\','reconstructionface', int2str(i), '.bmp')); 
        Face_re(i,:)=Xp(i,:)+mu;  
    end  
    e = Face_re-test;
    error(1,i)=norm(e);  
  
    %dispaly error rate  
    error_rate=error(1,i);  
    display(error_rate);    %1.9061e+04
     %訓練集的重構，因為訓練集多除了個sigma矩陣這裡再乘回來 
    %% for the train set reconstruction, we minus the mean face and divide by standard deviation during the train  
    % so in the reconstruction process, we need to multiby standard deviation first,   
    % and then add the mean face back  
    trainset_re = trainset_norm * Uk; % reduction  
    trainset_re = trainset_re * Uk'; % reconstruction
    for i = 11:25  
        train = trainset_re(i, :);  
        train = train .* sigma;  
        train = train + mu;
        train = reshape(train, 50, 40);  
        imwrite(uint8(train), strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\亂七八糟PCA\', 'reconstruction',int2str(i), 'train.bmp'));  
    end

以上程式碼完成降維和重構：

得到的平均臉如右側所示：

特徵臉：

重構影象：

以上部分完成了降維、重構。

識別：

識別程式碼如下（含降維和重構）：

%% 讀入圖片  
    clear ; close all; clc   
    %m = 1680; % number of samples  
    trainset = zeros(10, 50 * 40); % 圖片大小 50 * 40  
    file_path =  'C:\Users\zyfls\Desktop\ML\第五章資料降維\資料\AR\AR\';% 影象資料夾路徑  
    img_path_list = dir(strcat(file_path,'*.bmp'));%獲取該資料夾中所有bmp格式的影象
    img_num = length(img_path_list);%獲取影象總數量
    j=1;
    for i = 1: img_num  %取所有照片做為訓練集 
        if(mod(i,14) == 0)%每個人的最後一張留下來做測試集
            continue;
        end
        image_name = img_path_list(i).name;% 影象名
    %         name = image_name(1:3);
    %         if strcmp(name,'001')
        img =  imread(strcat(file_path,image_name));
        img = double(img);
        trainset(j, :) = img(:);
        j=j+1;
%         end
    end
    %% before training PCA, do feature normalization  
    mu = mean(trainset);%mean函式用來求 沿陣列中不同維的元素的平均值。
    trainset_norm = bsxfun(@minus, trainset, mu);%訓練集減去平均值
      
    sigma = std(trainset_norm); %std 計算標準差
    trainset_norm = bsxfun(@rdivide, trainset_norm, sigma);  %trainset_norm 點除 sigma(標準差)
      
    %% we could save the mean face mu to take a look the mean face  
    imwrite(uint8(reshape(mu, 50, 40)), 'C:\Users\zyfls\Desktop\ML各種截圖\5\Recognition\meanface.bmp'); 
    
    
    %% 計算降維陣 
    X = trainset; % just for convience  
    [m, n] = size(X);
      
    U = zeros(n);
    S = zeros(n);
    Cov = (1 / m) * X' * X;  %計算協方差矩陣
    [U, S, V] = svd(Cov);%奇異值分解，返回一個與cov 同大小的對角矩陣S，兩個正交矩陣U 和V，且滿足= U*S*V'。若A 為m×n 陣，則U 為m×m 陣，V為n×n 陣。奇異值在S 的對角線上，非負且按降序排列。 
    %使用SVD可以對非方陣進行PCA處理

    E = diag(S);
    contribution = cumsum(E)./sum(E);%計算貢獻率
    fprintf('compute cov done.\n');
   
        
    %降維矩陣U中的特徵向量， 在關於人臉的降維中，又被稱為特徵臉，  U 中的每個特徵向量相當於找到的降維空間的一個方向。 利用U可以將特徵對映到這個空間中。    
    %% 顯示特徵臉  U的前十項
    for i = 1:10
        ef = U(:, i);   
        img = ef;  
        minVal = min(img);  
        img = img - minVal;
        max_val = max(abs(img));  
        img = img / max_val;  
        img = reshape(img, 50, 40);
        imwrite(img, strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\Recognition\','eigenface', int2str(i), '.bmp'));
    end  
     
    fprintf('eigen face saved, paused.\n');  
    pause;
    j=1;
    regis = zeros(120,50*40);
    for i = 1:14: img_num  %取每組人的第一張照片做登記記錄集
        image_name = img_path_list(i).name;% 影象名
%         name = image_name(1:3);
%         if strcmp(name,'001')
         img =  imread(strcat(file_path,image_name));
         img = double(img);
         regis(j, :) = img(:);
         j=j+1;
%         end 
    end  
    regis = bsxfun(@minus,regis,mu);
    Uk = U(:, 1:100);  %取從1到100的特徵向量作為降維空間
    Zregis = regis * Uk;%記錄登記記錄集在降維陣中的值
    
    %% dimension reduction  
    k = 100; % reduce to 100 dimension  
    test = zeros(1680, 50 * 40);  
    file_path =  'C:\Users\zyfls\Desktop\ML\第五章資料降維\資料\AR\AR\';% 影象資料夾路徑  
    img_path_list = dir(strcat(file_path,'*.bmp'));%獲取該資料夾中所有bmp格式的影象
    success = 0;
    for i = 1: img_num %遍歷每張照片將其與登記記錄比較，進行分類
        image_name = img_path_list(i).name;% 影象名
        img =  imread(strcat(file_path,image_name));
        %img = imread(strcat('C:\Users\zyfls\Desktop\ML各種截圖\5\', int2str(i), '.bmp'));
        img = double(img);
        test(i, :) = img(:);  
        test(i,:) = test(i,:) - mu;
        
        Uk = U(:, 1:100);  %取從1到dimsion的特徵向量作為降維空間
        Ztest = test * Uk;%測試集在降維陣中的值
        for j=1:120
            mdist(j)=norm(Ztest(i,:)-Zregis(j,:));%計算與登記記錄的距離
        end
        [C,I] = min(mdist);%返回最小的距離，及其位置
        
        if(I<10)
            I = num2str(I);
            I = strcat('00',I);
        elseif (I<100)
            I = num2str(I);
            I = strcat('0',I);
        else
            I = num2str(I);
        end
        name = image_name(1:3);%取當前圖片的前三位用於判斷是否分類正確
        if strcmp(name,I)
            success = success + 1;
        end
    end
    suc_rate = success/1680;

PS：我的圖片庫名字格式是前三位數字編號代表不同人，所以這裡靠這個來辨認分類的正確性。

這樣就完成了整個PCA的降維、重構和識別，終於完成大作業了。

下一篇會介紹一下SR（稀疏字典）識別。

PCA（主成分分析）方法資料降維、重構和人臉識別

我眼中的PCA：

PCA思路流程如下：

Matlab程式碼如下：

識別：

PCA（主成分分析）方法資料降維、重構和人臉識別

PCA （主成分分析）詳解（寫給初學者）結合matlab（轉載）

淺談PCA（主成分分析）線性降維演算法用法

PCA（主成分分析）降維演算法詳解和程式碼

機器學習實戰——PCA（主成分分析）

機器學習系列1 PCA（主成分分析法）

PCA演算法（主成分分析）

運用PCA（主成分分析法）進行人臉識別的MATLAB 程式碼實現

ICA（獨立成分分析）方法及應用於視訊分析心跳檢測

Principal Component Analysis（主成分分析）

DataTables伺服器端傳入傳出（接收與提交）的資料格式搜尋、排序和分頁與後臺資料的互動

（魯棒性主成分分析）Robust PCA

從矩陣（matrix）角度討論PCA（Principal Component Analysis 主成分分析）、SVD（Singular Value Decomposition 奇異值分解）相關原理

R語言 PCA(主成分分析）

大資料（二十四）：資料傾斜優化、並行執行、嚴格模式、JVM重用、執行計劃

（GIS視覺化）ArcGis中屬性連線、關聯和空間連線的區別

使用主成分分析（PCA）方法對資料進行降維

Python資料分析學習筆記（6）資料規約實戰--以主成分分析PCA為例

資料探勘學習------------------1-資料準備-４-主成分分析（PCA）降維和相關係數降維

【機器學習】資料降維—主成分分析（PCA）

PCA（主成分分析）方法資料降維、重構和人臉識別

我眼中的PCA：

PCA思路流程如下：

Matlab程式碼如下：

識別：

相關推薦