查準率-查全率precision recall（PR）曲線Matlab實現

阿新 • • 發佈：2019-01-01

在用雜湊進行檢索時，常會用到precision recall曲線對其效能進行定量評價。precision recall的定義在資訊檢索評價指標中已做了詳細說明，這裡再記錄一下precision recall的具體實現。

precision recall曲線matlab一般使用的都是下面的版本：

function [recall, precision, rate] = recall_precision(Wtrue, Dhat)
%
% Input:
%    Wtrue = true neighbors [Ntest * Ndataset], can be a full matrix NxN
%    Dhat  = estimated distances
%
% Output:
%
%                  exp. # of good pairs inside hamming ball of radius <= (n-1)
%  precision(n) = --------------------------------------------------------------
%                  exp. # of total pairs inside hamming ball of radius <= (n-1)
%
%               exp. # of good pairs inside hamming ball of radius <= (n-1)
%  recall(n) = --------------------------------------------------------------
%                          exp. # of total good pairs

max_hamm = max(Dhat(:))
hamm_thresh = min(3,max_hamm);

[Ntest, Ntrain] = size(Wtrue);
total_good_pairs = sum(Wtrue(:));

% find pairs with similar codes
precision = zeros(max_hamm,1);
recall = zeros(max_hamm,1);
rate = zeros(max_hamm,1);

for n = 1:length(precision)
    j = (Dhat<=((n-1)+0.00001));

    %exp. # of good pairs that have exactly the same code
    retrieved_good_pairs = sum(Wtrue(j));

    % exp. # of total pairs that have exactly the same code
    retrieved_pairs = sum(j(:));

    precision(n) = retrieved_good_pairs/retrieved_pairs;
    recall(n)= retrieved_good_pairs/total_good_pairs;
    rate(n) = retrieved_pairs / (Ntest*Ntrain);
end

% The standard measures for IR are recall and precision. Assuming that:
%
%    * RET is the set of all items the system has retrieved for a specific inquiry;
%    * REL is the set of relevant items for a specific inquiry;
%    * RETREL is the set of the retrieved relevant items
%
% then precision and recall measures are obtained as follows:
%
%    precision = RETREL / RET
%    recall = RETREL / REL

% if nargout == 0 || nargin > 3
%     if isempty(fig);
%         fig = figure;
%     end
%     figure(fig)
%
%     subplot(311)
%     plot(0:hamm_thresh-1, precision(1:hamm_thresh), varargin{:})
%     hold on
%     xlabel('hamming radius')
%     ylabel('precision')
%
%     subplot(312)
%     plot(0:hamm_thresh-1, recall(1:hamm_thresh), varargin{:})
%     hold on
%     xlabel('hamming radius');
%     ylabel('recall');
%
%    subplot(313);
%     plot(recall, precision, varargin{:});
%     hold on;
%     axis([0 1 0 1]);
%     xlabel('recall');
%     ylabel('precision');
%
%     drawnow;
% end

function [score, recall] = evaluation(Wtrue, Dhat, fig, varargin)
%
% Input:
%    Wtrue = true neighbors [Ntest * Ndataset], can be a full matrix NxN
%    Dhat  = estimated distances
%   The next inputs are optional:
%    fig = figure handle
%    options = just like in the plot command
%
% Output:
%
%               exp. # of good pairs inside hamming ball of radius <= (n-1)
%  score(n) = --------------------------------------------------------------
%               exp. # of total pairs inside hamming ball of radius <= (n-1)
%
%               exp. # of good pairs inside hamming ball of radius <= (n-1)
%  recall(n) = --------------------------------------------------------------
%                          exp. # of total good pairs

[Ntest, Ntrain] = size(Wtrue);
total_good_pairs = sum(Wtrue(:));

% find pairs with similar codes
score = zeros(20,1);
for n = 1:length(score)
    j = find(Dhat<=((n-1)+0.00001));

    %exp. # of good pairs that have exactly the same code
    retrieved_good_pairs = sum(Wtrue(j));

    % exp. # of total pairs that have exactly the same code
    retrieved_pairs = length(j);

    score(n) = retrieved_good_pairs/retrieved_pairs;
    recall(n)= retrieved_good_pairs/total_good_pairs;
end

% The standard measures for IR are recall and precision. Assuming that:
%
%    * RET is the set of all items the system has retrieved for a specific inquiry;
%    * REL is the set of relevant items for a specific inquiry;
%    * RETREL is the set of the retrieved relevant items
%
% then precision and recall measures are obtained as follows:
%
%    precision = RETREL / RET
%    recall = RETREL / REL

if nargout == 0 || nargin > 3
    if isempty(fig);
        fig = figure;
    end
    figure(fig)
    subplot(211)
    plot(0:length(score)-1, score, varargin{:})
    hold on
    xlabel('hamming radium')
    ylabel('percent correct (precision)')
    title('percentage of good neighbors inside the hamm ball')
    subplot(212)
    plot(recall, score, varargin{:})
    hold on
    axis([0 1 0 1])
    xlabel('recall')
    ylabel('percent correct (precision)')
    drawnow
end

不能看出，上面的score就是前面的precision,在追溯到08年，也就是譜雜湊SH發表的那年，同樣可以在SH中有畫precision recall的曲線，跟第二個一樣。考證這些，無非就是想說在自己畫PR曲線時，就用這些牛提供的比較靠譜，自己寫出來的不一定對。

好了，再對畫precision recall輸入的引數做些梳理。畫precision recall曲線時，用到的groundtruth是原歐式空間中查詢樣本的近鄰，所以在計算Wtrue時，可以採用下面的方法計算：

%center, then normalize data
X = X - ones(size(X,1),1)*mean(X);
for i = 1:size(X,1)
    X(i,:) = X(i,:) / norm(X(i,:));
end

rp = randperm(size(X,1));
trIdx = rp(1:trN);
testIdx = rp(trN+1:trN+testN);
Xtr = X(trIdx,:);
Xtst = X(testIdx,:);

D_tst = distMat(Xtst,Xtr);
D_tr = distMat(Xtr);
Dball = sort(D_tr,2);
Dball = mean(Dball(:,50));
WTT = D_tst < Dball;

上面第一步先對資料進行中心化，然後進行歸一化。之後挑選出訓練樣本和測試樣本（查詢樣本），然後計算Wture。Dhat就是計算查詢樣本與database之間的漢明距離，可以通過下面方法計算：

%get Hamming distance between queries and database
B1 = compactbit(H);
B2 = compactbit(H_query);
Dhamm = hammingDist(B2,B1);

H是database中的編碼，進行壓縮以十進位制數進行表示，同理H_query即為查詢樣本的編碼。將上面都計算出來後，便可以得到precision和recall,plot一下就可以了。

參考：

查準率-查全率precision recall（PR）曲線Matlab實現

查準率-查全率precision recall（PR）曲線Matlab實現

【二】遺傳算法（GA）的MATLAB實現

基本粒子群優化演算法（PSO）的matlab實現

【 MATLAB 】離散傅立葉變換（DFT）以及逆變換（IDFT）的MATLAB實現

python繪製lost（損失）曲線加方差範圍

移動最小二乘法（MLS）曲線曲面擬合C++程式碼實現

（C#）曲線擬合的最小二乘法

（25）曲線和曲面基礎

[轉]Web APi之認證（Authentication）兩種實現方式【二】（十三）

快速排序（Quicksort）的Javascript實現

排球計分（三）Controller的實現

Android Design Support Library（二）用NavigationView實現抽屜菜單界面

springCloud（7）：Ribbon實現客戶端側負載均衡-消費者整合Ribbon

springCloud（8）：Ribbon實現客戶端側負載均衡-自定義Ribbon配置

自己搭建自動化巡檢系統（三）通過telnet實現遠程創建loopback

用numpy計算成交量加權平均價格（VWAP），並實現讀寫文件

[數學建模（八）]使用MATLAB繪圖1

echarts中視覺映射器（visualMap）與時間軸（timeline）混用的實現方法

MySQL集群（四）之keepalived實現mysql雙主高可用

從零搭建SSM框架（五）使用Maven實現Tomcat熱部署

查準率-查全率precision recall（PR）曲線Matlab實現

相關推薦