查準率-查全率precision recall(PR)曲線Matlab實現
阿新 • • 發佈:2019-01-01
在用雜湊進行檢索時,常會用到precision recall曲線對其效能進行定量評價。precision recall的定義在資訊檢索評價指標中已做了詳細說明,這裡再記錄一下precision recall的具體實現。
precision recall曲線matlab一般使用的都是下面的版本:
function [recall, precision, rate] = recall_precision(Wtrue, Dhat) % % Input: % Wtrue = true neighbors [Ntest * Ndataset], can be a full matrix NxN % Dhat = estimated distances % % Output: % % exp. # of good pairs inside hamming ball of radius <= (n-1) % precision(n) = -------------------------------------------------------------- % exp. # of total pairs inside hamming ball of radius <= (n-1) % % exp. # of good pairs inside hamming ball of radius <= (n-1) % recall(n) = -------------------------------------------------------------- % exp. # of total good pairs max_hamm = max(Dhat(:)) hamm_thresh = min(3,max_hamm); [Ntest, Ntrain] = size(Wtrue); total_good_pairs = sum(Wtrue(:)); % find pairs with similar codes precision = zeros(max_hamm,1); recall = zeros(max_hamm,1); rate = zeros(max_hamm,1); for n = 1:length(precision) j = (Dhat<=((n-1)+0.00001)); %exp. # of good pairs that have exactly the same code retrieved_good_pairs = sum(Wtrue(j)); % exp. # of total pairs that have exactly the same code retrieved_pairs = sum(j(:)); precision(n) = retrieved_good_pairs/retrieved_pairs; recall(n)= retrieved_good_pairs/total_good_pairs; rate(n) = retrieved_pairs / (Ntest*Ntrain); end % The standard measures for IR are recall and precision. Assuming that: % % * RET is the set of all items the system has retrieved for a specific inquiry; % * REL is the set of relevant items for a specific inquiry; % * RETREL is the set of the retrieved relevant items % % then precision and recall measures are obtained as follows: % % precision = RETREL / RET % recall = RETREL / REL % if nargout == 0 || nargin > 3 % if isempty(fig); % fig = figure; % end % figure(fig) % % subplot(311) % plot(0:hamm_thresh-1, precision(1:hamm_thresh), varargin{:}) % hold on % xlabel('hamming radius') % ylabel('precision') % % subplot(312) % plot(0:hamm_thresh-1, recall(1:hamm_thresh), varargin{:}) % hold on % xlabel('hamming radius'); % ylabel('recall'); % % subplot(313); % plot(recall, precision, varargin{:}); % hold on; % axis([0 1 0 1]); % xlabel('recall'); % ylabel('precision'); % % drawnow; % end
function [score, recall] = evaluation(Wtrue, Dhat, fig, varargin) % % Input: % Wtrue = true neighbors [Ntest * Ndataset], can be a full matrix NxN % Dhat = estimated distances % The next inputs are optional: % fig = figure handle % options = just like in the plot command % % Output: % % exp. # of good pairs inside hamming ball of radius <= (n-1) % score(n) = -------------------------------------------------------------- % exp. # of total pairs inside hamming ball of radius <= (n-1) % % exp. # of good pairs inside hamming ball of radius <= (n-1) % recall(n) = -------------------------------------------------------------- % exp. # of total good pairs [Ntest, Ntrain] = size(Wtrue); total_good_pairs = sum(Wtrue(:)); % find pairs with similar codes score = zeros(20,1); for n = 1:length(score) j = find(Dhat<=((n-1)+0.00001)); %exp. # of good pairs that have exactly the same code retrieved_good_pairs = sum(Wtrue(j)); % exp. # of total pairs that have exactly the same code retrieved_pairs = length(j); score(n) = retrieved_good_pairs/retrieved_pairs; recall(n)= retrieved_good_pairs/total_good_pairs; end % The standard measures for IR are recall and precision. Assuming that: % % * RET is the set of all items the system has retrieved for a specific inquiry; % * REL is the set of relevant items for a specific inquiry; % * RETREL is the set of the retrieved relevant items % % then precision and recall measures are obtained as follows: % % precision = RETREL / RET % recall = RETREL / REL if nargout == 0 || nargin > 3 if isempty(fig); fig = figure; end figure(fig) subplot(211) plot(0:length(score)-1, score, varargin{:}) hold on xlabel('hamming radium') ylabel('percent correct (precision)') title('percentage of good neighbors inside the hamm ball') subplot(212) plot(recall, score, varargin{:}) hold on axis([0 1 0 1]) xlabel('recall') ylabel('percent correct (precision)') drawnow end
不能看出,上面的score就是前面的precision,在追溯到08年,也就是譜雜湊SH發表的那年,同樣可以在SH中有畫precision recall的曲線,跟第二個一樣。考證這些,無非就是想說在自己畫PR曲線時,就用這些牛提供的比較靠譜,自己寫出來的不一定對。
好了,再對畫precision recall輸入的引數做些梳理。畫precision recall曲線時,用到的groundtruth是原歐式空間中查詢樣本的近鄰,所以在計算Wtrue時,可以採用下面的方法計算:
%center, then normalize data X = X - ones(size(X,1),1)*mean(X); for i = 1:size(X,1) X(i,:) = X(i,:) / norm(X(i,:)); end rp = randperm(size(X,1)); trIdx = rp(1:trN); testIdx = rp(trN+1:trN+testN); Xtr = X(trIdx,:); Xtst = X(testIdx,:); D_tst = distMat(Xtst,Xtr); D_tr = distMat(Xtr); Dball = sort(D_tr,2); Dball = mean(Dball(:,50)); WTT = D_tst < Dball;
上面第一步先對資料進行中心化,然後進行歸一化。之後挑選出訓練樣本和測試樣本(查詢樣本),然後計算Wture。Dhat就是計算查詢樣本與database之間的漢明距離,可以通過下面方法計算:
%get Hamming distance between queries and database
B1 = compactbit(H);
B2 = compactbit(H_query);
Dhamm = hammingDist(B2,B1);
H是database中的編碼,進行壓縮以十進位制數進行表示,同理H_query即為查詢樣本的編碼。將上面都計算出來後,便可以得到precision和recall,plot一下就可以了。
參考: