MSCOCO資料集評估caption_evaluation
二、evaldemo/程式碼
這裡提供的評估程式碼可以用於獲取公開可用的COCO驗證集的結果。它計算多個常用指標,包括BLEU、METEOR、ROUGE-L和CIDEr(包含每個度量的引用和描述)。
1. coco
官網http://cocodataset.org/#download 下提供的程式碼地址:https://github.com/cocodataset/cocoapi
其中帶有coco的評估程式碼,會隨著當初安裝cocoapi時一同安裝。
但此處的cocoeval只用於keypoint與instances,不能用於caption。
demo.py
from pycocotools.coco import COCOfrom pycocotools.cocoeval import COCOeval import numpy as np annType = ['segm','bbox','keypoints'] annType = annType[1] #specify type here\n", #initialize COCO ground truth api dataDir='./coco' dataType='val2014' annFile = '%s/annotations/%s_%s.json'%(dataDir,prefix,dataType) cocoGt=COCO(annFile)#initialize COCO detections api resFile='./coco/results/%s_%s_fakecap_results.json'#captions_val2014_fakecap_results.json resFile = resFile%(prefix, dataType) cocoDt=cocoGt.loadRes(resFile)imgIds=sorted(cocoGt.getImgIds()) imgIds=imgIds[0:100] imgId = imgIds[np.random.randint(100)]prefix = 'person_keypoints' if annType=='keypoints' else 'instances'print ('Running demo for *%s* results.'%(annType))# running evaluation\n", cocoEval = COCOeval(cocoGt, cocoDt, annType)# , cocoEval.params.imgIds = imgIds cocoEval.evaluate() cocoEval.accumulate() cocoEval.summarize()
2. coco-caption
官網http://cocodataset.org/#captions-eval 下提供的程式碼地址:https://github.com/tylin/coco-caption
其中帶有coco專用於caption的評估程式碼
from pycocotools.coco import COCO
from pycocoevalcap.eval import COCOEvalCap
dataDir='.' dataType='val2014' algName = 'fakecap' annFile='%s/annotations/captions_%s.json'%(dataDir,dataType) subtypes=['results', 'evalImgs', 'eval']
resFile='%s/results/captions_%s_%s_%s.json'%(dataDir,dataType,algName,subtype)
coco = COCO(annFile)
cocoRes = coco.loadRes(resFile)
cocoEval = COCOEvalCap(coco, cocoRes) for metric, score in cocoEval.eval.items(): print '%s: %.3f'%(metric, score)
# demo how to use evalImgs to retrieve low score result evals = [eva for eva in cocoEval.evalImgs if eva['CIDEr']<30] print ('ground truth captions') imgId = evals[0]['image_id'] annIds = coco.getAnnIds(imgIds=imgId) anns = coco.loadAnns(annIds) coco.showAnns(anns) print ('\n'+'generated caption (CIDEr score %0.1f)'%(evals[0]['CIDEr'])) annIds = cocoRes.getAnnIds(imgIds=imgId) anns = cocoRes.loadAnns(annIds) coco.showAnns(anns) img = coco.loadImgs(imgId)[0] # I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name'])) # plt.imshow(I) # plt.axis('off') # plt.show() # plot score histogram # ciderScores = [eva['CIDEr'] for eva in cocoEval.evalImgs] # plt.hist(ciderScores) # plt.title('Histogram of CIDEr Scores', fontsize=20) # plt.xlabel('CIDEr score', fontsize=20) # plt.ylabel('result counts ', fontsize=20) # plt.show() # save evaluation results to ./results folder json.dump(cocoEval.evalImgs, open(evalImgsFile, 'w')) json.dump(cocoEval.eval, open(evalFile, 'w'))
注:
1. 無法fanqiang的,官網下內容可以參照該連結內容:http://blog.csdn.net/ccbrid/article/details/79368639 末尾
2. result參考格式:https://github.com/tylin/coco-caption/blob/master/results/captions_val2014_fakecap_results.json
3. 作為一個包直接用,不用安裝
4. 注意要求python版本為2.7,可強行改成python3語法,但會出現很多錯誤。。
三、評價指標
eval{"BLEU_1" : float, (blue常用來測機翻)"BLEU_2" : float,"BLEU_3" : float,"BLEU_4" : float,"METEOR" : float,"ROUGE_L" : float, (常用來測文摘)"CIDEr" : float,}四、注意
在使用2014版imagecaption時,test資料集(用於比賽)並沒有給出相應的reference,官方論文:為了防止過擬合。
所以可自己劃分:(常見train:val:test = 8:1:1),原train82783原val40531,可劃分為train82783val20266test20265
五、其他參考
http://blog.csdn.net/u014734886/article/details/78831884
http://blog.csdn.net/u014734886/article/details/78837961
以上1、2兩篇來自於同一個部落格 // 然而這位老哥並沒有寫關於 caption 的文章
3. 另一個相關的評測程式碼,其中封裝了cocoeval,格式要求不同。https://github.com/Maluuba/nlg-eval