在YOLOv2論文中,作者有對Dimension Cluster做一個介紹,這個cluster的目的就是尋找出anchor的先驗(簡稱為先驗框)。

什麼是先驗框呢,簡單來說,在YOLOv1中,作者遇到了一個問題,雖然我們通過實驗知道要選兩個boxes是最優的,但是如何這兩個boxes的尺寸如何決定呢?網路自身可以學著不斷調節box的大小,但是我們能夠提前給定一個/多個尺寸作為備選不是更好嗎?所以作者就決定利用 k-means 聚類方法在 training set bounding boxes上來尋找先驗框(框的尺寸)。

標準的k-means方法用的是歐氏距離,但是這樣會導致 larger boxes generate more error than smaller boxes. 我們想要得到的先驗框,是能夠帶領我們得到更高的IOU的,如果用歐氏距離來衡量,可能會導致“大框優勢”。所以作者使用了


我們期待距離越小越好(IOU越大越好),所以距離判定時候用 1 - IOU



PaulChongPeng的我用VOC2007+2012的training set測試了一下(4/46/2018),結果如下:


還有一個po主放出了standard k-means的方法(實際是不對的),用的是歐氏距離而非“IOU距離”:

# I wrote up a couple quick scripts to help with this: gen_boxes.sh and cluster_boxes.py.
# They operate within your directory of some_image_name.txt label files.
# Usage example is shown below:

[email protected]:~/data/labels$ ls *.txt | head -5

[email protected]
:~/data/labels$ head 00RaKqC3eqjWCHQMIPKaeNMsdivO83GL.txt 0 0.502333 0.549333 0.144667 0.137333 [email protected]:~/data/labels$ cat gen_boxes.sh cat *.txt | cut -d' ' -f 4,5 | sed 's/\([^ ]*\) \(.*\)/\1,\2/g' > boxes.csv [email protected]:~/data/labels$ bash gen_boxes.sh [email protected]:~/data/labels$ cat cluster_boxes.py from sklearn.cluster import KMeans import numpy as np data = np.genfromtxt('boxes.csv', delimiter=',') print("Example of data:") print(data[0:10]) print("") kmeans = KMeans(n_clusters=5, random_state=0).fit(data) print("Cluster centers:") print(kmeans.cluster_centers_) print("") print("Scaled to [0, 13]:") print(kmeans.cluster_centers_ * 13) print("") print("In Darknet config format:") def coords(x): return "%f,%f" % (x[0], x[1]) print("anchors= %s" % " ".join([coords(center) for center in kmeans.cluster_centers_ * 13])) [email protected]:~/data/labels$ python cluster_boxes.py Example of data: [[ 0.144667 0.137333] [ 0.135333 0.240667] [ 0.145 0.146667] [ 0.547 0.306667] [ 0.4 0.241667] [ 0.137 0.145 ] [ 0.643 0.356667] [ 0.147 0.086667] [ 0.123 0.112 ] [ 0.202 0.265 ]] Cluster centers: [[ 0.1377161 0.13268718] [ 0.28492789 0.18958423] [ 0.0663724 0.05359964] [ 0.48530697 0.496173 ] [ 0.18765588 0.27052479]] Scaled to [0, 13]: [[ 1.79030931 1.72493339] [ 3.70406262 2.46459499] [ 0.86284123 0.6967953 ] [ 6.30899063 6.450249 ] [ 2.43952643 3.51682231]] # In Darknet config format: # anchors= 1.790309,1.724933 3.704063,2.464595 0.862841,0.696795 6.308991,6.450249 2.439526,3.516822 # You can then copy that "anchors= ..." line in place of the existing one in your yolo-whatever.cfg file.



在YOLOv2中,作者用最後一層feature map的相對大小來定義anchor大小。也就是說,在YOLOv2中,最後一層feature map大小為13X13,相對的anchor大小範圍就在(0x0,13x13],如果一個anchor大小是9x9,那麼其在原圖上的實際大小是288x288.

而在YOLOv3中,作者又改用相對於原圖的大小來定義anchor,anchor的大小為(0x0,input_w x input_h]。


So YOLOv2 I made some design choice errors, I made the anchor box size be relative to the feature size in the last layer. Since the network was down-sampling by 32. This means it was relative to 32 pixels so an anchor of 9x9 was actually 288px x 288px.

In YOLOv3 anchor sizes are actual pixel values. this simplifies a lot of stuff and was only a little bit harder to implement


