CelebA資料集詳細屬性統計
阿新 • • 發佈:2018-11-10
CelebA是香港中文大學提供的包含10,177個名人身份的202,599張人臉圖片的資料集,其提供了5個點的人臉關鍵點座標以及40個屬性,可以在Large-scale CelebFaces Attributes (CelebA) Dataset下載.
各屬性的含義請參考文末給出的連結,本文的程式碼用於統計其各屬性的數量
rootdir="../" imgdir=rootdir+"Img/img_celeba" attributepath=rootdir+"Anno/list_attr_celeba.txt" def stats(): with open(attributepath)as f: numofimgs=int(f.readline()) line=f.readline() items=line.split() attrs=[] for i in range(len(items)): attrs.append(items[i]) #print(attrs) stats=[] for i in range(len(attrs)): stat=[] stat.append(0) stat.append(0) stats.append(stat) for i in range(numofimgs): line=f.readline() items=line.split()[1:] for j in range(len(attrs)): if items[j]=="1": stats[j][0]+=1 else: stats[j][1]+=1 for i in range(len(attrs)): print(attrs[i],stats[i][0],stats[i][1]) if __name__=="__main__": stats()
結果如下:
5_o_Clock_Shadow 22516 180083 Arched_Eyebrows 54090 148509 Attractive 103833 98766 Bags_Under_Eyes 41446 161153 Bald 4547 198052 Bangs 30709 171890 Big_Lips 48785 153814 Big_Nose 47516 155083 Black_Hair 48472 154127 Blond_Hair 29983 172616 Blurry 10312 192287 Brown_Hair 41572 161027 Bushy_Eyebrows 28803 173796 Chubby 11663 190936 Double_Chin 9459 193140 Eyeglasses 13193 189406 Goatee 12716 189883 Gray_Hair 8499 194100 Heavy_Makeup 78390 124209 High_Cheekbones 92189 110410 Male 84437 118162 Mouth_Slightly_Open 97942 104657 Mustache 8417 194182 Narrow_Eyes 23329 179270 No_Beard 169158 33441 Oval_Face 57567 145032 Pale_Skin 8701 193898 Pointy_Nose 56210 146389 Receding_Hairline 16163 186436 Rosy_Cheeks 13315 189284 Sideburns 11449 191150 Smiling 97669 104930 Straight_Hair 42222 160377 Wavy_Hair 64744 137855 Wearing_Earrings 38276 164323 Wearing_Hat 9818 192781 Wearing_Lipstick 95715 106884 Wearing_Necklace 24913 177686 Wearing_Necktie 14732 187867 Young 156734 45865
不難發現有些屬性分佈很不均衡,達到了10:1的比例,而男女還是相對要均衡一些的,為84437:118162,可以提取出來作為性別識別的資料.
參考: