1. 程式人生 > >CelebA資料集詳細屬性統計

CelebA資料集詳細屬性統計

CelebA是香港中文大學提供的包含10,177個名人身份的202,599張人臉圖片的資料集,其提供了5個點的人臉關鍵點座標以及40個屬性,可以在Large-scale CelebFaces Attributes (CelebA) Dataset下載.

各屬性的含義請參考文末給出的連結,本文的程式碼用於統計其各屬性的數量

rootdir="../"
imgdir=rootdir+"Img/img_celeba"
attributepath=rootdir+"Anno/list_attr_celeba.txt"
def stats():
    with open(attributepath)as f:
        numofimgs=int(f.readline())
        line=f.readline()
        items=line.split()
        attrs=[]
        for i in range(len(items)):
            attrs.append(items[i])
        #print(attrs)
        stats=[]
        for i in range(len(attrs)):
            stat=[]
            stat.append(0)
            stat.append(0)
            stats.append(stat)
        for i in range(numofimgs):
            line=f.readline()
            items=line.split()[1:]
            for j in range(len(attrs)):
                if items[j]=="1":
                    stats[j][0]+=1
                else:
                    stats[j][1]+=1
        for i in range(len(attrs)):
            print(attrs[i],stats[i][0],stats[i][1])

if __name__=="__main__":
    stats()

結果如下:

5_o_Clock_Shadow 22516 180083
Arched_Eyebrows 54090 148509
Attractive 103833 98766
Bags_Under_Eyes 41446 161153
Bald 4547 198052
Bangs 30709 171890
Big_Lips 48785 153814
Big_Nose 47516 155083
Black_Hair 48472 154127
Blond_Hair 29983 172616
Blurry 10312 192287
Brown_Hair 41572 161027
Bushy_Eyebrows 28803 173796
Chubby 11663 190936
Double_Chin 9459 193140
Eyeglasses 13193 189406
Goatee 12716 189883
Gray_Hair 8499 194100
Heavy_Makeup 78390 124209
High_Cheekbones 92189 110410
Male 84437 118162
Mouth_Slightly_Open 97942 104657
Mustache 8417 194182
Narrow_Eyes 23329 179270
No_Beard 169158 33441
Oval_Face 57567 145032
Pale_Skin 8701 193898
Pointy_Nose 56210 146389
Receding_Hairline 16163 186436
Rosy_Cheeks 13315 189284
Sideburns 11449 191150
Smiling 97669 104930
Straight_Hair 42222 160377
Wavy_Hair 64744 137855
Wearing_Earrings 38276 164323
Wearing_Hat 9818 192781
Wearing_Lipstick 95715 106884
Wearing_Necklace 24913 177686
Wearing_Necktie 14732 187867
Young 156734 45865

不難發現有些屬性分佈很不均衡,達到了10:1的比例,而男女還是相對要均衡一些的,為84437:118162,可以提取出來作為性別識別的資料.

參考:

CelebA資料集詳細介紹及其屬性提取原始碼