Machine Learning Homework 3 - Image Sentiment Classification Kaggle題解報告（基於pytorch架構）

阿新 • • 發佈：2018-11-05

題目：

我們提供給各位的training dataset為兩萬八千張左右48x48 pixel的圖片，以及每一張圖片的表情label（注意：每張圖片都會唯一屬於一種表情）。總共有七種可能的表情（0：生氣, 1：厭惡, 2：恐懼, 3：高興, 4：難過, 5：驚訝, 6：中立(難以區分為前六種的表情))。

Testing data則是七千張左右48x48的圖片，希望各位同學能利用training dataset訓練一個CNN model，預測出每張圖片的表情label（同樣地，為0~6中的某一個）並存在csv檔中。

分析：

1.首先根據train.csv構造自己的資料集

由於每張圖片的feature位於一個單元格中，所以首先要用spilt()把數字字串資料轉換成48*48大小的數值型列表，此時要注意數值型別必須為float型別

import pandas as pd
data=pd.read_csv(‘data/train.csv’)
image=data.iloc[idx,1].split()
image=list(map(float,image))
image=np.array(image).reshape(-1,48)

image有了，接下來就是儲存lable了，lable是0-6的int型整數.

lable=np.array(self.data.iloc[idx,0])

所以CNN的輸出層必然為7層，輸入層只有1個channel.

資料準備完畢，便要繼承Dataset實現自己的資料集類，使得pytorch能夠自動對資料集batch,shuffle,transform等操作，不需要自己寫程式碼完成這些功能

class ISCDataset(Dataset):
    def __init__(self,csv_file,transform=None):
        self.data=pd.read_csv(csv_file)
        self.transform=transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        image=self.data.iloc[idx,1].split()
        image=list(map(float,image))
        image=np.array(image).reshape(-1,48)
        lable=np.array(self.data.iloc[idx,0])

        sample={'image':image,'lable':lable}
        if self.transform:
            sample = self.transform(sample)

        return sample

2.構造自己的CNN網路

這一部分，仁者見仁智者見智，不同的人設計的層數，引數各有不同，只要網路結構不報錯，剩下的便是調參了。值得一提的是，剛開始我的lr設的比較大，結果就是所有圖片的預測結果都一樣，後來print每次的預測情況才發現----執行幾次batch之後，預測結果便不在是數字了，變成了nan或inf，這顯然是因為預測值太大造成的，所以就要把lr設的足夠小才行。另外由於分類個數（7類）比較多，剛開始正確率很低是極其正常的，讓它多迴圈幾次準確率就會不斷上升。

class ConvNet(torch.nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1=torch.nn.Conv2d(1,6,3,1)
        self.pooling1=torch.nn.MaxPool2d(2)
        self.conv2=torch.nn.Conv2d(6,1,3,1)
        self.fc1=torch.nn.Linear(21*21,100)
        self.fc2=torch.nn.Linear(100,10)
        self.fc3=torch.nn.Linear(10,10)
        self.fc4=torch.nn.Linear(10,7)

    def forward(self, x):
        x=F.relu(self.conv1(x.float()))
        x=self.pooling1(x)
        x=F.relu(self.conv2(x))
        x=x.view(-1,1*21*21)
        x=F.relu(self.fc1(x))
        x=F.relu(self.fc2(x))
        x=F.relu(self.fc3(x))
        x=self.fc4(x)
        #x=F.softmax(x)
        return x

另外：由於原始資料未提供測試集，所以要從訓練集中選擇一部分做驗證集，注意這一部分不是在構造資料集時做的，而是在跑模型時，令某幾個batch不再更新引數變為驗證集，然後對這部分測試正確率。這樣做的好處是驗證集都是shuffle=True來的，也就很隨機，可信度更高。

if i_batch<=6000:
           
            prediction=net(item['image'])
   
            loss=loss_fn(prediction,item['lable'])

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
else:
            #print(len(item['lable']))
            with torch.no_grad():
                num_verificationData+=len(item['lable'])
                veri_prediction=net(item['image'])
                
                veri_prediction=torch.argmax(veri_prediction,dim=1).numpy()
               
                for j in range(len(item['lable'])):
                    if veri_prediction[j]==item['lable'].numpy()[j]:
                        num_right+=1

3.儲存對test.csv的測試結果到sample.csv

通過分析兩個csv檔案發現，只需要按順序（所以載入test.csv時shuffle值必須為False）把test.csv預測結果按順序寫到sample.csv中即可（即更新csv中的某個cell）

#讀入test.csv
test_dataset=ISCDataset('data/test.csv',transform=transforms.Compose([ToTensor()]))
testloader=DataLoader(test_dataset,batch_size=4,shuffle=False)
 #讀入sample.csv
sample=pd.read_csv('data/sample.csv')
for i_batch,item in enumerate(testloader):
    prediction=net(item['image'])
    prediction=torch.argmax(prediction,dim=1).numpy()
    for i in range(len(prediction)):
        sample.iloc[item['lable'].numpy()[i],1]=prediction[i]
#sample.iloc[0,1]=10
#print(sample,type(sample))
sample.to_csv('data/sample.csv',index=False)

4.整個專案程式碼

可執行完整程式碼已放到github上，歡迎大家fork,star,watch甚至issue

https://github.com/zc-authorization/Image_Sentiment_Classification

另: 有問題大家可以在下面評論，看到都會回覆

Machine Learning Homework 3 - Image Sentiment Classification Kaggle題解報告（基於pytorch架構）

題目：

分析：

4.整個專案程式碼

Machine Learning Homework 3 - Image Sentiment Classification Kaggle題解報告（基於pytorch架構）

Image Sentiment Classification 題解報告（基於pytorch架構）

Linux實戰第八篇：CentOS7.3下Nginx虛擬主機配置實戰（基於端口）

Machine Learning - week 3 - Overfitting

Machine Learning Week 3-advanced-optimization

Machine learning for improved image-based wavefront sensing

Machine Learning Yearning(3、4)

CSE 6363 - Machine Learning Homework MLE, MAP, and Basic Supervised Learning

Machine Learning week 3 quiz : Regularization

Machine Learning On Spark——第一節：基礎資料結構（一)

Machine Learning第八講【非監督學習】-- （四）PCA應用

Machine Learning第八講【非監督學習】--（三）主成分分析（PCA）

Machine Learning第八講【非監督學習】-- （二）動因

Machine Learning第六講[應用機器學習的建議] --（二）診斷偏差和方差

Machine Learning第六講[應用機器學習的建議] --（三）建立一個垃圾郵件分類器

Linux實戰第五篇：RHEL7.3下Nginx虛擬主機配置實戰（基於別名）

python 3.x 分析日誌的模塊（正則匹配）

響應式編程庫Reactor 3 Reference Guide參考文檔中文版（v3.2.0）

自學Aruba7.3-Aruba安全認證-802.1x認證（web頁面配置）

3分鐘實現iOS語言本地化/國際化（圖文詳解）

Machine Learning Homework 3 - Image Sentiment Classification Kaggle題解報告（基於pytorch架構）

題目：

分析：

4.整個專案程式碼

相關推薦