Python機器學習預測蘋果酸甜度

阿新 • • 發佈：2021-06-21

一、選題背景

經常無法判斷哪個蘋果會比較酸或者甜，有的人喜歡甜，有的人喜歡酸，但是都是隻能運用鄉間辦法以及猜測，都屬於並不科學的辦法，所以想要用機器學習來預測蘋果的酸甜度。

二、機器學習案例設計方案

資料集來源：

資料來源於Kaggle，主要為開發商和資料科學家提供舉辦機器學習競賽、託管資料庫、編寫和分享程式碼的平臺。該平臺已經吸引了80萬名資料科學家的關注，這些使用者資源或許正是吸引谷歌的主要因素。

採用的機器學習框架描述：

卷積神經網路(Convolutional Neural Network, CNN)：卷積神經網路是深度學習技術中極具代表的網路結構之一，在影象處理領域取得了很大的成功，在國際標準的 ImageNet 資料集上，許多成功的模型都是基於 CNN 的。

Keras：Keras是一個模型級(model-level)的庫，為開發深度學習模型提供了高層次的構建模組。它不處理張量操作、求微積分等基礎的運算，而是依賴--個專門的、高度優化的張量庫來完成這些運算。這個張量庫就是Keras的後端引擎(backendengine)，如TensorFlow等。

TensorFlow ：TensorFlow 是一個開源機器學習框架，具有快速、靈活並適合產品級大規模應用等特點，讓每個開發者和研究者都能方便地使用人工智慧來解決多樣化的挑戰。

涉及到的技術難點：

容易測試中斷，需要多多嘗試

匯入所需的庫

 1 import os
 2 import random
 
 3 import itertools
 4 import numpy as np
 5 import pandas as pd
 6 import seaborn as sns
 7 import tensorflow as tf
 8 import matplotlib.pyplot as plt
 9 from tensorflow.keras import layers
10 from sklearn import preprocessing
11 from sklearn.decomposition import PCA
12 from sklearn import model_selection as ms
 
13 from sklearn.metrics import confusion_matrix
14 from tensorflow.keras.optimizers import Adamax
15 from tensorflow.keras.initializers import RandomNormal
16 from sklearn.linear_model import LogisticRegression
17 from sklearn.metrics import classification_report
18 from sklearn.model_selection import train_test_split
19 from sklearn.preprocessing import StandardScaler
20 from sklearn import metrics

檢視庫對應的版本

1 pip list

原始資料讀取

定義讀取txt檔案函式,對資料進行讀取

 1 def getInfo(samplePath):
 2     with open(samplePath, "r") as f: # 開啟檔案
 3         datas = f.readlines()
 4 
 5     totalList = [] # 定義空列表
 6     for data in datas:
 7         list1 = data.split("\t")
 8         list2 = []
 9         for i in list1:
10             list2.append(float(i)) 
11         totalList.append(list2) # 將讀取後的檔案放入列表
12     return totalList # 返回裝有資料的列表

將處特徵資料的txt檔案讀取後放至空列表中

 1 path0 = './res_data/Feature_0.txt'
 2 path1 = './res_data/Feature_1.txt'
 3 path2 = './res_data/Feature_2.txt'
 4 path3 = './res_data/Feature_3.txt'
 5 path4 = './res_data/Feature_4.txt'
 6 path5 = './res_data/Feature_5.txt'
 7 path6 = './res_data/Feature_6.txt'
 8 path7 = './res_data/Feature_7.txt'
 9 path8 = './res_data/Feature_8.txt'
10 path9= './res_data/Feature_9.txt'
11 path10 = './res_data/Feature_10.txt'
12 path11 = './res_data/Feature_11.txt'
13 path12 = './res_data/Feature_12.txt'
14 path13 = './res_data/Feature_13.txt'
15 path14 = './res_data/Feature_14.txt'
16 path15 = './res_data/Feature_15.txt'
17 path16 = './res_data/Feature_16.txt'
18 
19 totalList = []
20 data0 = getInfo(path0)
21 data1 = getInfo(path1)
22 data2 = getInfo(path2)
23 data3 = getInfo(path3)
24 data4 = getInfo(path4)
25 data5 = getInfo(path5)
26 data6 = getInfo(path6)
27 data7 = getInfo(path7)
28 data8 = getInfo(path8)
29 data9 = getInfo(path9)
30 data10 = getInfo(path10)
31 data11 = getInfo(path11)
32 data12 = getInfo(path12)
33 data13 = getInfo(path13)
34 data14 = getInfo(path14)
35 data15 = getInfo(path15)
36 data16 = getInfo(path16)
37 
38 for i in range(len(data1)):
39     totalList.append(data1[i] + data2[i] + data3[i] + data4[i] + data5[i] + data6[i] + data7[i] + data8[i] + data9[i] + data10[i] + data11[i] + data12[i] + data13[i] + data14[i] + data15[i] + data16[i])

讀取標籤資料集(已對資料進行了二分類標籤處理，0-酸蘋果，1-甜蘋果)

1 labelPath = './labels.txt'
2 
3 labels = open(labelPath, "r")
4 label = labels.readlines()
5 trueLabelList = []
6 for r in label:
7     labelList = []
8     labelList = r.split("\t")
9     trueLabelList.append(int(labelList[len(labelList)-1]))

檢視資料的數量

1 label0 = np.sum(np.array(trueLabelList) == 0)
2 label1 = np.sum(np.array(trueLabelList) == 1)
3 
4 print(label0)
5 print(label1)

資料集打亂及劃分

 1 totalArr = np.array(totalList, dtype=float) # 將資料轉為陣列格式
 2 trueLabelArr = np.array(trueLabelList, dtype=int)
 3 
 4 indx = [i for i in range(len(totalArr))]
 5 random.shuffle(indx)
 6 totalArr = totalArr[indx]
 7 trueLabelArr = trueLabelArr[indx]
 8 
 9 input_features = preprocessing.StandardScaler().fit_transform(totalArr) # 對資料進行標準化 
10 
11 train_x, test_x, train_y, test_y = ms.train_test_split(input_features, trueLabelArr, test_size=0.2) # 資料集劃分

 1 # 搭建網路模型  
 2 model= tf.keras.Sequential()
 3 model.add(layers.Dense(128,kernel_initializer=RandomNormal(mean=0.0,stddev=0.05,seed=None),input_dim=43620))
 4 
 5 # model.add(layers.Dropout(0.2))
 6 model.add(layers.Dense(256,activation='relu',use_bias=True))
 7 
 8 # model.add(layers.Dropout(0.3))
 9 model.add(layers.Dense(512,activation='relu',use_bias=True))
10 
11 # model.add(layers.Dropout(0.4))
12 model.add(layers.Dense(256, activation='relu', use_bias=True))
13 
14 # model.add(layers.Dropout(0.3))
15 model.add(layers.Dense(128, activation='relu', use_bias=True))
16 
17 # model.add(layers.Dropout(0.2))
18 
19 # model.add(layers.Dense(128,activation='relu',use_bias=True))
20 
21 # model.add(layers.Dropout(0.1))
22 model.add(layers.Dense(1,activation='sigmoid'))
23 
24 model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.0001),
25               loss = 'binary_crossentropy', metrics=['accuracy'])
26 
27 model.summary()

模型訓練過程

1 model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
2               loss = 'binary_crossentropy', metrics=['accuracy'])
3 history = model.fit(train_x, train_y,validation_split=0.1, epochs=100,batch_size=32, verbose=1)

1 # 使用predict方法進行預測
2 yPre = model.predict(test_x)

1 # 列印輸出BP神經網路的評估
2 print("BP神經網路準確值:{}".format(metrics.accuracy_score(test_y,yPre.round())))
3 print("BP神經網路精確度:{}".format(metrics.precision_score(test_y,yPre.round())))
4 print("BP神經網路召回率:{}".format(metrics.recall_score(test_y,yPre.round())))
5 print("BP神經網路的F1評分{}".format(metrics.f1_score(test_y,yPre.round())))

打印出其混淆矩陣

1 cMatrix = metrics.confusion_matrix(test_y, yPre.round(), normalize=None)
2 
3 pdMatrix = pd.DataFrame(cMatrix, index=["0","1"], columns=["0","1"])
4 plt.subplots(figsize=(8,8))
5 sns.heatmap(pdMatrix, annot=True, cbar=None, fmt='g')
6 plt.ylabel("True Class")
7 plt.xlabel("Predicted Class")
8 plt.show()

畫出訓練集和驗證集accuracy以及loss的走向

 1 acc = history.history['accuracy']
 2 val_acc = history.history['val_accuracy']
 3 loss = history.history['loss']
 4 val_loss = history.history['val_loss']
 5 
 6 epochs = range(len(acc))
 7 
 8 plt.subplots(figsize=(8,8))
 9 plt.plot(epochs, acc, 'b', label='Training accuracy')
10 plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
11 plt.title('Training and validation accuracy')
12 plt.legend()
13 
14 # plt.figure()
15 plt.subplots(figsize=(8,8))
16 plt.plot(epochs, loss, 'b', label='Training Loss')
17 plt.plot(epochs, val_loss, 'r', label='Validation Loss')
18 plt.title('Training and validation loss')
19 plt.legend()
20 
21 plt.show()

ROC曲線

 1 yPreProb = model.predict_proba(test_x)
 2 p1, p2, _ = metrics.roc_curve(test_y, yPreProb)
 3 PBAUC = metrics.roc_auc_score(test_y, yPreProb)
 4 
 5 plt.subplots(figsize=(5,5))
 6 plt.plot(p1, p2, color='r', label="Neural Network auc={:.3f}".format(PBAUC))
 7 plt.xlabel("False Positive Rate")
 8 plt.ylabel("True Positive Rate")
 9 plt.legend(loc=4)
10 plt.show()

Python機器學習預測蘋果酸甜度

一、選題背景

二、機器學習案例設計方案

匯入所需的庫

檢視庫對應的版本

原始資料讀取

將處特徵資料的txt檔案讀取後放至空列表中

讀取標籤資料集(已對資料進行了二分類標籤處理，0-酸蘋果，1-甜蘋果)

檢視資料的數量

資料集打亂及劃分

模型訓練過程

打印出其混淆矩陣

畫出訓練集和驗證集accuracy以及loss的走向

ROC曲線

Python機器學習預測蘋果酸甜度

Python機器學習預測分析核心演算法1

python機器學習實現決策樹

python機器學習庫xgboost的使用

使用python機器學習和深度學習的5個很棒的計算機視覺專案創意

python機器學習 | 入門介紹

Python機器學習演算法：線性迴歸

Python-機器學習基礎-K近鄰演算法

Python機器學習課程：線性迴歸演算法

python機器學習 | PCA降維演算法介紹及實現

Python機器學習筆記：奇異值分解（SVD）演算法

Python機器學習【一】 - Hello World

python機器學習，載入樣本集，對資料分類

python機器學習-特徵降維

python機器學習-特徵工程與資料預處理

python機器學習-資料集劃分

python機器學習-鳶尾花決策樹

python機器學習-KNN演算法

python機器學習-中文文字特徵提取

python機器學習-泰坦尼克號決策樹

Python機器學習預測蘋果酸甜度

一、選題背景

二、機器學習案例設計方案

匯入所需的庫

檢視庫對應的版本

原始資料讀取

將處特徵資料的txt檔案讀取後放至空列表中

讀取標籤資料集(已對資料進行了二分類標籤處理，0-酸蘋果，1-甜蘋果)

檢視資料的數量

資料集打亂及劃分

模型訓練過程

打印出其混淆矩陣

畫出訓練集和驗證集accuracy以及loss的走向

ROC曲線

相關推薦