Accord.NET_Naive Bayes Classifier

阿新 • • 發佈：2017-07-27

reg 關系接下來 k&r als sym apply 答案 arr

我們這個系列主要為了了解並會使用Accord.NET中機器學習有關算法，因此主要關註的是算法針對的的問題，算法的使用。所以主要以代碼為主，通過代碼來學習，在腦海中形成一個輪廓。下面就言歸正傳，開始貝葉斯分類器的學習。

樸素貝葉斯分類器，一個基於貝葉斯理論的簡單概率分類器。簡單的說，貝葉斯理論是獨立特征模型，也就是說一個類別的指定特征的表現與否，跟其他任何特征無關。

TestCase1

著名的打網球實驗(Tom Mitchell (1998))。實驗中，基於四個條件，推測某人是否想去打網球。這些條件變量都是可分類的，即各變量的可取值之間沒有關系

首先需要將問題的表現形式簡化。通過Accord.Statistics.Filters.Codification，將問題轉為用數字表示的codebook，比如Sunny轉為0，Overcast為1，Rain為2。以此類推，得到訓練用的輸入輸出。

接下來應該訓練貝葉斯模型，用來預測最後一列，是否打網球。這裏使用“Outlook”，“Temperature”，“Humidity”，“Wind”作為條件，預測是否打網球，四個輸入一個輸出。由於輸入條件都是可分類的，在創建貝葉斯模型時應該指定每個變量的取值有幾種可能，如果訓練集的輸入已經覆蓋了每個變量的所有的情況，可以不創建模型，本例就是如此，因為算法的Learn函數會檢查模型是否為空，空的情況下會根據輸入輸出創建。

得到分類器後，使用Decide方法根據輸入計算輸出。

接下來看代碼

public 
 void ComputeTest()
{
    #region doc_mitchell
    DataTable data = new DataTable("Mitchell‘s Tennis Example");

    data.Columns.Add("Day", "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");

    data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");
    data.Rows.Add("D2", "Sunny 
", "Hot", "High", "Strong", "No");
    data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes");
    data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");
    data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes");
    data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");
    data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");
    data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No");
    data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes");
    data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes");
    data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes");
    data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes");
    data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes");
    data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");
    #endregion

    #region doc_codebook
    // 創建codification codebook
    // 把字符串變量轉為獨立的符號變量
    Codification codebook = new Codification(data,
        "Outlook", "Temperature", "Humidity", "Wind", "PlayTennis");

    // 提取出輸入輸出對作為訓練集
    DataTable symbols = codebook.Apply(data);
    int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind");
    int[] outputs = symbols.ToArray<int>("PlayTennis");
    #endregion

    #region doc_learn
    // 創建一個貝葉斯算法實例
    var learner = new NaiveBayesLearning();

    // 用訓練集學習一個貝葉斯模型
    NaiveBayes nb = learner.Learn(inputs, outputs);
    #endregion


    #region doc_test
    // 測試一組數據，在sunny，cool，humid，windy的條件下，某人是否會打網球
    // 先將條件通過codebook編碼為符號
    int[] instance = codebook.Translate("Sunny", "Cool", "High", "Strong");

    // 獲得數字輸出表示的答案
    int c = nb.Decide(instance); // answer will be 0

    // 將數字輸出的答案通過codebook轉為實際的"Yes"/"No"
    string result = codebook.Translate("PlayTennis", c); // 答案是"No"

    // 還可以提取每種答案的概率
    double[] probs = nb.Probabilities(instance); // { 0.795, 0.205 }
    #endregion

    Assert.AreEqual("No", result);
    Assert.AreEqual(0, c);
    Assert.AreEqual(0.795, probs[0], 1e-3);
    Assert.AreEqual(0.205, probs[1], 1e-3);
    Assert.AreEqual(1, probs.Sum(), 1e-10);
    Assert.IsFalse(double.IsNaN(probs[0]));
    Assert.AreEqual(2, probs.Length);
}

TestCase2

下面的例子針對離散模型設置了更具體的學習參數。

public void laplace_smoothing_missing_sample()
{
    #region doc_laplace
    // Laplace rule針對當某個輸入符號的某個類別不在訓練集中時
    // 比如本例中輸入的第二列應包含0，1，2三個值
    // 但實際的例子中只有1，2兩種情況

    int[][] inputs =
    {
        //      輸入         輸出
        new [] { 0, 1 }, //  0 
        new [] { 0, 2 }, //  0
        new [] { 0, 1 }, //  0
        new [] { 1, 2 }, //  1
        new [] { 0, 2 }, //  1
        new [] { 0, 2 }, //  1
        new [] { 1, 1 }, //  2
        new [] { 0, 1 }, //  2
        new [] { 1, 1 }, //  2
    };

    int[] outputs = // 對應的分類
    {
        0, 0, 0, 1, 1, 1, 2, 2, 2, 
    };

    // 由於訓練集沒有覆蓋實際期望的所有情況Since the data is not enough to determine which symbols we are
    // 所以需要指定貝葉斯模型
    // 第一個輸入有兩種情況，第二個輸入有三種情況
    var bayes = new NaiveBayes(classes: 3, symbols: new[] { 2, 3 });

    // 創建學習算法時指定模型
    var learning = new NaiveBayesLearning()
    {
        Model = bayes
    };

    // 使用Laplace rule
    learning.Options.InnerOption.UseLaplaceRule = true;

    // 訓練貝葉斯模型
    learning.Learn(inputs, outputs);

    // 第二個輸入為0來預測分類結果
    int answer = bayes.Decide(new int[] { 0, 0 });
    #endregion

    Assert.AreEqual(0, answer);

    double prob = bayes.Probability(new int[] { 0, 0 }, out answer);
    Assert.AreEqual(0, answer);
    //Assert.AreEqual(0.52173913043478259, prob, 1e-10);
    Assert.AreEqual(0.44444444444444453, prob, 1e-10);
    
    double error = new ZeroOneLoss(outputs)
    {
        Mean = true
    }.Loss(bayes.Decide(inputs));

    Assert.AreEqual(2 / 9.0, error);
}

TestCase3

下面的例子創建了一個多類別的分類器，使用整數輸入並創建離散的貝葉斯模型。

public void ComputeTest3()
{
    #region doc_multiclass
    // 將下列數據分成三類//
    int[][] inputs =
    {
        //               輸入         輸出
        new int[] { 0, 1, 1, 0 }, //  0 
        new int[] { 0, 1, 0, 0 }, //  0
        new int[] { 0, 0, 1, 0 }, //  0
        new int[] { 0, 1, 1, 0 }, //  0
        new int[] { 0, 1, 0, 0 }, //  0
        new int[] { 1, 0, 0, 0 }, //  1
        new int[] { 1, 0, 0, 0 }, //  1
        new int[] { 1, 0, 0, 1 }, //  1
        new int[] { 0, 0, 0, 1 }, //  1
        new int[] { 0, 0, 0, 1 }, //  1
        new int[] { 1, 1, 1, 1 }, //  2
        new int[] { 1, 0, 1, 1 }, //  2
        new int[] { 1, 1, 0, 1 }, //  2
        new int[] { 0, 1, 1, 1 }, //  2
        new int[] { 1, 1, 1, 1 }, //  2
    };

    int[] outputs = // 對應的輸出類別
    {
        0, 0, 0, 0, 0,
        1, 1, 1, 1, 1,
        2, 2, 2, 2, 2,
    };

    // 創建算法
    var learner = new NaiveBayesLearning();

    // 訓練模型
    NaiveBayes nb = learner.Learn(inputs, outputs);

    // 使用第一個樣本測試
    int answer = nb.Decide(new int[] { 0, 1, 1, 0 }); // should be 1
    #endregion

    double error = new ZeroOneLoss(outputs).Loss(nb.Decide(inputs));
    Assert.AreEqual(0, error);

    for (int i = 0; i < inputs.Length; i++)
    {
        error = nb.Compute(inputs[i]);
        double expected = outputs[i];
        Assert.AreEqual(expected, error);
    }
}

TestCase4

下面的例子使用了高斯模型，並且展示如何設置更加具體的學習參數。

public void learn_test()
{
    #region doc_learn
    // 將下面的輸入分成三類    double[][] inputs =
    {
        //               輸入           輸出
        new double[] { 0, 1, 1, 0 }, //  0 
        new double[] { 0, 1, 0, 0 }, //  0
        new double[] { 0, 0, 1, 0 }, //  0
        new double[] { 0, 1, 1, 0 }, //  0
        new double[] { 0, 1, 0, 0 }, //  0
        new double[] { 1, 0, 0, 0 }, //  1
        new double[] { 1, 0, 0, 0 }, //  1
        new double[] { 1, 0, 0, 1 }, //  1
        new double[] { 0, 0, 0, 1 }, //  1
        new double[] { 0, 0, 0, 1 }, //  1
        new double[] { 1, 1, 1, 1 }, //  2
        new double[] { 1, 0, 1, 1 }, //  2
        new double[] { 1, 1, 0, 1 }, //  2
        new double[] { 0, 1, 1, 1 }, //  2
        new double[] { 1, 1, 1, 1 }, //  2
    };

    int[] outputs = // 對應輸出的類別
    {
        0, 0, 0, 0, 0,
        1, 1, 1, 1, 1,
        2, 2, 2, 2, 2,
    };

    // 高斯模型

    var teacher = new NaiveBayesLearning<NormalDistribution>();

    // component distributions
    teacher.Options.InnerOption = new NormalOptions
    {
        Regularization = 1e-5 // 避免0變異
    };

    // 訓練模型
    NaiveBayes<NormalDistribution> bayes = teacher.Learn(inputs, outputs);

    // 預測輸出
    int[] predicted = bayes.Decide(inputs);

    // 預估模型誤差，應為0
    double error = new ZeroOneLoss(outputs).Loss(predicted);

    // 預測指定輸入
    int answer = bayes.Decide(new double[] { 1, 0, 0, 1 }); // 應為1
    #endregion

    Assert.AreEqual(0, error);
    Assert.AreEqual(1, answer);
    Assert.IsTrue(predicted.IsEqual(outputs));
}

Accord.NET_Naive Bayes Classifier

reg 關系接下來 k&r als sym apply 答案 arr 我們這個系列主要為了了解並會使用Accord.NET中機器學習有關算法，因此主要關註的是算法針對的的問題，算法的使用。所以主要以代碼為主，通過代碼來學習，在腦海中形成一個輪廓。下面就言歸正傳，開始貝葉斯

Accord.NET_Naive Bayes Classifier

Accord.NET_Naive Bayes Classifier

樸素貝葉斯分類器的應用 Naive Bayes classifier

naive bayes classifier in data mining

機器學習---樸素貝葉斯分類器（Machine Learning Naive Bayes Classifier）

Naive Bayes Classifier in OpenNLP

Naive Bayes Classifier From Scratch in Python

Developing a Naive Bayes Text Classifier in JAVA

利用樸素貝葉斯（Navie Bayes）進行垃圾郵件分類

Accord.NET入門

學習筆記TF036:實現Bidirectional LSTM Classifier

Maven 的dependency 的 classifier的作用

【黎明傳數==>機器學習速成寶典】模型篇05——樸素貝葉斯【Naive Bayes】（附python代碼）

【Spark MLlib速成寶典】模型篇04樸素貝葉斯【Naive Bayes】（Python版）

M03 利用Accord 進行機器學習的第一個小例子

機器學習實戰三（Naive Bayes）

一張圖，關於 Bayes error rate，貝葉斯錯誤率等的分析

機器學習分類實例——SVM(修改)/Decision Tree/Naive Bayes

BAYES和樸素BAYES

Bayes' theorem (貝葉斯定理)

基於Naive Bayes算法的文本分類

Accord.NET_Naive Bayes Classifier

相關推薦