1. 程式人生 > >Design and Run your First Experiment in Weka

Design and Run your First Experiment in Weka

Weka is the perfect platform for learning machine learning. It provides a graphical user interface for exploring and experimenting with machine learning algorithms on datasets, without you having to worry about the mathematics or the programming.

A powerful feature of Weka is the Weka Experimenter interface. Unlike the Weka Explorer that is for filtering data and trying out different algorithms, the Experimenter is for designing and running experiments. The experimental results it produces are robust and are good enough to be published (if you know what you are doing).

In this post you will discover the power of the Weka Experimenter. If you follow along the step-by-step instructions, you will design an run your first machine learning experiment in under five minutes.

First Experiment

First Experiment
Photo by mhofstrand, some rights reserved

1. Download and Install Weka

Visit the

Weka Download page and locate a version of Weka suitable for your computer (Windows, Mac or Linux).

Weka requires Java. You may already have Java installed and if not, there are versions of Weka listed on the download page (for Windows) that include Java and will install it for you. I’m on a Mac myself, and like everything else on Mac, Weka just works out of the box.

If you are interested in machine learning, then I know you can figure out how to download and install software into your own computer.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

2. Start Weka

Start Weka. This may involve finding it in program launcher or double clicking on the weka.jar file. This will start the Weka GUI Chooser.

Weka GUI Chooser

Weka GUI Chooser

The Weka GUI Chooser lets you choose one of the Explorer, Experimenter, KnowledgeExplorer and the Simple CLI (command line interface).

Click the “Experimenter” button to launch the Weka Experimenter.

The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze the results. It’s a powerful tool.

3. Design Experiment

Click the “New” button to create a new experiment configuration.

Weka Experimenter

Weka Experimenter
Start a new Experiment

Test Options

The experimenter configures the test options for you with sensible defaults. The experiment is configured to use Cross Validation with 10 folds. It is a “Classification” type problem and each algorithm + dataset combination is run 10 times (iteration control).

Iris flower Dataset

Let’s start out by selecting the dataset.

  1. In the “Datasets” select click the “Add new…” button.
  2. Open the “data“directory and choose the “iris.arff” dataset.

The Iris flower dataset is a famous dataset from statistics and is heavily borrowed by researchers in machine learning. It contains 150 instances (rows) and 4 attributes (columns) and a class attribute for the species of iris flower (one of setosa, versicolor, virginica). You can read more about Iris flower dataset on Wikipedia.

Let’s choose 3 algorithms to run our dataset.

ZeroR

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “ZeroR” under the “rules” selection.

ZeroR is the simplest algorithm we can run. It picks the class value that is the majority in the dataset and gives that for all predictions. Given that all three class values have an equal share (50 instances), it picks the first class value “setosa” and gives that as the answer for all predictions. Just off the top of our head, we know that the best result ZeroR can give is 33.33% (50/150). This is good to have as a baseline that we demand algorithms to outperform.

OneR

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “OneR” under the “rules” selection.

OneR is like our second simplest algorithm. It picks one attribute that best correlates with the class value and splits it up to get the best prediction accuracy it can. Like the ZeroR algorithm, the algorithm is so simple that you could implement it by hand and we would expect that more sophisticated algorithms out perform it.

J48

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “J48” under the “trees” selection.

J48 is decision tree algorithm. It is an implementation of the C4.8 algorithm in Java (“J” for Java and 48 for C4.8). The C4.8 algorithm is a minor extension to the famous C4.5 algorithm and is a very powerful prediction algorithm.

Weka Experimenter

Weka Experimenter
Configure the experiment

We are ready to run our experiment.

4. Run Experiment

Click the “Run” tab at the top of the screen.

This tab is the control panel for running the currently configured experiment.

Click the big “Start” button to start the experiment and watch the “Log” and “Status” sections to keep an eye on how it is doing.

Weka Experimenter

Weka Experimenter
Run the experiment

Given that the dataset is small and the algorithms are fast, the experiment should complete in seconds.

5. Review Results

Click the “Analyse” tab at the top of the screen.

This will open up the experiment results analysis panel.

Weka Experimenter

Weka Experimenter
Load the experiment results

Click the “Experiment” button in the “Source” section to load the results from the current experiment.

Algorithm Rank

The first thing we want to know is which algorithm was the best. We can do that by ranking the algorithms by the number of times a given algorithm beat the other algorithms.

  1. Click the “Select” button for the “Test base” and choose “Ranking“.
  2. Now Click the “Perform test” button.
Weka Experimenter

Weka Experimenter
Rank the algorithms in the experiment results

The ranking table shows the number of statistically significant wins each algorithm has had against all other algorithms on the dataset. A win, means an accuracy that is better than the accuracy of another algorithm and that the difference was statistically significant.

We can see that both J48 and OneR have one win each and that ZeroR has two losses. This is good, it means that OneR and J48 are both potentially contenders outperforming out baseline of ZeroR.

Algorithm Accuracy

Next we want to know what scores the algorithms achieved.

  1. Click the “Select” button for the “Test base” and choose the “ZeroR” algorithm in the list and click the “Select” button.
  2. Click the check-box next to “Show std. deviations“.
  3. Now click the “Perform test” button.
Weka Experimenter

Weka Experimenter
Algorithm accuracy compared to ZeroR

In the “Test output” we can see a table with the results for 3 algorithms. Each algorithm was run 10 times on the dataset and the accuracy reported is the mean and the standard deviation in rackets of those 10 runs.

We can see that both the OneR and J48 algorithms have a little “v” next to their results. This means that the difference in the accuracy for these algorithms compared to ZeroR is statistically significant. We can also see that the accuracy for these algorithms compared to ZeroR is high, so we can say that these two algorithms achieved a statistically significantly better result than the ZeroR baseline.

The score for J48 is higher than the score for OneR, so next we want to see if the difference between these two accuracy scores is significant.

  1. Click the “Select” button for the “Test base” and choose the “J48” algorithm in the list and click the “Select” button.
  2. Now click the “Perform test” button.
Weka Experimenter

Weka Experimenter
Algorithm accuracy compared to J48

We can see that the ZeroR has a “*” next to its results, indicating that its results compared to the J48 are statistically different. But we already knew this. We do not see a “*” next to the results for the OneR algorithm. This tells us that although the mean accuracy between J48 and OneR is different, the differences is not statistically significant.

All things being equal, we would choose the OneR algorithm to make predictions on this problem because it is the simpler of the two algorithms.

If we wanted to report the results, we would say that the OneR algorithm achieved a classification accuracy of 92.53% (+/- 5.47%) which is statistically significantly better than ZeroR at 33.33% (+/- 5.47%).

Summary

You discovered how to configure a machine learning experiment with one dataset and three algorithms in Weka. You also learned about how to analyse the results from an experiment and the importance of statistical significance when interpreting results.

You now have the skill to design and run experiments with any algorithms provided by Weka on datasets of your choosing and meaningfully and confidently report results that you achieve.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

Design and Run your First Experiment in Weka

Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro

How to Run Your First Classifier in Weka

Tweet Share Share Google Plus Weka makes learning applied machine learning easy, efficient, and

Setting up Storm and Running Your First Topology

oms yam org scratch connect tput mode don cep http://www.haroldnguyen.com/blog/2015/01/setting-up-storm-and-running-your-first-topology/

Your First Steps in a Front-end Developer Career

There Is No RecipeThe thing is — there is no just simple recipe. If there is, I guess it would sound like that — “practice, practice, practice”.It’s not a

Ask HN: What was the most valuable and/or your favorite course in college?

Preferably aside from stuff that most CS majors would take (saying this selfishly as I'm trying to crowdsource the remainder of my curriculum)

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

[leetcode][34] Find First and Last Position of Element in Sorted Array

數組 -o tin num ive new algorithm target gre 34. Find First and Last Position of Element in Sorted Array Given an array of integers nums so

[LeetCode] 34. Search for a Range 搜索一個範圍(Find First and Last Position of Element in Sorted Array)

begin tro value 復雜 targe || art length controls 原題目:Search for a Range, 現在題目改為: 34. Find First and Last Position of Element in Sorted Arr

【LeetCode】【找元素】Find First and Last Position of Element in Sorted Array

com pub bsp starting tin example pan ray 範圍 描述: Given an array of integers nums sorted in ascending order, find the starting and ending p

[Swift]LeetCode34. 在排序陣列中查詢元素的第一個和最後一個位置 | Find First and Last Position of Element in Sorted Array

Given an array of integers nums sorted in ascending order, find the starting and ending position of a given target value. Your algorit

[Swift]LeetCode34. 在排序數組中查找元素的第一個和最後一個位置 | Find First and Last Position of Element in Sorted Array

earch ast 繼續 pri rst not 找到 fin 存在 Given an array of integers nums sorted in ascending order, find the starting and ending position of a

演算法42--Find First and Last Position of Element in Sorted Array

Given an array of integers nums sorted in ascending order, find the starting and ending position of a given targetvalue. Your algorit

LeetCode34:Find First and Last Position of Element in Sorted Array(二分法)

Given an array of integers nums sorted in ascending order, find the starting and ending position of a given target value. Your a

[leetcode]34. Find First and Last Position of Element in Sorted Array

自己寫的,雖然過了,但是其實是有問題的 class Solution { public int[] searchRange(int[] nums, int target) { if(nums==null||nums.length==0)retur

【LeetCode】34. Find First and Last Position of Element in Sorted Array(C++)

地址:https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/ 題目: Given an array of integers nums sorted in ascend

python leetcode 34. Find First and Last Position of Element in Sorted Array

二分定位,再前後遍歷 class Solution: def searchRange(self, nums, target): """ :type nums: List[int] :type target: int

The Top 10 Tips to Make Your First High Speed PCB Design Project a Success

It wasn’t that long ago when the word “high speed” didn’t exist in the vocabulary of PCB designers. But these days, it seems to be

Leet34. 在排序陣列中查詢元素的第一個和最後一個位置(Find First and Last Position of Element in Sorted Array)

class Solution { public static int[] searchRange(int[] nums, int target) { int res[]= {-1,-1}; if(nums.length=

34. Find First and Last Position of Element in Sorted Array

題意: 一個有序陣列中有重複元素,返回第一個和最後一個target的下標。要求O(logN)。 思路: 沒什麼好說的,還是二分法。 vector<int> searchRange(vector<int>& nums, int ta

leetcode-34-Find First and Last Position of Element in Sorted Array

The problem is not too difficult than previous one, for the leftmost index, assign right = mid - 1 when target == nums[mid] in order to let the