1. 程式人生 > >Machine Learning in a Year

Machine Learning in a Year

First intro: Hacker News and Udacity

My interest in ml stems back to 2014 when I started reading articles about it on Hacker News. I simply found the idea of teaching machines stuff by looking at data appealing. At the time I wasn’t even a professional developer, but a hobby coder who’d done a couple of small projects.

So I began watching the first few chapters of Udacity’s Supervised Learning course, while also reading all articles I came across on the subject.

This gave me a little bit of conceptual understanding, though no practical skills. I also didn’t finish it, as I rarely do with MOOC’s.

Failing Coursera’s ML Course

In January 2015 I joined the Founders and Coders (FAC) bootcamp in London in order to become a developer. A few weeks in, I wanted to learn how to actually code machine learning algorithms, so I started a study group with a few of my peers. Every Tuesday evening, we’d watch lectures from Coursera’s Machine Learning

course.

It’s a fantastic course, and I learned a hell of a lot. But it’s tough for a beginner. I had to watch the lectures over and over again before grasping the concepts. The Octave coding task are challenging as well, especially if you don’t know Octave. As a result of the difficulty, one by one fell off the study group as the weeks passed. Eventually, I fell off it myself as well.

In hindsight, I should have started with a course that either used ml libraries for the coding tasks — as opposed to building the algorithms from scratch — or at least used a programming language I knew.

Learning a new language while also trying to code ml algorithms is too hard for a newbie.

If I could go back in time, I’d choose Udacity’s Intro to Machine Learning, as it’s easier and uses Python and Scikit Learn. This way, we would have gotten our hands dirty as soon as possible, gained confidence, and had more fun.

Lesson learned:Start with something easy and practical rather than difficult and theoretical.

Machine Learning in a Week

One of the last things I did at FAC was the ml week stunt. My goal was to be able to apply machine learning to actual problems at the end of the week, which I managed to do.

Throughout the week I did the following:

  • got to know Scikit Learn
  • tried ml on a real world dataset
  • coded a linear regression algorithm from scratch (in Python)
  • did a tiny bit of nlp

It’s by far the steepest ml learning curve I’ve ever experienced. Go ahead and read the article if you want a more detailed overview.

Lesson learned: Setting off a week solely to immerse yourself into a new subject is extremely effective.

Failing neural networks

After I finished FAC in London and moved back to Norway, I tried to repeat the success from the ml week, but for neural networks instead.

This failed.

There were simply too many distractions to spend 10 hours of coding and learning every day. I had underestimated how important it was to be surrounded by peers at FAC.

Lesson learned: Find an thriving environment to surround yourself with when doing these kinds of learning stunts.

However, I got started with neural nets at least, and slowly started to grasp the concept. By July I managed to code my first net. It’s probably the crappiest implementation ever created, and I actually find it embarrassing to show off. But it did the trick; I proved to myself that I understood concepts like backpropagation and gradient descent.

In the second half of the year, my progression slowed down, as I started a new job. The most important takeaway from this period was the leap from non-vectorized to vectorized implementations of neural networks, which involved repeating linear algebra from university.

By the end of the year I wrote an article as a summary of how I learned this:

Testing out Kaggle Contests

During the christmas vacation of 2015, I got a motivational boost again and decided try out Kaggle. So I spent quite some time experimenting with various algorithms for their Homesite Quote Conversion, Otto Group Product Classification and Bike Sharing Demand contests.

Kaggle is a fun way to experiment with datasets and get feedback on your performance.

The main takeaway from this was the experience of iteratively improving the results by experimenting with the algorithms and the data.

I learned to trust my logic when doing machine learning.

If tweaking a parameter or engineering a new feature seems like a good idea logically, it’s quite likely that it actually will help.

Setting up a learning routine at work

Back at work in January 2016 I wanted to continue in the flow I’d gotten into during Christmas. So I asked my manager if I could spend some time learning stuff during my work hours as well, which he happily approved.

Having gotten a basic understanding of neural networks at this point, I wanted to move on to deep learning.

Udacity’s Deep Learning

My first attempt was Udacity’s Deep Learning course, which ended up as a big disappointment. The contents of the video lectures are good, but they are too short and shallow to me.

And the IPython Notebook assignments ended up being too frustrating, as I spent most of my time debugging code errors, which is the most effective way to kill motivation. So after doing that for a couple of sessions at work, I simply gave up.

To their defense, I’m a total noob when it comes to IPython Notebooks, so it might not be as bad for you as it was for me. So it might be that I simply wasn’t ready for the course.

Stanford — Deep Learning for NLP

Luckily, I then discovered Stanford’s CS224D and decided to give it a shot. It is a fantastic course. And though it’s difficult, I never end up debugging when doing the problem sets.

Secondly, they actually give you the solution code as well, which I often look at when I’m stuck, so that I can work my way backwards to understand the steps needed to reach a solution.

Though I’ve haven’t finished it yet, it has significantly boosted my knowledge in nlp and neural networks so far.

However it’s been tough. Really tough. At one point, I realized I needed help from someone better than me, so I came in touch with a Ph.D student who was willing to help me out for 40 USD per hour, both with the problem sets as well as the overall understanding. This has been critical for me in order to move on, as he has uncovered a lot of black holes in my knowledge.

Lesson learned: It’s possible to get a good machine learning teacher for around 50 USD per hour. If you can afford it, it’s definitely worth it.

In addition to this, Xeneta also hired a data scientist recently. He’s got a masters degree in math, so I often ask him for help when I’m stuck with various linear algebra an calculus tasks, or ml in general. So be sure to check out which resources you have internally in your company as well.

Boosting Sales at Xeneta

After doing all this, I finally felt ready to do a ml project at work. It basically involved training an algorithm to qualify sales leads by reading company descriptions, and has actually proven to be a big time saver for the sales guys using the tool.

Check out out article I wrote about it below or head over to GitHub to dive straight into the code.

Getting to this point has surely been a long journey. But also a fast one; when I started my machine learning in a week project, I certainly didn’t have any hopes of actually using it professionally within a year.

But it’s 100 percent possible. And if I can do it, so can anybody else.

相關推薦

Machine Learning in a Year

First intro: Hacker News and UdacityMy interest in ml stems back to 2014 when I started reading articles about it on Hacker News. I simply found the idea o

Machine Learning in a Box (Week 8): SAP HANA EML and TensorFlow Integration

Last time, we looked at how to use Jupyter notebooks to run kernels in Python and R and even SQL. I'll be honest, Jupyter is now my new "go to" tool. I eve

Machine Learning in a Week

Getting into machine learning (ml) can seem like an unachievable task from the outside.And it definitely can be, if you attack it from the wrong end.Howeve

Machine Learning In Netflix, A Visionary Video Streaming App Development Approach

Updating an application is significantly important for both software developers and users. Paying attention to the tips overviewed in this article can help

Machine Learning in Action-chapter2-k近鄰算法

turn fma 全部 pytho label -c log eps 數組 一.numpy()函數 1.shape[]讀取矩陣的長度 例: import numpy as np x = np.array([[1,2],[2,3],[3,4]]) print x

<Machine Learning in Action >之二 樸素貝葉斯 C#實現文章分類

options 直升機 water 飛機 math mes 視頻 write mod def trainNB0(trainMatrix,trainCategory): numTrainDocs = len(trainMatrix) numWords =

[Javascript] Classify text into categories with machine learning in Natural

bus easy ann etc hms scrip steps spam not In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression cl

[Javascript] Classify JSON text data with machine learning in Natural

comm about cnblogs ++ get ssi learn clas save In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regressi

斯坦福大學公開課機器學習: advice for applying machine learning - evaluatin a phpothesis(怎麽評估學習算法得到的假設以及如何防止過擬合或欠擬合)

class 中一 技術分享 cnblogs 訓練數據 是否 多個 期望 部分 怎樣評價我們的學習算法得到的假設以及如何防止過擬合和欠擬合的問題。 當我們確定學習算法的參數時,我們考慮的是選擇參數來使訓練誤差最小化。有人認為,得到一個很小的訓練誤差一定是一件好事。但其實,僅

機器學習實戰(Machine Learning in Action)學習筆記————02.k-鄰近演算法(KNN)

機器學習實戰(Machine Learning in Action)學習筆記————02.k-鄰近演算法(KNN)關鍵字:鄰近演算法(kNN: k Nearest Neighbors)、python、原始碼解析、測試作者:米倉山下時間:2018-10-21機器學習實戰(Machine Learning in

機器學習實戰(Machine Learning in Action)學習筆記————05.Logistic迴歸

機器學習實戰(Machine Learning in Action)學習筆記————05.Logistic迴歸關鍵字:Logistic迴歸、python、原始碼解析、測試作者:米倉山下時間:2018-10-26機器學習實戰(Machine Learning in Action,@author: Peter H

機器學習實戰(Machine Learning in Action)學習筆記————04.樸素貝葉斯分類(bayes)

機器學習實戰(Machine Learning in Action)學習筆記————04.樸素貝葉斯分類(bayes)關鍵字:樸素貝葉斯、python、原始碼解析作者:米倉山下時間:2018-10-25機器學習實戰(Machine Learning in Action,@author: Peter Harri

機器學習實戰(Machine Learning in Action)學習筆記————03.決策樹原理、原始碼解析及測試

機器學習實戰(Machine Learning in Action)學習筆記————03.決策樹原理、原始碼解析及測試關鍵字:決策樹、python、原始碼解析、測試作者:米倉山下時間:2018-10-24機器學習實戰(Machine Learning in Action,@author: Peter Harr

機器學習實戰(Machine Learning in Action)學習筆記————08.使用FPgrowth演算法來高效發現頻繁項集

機器學習實戰(Machine Learning in Action)學習筆記————08.使用FPgrowth演算法來高效發現頻繁項集關鍵字:FPgrowth、頻繁項集、條件FP樹、非監督學習作者:米倉山下時間:2018-11-3機器學習實戰(Machine Learning in Action,@autho

機器學習實戰(Machine Learning in Action)學習筆記————07.使用Apriori演算法進行關聯分析

機器學習實戰(Machine Learning in Action)學習筆記————07.使用Apriori演算法進行關聯分析關鍵字:Apriori、關聯規則挖掘、頻繁項集作者:米倉山下時間:2018-11-2機器學習實戰(Machine Learning in Action,@author: Peter H

機器學習實戰(Machine Learning in Action)學習筆記————06.k-均值聚類演算法(kMeans)學習筆記

機器學習實戰(Machine Learning in Action)學習筆記————06.k-均值聚類演算法(kMeans)學習筆記關鍵字:k-均值、kMeans、聚類、非監督學習作者:米倉山下時間:2018-11-3機器學習實戰(Machine Learning in Action,@author: Pet

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

​ A Gentle Introduction to Applied Machine Learning as a Search Problem 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/applied-m

Machine Learning in Action》| 第1章 k-近鄰演算法

準備:使用 Python 匯入資料 """ @函式說明: 建立資料集 """ def createDataSet(): # 四組二維特徵 group = np.array([[3,104],[2,100],[101,10],[99,5]])

Machine Learning in Action》| 第2章 決策樹

決策樹 調包 import numpy as np import matplotlib.pyplot as plt import operator from matplotlib.font_manager import FontProperties 3.1.決

機器學習實戰(Machine Learning in Action)學習筆記————10.奇異值分解(SVD)原理、基於協同過濾的推薦引擎、資料降維

關鍵字:SVD、奇異值分解、降維、基於協同過濾的推薦引擎作者:米倉山下時間:2018-11-3機器學習實戰(Machine Learning in Action,@author: Peter Harrington)原始碼下載地址:https://www.manning.com/books/machine-le