1. 程式人生 > >Bootstrapping Machine Learning: Book Review

Bootstrapping Machine Learning: Book Review

Louis Dorard has released his book titled Bootstrapping Machine Learning. It’s a book that provides a gentle introduction to the field of machine learning targeted at developers and start-ups with a focus on prediction APIs.

I just finished reading this book and I want to share some my thoughts. If you are interested, I have already reviewed the sample Louis provides on his webpage that covers the first two chapters.

Bootstrapping Machine Learning

Bootstrapping Machine Learning

Overview

The book is broken down into eight chapters, as follows.

  1. Introduction
  2. Machine Learning and Artificial Intelligence
  3. Concepts
  4. Examples
  5. Applying Machine Learning to Your Domain
  6. Prediction APIs
  7. Case Study: Priority Inbox
  8. Wrap-up

Prediction APIs

Louis provides a taxonomy of prediction APIs in Chapter 3. I’m not clear whether it is his own taxonomy or a general breakdown that is used in the field, but I found it useful nevertheless. He classifies prediction APIs as follows:

  • Specialized AP‭Is: These are APIs that solve a specific problem such as sentiment analysis of tweets or face recognition in images.
  • Generic APIs: these are general machine learning APIs where you upload a dataset and the system returns predictions. Google Prediction API is an example of this.
  • Algorithm APIs: these are generic APIs that provide the details of the algorithms such as their configuration parameters, and choice between algorithms. I think BigML is an example of this API (CART), but Louis suggests BigML is a Generic API.

Louis motivates the need for hosted prediction APIs by suggesting that if you don’t have the time to figure out how machine learning algorithms work, you won’t have the time to figure out how to scale them. I like this point, it highlights the need for the developer or startup to focus on their core offering and to move quickly.

Problem Breakdown

Louis provides a number of example machine learning problems in Chapter 4 in the areas of business and applications. These provide a concrete motivation for the types of problems for which machine learning is suited and how to think about those problems. Louis provides a framework to think about machine learning problems in a structured way that I really like. In summary, that framework is:

  • Who: who does the example concern?
  • Description: what is the context, and what are we trying to
  • do?
  • Question asked: how would you write the questions that the predictive model should give answers to in plain English?
  • Type of ML problem: classification or regression?
  • Input: what are we doing predictions on?
  • Features: which aspects of the inputs are we considering, and what kind of information do we have in their representation?
  • Output: what does the predictive model return?
  • Data collection: how are example input-output pairs obtained to train the predictive model?
  • How predictions are used: when are predictions being made, and what do we do once we have them?

Apply Machine Learning

Chapter 5 focuses on the concerns of applying machine learning to your own domain. Louis guides you through topics such as data collection, feature engineering, preparing data, sanity checks, privacy and performance.

A strong point I liked i this chapter was the thought experiment Louis suggests when approaching a machine learning problem of imaging the system achieving perfect predictions. He uses this to suggest that you think about defining success criteria, performance measures and most importantly: whether solving the problem can yield a return on the investment. He makes this point with a concrete example of customer churn.

Priority Inbox Case Study

Louis finishes out the book with Chapter 6 that summarizes the state of prediction APIs and, touching on text/Natural Language Processing, Computer Vision and examples of using BigML and the Google Prediction API. I didn’t realize that there were so many companies and such a variety of prediction APIs available at the moment. For example, Louis links to the post List of 40+ Machine Learning APIs.

Chapter 7 provides a worked case study on developing a priority inbox leveraging both Google infrastructure and the BigML platform. The thing I liked the most about this example was that it was clear and concise. I don’t like examples with an over abundance of code and this was the right mix for me.

Louis rounds out the book in Chapter 8 with a call to arms in adopting prediction APIs and Machine Learning as a Service (MLaaS) as a way of addressing the the current (and expected to worsen) talent shortage in data science and machine learning. The resources at the back suggest books and courses for learning more about a specific area covered throughout the text.

Summary

I have been thinking deeply about commoditized machine learning. I think it is a market and a adoption that will only grow. I think the benefits will be in the ways in which to best integrate and offer it into the businesses.

The book is clearly presented with the content focused and well suited for the audience. It is not maths heavy, nor is it bogged down with pages and pages of code examples. I really like the crisp presentation of two APIs focused in the book – Google Prediction API and BigML and the world example is just the right level of detail.

You could figure out how to use the APIs on your own, but the benefits in reading Louis’ book is that he motivates the problem solving and machine learning around the available APIs. I recommend this book to a developer or startup looking to start using machine learning quickly and effectively in the web application.

相關推薦

Bootstrapping Machine Learning: Book Review

Tweet Share Share Google Plus Louis Dorard has released his book titled Bootstrapping Machine Le

The Hundred-Page Machine Learning Book

You wrote "Writing a technical book is a profoundly uneconomic project." I think this is true if you go with traditional publishers. My last book, self-pub

Book Review: Machine Learning with Python Cookbook

Additional Considerations The only criticism I can place is that I wish there were more topics covered in the content. Some specific areas I would have li

Machine Learning (2) Parameter Learning & Linear Algebra Review

上一篇介紹了機器學習的基本概念以及這個系列中將要使用的各種表示法,建議手動畫幾次所謂的訓練資料集的表格,加深對各個引數的理解。另,這個系列的主要目的是對整體ML提供一個有深度併兼顧廣度的flavor,所

Review of Machine Learning With R

Tweet Share Share Google Plus How do you get started with machine learning in R? In this post yo

machine learning--L1 ,L2 norm

lan font 更多 ora net 例如 參數 而已 內容   關於L1範數和L2範數的內容和圖示,感覺已經看過千百遍,剛剛看完此大牛博客http://blog.csdn.net/zouxy09/article/details/24971995/,此時此刻終於弄懂了那麽

Ng第十一課:機器學習系統的設計(Machine Learning System Design)

未能 計算公式 pos 構建 我們 行動 mic 哪些 指標 11.1 首先要做什麽 11.2 誤差分析 11.3 類偏斜的誤差度量 11.4 查全率和查準率之間的權衡 11.5 機器學習的數據 11.1 首先要做什麽 在接下來的視頻將談到機器

[Machine Learning (Andrew NG courses)]V. Octave Tutorial (Week 2)

img and learning text net con fonts http .net [Machine Learning (Andrew NG courses)]V. Octave Tutorial (Week 2)

Machine Learning in Action-chapter2-k近鄰算法

turn fma 全部 pytho label -c log eps 數組 一.numpy()函數 1.shape[]讀取矩陣的長度 例: import numpy as np x = np.array([[1,2],[2,3],[3,4]]) print x

Ng第十七課:大規模機器學習(Large Scale Machine Learning)

在線 src 化簡 ima 機器學習 learning 大型數據集 machine cnblogs 17.1 大型數據集的學習 17.2 隨機梯度下降法 17.3 微型批量梯度下降 17.4 隨機梯度下降收斂 17.5 在線學習 17.6 映射化簡和數據並行

Machine Learning:Neural Network---Representation

white div and for 設計 rop out fcm multi Machine Learning:Neural Network---Representation 1。Non-Linear Classification 假設還採取簡

Machine Learning — 關於過度擬合(Overfitting)

機器學習 gis ear http 問題 正則化 數據集 技術 wid 機器學習是在模型空間中選擇最優模型的過程,所謂最優模型,及可以很好地擬合已有數據集,並且正確預測未知數據。 那麽如何評價一個模型的優劣的,用代價函數(Cost function)來度量預測錯誤的程度。代

Machine Learning — 邏輯回歸

url home mage 簡化 bsp 線性 alt 邏輯回歸 sce 現實生活中有很多分類問題,比如正常郵件/垃圾郵件,良性腫瘤/惡性腫瘤,識別手寫字等等,這些可以用邏輯回歸算法來解決。 一、二分類問題 所謂二分類問題,即結果只有兩類,Yes or No,這樣結果{0,

Machine Learning~初探

Y軸 ron 當我 什麽 http 過程 網上 數據 大坑   最近接觸了機器學習,感覺很夢幻,能實現的我的夢想,看網上說的花天酒地的難,但是想做就要做下去,毅然決然的跳入這個大坑。   讓我們慢慢來,先懟它幾個概念。 監督學習   我們給出了關於每個數據的“正確答案”。監

<Machine Learning in Action >之二 樸素貝葉斯 C#實現文章分類

options 直升機 water 飛機 math mes 視頻 write mod def trainNB0(trainMatrix,trainCategory): numTrainDocs = len(trainMatrix) numWords =

Coursera - Machine Learning, Stanford: Week 10

minimal machine mini ica dataset pri text -c summary Overview Gradient Descent with Large Datasets Learning With Large Datasets

useful links about machine learning

ear target 課程 nfa learn pic href learning 資料 機器學習(Machine Learning)&深度學習(Deep Learning)資料(Chapter 1) 機器學習(Machine Learning)&深度學

Machine Learning——DAY1

優劣 大量 mach spa http pin bsp -1 ica 監督學習:分類和回歸 非監督學習:聚類和非聚類 1.分類和聚類的區別: 分類(Categorization or Classification)就是按照某種標準給對象貼標簽(label),再根據標簽來區分

Machine Learning——octave的操作(1)——DAY2

mil 畫出 基礎上 isp res 增加 rand nbsp span 1.PS1(‘>>’); ——不顯示版本 2.輸出: a=pi; format long format short(4位) disp(sprintf(‘%0.2f’,a)) 3.矩陣的輸入

Optimization and Machine Learning(優化與機器學習)

compute war limit label right whether computer itself phy 這是根據(ShanghaiTech University)王浩老師的授課所作的整理。 需要的預備知識:數分、高代、統計、優化 machine learning