[筆記]機器學習基石 01 The Learning Problem
本筆記是從onenote寫的,然後導成word發布在這裏。自己記的相對隨意,沒有多花時間編輯因此也有內容重點不很突出、中英文混雜等的缺點,請見諒。
一 Course Introduction
方式:從基礎的角度切入
story-like:
·When Can Machine Learn? (illustrative + technical)
·Why Can Machine Learn? (theoretical + illustrative)
·How Can Machine Learn? (technical + practical)
·How Can Machine Learn Better? (practical + theoretical)
二 What Is Machine Learning
1 What Is Learning
學習的一個共通性是從觀察出發,聽覺是一種觀察,視覺也是一種觀察。從這些觀察出發,然後經過腦袋的轉化過程,最後變成有用的技巧,這是一種學習的過程。
機器學習就是在模仿人類學習的過程。機器學習的主體從人轉變成計算機。電腦觀察到的東西(我們主動給電腦的東西或者電腦想辦法獲取到的東西)稱為資料。電腦將資料拿來,經過一番處理,最後變成對電腦來說有用的技能。
2 What Is Skill
技巧是用來增進某一方面的表現。比如學習了數學,計算可以變得更準確。
3 Machine Learning
機器學習的過程是從資料出發,然後經過電腦的計算,最終得到某種表現的增進。
4 Why Use Machine Learning
The Machine Learning: an alternative route to build complicated systems
5 Key Essence of Machine Learning
什麽情況下可以使用機器學習呢?如果問題有下面三個關鍵,可能可以使用機器學習。
三 Applications of Machine Learning (略看)
機器學習在我們日常生活中的衣食住行育樂都有應用。以下為幾個方面的例子,了解一下就好
衣食住行:
教育:
娛樂:
四 Components of Machine Learning (重點!!!)
1 Formalize of Learning Problem
·輸入(Input):xX(銀行掌握的用戶信息)
·輸出(Output):yY(是否發卡給用戶)
·未知的函數,即目標函數(target function): f: X→Y(理想的信用卡發放公式)
·資料(data),即訓練樣本(training examples):D={(x1, y1), (x2, y2),…, (xN, yN)} (銀行的歷史記錄)
·假說(hypothesis),即能增進表現的技能(skill): g:X→Y (學習到的公式)
2 Learning Flow
學習的簡單流程:
學習的詳細流程:
在上圖中註意兩點:
(1)target f unknown
(i.e. no programmable definition)
(2)hypothesis g hopefully ≈ f
but possibly different from f
(perfection ‘impossible‘ when f unknown)
3 The Learning Model
這裏的流程圖與2中不同。可能的假說公式有很多種,這些總合起來放到假設集合(hypothesis set,符號為H)中,有好的假設也有壞的假設。這是ML被詳細的定義為機器學習算法(learning algorithm),它從看到的資料裏面,去假設集合裏選一個最好的出來。
註意:
(1)assume g∈H={hk}, i.e. approving if
·h1: annual salary > NTD 800,000
·h2: debt > NTD 100,000 (really?)
·h3: year in job <= 2 (really?)
(2)hypothesis set H:
·can contain good or bad hypotheses
·up to A to pick the ‘best‘ one as g
模型:
4 Practical Definition of Machine Learning
現在可以對機器學習進行更完整的定義:
五 Machine Learning and Other Fields
1 Machine Learning and Data Mining
Machine Learning:
use data to compute hypothesis g that approximates target f
Data Mining:
use (huge) to find property that is interesting
·if ‘interesting property‘ same as ‘hypothesis that approximate target‘
--ML = DM (usually what KDDCup does)
·if ‘interesting property‘ related to ‘hypothesis that approximate target‘
-- DM can help ML, and vice versa (often, but not always)
·traditional DM also focuses on efficient computation in large database
In general, it‘s difficult to distinguish ML and DM in reality
2 Machine Learning and Artificial Intelligence
Machine Learning:
use data to compute hypothesis g that approximates target f
Artificial Intelligence:
compute something that shows intelligent behavior
·g≈f is something that shows intelligent behavior
-- ML can realize AI, among other routes
·e.g. chess playing
traditional AI: game tree
ML for AI: ‘learning from board data‘
ML is one possible route to realize AI
3 Machine Learning and Statistics
Machine Learning:
use data to compute hypothesis g that approximates target f
Statistics:
use data to make inference about an unknown process
·g is an inference outcome; f is something unknown
--statistics can be used to achieve ML
·traditional statistics also focus on provable results with math assumptions,
and care less about computation
statistics: many useful tools for ML
4 Summary
[筆記]機器學習基石 01 The Learning Problem