貝葉斯公式、先驗概率、後驗概率
先驗概率:
在缺少某個前提下的變數概率,在機器學習中就是沒有訓練樣本,在訓練之前的初始概率:P(w)
後驗概率:
在有了樣本資料以後,對變數進行概率的修正,得到的概率就是後驗概率,,例如g是樣本,則後驗概率是:P(w | g)
貝葉斯公式:
從形式上講,貝葉斯公式通過先驗概率和似然函式求取後驗概率。
P(w | g)= P(w) P(g | w) / P(g)
R 語言貝葉斯公式計算例子:
先驗概率: 機器的狀態有兩種,工作working(概率是:0.9),或者損壞broken(概率是:0.1)
似然概率: 在兩種狀態下,結果有好壞兩種, good or broken
good | broken | |
working | 0.95 | 0.05 |
broken | 0.7 | 0.3 |
然後給出一組結果,"g", "b", "g", "g", "g", "g", "g", "g", "g", "b", "g", "b", 求後驗概率
即 P(w | g), P(w | b), P(b | g), P(b | b)
例如, P(w | g)= P(w) P(g | w) / P(g)
這裡的全概率P(g) = P(g | w)P(w) + P(g | b)P(b)
下面是R程式碼
######################################################## # Illustration of function bayes to illustrate # sequential learning in Bayes' rule ######################################################## bayes <- function(prior, likelihood, data){ probs <- matrix(0, length(data) + 1, length(prior)) dimnames(probs)[[1]] <- c("prior", data) dimnames(probs)[[2]] <- names(prior) probs[1, ] <- prior for(j in 1:length(data)) probs[j+1, ] <- probs[j, ] * likelihood[, data[j]] / sum(probs[j, ] * likelihood[, data[j]]) dimnames(probs)[[1]] <- paste(0:length(data), dimnames(probs)[[1]]) data.frame(probs) } # quality control example # machine is either working or broken with prior probs .9 and .1 prior <- c(working = .9, broken = .1) # outcomes are good (g) or broken (b) # likelihood matrix gives probs of each outcome for each model like.working <- c(g=.95, b=.05) like.broken <- c(g=.7, b=.3) likelihood <- rbind(like.working, like.broken) # sequence of data outcomes data <- c("g", "b", "g", "g", "g", "g", "g", "g", "g", "b", "g", "b") # function bayes will computed the posteriors, one datum at a time # inputs are the prior vector, likelihood matrix, and vector of data posterior <- bayes(prior, likelihood, data) posterior
執行結果:
working broken
0 prior 0.9000 0.10000
1 g 0.9243 0.07568
2 b 0.6706 0.32941
3 g 0.7342 0.26576
4 g 0.7894 0.21055
5 g 0.8358 0.16424
6 g 0.8735 0.12649
7 g 0.9036 0.09641
8 g 0.9271 0.07289
9 g 0.9452 0.05476
10 b 0.7421 0.25793
11 g 0.7961 0.20389
12 b 0.3942 0.60578