最大似然估計Maximum-likelihood (ML) Estimation

Suppose that an experiment consists ofn= 5 independent Bernoulli trials, each having probability of successp. LetXbe the total number of successes in the trials, so thatX∼Bin(5,p). If the outcome isX= 3, the likelihood is

L(p;x)=n!x!(n−x)!px(1−p)n−x=5!3!(5−3)!p3(1−p)5−3∝p3(1−p)2L(p;x)=n!x!(n−x)!px(1−p)n−x=5!3!(5−3)!p3(1−p)5−3∝p3(1−p)2

where the constant at the beginning is ignored. A graph ofL(p;x)=p3(1−p)2 over the unit intervalp∈ (0, 1) looks like this:

plot

It’s interesting that this function reaches its maximum value atp= .6. An intelligent person would have said that if we observe 3 successes in 5 trials, a reasonable estimate of the long-run proportion of successesp

would be 3/5 = .6.

This example suggests that it may be reasonable to estimate an unknown parameter θ by the value for which the likelihood functionL(θ ;x) is largest. This approach is calledmaximum-likelihood (ML) estimation. We will denote the value of θ that maximizes the likelihood function byθ

^θ^, read “theta hat.”θ^θ^is called themaximum-likelihood estimate (MLE)of θ.

Finding MLE’s usually involves techniques of differential calculus. To maximizeL(θ ;x) with respect to θ:

first calculate the derivative ofL(θ ;x) with respect to θ,
set the derivative equal to zero, and
solve the resulting equation for θ.

These computations can often be simplified by maximizing theloglikelihood function,

l(θ;x)=logL(θ;x),

where “log” means natural log (logarithm to the base e). Because the natural log is an increasing function, maximizing the loglikelihood is the same as maximizing the likelihood. The loglikelihood often has a much simpler form than the likelihood and is usually easier to differentiate.

In Stat 504 you will not be asked to derive MLE’s by yourself. In most of the probability models that we will use later in the course (logistic regression, loglinear models, etc.) no explicit formulas for MLE’s are available, and we will have to rely on computer packages to calculate the MLE’s for us. For the simple probability models we have seen thus far, however, explicit formulas for MLE’s are available and are given next.

ML for Bernoulli trials

If our experiment is a single Bernoulli trial and we observeX= 1 (success) then the likelihood function isL(p;x) =p. This function reaches its maximum atp^=1. If we observeX= 0 (failure) then the likelihood isL(p;x) = 1 −p, which reaches its maximum at

最大似然估計Maximum-likelihood (ML) Estimation

ML for Bernoulli trials

最大似然估計Maximum-likelihood (ML) Estimation

【MLE】最大似然估計Maximum Likelihood Estimation

最大似然估計(Maximum likelihood estimation)

最大似然估計(Maximum likelihood estimation)(通過例子理解)

最大似然估計實例 | Fitting a Model by Maximum Likelihood (MLE)

機器學習概念：最大後驗概率估計與最大似然估計（Maximum posterior probability and maximum likelihood estimation)

【機器學習】MAP最大後驗估計和ML最大似然估計區別

協方差最大似然估計為什麼比實際協方差小一點 E(ΣML)=(N-1)/N * Σ

『科學計算_理論』最大似然估計

最小二乘法和最大似然估計的聯系和區別（轉）

最大似然估計與最小二乘

最大似然估計

【機器學習基本理論】詳解最大似然估計（MLE）、最大後驗概率估計（MAP），以及貝葉斯公式的理解

最大似然估計（轉載）

最大似然估計最大似然估計（MLE）最大後驗概率（MAP）

似然函式和最大似然估計與機器學習中的交叉熵函式之間的關係

最大似然估計vs最大後驗概率

【模式識別與機器學習】——最大似然估計（MLE）最大後驗概率（MAP）

最大似然估計的學習

人工智慧初學- 1.2 最大似然估計及貝葉斯演算法

最大似然估計Maximum-likelihood (ML) Estimation

ML for Bernoulli trials

相關推薦