最大似然估計Maximum-likelihood (ML) Estimation

Suppose that an experiment consists ofn= 5 independent Bernoulli trials, each having probability of successp. LetXbe the total number of successes in the trials, so thatXBin(5,p)X∼Bin(5,p). If the outcome isX= 3, the likelihood is


where the constant at the beginning is ignored. A graph ofL(p;x)=p3(1p)2L(p;x)=p3(1−p)2 over the unit intervalp∈ (0, 1) looks like this:


It’s interesting that this function reaches its maximum value atp= .6. An intelligent person would have said that if we observe 3 successes in 5 trials, a reasonable estimate of the long-run proportion of successesp

would be 3/5 = .6.

This example suggests that it may be reasonable to estimate an unknown parameter θ by the value for which the likelihood functionL(θ ;x) is largest. This approach is calledmaximum-likelihood (ML) estimation. We will denote the value of θ that maximizes the likelihood function byθ

^θ^, read “theta hat.”θ^θ^is called themaximum-likelihood estimate (MLE)of θ.

Finding MLE’s usually involves techniques of differential calculus. To maximizeL(θ ;x) with respect to θ:

  • first calculate the derivative ofL(θ ;x) with respect to θ,
  • set the derivative equal to zero, and
  • solve the resulting equation for θ.

These computations can often be simplified by maximizing theloglikelihood function,


where “log” means natural log (logarithm to the base e). Because the natural log is an increasing function, maximizing the loglikelihood is the same as maximizing the likelihood. The loglikelihood often has a much simpler form than the likelihood and is usually easier to differentiate.

In Stat 504 you will not be asked to derive MLE’s by yourself. In most of the probability models that we will use later in the course (logistic regression, loglinear models, etc.) no explicit formulas for MLE’s are available, and we will have to rely on computer packages to calculate the MLE’s for us. For the simple probability models we have seen thus far, however, explicit formulas for MLE’s are available and are given next.

ML for Bernoulli trials

If our experiment is a single Bernoulli trial and we observeX= 1 (success) then the likelihood function isL(p;x) =p. This function reaches its maximum atp^=1p^=1. If we observeX= 0 (failure) then the likelihood isL(p;x) = 1 −p, which reaches its maximum at


最大似然估計 (MLE) 最大後驗概率(MAP) 1) 最大似然估計 MLE 給定一堆資料,假如我們知道它是從某一種分佈中隨機取出來的,可是我們並不知道這個分佈具體的參,即"模型已定,引數未知"。例如,我們知道這個分佈是正態分佈,但是不知道均值和方差;或者是二項分佈,但是不知道均值。 最


1) 最大似然估計 MLE 給定一堆資料,假如我們知道它是從某一種分佈中隨機取出來的,可是我們並不知道這個分佈具體的參,即"模型已定,引數未知"。例如,我們知道這個分佈是正態分佈,但是不知道均值和方差;或者是二項分佈,但是不知道均值。 最大似然估計(MLE,Maximum Lik

1) 極/最大似然估計 MLE 給定一堆資料,假如我們知道它是從某一種分佈中隨機取出來的,可是我們並不知道這個分佈具體的參,即"模型已定,引數未知"。例如,我們知道這個分佈是正態分佈,但是不知道均值和方差;或者是二項分佈,但是不知道均值。 最大似然估計(MLE,Maximum Likelihood Esti


