510

# Introduction to Likelihood-based methods May 03, 2013

## Transcript

1. Likelihood Methods
Corey Chivers
Department of Biology, McGill University

2. Script Format

The script is divided into sections:
##@ x.x @##

...some commands...

### -- ###

3. ##@ 0.1 @##

We will need to install a package:
##@ 0.1 @##
rm(list=ls()) # Housekeeping
install.packages('emdbook') # Bolker's data
library(emdbook)
### -- ###

4. The Likelihood Principle
L |x∝ P X=x|

All information contained in data x, with
respect to inference about the value of θ, is
contained in the likelihood function:

5. The Likelihood Principle
L.J. Savage R.A. Fisher

6. The Likelihood Function
L |x∝ P X=x|
Where θ is(are) our parameter(s) of interest
ex:
Attack rate
Fitness
Mean body mass
Mortality
etc...
L |x=f | x

7. The Ecologist's Quarter

Lands tails (caribou up) 60% of the time

8. The Ecologist's Quarter

1) Before flipping, what is the probability that I will flip tails,
given that I am flipping an ecologist's quarter (p(tail=0.6))?

2) What is the likelihood that I am flipping an ecologist's
quarter, given the flip(s) that I have observed?
Px |=0.6
L=0.6| x

9. L |x=∏
t=1
T
 ∏
h=1
H
1−
The Ecologist's Quarter
L=0.6| x=H T TH T
=∏
t=1
3
0.6∏
h=1
2
0.4
=0.03456

10. L |x=∏
t=1
T
 ∏
h=1
H
1−
The Ecologist's Quarter
L=0.6| x=H T TH T
=∏
t=1
3
0.6∏
h=1
2
0.4
=0.03456
But what does this
mean?
0.03456 ≠ P(θ) !!!!

11. ##@ 1.1-2 @##

@1.1 defines the likelihood function

@1.2 flips an ecologist's quarter n times, and plots
the likelihood function.

Vary the number of flips (n), re-run, and see what
happens.

Vary the true θ parameter and see what happens.

they can guess θ. How about n?

12. Likelihood So Far...

The likelihood function is just the product of the
individual likelihoods of each observation.

The likelihood function carries all information contained in
the data relevant to inference about parameters.

Likelihood is a REALATIVE measure, and gets smaller as
more data is observed.

The 'best fitting' parameter value is the one that
maximizes this function.

13. Maximum Likelihood Estimation
(MLE)

MLE is a process of fitting model parameters to
data. → “What are the parameter values which
are most likely to have generated my data?”

For many models, the maximum likelihood
value can not be found analytically.

In these cases, we use numerical optimization.
>?optim

14. Computers hate really small
numbers

As we have seen, as n gets
larger, the likelihood can
become very small.

Instead of maximizing the likelihood function
itself, we maximize the log likelihood.
log ∏ x=∑ log x

15. ##@ 1.3 @##
>m_eco_q<-optim(par=0.5,
fn=log_likelihood_ecol_quarter,
lower=0,upper=1, # range of params to search
method='Brent', # Use this method for 1D models
control=list("fnscale"=-1),
hessian=TRUE,
q_flips=flips)
Where the eff is
this peak?

16. ##@ 1.3 @##
\$par
 0.5
\$value
 -6.931472
\$counts
NA NA
\$convergence
 0
\$message
NULL
\$hessian
[,1]
[1,] -40.00032
Best fit parameter value
(aka the MLE)
Log likelihood at the MLE
If (\$convergence == 0)
user = :-)

17. ##@ 1.3 @##
>confint(m_eco_q)
2.5 % 97.5 %
par 1 0.1901037 0.8098963
WARNING: Assumes that the
L(θ|x) is normally distributed.
Usually safe with large n.

18. Formulating Likelihood Functions

Okay, that was easy, but how can I determine the
likelihood function for the way I think the world
works (aka my model)?

A Model is just a mathematical representation of a
process, often including both
Deterministic and,
Stochastic components.

The likelihood function comes straight from the
specification of the model.

##@ 2.1 @##
> data(ReedfrogFuncresp) # Load Frog predation data
> ReedfrogFuncresp
Initial Killed
1 5 1
2 5 2
3 10 5
4 10 6
5 15 10
6 15 9
7 20 7
8 20 10
...

Suppose tadpole predators have a Holling type-
II functional response:

Realized number eaten is binomial with
probability p:
p=
a
1ahN
k ~Binom p, N
Deterministic
Stochastic

##@ 2.2 @##

Function Name Arguments
Parameters Data
binomLL2 <- function(pars,data)
{
a = pars
h = pars
predprob = a/(1+a*h*data\$N)
sum(dbinom(data\$k,prob=predprob,
size=data\$N,
log=TRUE))
}

Parameters
binomLL2 <- function(pars,data)
{
a = pars
h = pars
predprob = a/(1+a*h*data\$N)
sum(dbinom(data\$k,prob=predprob,
size=data\$N,
log=TRUE))
}

Deterministic component
binomLL2 <- function(pars,data)
{
a = pars
h = pars
predprob = a/(1+a*h*data\$N)
sum(dbinom(data\$k,prob=predprob,
size=data\$N,
log=TRUE))
}

25. binomLL2 <- function(pars,data)
{
a = pars
h = pars
predprob = a/(1+a*h*data\$N)
sum(dbinom(data\$k,prob=predprob,
size=data\$N,
log=TRUE))
}
Stochastic Component

26. ##@ 2.4-6 @##

Compute the maximum likelihood values of the
model parameters. (@2.4)

Compute the approximate confidence interval
and plot the likelihood surface (@2.5)

Plot the fitted model prediction and (stochastic)
confidence intervals over the data. (@2.6)

27. ##@ 2.4 @##
>m2 #Output of MLE routine
\$par
 0.52593924 0.01660613
\$value
 -46.72136
\$counts
53 NA
\$convergence
 0
\$message
NULL
\$hessian
[,1] [,2]
[1,] -616.2803 7408.079
[2,] 7408.0792 -131327.243
Best fit parameter values
(aka the MLE)
Log likelihood at the MLE

28. ##@ 2.5 @##
> confint(m2,parnames = c("a","h"))
2.5 % 97.5 %
a 0.386789631 0.66508885
h 0.007073913 0.02613834
> domain<-confint(m2)
> surface_plot(binomLL2,domain=domain,data=frogs)

29. ##@ 2.6 @##

30. But what if this isn't the right model?

How would we know?

31. But what if this isn't the right model?

How would we know?

Since the model space is infinite, all we can do
is propose a suite of models, and allow them to
compete in the arena (octagon?) of data.
Model 1
Model 2

32. Extension: Model Selection

Until now, we have been concerned with finding
the best fitting parameter values, but what if we
have competing models?

These competing models may not have the
same number of parameters.
LM |, x=?

33. Extension: Model Selection

Akaike Information Criteria:

k=number of parameters
L=Maximum likelihood value

Better models have lower AIC values

Just like the likelihood itself, AIC is a RELATIVE
measure.
AIC=2k−2logL
Hirotugo Akaike

34. Akaike Weights

Step 1: Compute the MLE for each candidate
model

Step 2: Compute each model's difference, Δ
i
,
from the 'best' model (lowest AIC value)

Step 3: Compute w
i
w
i
=
exp−1/2
i

m=1
M
exp−1/2
m

35. ##@ 3.1-4 @##

Define the likelihood functions for two alternative models.
(@3.1)

Try to write down the mathematical formulation of each
model.

Compute the AIC for each of the three competing models.
(@3.2)

Compute the model weights for each of the three
competing models. (@3.3)

Which is the best fitting model?

How will these values vary if we were to add more
competing models to our set?

Plot the predictions of each model (@3.4)

36. w(m1) = 0.010
w(m2) = 0.988
w(m3) = 0.001
Akaike Weights:

37. ##@ 4.0 @##
Sandbox

38. Summary

The likelihood function is simply the product of the
individual likelihoods of each observation.

Likelihood is a REALATIVE measure, and gets
smaller as more data is observed.

The 'best fitting' parameter value is the one that
maximizes the likelihood function (MLE).

Numerical optimization can be used to find the
MLE.

Competing models can be compared using AIC,
which is based on each model's MLE.