Introduction to Likelihood-based methods

Likelihood Methods Corey Chivers Department of Biology, McGill University

Script Format • The script is divided into sections: ##@
x.x @## … ...some commands... … ### -- ###

##@ 0.1 @## • We will need to install a
package: ##@ 0.1 @## rm(list=ls()) # Housekeeping install.packages('emdbook') # Bolker's data library(emdbook) source('http://madere.biol.mcgill.ca/cchivers/ccMLE.R') ### -- ###

The Likelihood Principle L |x∝ P X=x| • All information
contained in data x, with respect to inference about the value of θ, is contained in the likelihood function:

The Likelihood Principle L.J. Savage R.A. Fisher

The Likelihood Function L |x∝ P X=x| Where θ is(are)
our parameter(s) of interest ex: Attack rate Fitness Mean body mass Mortality etc... L |x=f | x

The Ecologist's Quarter • Lands tails (caribou up) 60% of
the time

The Ecologist's Quarter • 1) Before flipping, what is the
probability that I will flip tails, given that I am flipping an ecologist's quarter (p(tail=0.6))? • 2) What is the likelihood that I am flipping an ecologist's quarter, given the flip(s) that I have observed? Px |=0.6 L=0.6| x

L |x=∏ t=1 T  ∏ h=1 H 1− The
Ecologist's Quarter L=0.6| x=H T TH T =∏ t=1 3 0.6∏ h=1 2 0.4 =0.03456

L |x=∏ t=1 T  ∏ h=1 H 1− The
Ecologist's Quarter L=0.6| x=H T TH T =∏ t=1 3 0.6∏ h=1 2 0.4 =0.03456 But what does this mean? 0.03456 ≠ P(θ) !!!!

##@ 1.1-2 @## • @1.1 defines the likelihood function •
@1.2 flips an ecologist's quarter n times, and plots the likelihood function. • Vary the number of flips (n), re-run, and see what happens. • Vary the true θ parameter and see what happens. • Show your neighbour your likelihood plot and see if they can guess θ. How about n?

Likelihood So Far... • The likelihood function is just the
product of the individual likelihoods of each observation. • The likelihood function carries all information contained in the data relevant to inference about parameters. • Likelihood is a REALATIVE measure, and gets smaller as more data is observed. • The 'best fitting' parameter value is the one that maximizes this function.

Maximum Likelihood Estimation (MLE) • MLE is a process of
fitting model parameters to data. → “What are the parameter values which are most likely to have generated my data?” • For many models, the maximum likelihood value can not be found analytically. • In these cases, we use numerical optimization. >?optim

Computers hate really small numbers • As we have seen,
as n gets larger, the likelihood can become very small. • Instead of maximizing the likelihood function itself, we maximize the log likelihood. log ∏ x=∑ log x

##@ 1.3 @## >m_eco_q<-optim(par=0.5, fn=log_likelihood_ecol_quarter, lower=0,upper=1, # range of params
to search method='Brent', # Use this method for 1D models control=list("fnscale"=-1), hessian=TRUE, q_flips=flips) Where the eff is this peak?

##@ 1.3 @## $par [1] 0.5 $value [1] -6.931472 $counts
function gradient NA NA $convergence [1] 0 $message NULL $hessian [,1] [1,] -40.00032 Best fit parameter value (aka the MLE) Log likelihood at the MLE If ($convergence == 0) user = :-)

##@ 1.3 @## >confint(m_eco_q) 2.5 % 97.5 % par 1
0.1901037 0.8098963 WARNING: Assumes that the L(θ|x) is normally distributed. Usually safe with large n.

Formulating Likelihood Functions • Okay, that was easy, but how
can I determine the likelihood function for the way I think the world works (aka my model)? • A Model is just a mathematical representation of a process, often including both Deterministic and, Stochastic components. • The likelihood function comes straight from the specification of the model.

Example: Tadpole Predation ##@ 2.1 @## > data(ReedfrogFuncresp) # Load
Frog predation data > ReedfrogFuncresp Initial Killed 1 5 1 2 5 2 3 10 5 4 10 6 5 15 10 6 15 9 7 20 7 8 20 10 ...

Example: Tadpole Predation • Suppose tadpole predators have a Holling
type- II functional response: • Realized number eaten is binomial with probability p: p= a 1ahN k ~Binom p, N Deterministic Stochastic

Example: Tadpole Predation ##@ 2.2 @##

Example: Tadpole Predation #@2.3 Function Name Arguments Parameters Data binomLL2
<- function(pars,data) { a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

Example: Tadpole Predation #@2.3 Parameters binomLL2 <- function(pars,data) { a
= pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

Example: Tadpole Predation #@2.3 Deterministic component binomLL2 <- function(pars,data) {
a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

binomLL2 <- function(pars,data) { a = pars[1] h = pars[2]
predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) } Example: Tadpole Predation #@2.3 Stochastic Component

##@ 2.4-6 @## • Compute the maximum likelihood values of
the model parameters. (@2.4) • Compute the approximate confidence interval and plot the likelihood surface (@2.5) • Plot the fitted model prediction and (stochastic) confidence intervals over the data. (@2.6)

##@ 2.4 @## >m2 #Output of MLE routine $par [1]
0.52593924 0.01660613 $value [1] -46.72136 $counts function gradient 53 NA $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] -616.2803 7408.079 [2,] 7408.0792 -131327.243 Best fit parameter values (aka the MLE) Log likelihood at the MLE

##@ 2.5 @## > confint(m2,parnames = c("a","h")) 2.5 % 97.5
% a 0.386789631 0.66508885 h 0.007073913 0.02613834 > domain<-confint(m2) > surface_plot(binomLL2,domain=domain,data=frogs)

##@ 2.6 @##

But what if this isn't the right model? • How
would we know?

But what if this isn't the right model? • How
would we know? • Since the model space is infinite, all we can do is propose a suite of models, and allow them to compete in the arena (octagon?) of data. Model 1 Model 2

Extension: Model Selection • Until now, we have been concerned
with finding the best fitting parameter values, but what if we have competing models? • These competing models may not have the same number of parameters. LM |, x=?

Extension: Model Selection • Akaike Information Criteria: • k=number of
parameters L=Maximum likelihood value • Better models have lower AIC values • Just like the likelihood itself, AIC is a RELATIVE measure. AIC=2k−2logL Hirotugo Akaike

Akaike Weights • Step 1: Compute the MLE for each
candidate model • Step 2: Compute each model's difference, Δ i , from the 'best' model (lowest AIC value) • Step 3: Compute w i w i = exp−1/2 i  ∑ m=1 M exp−1/2 m 

##@ 3.1-4 @## • Define the likelihood functions for two
alternative models. (@3.1) • Try to write down the mathematical formulation of each model. • Compute the AIC for each of the three competing models. (@3.2) • Compute the model weights for each of the three competing models. (@3.3) • Which is the best fitting model? • How will these values vary if we were to add more competing models to our set? • Plot the predictions of each model (@3.4)

w(m1) = 0.010 w(m2) = 0.988 w(m3) = 0.001 Akaike
Weights:

##@ 4.0 @## Sandbox

Summary • The likelihood function is simply the product of
the individual likelihoods of each observation. • Likelihood is a REALATIVE measure, and gets smaller as more data is observed. • The 'best fitting' parameter value is the one that maximizes the likelihood function (MLE). • Numerical optimization can be used to find the MLE. • Competing models can be compared using AIC, which is based on each model's MLE.

Introduction to Likelihood-based methods

Introduction to Likelihood-based methods

More Decks by Corey Chivers

Featured

Transcript