Introduction to Likelihood-based methods

Slide 1

Slide 1 text

Likelihood Methods Corey Chivers Department of Biology, McGill University

Slide 2

Slide 2 text

Script Format ● The script is divided into sections: ##@ x.x @## … ...some commands... … ### -- ###

Slide 3

Slide 3 text

##@ 0.1 @## ● We will need to install a package: ##@ 0.1 @## rm(list=ls()) # Housekeeping install.packages('emdbook') # Bolker's data library(emdbook) source('http://madere.biol.mcgill.ca/cchivers/ccMLE.R') ### -- ###

Slide 4

Slide 4 text

The Likelihood Principle L |x∝ P X=x| ● All information contained in data x, with respect to inference about the value of θ, is contained in the likelihood function:

Slide 5

Slide 5 text

The Likelihood Principle L.J. Savage R.A. Fisher

Slide 6

Slide 6 text

The Likelihood Function L |x∝ P X=x| Where θ is(are) our parameter(s) of interest ex: Attack rate Fitness Mean body mass Mortality etc... L |x=f | x

Slide 7

Slide 7 text

The Ecologist's Quarter ● Lands tails (caribou up) 60% of the time

Slide 8

Slide 8 text

The Ecologist's Quarter ● 1) Before flipping, what is the probability that I will flip tails, given that I am flipping an ecologist's quarter (p(tail=0.6))? ● 2) What is the likelihood that I am flipping an ecologist's quarter, given the flip(s) that I have observed? Px |=0.6 L=0.6| x

Slide 9

Slide 9 text

L |x=∏ t=1 T  ∏ h=1 H 1− The Ecologist's Quarter L=0.6| x=H T TH T =∏ t=1 3 0.6∏ h=1 2 0.4 =0.03456

Slide 10

Slide 10 text

L |x=∏ t=1 T  ∏ h=1 H 1− The Ecologist's Quarter L=0.6| x=H T TH T =∏ t=1 3 0.6∏ h=1 2 0.4 =0.03456 But what does this mean? 0.03456 ≠ P(θ) !!!!

Slide 11

Slide 11 text

##@ 1.1-2 @## ● @1.1 defines the likelihood function ● @1.2 flips an ecologist's quarter n times, and plots the likelihood function. ● Vary the number of flips (n), re-run, and see what happens. ● Vary the true θ parameter and see what happens. ● Show your neighbour your likelihood plot and see if they can guess θ. How about n?

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Likelihood So Far... ● The likelihood function is just the product of the individual likelihoods of each observation. ● The likelihood function carries all information contained in the data relevant to inference about parameters. ● Likelihood is a REALATIVE measure, and gets smaller as more data is observed. ● The 'best fitting' parameter value is the one that maximizes this function.

Slide 14

Slide 14 text

Maximum Likelihood Estimation (MLE) ● MLE is a process of fitting model parameters to data. → “What are the parameter values which are most likely to have generated my data?” ● For many models, the maximum likelihood value can not be found analytically. ● In these cases, we use numerical optimization. >?optim

Slide 15

Slide 15 text

Computers hate really small numbers ● As we have seen, as n gets larger, the likelihood can become very small. ● Instead of maximizing the likelihood function itself, we maximize the log likelihood. log ∏ x=∑ log x

Slide 16

Slide 16 text

##@ 1.3 @## >m_eco_q<-optim(par=0.5, fn=log_likelihood_ecol_quarter, lower=0,upper=1, # range of params to search method='Brent', # Use this method for 1D models control=list("fnscale"=-1), hessian=TRUE, q_flips=flips) Where the eff is this peak?

Slide 17

Slide 17 text

##@ 1.3 @## $par [1] 0.5 $value [1] -6.931472 $counts function gradient NA NA $convergence [1] 0 $message NULL $hessian [,1] [1,] -40.00032 Best fit parameter value (aka the MLE) Log likelihood at the MLE If ($convergence == 0) user = :-)

Slide 18

Slide 18 text

##@ 1.3 @## >confint(m_eco_q) 2.5 % 97.5 % par 1 0.1901037 0.8098963 WARNING: Assumes that the L(θ|x) is normally distributed. Usually safe with large n.

Slide 19

Slide 19 text

Formulating Likelihood Functions ● Okay, that was easy, but how can I determine the likelihood function for the way I think the world works (aka my model)? ● A Model is just a mathematical representation of a process, often including both Deterministic and, Stochastic components. ● The likelihood function comes straight from the specification of the model.

Slide 20

Slide 20 text

Example: Tadpole Predation ##@ 2.1 @## > data(ReedfrogFuncresp) # Load Frog predation data > ReedfrogFuncresp Initial Killed 1 5 1 2 5 2 3 10 5 4 10 6 5 15 10 6 15 9 7 20 7 8 20 10 ...

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Example: Tadpole Predation ● Suppose tadpole predators have a Holling type- II functional response: ● Realized number eaten is binomial with probability p: p= a 1ahN k ~Binom p, N Deterministic Stochastic

Slide 23

Slide 23 text

Example: Tadpole Predation ##@ 2.2 @##

Slide 24

Slide 24 text

Example: Tadpole Predation #@2.3 Function Name Arguments Parameters Data binomLL2 <- function(pars,data) { a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

Slide 25

Slide 25 text

Example: Tadpole Predation #@2.3 Parameters binomLL2 <- function(pars,data) { a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

Slide 26

Slide 26 text

Example: Tadpole Predation #@2.3 Deterministic component binomLL2 <- function(pars,data) { a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) }

Slide 27

Slide 27 text

binomLL2 <- function(pars,data) { a = pars[1] h = pars[2] predprob = a/(1+a*h*data$N) sum(dbinom(data$k,prob=predprob, size=data$N, log=TRUE)) } Example: Tadpole Predation #@2.3 Stochastic Component

Slide 28

Slide 28 text

##@ 2.4-6 @## ● Compute the maximum likelihood values of the model parameters. (@2.4) ● Compute the approximate confidence interval and plot the likelihood surface (@2.5) ● Plot the fitted model prediction and (stochastic) confidence intervals over the data. (@2.6)

Slide 29

Slide 29 text

##@ 2.4 @## >m2 #Output of MLE routine $par [1] 0.52593924 0.01660613 $value [1] -46.72136 $counts function gradient 53 NA $convergence [1] 0 $message NULL $hessian [,1] [,2] [1,] -616.2803 7408.079 [2,] 7408.0792 -131327.243 Best fit parameter values (aka the MLE) Log likelihood at the MLE

Slide 30

Slide 30 text

##@ 2.5 @## > confint(m2,parnames = c("a","h")) 2.5 % 97.5 % a 0.386789631 0.66508885 h 0.007073913 0.02613834 > domain<-confint(m2) > surface_plot(binomLL2,domain=domain,data=frogs)

Slide 31

Slide 31 text

##@ 2.6 @##

Slide 32

Slide 32 text

But what if this isn't the right model? ● How would we know?

Slide 33

Slide 33 text

But what if this isn't the right model? ● How would we know? ● Since the model space is infinite, all we can do is propose a suite of models, and allow them to compete in the arena (octagon?) of data. Model 1 Model 2

Slide 34

Slide 34 text

Extension: Model Selection ● Until now, we have been concerned with finding the best fitting parameter values, but what if we have competing models? ● These competing models may not have the same number of parameters. LM |, x=?

Slide 35

Slide 35 text

Extension: Model Selection ● Akaike Information Criteria: ● k=number of parameters L=Maximum likelihood value ● Better models have lower AIC values ● Just like the likelihood itself, AIC is a RELATIVE measure. AIC=2k−2logL Hirotugo Akaike

Slide 36

Slide 36 text

Akaike Weights ● Step 1: Compute the MLE for each candidate model ● Step 2: Compute each model's difference, Δ i , from the 'best' model (lowest AIC value) ● Step 3: Compute w i w i = exp−1/2 i  ∑ m=1 M exp−1/2 m 

Slide 37

Slide 37 text

##@ 3.1-4 @## ● Define the likelihood functions for two alternative models. (@3.1) ● Try to write down the mathematical formulation of each model. ● Compute the AIC for each of the three competing models. (@3.2) ● Compute the model weights for each of the three competing models. (@3.3) ● Which is the best fitting model? ● How will these values vary if we were to add more competing models to our set? ● Plot the predictions of each model (@3.4)

Slide 38

Slide 38 text

w(m1) = 0.010 w(m2) = 0.988 w(m3) = 0.001 Akaike Weights:

Slide 39

Slide 39 text

##@ 4.0 @## Sandbox

Slide 40

Slide 40 text

Summary ● The likelihood function is simply the product of the individual likelihoods of each observation. ● Likelihood is a REALATIVE measure, and gets smaller as more data is observed. ● The 'best fitting' parameter value is the one that maximizes the likelihood function (MLE). ● Numerical optimization can be used to find the MLE. ● Competing models can be compared using AIC, which is based on each model's MLE.