##@ 0.1 @## ● We will need to install a package: ##@ 0.1 @## rm(list=ls()) # Housekeeping install.packages('emdbook') # Bolker's data library(emdbook) source('http://madere.biol.mcgill.ca/cchivers/ccMLE.R') ### -- ###
The Likelihood Principle L |x∝ P X=x| ● All information contained in data x, with respect to inference about the value of θ, is contained in the likelihood function:
The Likelihood Function L |x∝ P X=x| Where θ is(are) our parameter(s) of interest ex: Attack rate Fitness Mean body mass Mortality etc... L |x=f | x
The Ecologist's Quarter ● 1) Before flipping, what is the probability that I will flip tails, given that I am flipping an ecologist's quarter (p(tail=0.6))? ● 2) What is the likelihood that I am flipping an ecologist's quarter, given the flip(s) that I have observed? Px |=0.6 L=0.6| x
L |x=∏ t=1 T ∏ h=1 H 1− The Ecologist's Quarter L=0.6| x=H T TH T =∏ t=1 3 0.6∏ h=1 2 0.4 =0.03456 But what does this mean? 0.03456 ≠ P(θ) !!!!
##@ 1.1-2 @## ● @1.1 defines the likelihood function ● @1.2 flips an ecologist's quarter n times, and plots the likelihood function. ● Vary the number of flips (n), re-run, and see what happens. ● Vary the true θ parameter and see what happens. ● Show your neighbour your likelihood plot and see if they can guess θ. How about n?
Likelihood So Far... ● The likelihood function is just the product of the individual likelihoods of each observation. ● The likelihood function carries all information contained in the data relevant to inference about parameters. ● Likelihood is a REALATIVE measure, and gets smaller as more data is observed. ● The 'best fitting' parameter value is the one that maximizes this function.
Maximum Likelihood Estimation (MLE) ● MLE is a process of fitting model parameters to data. → “What are the parameter values which are most likely to have generated my data?” ● For many models, the maximum likelihood value can not be found analytically. ● In these cases, we use numerical optimization. >?optim
Computers hate really small numbers ● As we have seen, as n gets larger, the likelihood can become very small. ● Instead of maximizing the likelihood function itself, we maximize the log likelihood. log ∏ x=∑ log x
##@ 1.3 @## >m_eco_q<-optim(par=0.5, fn=log_likelihood_ecol_quarter, lower=0,upper=1, # range of params to search method='Brent', # Use this method for 1D models control=list("fnscale"=-1), hessian=TRUE, q_flips=flips) Where the eff is this peak?
##@ 1.3 @## $par [1] 0.5 $value [1] -6.931472 $counts function gradient NA NA $convergence [1] 0 $message NULL $hessian [,1] [1,] -40.00032 Best fit parameter value (aka the MLE) Log likelihood at the MLE If ($convergence == 0) user = :-)
##@ 1.3 @## >confint(m_eco_q) 2.5 % 97.5 % par 1 0.1901037 0.8098963 WARNING: Assumes that the L(θ|x) is normally distributed. Usually safe with large n.
Formulating Likelihood Functions ● Okay, that was easy, but how can I determine the likelihood function for the way I think the world works (aka my model)? ● A Model is just a mathematical representation of a process, often including both Deterministic and, Stochastic components. ● The likelihood function comes straight from the specification of the model.
Example: Tadpole Predation ● Suppose tadpole predators have a Holling type- II functional response: ● Realized number eaten is binomial with probability p: p= a 1ahN k ~Binom p, N Deterministic Stochastic
##@ 2.4-6 @## ● Compute the maximum likelihood values of the model parameters. (@2.4) ● Compute the approximate confidence interval and plot the likelihood surface (@2.5) ● Plot the fitted model prediction and (stochastic) confidence intervals over the data. (@2.6)
But what if this isn't the right model? ● How would we know? ● Since the model space is infinite, all we can do is propose a suite of models, and allow them to compete in the arena (octagon?) of data. Model 1 Model 2
Extension: Model Selection ● Until now, we have been concerned with finding the best fitting parameter values, but what if we have competing models? ● These competing models may not have the same number of parameters. LM |, x=?
Extension: Model Selection ● Akaike Information Criteria: ● k=number of parameters L=Maximum likelihood value ● Better models have lower AIC values ● Just like the likelihood itself, AIC is a RELATIVE measure. AIC=2k−2logL Hirotugo Akaike
Akaike Weights ● Step 1: Compute the MLE for each candidate model ● Step 2: Compute each model's difference, Δ i , from the 'best' model (lowest AIC value) ● Step 3: Compute w i w i = exp−1/2 i ∑ m=1 M exp−1/2 m
##@ 3.1-4 @## ● Define the likelihood functions for two alternative models. (@3.1) ● Try to write down the mathematical formulation of each model. ● Compute the AIC for each of the three competing models. (@3.2) ● Compute the model weights for each of the three competing models. (@3.3) ● Which is the best fitting model? ● How will these values vary if we were to add more competing models to our set? ● Plot the predictions of each model (@3.4)
Summary ● The likelihood function is simply the product of the individual likelihoods of each observation. ● Likelihood is a REALATIVE measure, and gets smaller as more data is observed. ● The 'best fitting' parameter value is the one that maximizes the likelihood function (MLE). ● Numerical optimization can be used to find the MLE. ● Competing models can be compared using AIC, which is based on each model's MLE.