Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Likelihood-based methods

Corey Chivers
May 03, 2013
510

Introduction to Likelihood-based methods

Corey Chivers

May 03, 2013
Tweet

Transcript

  1. Likelihood Methods
    Corey Chivers
    Department of Biology, McGill University

    View Slide

  2. Script Format

    The script is divided into sections:
    ##@ x.x @##

    ...some commands...

    ### -- ###

    View Slide

  3. ##@ 0.1 @##

    We will need to install a package:
    ##@ 0.1 @##
    rm(list=ls()) # Housekeeping
    install.packages('emdbook') # Bolker's data
    library(emdbook)
    source('http://madere.biol.mcgill.ca/cchivers/ccMLE.R')
    ### -- ###

    View Slide

  4. The Likelihood Principle
    L |x∝ P X=x|

    All information contained in data x, with
    respect to inference about the value of θ, is
    contained in the likelihood function:

    View Slide

  5. The Likelihood Principle
    L.J. Savage R.A. Fisher

    View Slide

  6. The Likelihood Function
    L |x∝ P X=x|
    Where θ is(are) our parameter(s) of interest
    ex:
    Attack rate
    Fitness
    Mean body mass
    Mortality
    etc...
    L |x=f | x

    View Slide

  7. The Ecologist's Quarter

    Lands tails (caribou up) 60% of the time

    View Slide

  8. The Ecologist's Quarter

    1) Before flipping, what is the probability that I will flip tails,
    given that I am flipping an ecologist's quarter (p(tail=0.6))?

    2) What is the likelihood that I am flipping an ecologist's
    quarter, given the flip(s) that I have observed?
    Px |=0.6
    L=0.6| x

    View Slide

  9. L |x=∏
    t=1
    T
     ∏
    h=1
    H
    1−
    The Ecologist's Quarter
    L=0.6| x=H T TH T
    =∏
    t=1
    3
    0.6∏
    h=1
    2
    0.4
    =0.03456

    View Slide

  10. L |x=∏
    t=1
    T
     ∏
    h=1
    H
    1−
    The Ecologist's Quarter
    L=0.6| x=H T TH T
    =∏
    t=1
    3
    0.6∏
    h=1
    2
    0.4
    =0.03456
    But what does this
    mean?
    0.03456 ≠ P(θ) !!!!

    View Slide

  11. ##@ 1.1-2 @##

    @1.1 defines the likelihood function

    @1.2 flips an ecologist's quarter n times, and plots
    the likelihood function.

    Vary the number of flips (n), re-run, and see what
    happens.

    Vary the true θ parameter and see what happens.

    Show your neighbour your likelihood plot and see if
    they can guess θ. How about n?

    View Slide

  12. View Slide

  13. Likelihood So Far...

    The likelihood function is just the product of the
    individual likelihoods of each observation.

    The likelihood function carries all information contained in
    the data relevant to inference about parameters.

    Likelihood is a REALATIVE measure, and gets smaller as
    more data is observed.

    The 'best fitting' parameter value is the one that
    maximizes this function.

    View Slide

  14. Maximum Likelihood Estimation
    (MLE)

    MLE is a process of fitting model parameters to
    data. → “What are the parameter values which
    are most likely to have generated my data?”

    For many models, the maximum likelihood
    value can not be found analytically.

    In these cases, we use numerical optimization.
    >?optim

    View Slide

  15. Computers hate really small
    numbers

    As we have seen, as n gets
    larger, the likelihood can
    become very small.

    Instead of maximizing the likelihood function
    itself, we maximize the log likelihood.
    log ∏ x=∑ log x

    View Slide

  16. ##@ 1.3 @##
    >m_eco_q<-optim(par=0.5,
    fn=log_likelihood_ecol_quarter,
    lower=0,upper=1, # range of params to search
    method='Brent', # Use this method for 1D models
    control=list("fnscale"=-1),
    hessian=TRUE,
    q_flips=flips)
    Where the eff is
    this peak?

    View Slide

  17. ##@ 1.3 @##
    $par
    [1] 0.5
    $value
    [1] -6.931472
    $counts
    function gradient
    NA NA
    $convergence
    [1] 0
    $message
    NULL
    $hessian
    [,1]
    [1,] -40.00032
    Best fit parameter value
    (aka the MLE)
    Log likelihood at the MLE
    If ($convergence == 0)
    user = :-)

    View Slide

  18. ##@ 1.3 @##
    >confint(m_eco_q)
    2.5 % 97.5 %
    par 1 0.1901037 0.8098963
    WARNING: Assumes that the
    L(θ|x) is normally distributed.
    Usually safe with large n.

    View Slide

  19. Formulating Likelihood Functions

    Okay, that was easy, but how can I determine the
    likelihood function for the way I think the world
    works (aka my model)?

    A Model is just a mathematical representation of a
    process, often including both
    Deterministic and,
    Stochastic components.

    The likelihood function comes straight from the
    specification of the model.

    View Slide

  20. Example: Tadpole Predation
    ##@ 2.1 @##
    > data(ReedfrogFuncresp) # Load Frog predation data
    > ReedfrogFuncresp
    Initial Killed
    1 5 1
    2 5 2
    3 10 5
    4 10 6
    5 15 10
    6 15 9
    7 20 7
    8 20 10
    ...

    View Slide

  21. View Slide

  22. Example: Tadpole Predation

    Suppose tadpole predators have a Holling type-
    II functional response:

    Realized number eaten is binomial with
    probability p:
    p=
    a
    1ahN
    k ~Binom p, N
    Deterministic
    Stochastic

    View Slide

  23. Example: Tadpole Predation
    ##@ 2.2 @##

    View Slide

  24. Example: Tadpole Predation #@2.3
    Function Name Arguments
    Parameters Data
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View Slide

  25. Example: Tadpole Predation #@2.3
    Parameters
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View Slide

  26. Example: Tadpole Predation #@2.3
    Deterministic component
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View Slide

  27. binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }
    Example: Tadpole Predation #@2.3
    Stochastic Component

    View Slide

  28. ##@ 2.4-6 @##

    Compute the maximum likelihood values of the
    model parameters. (@2.4)

    Compute the approximate confidence interval
    and plot the likelihood surface (@2.5)

    Plot the fitted model prediction and (stochastic)
    confidence intervals over the data. (@2.6)

    View Slide

  29. ##@ 2.4 @##
    >m2 #Output of MLE routine
    $par
    [1] 0.52593924 0.01660613
    $value
    [1] -46.72136
    $counts
    function gradient
    53 NA
    $convergence
    [1] 0
    $message
    NULL
    $hessian
    [,1] [,2]
    [1,] -616.2803 7408.079
    [2,] 7408.0792 -131327.243
    Best fit parameter values
    (aka the MLE)
    Log likelihood at the MLE

    View Slide

  30. ##@ 2.5 @##
    > confint(m2,parnames = c("a","h"))
    2.5 % 97.5 %
    a 0.386789631 0.66508885
    h 0.007073913 0.02613834
    > domain<-confint(m2)
    > surface_plot(binomLL2,domain=domain,data=frogs)

    View Slide

  31. ##@ 2.6 @##

    View Slide

  32. But what if this isn't the right model?

    How would we know?

    View Slide

  33. But what if this isn't the right model?

    How would we know?

    Since the model space is infinite, all we can do
    is propose a suite of models, and allow them to
    compete in the arena (octagon?) of data.
    Model 1
    Model 2

    View Slide

  34. Extension: Model Selection

    Until now, we have been concerned with finding
    the best fitting parameter values, but what if we
    have competing models?

    These competing models may not have the
    same number of parameters.
    LM |, x=?

    View Slide

  35. Extension: Model Selection

    Akaike Information Criteria:

    k=number of parameters
    L=Maximum likelihood value

    Better models have lower AIC values

    Just like the likelihood itself, AIC is a RELATIVE
    measure.
    AIC=2k−2logL
    Hirotugo Akaike

    View Slide

  36. Akaike Weights

    Step 1: Compute the MLE for each candidate
    model

    Step 2: Compute each model's difference, Δ
    i
    ,
    from the 'best' model (lowest AIC value)

    Step 3: Compute w
    i
    w
    i
    =
    exp−1/2
    i


    m=1
    M
    exp−1/2
    m

    View Slide

  37. ##@ 3.1-4 @##

    Define the likelihood functions for two alternative models.
    (@3.1)

    Try to write down the mathematical formulation of each
    model.

    Compute the AIC for each of the three competing models.
    (@3.2)

    Compute the model weights for each of the three
    competing models. (@3.3)

    Which is the best fitting model?

    How will these values vary if we were to add more
    competing models to our set?

    Plot the predictions of each model (@3.4)

    View Slide

  38. w(m1) = 0.010
    w(m2) = 0.988
    w(m3) = 0.001
    Akaike Weights:

    View Slide

  39. ##@ 4.0 @##
    Sandbox

    View Slide

  40. Summary

    The likelihood function is simply the product of the
    individual likelihoods of each observation.

    Likelihood is a REALATIVE measure, and gets
    smaller as more data is observed.

    The 'best fitting' parameter value is the one that
    maximizes the likelihood function (MLE).

    Numerical optimization can be used to find the
    MLE.

    Competing models can be compared using AIC,
    which is based on each model's MLE.

    View Slide