Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Likelihood-based methods

Corey Chivers
May 03, 2013
550

Introduction to Likelihood-based methods

Corey Chivers

May 03, 2013
Tweet

Transcript

  1. Likelihood Methods
    Corey Chivers
    Department of Biology, McGill University

    View full-size slide

  2. Script Format

    The script is divided into sections:
    ##@ x.x @##

    ...some commands...

    ### -- ###

    View full-size slide

  3. ##@ 0.1 @##

    We will need to install a package:
    ##@ 0.1 @##
    rm(list=ls()) # Housekeeping
    install.packages('emdbook') # Bolker's data
    library(emdbook)
    source('http://madere.biol.mcgill.ca/cchivers/ccMLE.R')
    ### -- ###

    View full-size slide

  4. The Likelihood Principle
    L |x∝ P X=x|

    All information contained in data x, with
    respect to inference about the value of θ, is
    contained in the likelihood function:

    View full-size slide

  5. The Likelihood Principle
    L.J. Savage R.A. Fisher

    View full-size slide

  6. The Likelihood Function
    L |x∝ P X=x|
    Where θ is(are) our parameter(s) of interest
    ex:
    Attack rate
    Fitness
    Mean body mass
    Mortality
    etc...
    L |x=f | x

    View full-size slide

  7. The Ecologist's Quarter

    Lands tails (caribou up) 60% of the time

    View full-size slide

  8. The Ecologist's Quarter

    1) Before flipping, what is the probability that I will flip tails,
    given that I am flipping an ecologist's quarter (p(tail=0.6))?

    2) What is the likelihood that I am flipping an ecologist's
    quarter, given the flip(s) that I have observed?
    Px |=0.6
    L=0.6| x

    View full-size slide

  9. L |x=∏
    t=1
    T
     ∏
    h=1
    H
    1−
    The Ecologist's Quarter
    L=0.6| x=H T TH T
    =∏
    t=1
    3
    0.6∏
    h=1
    2
    0.4
    =0.03456

    View full-size slide

  10. L |x=∏
    t=1
    T
     ∏
    h=1
    H
    1−
    The Ecologist's Quarter
    L=0.6| x=H T TH T
    =∏
    t=1
    3
    0.6∏
    h=1
    2
    0.4
    =0.03456
    But what does this
    mean?
    0.03456 ≠ P(θ) !!!!

    View full-size slide

  11. ##@ 1.1-2 @##

    @1.1 defines the likelihood function

    @1.2 flips an ecologist's quarter n times, and plots
    the likelihood function.

    Vary the number of flips (n), re-run, and see what
    happens.

    Vary the true θ parameter and see what happens.

    Show your neighbour your likelihood plot and see if
    they can guess θ. How about n?

    View full-size slide

  12. Likelihood So Far...

    The likelihood function is just the product of the
    individual likelihoods of each observation.

    The likelihood function carries all information contained in
    the data relevant to inference about parameters.

    Likelihood is a REALATIVE measure, and gets smaller as
    more data is observed.

    The 'best fitting' parameter value is the one that
    maximizes this function.

    View full-size slide

  13. Maximum Likelihood Estimation
    (MLE)

    MLE is a process of fitting model parameters to
    data. → “What are the parameter values which
    are most likely to have generated my data?”

    For many models, the maximum likelihood
    value can not be found analytically.

    In these cases, we use numerical optimization.
    >?optim

    View full-size slide

  14. Computers hate really small
    numbers

    As we have seen, as n gets
    larger, the likelihood can
    become very small.

    Instead of maximizing the likelihood function
    itself, we maximize the log likelihood.
    log ∏ x=∑ log x

    View full-size slide

  15. ##@ 1.3 @##
    >m_eco_q<-optim(par=0.5,
    fn=log_likelihood_ecol_quarter,
    lower=0,upper=1, # range of params to search
    method='Brent', # Use this method for 1D models
    control=list("fnscale"=-1),
    hessian=TRUE,
    q_flips=flips)
    Where the eff is
    this peak?

    View full-size slide

  16. ##@ 1.3 @##
    $par
    [1] 0.5
    $value
    [1] -6.931472
    $counts
    function gradient
    NA NA
    $convergence
    [1] 0
    $message
    NULL
    $hessian
    [,1]
    [1,] -40.00032
    Best fit parameter value
    (aka the MLE)
    Log likelihood at the MLE
    If ($convergence == 0)
    user = :-)

    View full-size slide

  17. ##@ 1.3 @##
    >confint(m_eco_q)
    2.5 % 97.5 %
    par 1 0.1901037 0.8098963
    WARNING: Assumes that the
    L(θ|x) is normally distributed.
    Usually safe with large n.

    View full-size slide

  18. Formulating Likelihood Functions

    Okay, that was easy, but how can I determine the
    likelihood function for the way I think the world
    works (aka my model)?

    A Model is just a mathematical representation of a
    process, often including both
    Deterministic and,
    Stochastic components.

    The likelihood function comes straight from the
    specification of the model.

    View full-size slide

  19. Example: Tadpole Predation
    ##@ 2.1 @##
    > data(ReedfrogFuncresp) # Load Frog predation data
    > ReedfrogFuncresp
    Initial Killed
    1 5 1
    2 5 2
    3 10 5
    4 10 6
    5 15 10
    6 15 9
    7 20 7
    8 20 10
    ...

    View full-size slide

  20. Example: Tadpole Predation

    Suppose tadpole predators have a Holling type-
    II functional response:

    Realized number eaten is binomial with
    probability p:
    p=
    a
    1ahN
    k ~Binom p, N
    Deterministic
    Stochastic

    View full-size slide

  21. Example: Tadpole Predation
    ##@ 2.2 @##

    View full-size slide

  22. Example: Tadpole Predation #@2.3
    Function Name Arguments
    Parameters Data
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View full-size slide

  23. Example: Tadpole Predation #@2.3
    Parameters
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View full-size slide

  24. Example: Tadpole Predation #@2.3
    Deterministic component
    binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }

    View full-size slide

  25. binomLL2 <- function(pars,data)
    {
    a = pars[1]
    h = pars[2]
    predprob = a/(1+a*h*data$N)
    sum(dbinom(data$k,prob=predprob,
    size=data$N,
    log=TRUE))
    }
    Example: Tadpole Predation #@2.3
    Stochastic Component

    View full-size slide

  26. ##@ 2.4-6 @##

    Compute the maximum likelihood values of the
    model parameters. (@2.4)

    Compute the approximate confidence interval
    and plot the likelihood surface (@2.5)

    Plot the fitted model prediction and (stochastic)
    confidence intervals over the data. (@2.6)

    View full-size slide

  27. ##@ 2.4 @##
    >m2 #Output of MLE routine
    $par
    [1] 0.52593924 0.01660613
    $value
    [1] -46.72136
    $counts
    function gradient
    53 NA
    $convergence
    [1] 0
    $message
    NULL
    $hessian
    [,1] [,2]
    [1,] -616.2803 7408.079
    [2,] 7408.0792 -131327.243
    Best fit parameter values
    (aka the MLE)
    Log likelihood at the MLE

    View full-size slide

  28. ##@ 2.5 @##
    > confint(m2,parnames = c("a","h"))
    2.5 % 97.5 %
    a 0.386789631 0.66508885
    h 0.007073913 0.02613834
    > domain<-confint(m2)
    > surface_plot(binomLL2,domain=domain,data=frogs)

    View full-size slide

  29. But what if this isn't the right model?

    How would we know?

    View full-size slide

  30. But what if this isn't the right model?

    How would we know?

    Since the model space is infinite, all we can do
    is propose a suite of models, and allow them to
    compete in the arena (octagon?) of data.
    Model 1
    Model 2

    View full-size slide

  31. Extension: Model Selection

    Until now, we have been concerned with finding
    the best fitting parameter values, but what if we
    have competing models?

    These competing models may not have the
    same number of parameters.
    LM |, x=?

    View full-size slide

  32. Extension: Model Selection

    Akaike Information Criteria:

    k=number of parameters
    L=Maximum likelihood value

    Better models have lower AIC values

    Just like the likelihood itself, AIC is a RELATIVE
    measure.
    AIC=2k−2logL
    Hirotugo Akaike

    View full-size slide

  33. Akaike Weights

    Step 1: Compute the MLE for each candidate
    model

    Step 2: Compute each model's difference, Δ
    i
    ,
    from the 'best' model (lowest AIC value)

    Step 3: Compute w
    i
    w
    i
    =
    exp−1/2
    i


    m=1
    M
    exp−1/2
    m

    View full-size slide

  34. ##@ 3.1-4 @##

    Define the likelihood functions for two alternative models.
    (@3.1)

    Try to write down the mathematical formulation of each
    model.

    Compute the AIC for each of the three competing models.
    (@3.2)

    Compute the model weights for each of the three
    competing models. (@3.3)

    Which is the best fitting model?

    How will these values vary if we were to add more
    competing models to our set?

    Plot the predictions of each model (@3.4)

    View full-size slide

  35. w(m1) = 0.010
    w(m2) = 0.988
    w(m3) = 0.001
    Akaike Weights:

    View full-size slide

  36. ##@ 4.0 @##
    Sandbox

    View full-size slide

  37. Summary

    The likelihood function is simply the product of the
    individual likelihoods of each observation.

    Likelihood is a REALATIVE measure, and gets
    smaller as more data is observed.

    The 'best fitting' parameter value is the one that
    maximizes the likelihood function (MLE).

    Numerical optimization can be used to find the
    MLE.

    Competing models can be compared using AIC,
    which is based on each model's MLE.

    View full-size slide