Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Incorporating biological information into genomic prediction models

Andrea Rau
February 07, 2023

Incorporating biological information into genomic prediction models

VistaMilk Artificial Intelligence in Agriculture Masterclass
February 8, 2023 (online)
https://www.vistamilk.ie/event/artificial-intelligence-in-agriculture-masterclass/

Andrea Rau

February 07, 2023
Tweet

More Decks by Andrea Rau

Other Decks in Science

Transcript

  1. Incorporating biological information
    into genomic prediction models
    Fanny Mollandin, Pascal Croiseau, Andrea Rau
    VistaMilk
    Artificial Intelligence in Agriculture Masterclass @ Zoom
    February 8, 2023
    [email protected] Biological priors in genomic prediction models 1 / 21

    View full-size slide

  2. Introduction Context
    Genomic selection overview
    Objective: select the best animals for reproduction to obtain genetic
    improvement of the population on traits of interest
    [email protected] Biological priors in genomic prediction models 2 / 21

    View full-size slide

  3. Introduction Context
    Genomic selection overview
    Objective: select the best animals for reproduction to obtain genetic
    improvement of the population on traits of interest
    Low- to high-density genotyping
    chips (10k-100k SNPs)
    → whole genome sequencing
    (10MM SNPs)
    [email protected] Biological priors in genomic prediction models 2 / 21

    View full-size slide

  4. Introduction Context
    Genomic selection overview
    Objective: select the best animals for reproduction to obtain genetic
    improvement of the population on traits of interest
    Low- to high-density genotyping
    chips (10k-100k SNPs)
    → whole genome sequencing
    (10MM SNPs) Image: F. Mollandin
    [email protected] Biological priors in genomic prediction models 2 / 21

    View full-size slide

  5. Introduction Context
    Genomic selection overview
    Objective: select the best animals for reproduction to obtain genetic
    improvement of the population on traits of interest
    Low- to high-density genotyping
    chips (10k-100k SNPs)
    → whole genome sequencing
    (10MM SNPs) Image: F. Mollandin
    [email protected] Biological priors in genomic prediction models 2 / 21

    View full-size slide

  6. Introduction Context
    Prediction models for genomic selection
    Goal: given a training set of data (Yi , Xi , Zi ) for i = 1, . . . , n individuals
    Yi = trait
    Xi = vector of (usually genome-wide) genotypes
    Zi = vector of covariates (age, location, sex, ...)
    ... predict the unobserved trait Y⋆ of a future individual with
    corresponding X⋆ and Z⋆
    [email protected] Biological priors in genomic prediction models 3 / 21

    View full-size slide

  7. Introduction Context
    Prediction models for genomic selection
    Goal: given a training set of data (Yi , Xi , Zi ) for i = 1, . . . , n individuals
    Yi = trait
    Xi = vector of (usually genome-wide) genotypes
    Zi = vector of covariates (age, location, sex, ...)
    ... predict the unobserved trait Y⋆ of a future individual with
    corresponding X⋆ and Z⋆
    Introduced by Meuwissen et al. (2001)
    Successfully implemented in many plant/animal breeds for traits
    related to production, health, climate adaptation, ...
    Modest gains in predictions can have large economic impacts
    (reduced generation interval, reduced cost and labor for phenotyping)
    [email protected] Biological priors in genomic prediction models 3 / 21

    View full-size slide

  8. Introduction Context
    Challenges of genomic prediction models
    Non-random association between alleles at neighboring loci (aka LD)
    Polygenic nature of complex traits
    Many more SNPs (variables) than individuals (observations) ⇒ curse
    of dimensionality
    Including too many predictors in a model risks over-fitting, poor
    generalizability, and problems with model estimation
    ... but including only a small pre-identified subset of SNPs (e.g.,
    significant GWAS hits) usually leads to poor predictions
    → Balance computational/statistical feasibility and biologically realistic
    assumptions
    [email protected] Biological priors in genomic prediction models 4 / 21

    View full-size slide

  9. Introduction Context
    Challenges of genomic prediction models
    Non-random association between alleles at neighboring loci (aka LD)
    Polygenic nature of complex traits
    Many more SNPs (variables) than individuals (observations) ⇒ curse
    of dimensionality
    Including too many predictors in a model risks over-fitting, poor
    generalizability, and problems with model estimation
    ... but including only a small pre-identified subset of SNPs (e.g.,
    significant GWAS hits) usually leads to poor predictions
    → Balance computational/statistical feasibility and biologically realistic
    assumptions
    Can genomic prediction models be improved by better accounting for
    our knowledge about the function of certain regions of the genome?
    [email protected] Biological priors in genomic prediction models 4 / 21

    View full-size slide

  10. Introduction Functional annotations
    Context: H2020 GENE-SWitCH project
    The regulatory GENomE of Swine & Chicken: functional annotation during development
    High-quality richly annotated maps of pig and chicken genomes:
    Development: early/late organogenesis, new born/hatched, adult
    Sexes: {M,F} × 3 biological replicates
    Tissues: liver, skeletal muscle, small intestine, cerebellum, dorsal
    epidermis, lung, kidney
    Assays: RNA-seq, ATAC-seq, ChIP-seq, smRNA-seq, methylation, Hi-C
    [email protected] Biological priors in genomic prediction models 5 / 21

    View full-size slide

  11. Introduction Functional annotations
    Context: H2020 GENE-SWitCH project
    The regulatory GENomE of Swine & Chicken: functional annotation during development
    High-quality richly annotated maps of pig and chicken genomes:
    Development: early/late organogenesis, new born/hatched, adult
    Sexes: {M,F} × 3 biological replicates
    Tissues: liver, skeletal muscle, small intestine, cerebellum, dorsal
    epidermis, lung, kidney
    Assays: RNA-seq, ATAC-seq, ChIP-seq, smRNA-seq, methylation, Hi-C
    But how?
    [email protected] Biological priors in genomic prediction models 5 / 21

    View full-size slide

  12. Introduction Models for genomic prediction
    First, back to basics: the linear model
    The workhorse of genomic prediction is the multiple linear regression
    model:
    Y = Zθ+Xβ + ε
    Y = n-vector of traits
    Z = n × m matrix of covariates
    θ = m-vector of covariate effect parameters
    X = n × p matrix of (suitably coded) genotypes
    β = p-vector of genetic effect parameters
    ε = n-vector of errors representing noise, assumed to be iid and
    (usually) normally distributed
    [email protected] Biological priors in genomic prediction models 6 / 21

    View full-size slide

  13. Introduction Models for genomic prediction
    Bayesian methods for genomic prediction
    Image: 10.1007/s10681-007-9516-1
    [email protected] Biological priors in genomic prediction models 7 / 21

    View full-size slide

  14. Introduction Models for genomic prediction
    Bayesian methods for genomic prediction
    Image: 10.1007/s10681-007-9516-1
    likelihood × prior
    n
    i=1
    N

    Yi |

    µ +
    p
    j=1
    Xij βj

     , σ2

     × p(σ2)
    p
    j=1
    p(βj
    |Ψ)
    σ2 often assigned a χ−2 prior distribution
    Choice of prior for βj
    should ideally reflect a trait’s genetic
    architecture (and be computationally feasible...)
    [email protected] Biological priors in genomic prediction models 7 / 21

    View full-size slide

  15. Introduction Models for genomic prediction
    Which prior to use for βj
    ?
    Image: 10.1543/genetics.112.143313
    [email protected] Biological priors in genomic prediction models 8 / 21

    View full-size slide

  16. Introduction Models for genomic prediction
    Which prior to use for βj
    ?
    Image: 10.1543/genetics.112.143313
    GBLUP: βi ∼ N(0, σ2
    β
    )
    [email protected] Biological priors in genomic prediction models 8 / 21

    View full-size slide

  17. Introduction Models for genomic prediction
    Which prior to use for βj
    ?
    Image: 10.1543/genetics.112.143313
    GBLUP: βi ∼ N(0, σ2
    β
    )
    BayesA: βi ∼ N(0, σ2
    βi
    ), σ2
    βi
    ∼ Inv χ2(ν, S2)
    BayesB: βi ∼ N(0, σ2
    βi
    ), σ2
    βi
    ∼ πδ(0) + (1 − π)Inv χ2(ν, S2), π fixed
    [email protected] Biological priors in genomic prediction models 8 / 21

    View full-size slide

  18. Introduction Models for genomic prediction
    Which prior to use for βj
    ?
    Image: 10.1543/genetics.112.143313
    GBLUP: βi ∼ N(0, σ2
    β
    )
    BayesA: βi ∼ N(0, σ2
    βi
    ), σ2
    βi
    ∼ Inv χ2(ν, S2)
    BayesB: βi ∼ N(0, σ2
    βi
    ), σ2
    βi
    ∼ πδ(0) + (1 − π)Inv χ2(ν, S2), π fixed
    BayesC: βi ∼ πδ(0) + (1 − π)N(0, σ2
    β
    ), σ2
    β
    ∼ Inv χ2(ν, S2) , π fixed
    BayesCπ: BayesC with π ∼ Unif(0, 1)
    [email protected] Biological priors in genomic prediction models 8 / 21

    View full-size slide

  19. Introduction Models for genomic prediction
    BayesR (Erbe et al., 2012)
    π ∼ Dirichlet(α), with α = (1, 1, 1, 1)
    Gibbs sampler for estimation
    [email protected] Biological priors in genomic prediction models 9 / 21

    View full-size slide

  20. Introduction Incorporating disjoint annotations
    Back to annotations: BayesRC (MacLeod et al., 2016)
    SNPs assigned to disjoint “annotations”, model is a factorized
    BayesR
    πc ∼ Dirichlet(α), with α = (1, 1, 1, 1)
    Gibbs sampler for estimation
    [email protected] Biological priors in genomic prediction models 10 / 21

    View full-size slide

  21. BayesRCO models Overview
    From BayesR to BayesRC ... and beyond
    [email protected] Biological priors in genomic prediction models 11 / 21

    View full-size slide

  22. BayesRCO models Overview
    From BayesR to BayesRC ... and beyond
    [email protected] Biological priors in genomic prediction models 11 / 21

    View full-size slide

  23. BayesRCO models Overview
    From BayesR to BayesRC ... and beyond
    [email protected] Biological priors in genomic prediction models 11 / 21

    View full-size slide

  24. BayesRCO models Overview
    From BayesR to BayesRC ... and beyond
    [email protected] Biological priors in genomic prediction models 11 / 21

    View full-size slide

  25. BayesRCO models Overview
    From BayesR to BayesRC ... and beyond
    [email protected] Biological priors in genomic prediction models 11 / 21

    View full-size slide

  26. BayesRCO models Model definition
    BayesRCO: BayesRC for Overlapping annotations
    Two hypotheses = two models!
    1 Multi-annotations represent added confidence→ BayesRC+
    2 Multi-annotations represent uncertainty → BayesRCπ
    [email protected] Biological priors in genomic prediction models 12 / 21

    View full-size slide

  27. Simulations Strategy
    Simulation strategy
    [email protected] Biological priors in genomic prediction models 13 / 21

    View full-size slide

  28. Simulations Results
    Evaluating impact of using annotations on validation data
    [email protected] Biological priors in genomic prediction models 14 / 21

    View full-size slide

  29. Simulations Results
    BayesRCπ assigns informative annotations to QTLs
    h2 = 0.5, k = 1%, scenario A
    PAIP = posterior annotation inclusion probability (BayesRCπ output)
    [email protected] Biological priors in genomic prediction models 15 / 21

    View full-size slide

  30. Simulations Results
    BayesRC+ assigns more weight to multi-annotated variants
    h2 = 0.5, k = 1%, scenario C
    [email protected] Biological priors in genomic prediction models 16 / 21

    View full-size slide

  31. Real data analysis Description
    Application in backcross population of growing pigs
    n = 1297 backcross pigs (3/4 Large-White, 1/4 Creole), genetically
    related sows sired with 10 boars
    Genotyped with Illumina Porcine 60k BeadChip array
    Sibling-structured 10-fold cross validation procedure
    Traits pre-corrected for age, sex, farm
    Focus on average daily weight gain (ADG) and backfat thickness
    (BFT) at 23 weeks
    [email protected] Biological priors in genomic prediction models 17 / 21

    View full-size slide

  32. Real data analysis Results
    Correlation of predicted traits in pig validation data
    Annotations constructed using pigQTLdb for 11 trait sub-hierarchies
    Anatomy, behavioral, blood parameters, conformation, fatness, fatty acid
    content, feed conversion, growth, immune capacity, litter, reproductive
    organs
    Nearest up- and downstream neighboring markers also annotated
    [email protected] Biological priors in genomic prediction models 18 / 21

    View full-size slide

  33. Real data analysis Results
    Correlation of predicted traits in pig validation data
    Annotations constructed using pigQTLdb for 11 trait sub-hierarchies
    Anatomy, behavioral, blood parameters, conformation, fatness, fatty acid
    content, feed conversion, growth, immune capacity, litter, reproductive
    organs
    Nearest up- and downstream neighboring markers also annotated
    [email protected] Biological priors in genomic prediction models 18 / 21

    View full-size slide

  34. Real data analysis Results
    Interpreting pigQTLdb annotations with BayesRCπ
    [email protected] Biological priors in genomic prediction models 19 / 21

    View full-size slide

  35. Wrapping up...
    Conclusions: incorporating annotations with BayesRCO
    BayesRCO:
    → BayesRCπ can assign informative annotations to multi-annotated
    SNPs to account for uncertainty in prior knowledge
    → BayesRC+ upweights multi-annotated SNPs and is robust to various
    annotation scenarios
    Fairly modest improvements in prediction (∼1-2 points) observed
    when incorporating biological annotations
    Improved predictions and rankings of large QTLs in simulations,
    especially for highly informative annotations
    Slight improvement in predictions for some traits in real data
    Strategies for constructing annotation categories impact results
    [email protected] Biological priors in genomic prediction models 20 / 21

    View full-size slide

  36. Wrapping up...
    Take home messages
    Can genomic prediction models be improved by better accounting for
    our knowledge about the function of certain regions of the genome?
    [email protected] Biological priors in genomic prediction models 21 / 21

    View full-size slide

  37. Wrapping up...
    Take home messages
    Can genomic prediction models be improved by better accounting for
    our knowledge about the function of certain regions of the genome?
    Yes, sometimes.
    [email protected] Biological priors in genomic prediction models 21 / 21

    View full-size slide

  38. Wrapping up...
    Take home messages
    Can genomic prediction models be improved by better accounting for
    our knowledge about the function of certain regions of the genome?
    Yes, sometimes.
    Models → BayesRCO for overlapping annotation categories,
    extensions in progress to handle quantitative annotations
    Genotyping data → Capitalizing on annotation maps likely requires
    WGS resolution
    Validation data → Greater potential gains when prediction is
    performed on genetically distant populations
    Traits → Heritability, genetic architecture, link with annotations, ...
    Annotations → Which molecular assays, in which tissues?
    [email protected] Biological priors in genomic prediction models 21 / 21

    View full-size slide

  39. Thank you!
    Mollandin et al. (2022) Accounting for overlapping annotations in genomic
    prediction models of complex traits, BMC Bioinformatics, 23:65.
    https://github.com/FAANG/BayesRCO

    View full-size slide