Slide 1

Slide 1 text

Incorporating biological information into genomic prediction models Fanny Mollandin, Pascal Croiseau, Andrea Rau VistaMilk Artificial Intelligence in Agriculture Masterclass @ Zoom February 8, 2023 [email protected] Biological priors in genomic prediction models 1 / 21

Slide 2

Slide 2 text

Introduction Context Genomic selection overview Objective: select the best animals for reproduction to obtain genetic improvement of the population on traits of interest [email protected] Biological priors in genomic prediction models 2 / 21

Slide 3

Slide 3 text

Introduction Context Genomic selection overview Objective: select the best animals for reproduction to obtain genetic improvement of the population on traits of interest Low- to high-density genotyping chips (10k-100k SNPs) → whole genome sequencing (10MM SNPs) [email protected] Biological priors in genomic prediction models 2 / 21

Slide 4

Slide 4 text

Introduction Context Genomic selection overview Objective: select the best animals for reproduction to obtain genetic improvement of the population on traits of interest Low- to high-density genotyping chips (10k-100k SNPs) → whole genome sequencing (10MM SNPs) Image: F. Mollandin [email protected] Biological priors in genomic prediction models 2 / 21

Slide 5

Slide 5 text

Introduction Context Genomic selection overview Objective: select the best animals for reproduction to obtain genetic improvement of the population on traits of interest Low- to high-density genotyping chips (10k-100k SNPs) → whole genome sequencing (10MM SNPs) Image: F. Mollandin [email protected] Biological priors in genomic prediction models 2 / 21

Slide 6

Slide 6 text

Introduction Context Prediction models for genomic selection Goal: given a training set of data (Yi , Xi , Zi ) for i = 1, . . . , n individuals Yi = trait Xi = vector of (usually genome-wide) genotypes Zi = vector of covariates (age, location, sex, ...) ... predict the unobserved trait Y⋆ of a future individual with corresponding X⋆ and Z⋆ [email protected] Biological priors in genomic prediction models 3 / 21

Slide 7

Slide 7 text

Introduction Context Prediction models for genomic selection Goal: given a training set of data (Yi , Xi , Zi ) for i = 1, . . . , n individuals Yi = trait Xi = vector of (usually genome-wide) genotypes Zi = vector of covariates (age, location, sex, ...) ... predict the unobserved trait Y⋆ of a future individual with corresponding X⋆ and Z⋆ Introduced by Meuwissen et al. (2001) Successfully implemented in many plant/animal breeds for traits related to production, health, climate adaptation, ... Modest gains in predictions can have large economic impacts (reduced generation interval, reduced cost and labor for phenotyping) [email protected] Biological priors in genomic prediction models 3 / 21

Slide 8

Slide 8 text

Introduction Context Challenges of genomic prediction models Non-random association between alleles at neighboring loci (aka LD) Polygenic nature of complex traits Many more SNPs (variables) than individuals (observations) ⇒ curse of dimensionality Including too many predictors in a model risks over-fitting, poor generalizability, and problems with model estimation ... but including only a small pre-identified subset of SNPs (e.g., significant GWAS hits) usually leads to poor predictions → Balance computational/statistical feasibility and biologically realistic assumptions [email protected] Biological priors in genomic prediction models 4 / 21

Slide 9

Slide 9 text

Introduction Context Challenges of genomic prediction models Non-random association between alleles at neighboring loci (aka LD) Polygenic nature of complex traits Many more SNPs (variables) than individuals (observations) ⇒ curse of dimensionality Including too many predictors in a model risks over-fitting, poor generalizability, and problems with model estimation ... but including only a small pre-identified subset of SNPs (e.g., significant GWAS hits) usually leads to poor predictions → Balance computational/statistical feasibility and biologically realistic assumptions Can genomic prediction models be improved by better accounting for our knowledge about the function of certain regions of the genome? [email protected] Biological priors in genomic prediction models 4 / 21

Slide 10

Slide 10 text

Introduction Functional annotations Context: H2020 GENE-SWitCH project The regulatory GENomE of Swine & Chicken: functional annotation during development High-quality richly annotated maps of pig and chicken genomes: Development: early/late organogenesis, new born/hatched, adult Sexes: {M,F} × 3 biological replicates Tissues: liver, skeletal muscle, small intestine, cerebellum, dorsal epidermis, lung, kidney Assays: RNA-seq, ATAC-seq, ChIP-seq, smRNA-seq, methylation, Hi-C [email protected] Biological priors in genomic prediction models 5 / 21

Slide 11

Slide 11 text

Introduction Functional annotations Context: H2020 GENE-SWitCH project The regulatory GENomE of Swine & Chicken: functional annotation during development High-quality richly annotated maps of pig and chicken genomes: Development: early/late organogenesis, new born/hatched, adult Sexes: {M,F} × 3 biological replicates Tissues: liver, skeletal muscle, small intestine, cerebellum, dorsal epidermis, lung, kidney Assays: RNA-seq, ATAC-seq, ChIP-seq, smRNA-seq, methylation, Hi-C But how? [email protected] Biological priors in genomic prediction models 5 / 21

Slide 12

Slide 12 text

Introduction Models for genomic prediction First, back to basics: the linear model The workhorse of genomic prediction is the multiple linear regression model: Y = Zθ+Xβ + ε Y = n-vector of traits Z = n × m matrix of covariates θ = m-vector of covariate effect parameters X = n × p matrix of (suitably coded) genotypes β = p-vector of genetic effect parameters ε = n-vector of errors representing noise, assumed to be iid and (usually) normally distributed [email protected] Biological priors in genomic prediction models 6 / 21

Slide 13

Slide 13 text

Introduction Models for genomic prediction Bayesian methods for genomic prediction Image: 10.1007/s10681-007-9516-1 [email protected] Biological priors in genomic prediction models 7 / 21

Slide 14

Slide 14 text

Introduction Models for genomic prediction Bayesian methods for genomic prediction Image: 10.1007/s10681-007-9516-1 likelihood × prior n i=1 N  Yi |  µ + p j=1 Xij βj   , σ2   × p(σ2) p j=1 p(βj |Ψ) σ2 often assigned a χ−2 prior distribution Choice of prior for βj should ideally reflect a trait’s genetic architecture (and be computationally feasible...) [email protected] Biological priors in genomic prediction models 7 / 21

Slide 15

Slide 15 text

Introduction Models for genomic prediction Which prior to use for βj ? Image: 10.1543/genetics.112.143313 [email protected] Biological priors in genomic prediction models 8 / 21

Slide 16

Slide 16 text

Introduction Models for genomic prediction Which prior to use for βj ? Image: 10.1543/genetics.112.143313 GBLUP: βi ∼ N(0, σ2 β ) [email protected] Biological priors in genomic prediction models 8 / 21

Slide 17

Slide 17 text

Introduction Models for genomic prediction Which prior to use for βj ? Image: 10.1543/genetics.112.143313 GBLUP: βi ∼ N(0, σ2 β ) BayesA: βi ∼ N(0, σ2 βi ), σ2 βi ∼ Inv χ2(ν, S2) BayesB: βi ∼ N(0, σ2 βi ), σ2 βi ∼ πδ(0) + (1 − π)Inv χ2(ν, S2), π fixed [email protected] Biological priors in genomic prediction models 8 / 21

Slide 18

Slide 18 text

Introduction Models for genomic prediction Which prior to use for βj ? Image: 10.1543/genetics.112.143313 GBLUP: βi ∼ N(0, σ2 β ) BayesA: βi ∼ N(0, σ2 βi ), σ2 βi ∼ Inv χ2(ν, S2) BayesB: βi ∼ N(0, σ2 βi ), σ2 βi ∼ πδ(0) + (1 − π)Inv χ2(ν, S2), π fixed BayesC: βi ∼ πδ(0) + (1 − π)N(0, σ2 β ), σ2 β ∼ Inv χ2(ν, S2) , π fixed BayesCπ: BayesC with π ∼ Unif(0, 1) [email protected] Biological priors in genomic prediction models 8 / 21

Slide 19

Slide 19 text

Introduction Models for genomic prediction BayesR (Erbe et al., 2012) π ∼ Dirichlet(α), with α = (1, 1, 1, 1) Gibbs sampler for estimation [email protected] Biological priors in genomic prediction models 9 / 21

Slide 20

Slide 20 text

Introduction Incorporating disjoint annotations Back to annotations: BayesRC (MacLeod et al., 2016) SNPs assigned to disjoint “annotations”, model is a factorized BayesR πc ∼ Dirichlet(α), with α = (1, 1, 1, 1) Gibbs sampler for estimation [email protected] Biological priors in genomic prediction models 10 / 21

Slide 21

Slide 21 text

BayesRCO models Overview From BayesR to BayesRC ... and beyond [email protected] Biological priors in genomic prediction models 11 / 21

Slide 22

Slide 22 text

BayesRCO models Overview From BayesR to BayesRC ... and beyond [email protected] Biological priors in genomic prediction models 11 / 21

Slide 23

Slide 23 text

BayesRCO models Overview From BayesR to BayesRC ... and beyond [email protected] Biological priors in genomic prediction models 11 / 21

Slide 24

Slide 24 text

BayesRCO models Overview From BayesR to BayesRC ... and beyond [email protected] Biological priors in genomic prediction models 11 / 21

Slide 25

Slide 25 text

BayesRCO models Overview From BayesR to BayesRC ... and beyond [email protected] Biological priors in genomic prediction models 11 / 21

Slide 26

Slide 26 text

BayesRCO models Model definition BayesRCO: BayesRC for Overlapping annotations Two hypotheses = two models! 1 Multi-annotations represent added confidence→ BayesRC+ 2 Multi-annotations represent uncertainty → BayesRCπ [email protected] Biological priors in genomic prediction models 12 / 21

Slide 27

Slide 27 text

Simulations Strategy Simulation strategy [email protected] Biological priors in genomic prediction models 13 / 21

Slide 28

Slide 28 text

Simulations Results Evaluating impact of using annotations on validation data [email protected] Biological priors in genomic prediction models 14 / 21

Slide 29

Slide 29 text

Simulations Results BayesRCπ assigns informative annotations to QTLs h2 = 0.5, k = 1%, scenario A PAIP = posterior annotation inclusion probability (BayesRCπ output) [email protected] Biological priors in genomic prediction models 15 / 21

Slide 30

Slide 30 text

Simulations Results BayesRC+ assigns more weight to multi-annotated variants h2 = 0.5, k = 1%, scenario C [email protected] Biological priors in genomic prediction models 16 / 21

Slide 31

Slide 31 text

Real data analysis Description Application in backcross population of growing pigs n = 1297 backcross pigs (3/4 Large-White, 1/4 Creole), genetically related sows sired with 10 boars Genotyped with Illumina Porcine 60k BeadChip array Sibling-structured 10-fold cross validation procedure Traits pre-corrected for age, sex, farm Focus on average daily weight gain (ADG) and backfat thickness (BFT) at 23 weeks [email protected] Biological priors in genomic prediction models 17 / 21

Slide 32

Slide 32 text

Real data analysis Results Correlation of predicted traits in pig validation data Annotations constructed using pigQTLdb for 11 trait sub-hierarchies Anatomy, behavioral, blood parameters, conformation, fatness, fatty acid content, feed conversion, growth, immune capacity, litter, reproductive organs Nearest up- and downstream neighboring markers also annotated [email protected] Biological priors in genomic prediction models 18 / 21

Slide 33

Slide 33 text

Real data analysis Results Correlation of predicted traits in pig validation data Annotations constructed using pigQTLdb for 11 trait sub-hierarchies Anatomy, behavioral, blood parameters, conformation, fatness, fatty acid content, feed conversion, growth, immune capacity, litter, reproductive organs Nearest up- and downstream neighboring markers also annotated [email protected] Biological priors in genomic prediction models 18 / 21

Slide 34

Slide 34 text

Real data analysis Results Interpreting pigQTLdb annotations with BayesRCπ [email protected] Biological priors in genomic prediction models 19 / 21

Slide 35

Slide 35 text

Wrapping up... Conclusions: incorporating annotations with BayesRCO BayesRCO: → BayesRCπ can assign informative annotations to multi-annotated SNPs to account for uncertainty in prior knowledge → BayesRC+ upweights multi-annotated SNPs and is robust to various annotation scenarios Fairly modest improvements in prediction (∼1-2 points) observed when incorporating biological annotations Improved predictions and rankings of large QTLs in simulations, especially for highly informative annotations Slight improvement in predictions for some traits in real data Strategies for constructing annotation categories impact results [email protected] Biological priors in genomic prediction models 20 / 21

Slide 36

Slide 36 text

Wrapping up... Take home messages Can genomic prediction models be improved by better accounting for our knowledge about the function of certain regions of the genome? [email protected] Biological priors in genomic prediction models 21 / 21

Slide 37

Slide 37 text

Wrapping up... Take home messages Can genomic prediction models be improved by better accounting for our knowledge about the function of certain regions of the genome? Yes, sometimes. [email protected] Biological priors in genomic prediction models 21 / 21

Slide 38

Slide 38 text

Wrapping up... Take home messages Can genomic prediction models be improved by better accounting for our knowledge about the function of certain regions of the genome? Yes, sometimes. Models → BayesRCO for overlapping annotation categories, extensions in progress to handle quantitative annotations Genotyping data → Capitalizing on annotation maps likely requires WGS resolution Validation data → Greater potential gains when prediction is performed on genetically distant populations Traits → Heritability, genetic architecture, link with annotations, ... Annotations → Which molecular assays, in which tissues? [email protected] Biological priors in genomic prediction models 21 / 21

Slide 39

Slide 39 text

Thank you! Mollandin et al. (2022) Accounting for overlapping annotations in genomic prediction models of complex traits, BMC Bioinformatics, 23:65. https://github.com/FAANG/BayesRCO