Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nimbus ruby library.

Juanjo Bazán
October 12, 2011

Nimbus ruby library.

Nimbus is a ruby gem implementing random forest algorithm for genomic selection contexts.
http://www.nimbusgem.org

Slides of the presentation from the EAAP 2011 conference in Stavanger, Norway.

Juanjo Bazán

October 12, 2011
Tweet

More Decks by Juanjo Bazán

Other Decks in Research

Transcript

  1. An opensource library to implement random forests in genomic contexts
    Oscar González-Recio Juanjo Bazán Selma Forni

    View Slide

  2. The Problem

    View Slide

  3. The Problem
    Massive amount of information from high troughput genotyping platforms.

    View Slide

  4. The Problem
    Massive amount of information from high troughput genotyping platforms.
    Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

    View Slide

  5. The Problem
    Massive amount of information from high troughput genotyping platforms.
    Massive amount of information consumes the attention of its recipients.
    We need to allocate that attention efciently.
    Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

    View Slide

  6. Why Random Forest?

    View Slide

  7. Why Random Forest?
    Using Machine Learning techniques we can extract hidden relationships that
    exist in these huge volumes of data and do not follow a particular parametric
    design.

    View Slide

  8. Why Random Forest?
    Using Machine Learning techniques we can extract hidden relationships that
    exist in these huge volumes of data and do not follow a particular parametric
    design.
    Random Forest have desirable statistical properties.

    View Slide

  9. Why Random Forest?
    Using Machine Learning techniques we can extract hidden relationships that
    exist in these huge volumes of data and do not follow a particular parametric
    design.
    Random Forest have desirable statistical properties.
    Random Forest scales well computationally.

    View Slide

  10. Using Machine Learning techniques we can extract hidden relationships that
    exist in these huge volumes of data and do not follow a particular parametric
    design.
    Random Forest have desirable statistical properties.
    Random Forest scales well computationally.
    Random Forest performs extremely well in a variety of possible complex
    domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).
    Why Random Forest?

    View Slide

  11. The Algorithm

    View Slide

  12. Ensemble methods:
    - Combination of diferent methods (usually simple models).
    - They have very good predictive ability because use additivity of models performances.
    Based on Classifcation And Regression Trees (CART).
    Use Randomization and Bagging.
    Performs Feature Subset Selection.
    Convenient for classifcation problems.
    Fast computation.
    Simple interpretation of results for human minds.
    Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)
    The Algorithm

    View Slide

  13. The Algorithm
    Perform bootstrap on data: Ψ* = (y, X)
    Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.
    Repeat M times to reduce residuals by a factor of M.
    Average estimates c0 = μ ; ci = 1/M

    View Slide

  14. Let Ψ = (y, X) be a set of data, with
    y = vector of phenotypes (response variables)
    X = (x1, x2) = matrix of features
    y1 x11 x12
    … … ...
    yi xi1 xi2
    … … ...
    yn xn1 xn2
    The Algorithm
    Classifcation and regression trees:

    View Slide

  15. The Algorithm

    View Slide

  16. Nimbus library

    View Slide

  17. Nimbus library
    Written in Ruby
    www.ruby-lang.org
    Open source programming language
    Syntax focused on simplicity
    Natural to read and easy to write

    View Slide

  18. Nimbus library
    How to install:
    > gem install nimbus
    Prerequisites:
    Ruby and Rubygems (default library manager) installed in the system

    View Slide

  19. Nimbus library
    How to run:
    > nimbus
    Confguration:
    Via confg.yml fle

    View Slide

  20. Nimbus library
    confg.yml fle:

    View Slide

  21. Nimbus library
    Input fles:
    training
    testing

    View Slide

  22. Nimbus library
    Features, use cases:
    Training of a prediction forest
    Training of a prediction forest
    Genomic Prediction of a testing sample
    Using a training set of individuals
    Nimbus creates a reutilizable forest
    Nimbus calculates SNP importances
    Generalization error are computed for every tree in the forest
    Using a custom/reused forest, specif
    i ed via conf
    i g.yml
    Using a new trained forest

    View Slide

  23. Outputs
    Random Forest fle
    In standard YAML format

    View Slide

  24. Outputs
    Predictions for the training sample

    View Slide

  25. Outputs
    Predictions for the testing sample

    View Slide

  26. Outputs
    SNP importances

    View Slide

  27. More info:
    Nimbus website:
    Source code:
    Report bugs/request features:
    www.nimbusgem.org
    www.github.com/xuanxu/nimbus/issues
    www.github.com/xuanxu/nimbus

    View Slide

  28. Thank you!

    View Slide

  29. Questions?

    View Slide