Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anaconda for R users

Anaconda for R users

Anaconda is a popular open-source Python distribution that includes more than 200 packages for scientific computing and data science. Recently, the Anaconda team released the “R Essentials” bundle with the IRKernel, which allows users to run R directly from a Jupyter notebook, and over 80 of the most used R packages for data science, including dplyr, shiny, ggplot2, tidyr, caret and nnet.

Anaconda includes Bokeh (http://bokeh.pydata.org/en/latest/) which is a visualization library that provides a flexible and powerful declarative framework for creating web-based plots. Bokeh renders plots using HTML canvas and provides many mechanisms for interactivity. Bokeh has interfaces in Python, Scala, Julia, and R, which is included in the "R-Essentials" bundle as rbokeh.

In this talk we will present how to get "R-Essentials", use conda for package and environment management, run Jupyter notebooks with the IRKernel and build interactive visualizations with rbokeh (http://hafen.github.io/rbokeh).

Christine Doig

November 03, 2015

More Decks by Christine Doig

Other Decks in Programming


  1. 2 • Industrial Engineer, UPC, Barcelona • Master Thesis -

    Voltage Stability Analysis, Aachen, Germany • Process Engineer - Operations Research, P&G • Business Analyst/Consultant, “La Caixa” • Quantitative Techniques for Finantial Markets, FME, UPC • Master in Data Mining and BI, FIB-UPC, Barcelona • Data Scientist, Continuum Analytics, Austin, Texas My background… Matlab C SAP SAS Excel VB Matlab SQL R Python R
  2. 3 • DARPA Memex - Human Trafficking • Python Advocate

    and Conference Speaker: PyCon Montreal, PyData Berlin, PyData Dallas, SciPy Austin, Europython Bilbao… • Blaze PM: http://blaze.pydata.org/ • Python Trainings • Blogs: • "Conda for Data Science" (https://www.continuum.io/content/conda-data-science) • "Jupyter and Conda for R" (https://www.continuum.io/blog/developer/jupyter-and-conda-r). at Continuum Analytics…
  3. 5 Continuum Analytics… …distributes Anaconda, an open source Python distribution

    that includes more than 300 packages for scientific computing and data science …provides enterprise ready products for data scientist through the Anaconda Platform …delivers Python trainings and offers consulting services …supports the development of open source technology: conda, blaze, dask, bokeh, numba… …sponsors Python conferences PyData, SciPy, PyCon, Europython, PySS… Learn more: https://www.continuum.io/
  4. Agenda 6 • Anaconda and “R Essentials” • Conda: package

    and environment manager • Jupyter: collaborative notebooks • Bokeh: interactive data visualizations
  5. 8 Conda • Package and environment manager • Language angnostic

    (Python, R, Java…) • Cross-platform (Windows, OS X, Linux) $ conda install python=2.7 $ conda install pandas $ conda install -c r r $ conda install mongodb
  6. 10 Conda vs Anaconda vs Miniconda vs R Essentials •

    Conda: package manager • Anaconda: Python + Conda + packages • Miniconda: Python + Conda • R Essentials: R + R packages
  7. 11 Language agnostic Python packages handles environments ! natively virtualenv

    installs binaries compiles from source general purpose ! envs python! envs Conda Pip
  8. 12 Conda + pip $ conda install pip $ pip

    install foo Conda skeleton pip $ conda skeleton pip foo $ conda build foo/
  9. 13 Why Conda? • Python with compiled, platform-dependent C, C++,

    or Fortran code • Seen this message too many times:“Storing debug log for failure in /.pip/pip.log” • Multi-language Data Science Projects
  10. 15 Anaconda Cloud $ conda build conda.recipe/ $ conda server

    upload my_foo_pkg $ conda install -c chdoig my_foo_pkg
  11. 17 Mirror CRAN packages $ conda skeleton cran ldavis !

    $ conda build r-ldavis/ ! $ conda server upload my_r_pkg ! $ conda install -c chdoig my_r_pkg
  12. 18 Conda environments name: myenv channels: - chdoig - r

    - foo dependecies: - python=2.7 - r - r-ldavis - pandas - mongodb - spark=1.5 - pip - pip: - flask-migrate - bar=1.4 environment.yml $ conda env create $ source activate myenv $ conda env export -n freeze.yml Create and activate Freeze versions Upload to anaconda.org $ conda server upload my_foo_env.yml $ conda env create chdoig/my_foo_env.yml
  13. 19 Conda auto env cdoig:~$ cd pygotham-topic-modeling/ discarding /anaconda/bin from

    PATH prepending /anaconda/envs/pygotham-topic/bin to PATH (pygotham-topic)cdoig:~/pygotham-topic-modeling$ https://github.com/chdoig/conda-auto-env
  14. 20 "R essentials" comes with IRKernel and over 80 of

    the most used R packages for data science like dplyr, shiny, ggplot2, tidyr, caret and nnet. $ conda install -c r r-essentials $ conda config --add channels r or $ conda install r-essentials R Essentials
  15. 21 or, alternatively, create an environment to isolate your "R

    essentials" packages from others: ! $ conda create -n r-essentials -c r r-essentials R Essentials environment
  16. 22 $ conda metapackage custom-r-bundle 0.1.0 --dependencies r-irkernel jupyter r-ggplot2

    r-dplyr --summary "My custom R bundle” ! Custom metapackage to share
  17. 24 http://jupyter.org/ https://try.jupyter.org/ The Jupyter Notebook is a web application

    that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
  18. 26 ! To start jupyter notebooks, simply run the following

    command: ! $ jupyter notebook http://nbviewer.ipython.org/github/chdoig/conda-jupyter-irkernel/blob/master/Jupyter%20and%20conda%20for%20R.ipynb
  19. 29

  20. 31

  21. 34