Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interoperability between Bioconductor and Pytho...

Luke Zappia
December 11, 2020

Interoperability between Bioconductor and Python for scRNA-seq analysis

Unlike traditional bulk RNA-seq analysis which is dominated by
Bioconductor, packages for analysing single-cell RNA sequencing data are more fragmented. Currently, there are three key ecosystems, the
Seurat package (https://satijalab.org/seurat/) (available from CRAN), Bioconductor's SingleCellExperiment object (https://bioconductor.org/packages/SingleCellExperiment/) and the AnnData Python object (https://anndata.readthedocs.io/en/latest/index.html) used by the Scanpy package (https://scanpy.readthedocs.io/en/stable/). While these platforms each have strengths and weaknesses most analysts are likely to only use one of them. In this talk, I discuss how interoperability between R and Python can allow us to take advantage of these platforms strengths
and avoid unnecessary reimplementation of methods. I highlight the reticulate R package for interacting with Python (https://rstudio.github.io/reticulate/), the basilisk package for encapsulating Python environments (https://bioconductor.org/packages/basilisk/), my zellkonverter package for
converting between AnnData and SingleCellExperiment objects (https://bioconductor.org/packages/zellkonverter/) and the velociraptor package as an example of wrapping a Python package (https://bioconductor.org/packages/velociraptor/). The methods in the scVelo (http://scvelo.org/) (package wrapped by velociraptor) for calculating
RNA velocity and the CellRank package for estimating
state transitions (http://cellrank.org/) will also be briefly described.

Luke Zappia

December 11, 2020
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Why interoperability? 1. Take advantage of strengths 2. Make use

    of existing packages 3. Avoid unnecessary reimplementation
  2. Disclaimer Most (almost all) of this is not my work

    Package Developer @GitHub Python Alternative
  3. {reticulate} Kevin Ushey @kevinushey J.J. Allaire @jjallaire Yuan Tang @terrytangyuan

    RStudio rstudio.org rpy2 install.packages("reticulate")
  4. library(reticulate) # Set Python environment > use_python("/path/to/my/python") # use_virtualenv("my_venv") #

    use_condaenv("my_conda_env") # Import Python libraries > pandas w- import("pandas") # Implicitly convert between R and Python > pandas$DataFrame(data = list("Col1" = 1:2, "Col2" = 3:4)) Col1 Col2 1 1 3 2 2 4 # Explicitly convert between R and Python > vec w- 1:4 > vec [1] 1 2 3 4 > py_list w- r_to_py(vec) > py_list [1, 2, 3, 4] > py_to_r(py_list) [1] 1 2 3 4 {reticulate} in R
  5. ```{r} # A normal R chunk vec w- 1:4 vec

    ``` [1] 1 2 3 4 ```{python} # A native Python chunk ls = [5, 6, 7, 8] ls ``` [1, 2, 3, 4] ```{r} # Access Python from R mean(py$ls) ``` 6.5 ```{python} # Access R from Python sum(r.vec) / len(r.vec) ``` 2.5 {reticulate} in R Markdown
  6. Conversion R Python Single-element vector Scalar Multi-element vector List List

    of multiple types Tuple Named list Dict Matrix/Array NumPy array data.frame Pandas DataFrame Function Python function NULL, TRUE, FALSE None, True, False
  7. {basilisk} Aaron Lun @LTLA Image from Ipipipourax via WikiMedia Commons

    (CC BY-SA 3.0) https://commons.wikimedia.org/wiki/File:Basilik_color%C3%A9.jpg BiocManagerw:install("basilisk")
  8. my_env w- basiliskw:BasiliskEnvironment( envname = "my_env", pkgname = "myPkg", packages

    = c("pandasw=1.1.2", ww.) ) my_py_fun w- function(ww.) { pandas w- import("pandas") ww. return(output) } my_r_fun w- function(ww.) { output w- basiliskw:basiliskRun( env = my_env, fun = my_py_fun, ww. ) } library(myPkg) output w- my_r_fun(ww.) Set up Python (Conda) environment (first time)... Run my_py_fun() in the environment... Return output {basilisk} 1. Define an environment 2. Create a {reticulate} function 2. Wrap the function in the environment Developer User
  9. {basilisk} .h5ad file readH5AD() AnnData2SCE() SingleCellExperiment ... .h5ad file AnnData

    {basilisk} writeH5AD() SCE2AnnData() AnnData AnnData2SCE() SingleCellExperiment SCE2AnnData() AnnData AnnData
  10. {anndata} Robrecht Cannoodt @rcannood {sceasy} Vladimir Kiselev @wikiselev Ni Huang

    @nh3 install.packages("anndata") anndata remotesw:install_github("sceasy")
  11. Summary Interoperability between Bioconductor and Python is already possible {zellkonverter}

    converts between SingleCellExperiment and AnnData objects scVelo and CellRank for analysis of dynamic processes