$30 off During Our Annual Pro Sale. View Details »

Interoperability between Bioconductor and Python for scRNA-seq analysis

Luke Zappia
December 11, 2020

Interoperability between Bioconductor and Python for scRNA-seq analysis

Unlike traditional bulk RNA-seq analysis which is dominated by
Bioconductor, packages for analysing single-cell RNA sequencing data are more fragmented. Currently, there are three key ecosystems, the
Seurat package (https://satijalab.org/seurat/) (available from CRAN), Bioconductor's SingleCellExperiment object (https://bioconductor.org/packages/SingleCellExperiment/) and the AnnData Python object (https://anndata.readthedocs.io/en/latest/index.html) used by the Scanpy package (https://scanpy.readthedocs.io/en/stable/). While these platforms each have strengths and weaknesses most analysts are likely to only use one of them. In this talk, I discuss how interoperability between R and Python can allow us to take advantage of these platforms strengths
and avoid unnecessary reimplementation of methods. I highlight the reticulate R package for interacting with Python (https://rstudio.github.io/reticulate/), the basilisk package for encapsulating Python environments (https://bioconductor.org/packages/basilisk/), my zellkonverter package for
converting between AnnData and SingleCellExperiment objects (https://bioconductor.org/packages/zellkonverter/) and the velociraptor package as an example of wrapping a Python package (https://bioconductor.org/packages/velociraptor/). The methods in the scVelo (http://scvelo.org/) (package wrapped by velociraptor) for calculating
RNA velocity and the CellRank package for estimating
state transitions (http://cellrank.org/) will also be briefly described.

Luke Zappia

December 11, 2020

More Decks by Luke Zappia

Other Decks in Science


  1. Interoperability between Bioconductor and Python for scRNA-seq analysis Luke Zappia

  2. What is interoperability? “Ability to quickly and easily switch between

    languages/platforms as required”
  3. Why interoperability? 1. Take advantage of strengths 2. Make use

    of existing packages 3. Avoid unnecessary reimplementation
  4. Bulk RNA-seq analysis

  5. scRNA-seq analysis Seurat CRAN

  6. Ecosystems

  7. How?

  8. {reticulate} {basilisk} B Python environments R/Python interface scRNA-seq objects Velocity

  9. Disclaimer Most (almost all) of this is not my work

    Package Developer @GitHub Python Alternative
  10. {reticulate} Kevin Ushey @kevinushey J.J. Allaire @jjallaire Yuan Tang @terrytangyuan

    RStudio rstudio.org rpy2 install.packages("reticulate")
  11. library(reticulate) # Set Python environment > use_python("/path/to/my/python") # use_virtualenv("my_venv") #

    use_condaenv("my_conda_env") # Import Python libraries > pandas w- import("pandas") # Implicitly convert between R and Python > pandas$DataFrame(data = list("Col1" = 1:2, "Col2" = 3:4)) Col1 Col2 1 1 3 2 2 4 # Explicitly convert between R and Python > vec w- 1:4 > vec [1] 1 2 3 4 > py_list w- r_to_py(vec) > py_list [1, 2, 3, 4] > py_to_r(py_list) [1] 1 2 3 4 {reticulate} in R
  12. ```{r} # A normal R chunk vec w- 1:4 vec

    ``` [1] 1 2 3 4 ```{python} # A native Python chunk ls = [5, 6, 7, 8] ls ``` [1, 2, 3, 4] ```{r} # Access Python from R mean(py$ls) ``` 6.5 ```{python} # Access R from Python sum(r.vec) / len(r.vec) ``` 2.5 {reticulate} in R Markdown
  13. Conversion R Python Single-element vector Scalar Multi-element vector List List

    of multiple types Tuple Named list Dict Matrix/Array NumPy array data.frame Pandas DataFrame Function Python function NULL, TRUE, FALSE None, True, False
  14. Limitations Manage Python environment Familiarity with Python syntax Only supports

    common data structures
  15. {basilisk} Aaron Lun @LTLA Image from Ipipipourax via WikiMedia Commons

    (CC BY-SA 3.0) https://commons.wikimedia.org/wiki/File:Basilik_color%C3%A9.jpg BiocManagerw:install("basilisk")
  16. my_env w- basiliskw:BasiliskEnvironment( envname = "my_env", pkgname = "myPkg", packages

    = c("pandasw=1.1.2", ww.) ) my_py_fun w- function(ww.) { pandas w- import("pandas") ww. return(output) } my_r_fun w- function(ww.) { output w- basiliskw:basiliskRun( env = my_env, fun = my_py_fun, ww. ) } library(myPkg) output w- my_r_fun(ww.) Set up Python (Conda) environment (first time)... Run my_py_fun() in the environment... Return output {basilisk} 1. Define an environment 2. Create a {reticulate} function 2. Wrap the function in the environment Developer User
  17. Advantages User doesn’t require Python code Automatic environment creation Different

    environments/dependencies for each package
  18. {zellkonverter} Aaron Lun @LTLA Luke Zappia @lazappi anndata2ri BiocManagerw:install("zellkonverter")

  19. {basilisk} .h5ad file readH5AD() AnnData2SCE() SingleCellExperiment ... .h5ad file AnnData

    {basilisk} writeH5AD() SCE2AnnData() AnnData AnnData2SCE() SingleCellExperiment SCE2AnnData() AnnData AnnData
  20. None
  21. {anndata} Robrecht Cannoodt @rcannood {sceasy} Vladimir Kiselev @wikiselev Ni Huang

    @nh3 install.packages("anndata") anndata remotesw:install_github("sceasy")
  22. {velociraptor} Kevin Rue-Albrecht @kevinrue Aaron Lun @LTLA Charlotte Soneson @csoneson

    scvelo BiocManagerw:install("velociraptor")
  23. {basilisk} SingleCellExperiment scvelo() AnnData2SCE() AnnData scv.tl.velocity(...) scv.tl.latent_time(...) ... AnnData X

  24. scVelo Volker Bergen @Volker Bergen pip install scvelo

  25. RNA velocity

  26. Dynamical RNA velocity

  27. None
  28. None
  29. CellRank Marius Lange @Marius1311 pip install cellrank

  30. None
  31. None
  32. None
  33. Pancreas development

  34. Summary Interoperability between Bioconductor and Python is already possible {zellkonverter}

    converts between SingleCellExperiment and AnnData objects scVelo and CellRank for analysis of dynamic processes
  35. Thanks! Luke Zappia @_lazappi_ @lazappi lazappi.id.au scvelo.org cellrank.org Theis Lab

    @fabian_theis @ICBmunich www.comp.bio