Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interoperability between Bioconductor and Python for scRNA-seq analysis

Luke Zappia
December 11, 2020

Interoperability between Bioconductor and Python for scRNA-seq analysis

Unlike traditional bulk RNA-seq analysis which is dominated by
Bioconductor, packages for analysing single-cell RNA sequencing data are more fragmented. Currently, there are three key ecosystems, the
Seurat package (https://satijalab.org/seurat/) (available from CRAN), Bioconductor's SingleCellExperiment object (https://bioconductor.org/packages/SingleCellExperiment/) and the AnnData Python object (https://anndata.readthedocs.io/en/latest/index.html) used by the Scanpy package (https://scanpy.readthedocs.io/en/stable/). While these platforms each have strengths and weaknesses most analysts are likely to only use one of them. In this talk, I discuss how interoperability between R and Python can allow us to take advantage of these platforms strengths
and avoid unnecessary reimplementation of methods. I highlight the reticulate R package for interacting with Python (https://rstudio.github.io/reticulate/), the basilisk package for encapsulating Python environments (https://bioconductor.org/packages/basilisk/), my zellkonverter package for
converting between AnnData and SingleCellExperiment objects (https://bioconductor.org/packages/zellkonverter/) and the velociraptor package as an example of wrapping a Python package (https://bioconductor.org/packages/velociraptor/). The methods in the scVelo (http://scvelo.org/) (package wrapped by velociraptor) for calculating
RNA velocity and the CellRank package for estimating
state transitions (http://cellrank.org/) will also be briefly described.

Luke Zappia

December 11, 2020
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. Interoperability between
    Bioconductor and Python
    for scRNA-seq analysis
    Luke Zappia
    @_lazappi_

    View Slide

  2. What is interoperability?
    “Ability to quickly and easily switch
    between languages/platforms as
    required”

    View Slide

  3. Why interoperability?
    1. Take advantage of strengths
    2. Make use of existing packages
    3. Avoid unnecessary reimplementation

    View Slide

  4. Bulk RNA-seq analysis

    View Slide

  5. scRNA-seq analysis
    Seurat
    CRAN

    View Slide

  6. Ecosystems

    View Slide

  7. How?

    View Slide

  8. {reticulate} {basilisk}
    B
    Python
    environments
    R/Python
    interface
    scRNA-seq
    objects
    Velocity
    analysis

    View Slide

  9. Disclaimer
    Most (almost all) of this is not my work
    Package Developer
    @GitHub
    Python Alternative

    View Slide

  10. {reticulate}
    Kevin Ushey
    @kevinushey
    J.J. Allaire
    @jjallaire
    Yuan Tang
    @terrytangyuan
    RStudio
    rstudio.org
    rpy2
    install.packages("reticulate")

    View Slide

  11. library(reticulate)
    # Set Python environment
    > use_python("/path/to/my/python")
    # use_virtualenv("my_venv")
    # use_condaenv("my_conda_env")
    # Import Python libraries
    > pandas w- import("pandas")
    # Implicitly convert between R and Python
    > pandas$DataFrame(data = list("Col1" = 1:2, "Col2" = 3:4))
    Col1 Col2
    1 1 3
    2 2 4
    # Explicitly convert between R and Python
    > vec w- 1:4
    > vec
    [1] 1 2 3 4
    > py_list w- r_to_py(vec)
    > py_list
    [1, 2, 3, 4]
    > py_to_r(py_list)
    [1] 1 2 3 4
    {reticulate}
    in R

    View Slide

  12. ```{r}
    # A normal R chunk
    vec w- 1:4
    vec
    ```
    [1] 1 2 3 4
    ```{python}
    # A native Python chunk
    ls = [5, 6, 7, 8]
    ls
    ```
    [1, 2, 3, 4]
    ```{r}
    # Access Python from R
    mean(py$ls)
    ```
    6.5
    ```{python}
    # Access R from Python
    sum(r.vec) / len(r.vec)
    ```
    2.5
    {reticulate}
    in R Markdown

    View Slide

  13. Conversion
    R Python
    Single-element vector Scalar
    Multi-element vector List
    List of multiple types Tuple
    Named list Dict
    Matrix/Array NumPy array
    data.frame Pandas DataFrame
    Function Python function
    NULL, TRUE, FALSE None, True, False

    View Slide

  14. Limitations
    Manage Python environment
    Familiarity with Python syntax
    Only supports common data structures

    View Slide

  15. {basilisk}
    Aaron Lun
    @LTLA
    Image from Ipipipourax via WikiMedia Commons (CC BY-SA 3.0)
    https://commons.wikimedia.org/wiki/File:Basilik_color%C3%A9.jpg
    BiocManagerw:install("basilisk")

    View Slide

  16. my_env w- basiliskw:BasiliskEnvironment(
    envname = "my_env",
    pkgname = "myPkg",
    packages = c("pandasw=1.1.2", ww.)
    )
    my_py_fun w- function(ww.) {
    pandas w- import("pandas")
    ww.
    return(output)
    }
    my_r_fun w- function(ww.) {
    output w- basiliskw:basiliskRun(
    env = my_env,
    fun = my_py_fun,
    ww.
    )
    }
    library(myPkg)
    output w- my_r_fun(ww.)
    Set up Python (Conda) environment (first
    time)...
    Run my_py_fun() in the environment...
    Return output
    {basilisk}
    1. Define an environment
    2. Create a {reticulate} function
    2. Wrap the function in the environment
    Developer User

    View Slide

  17. Advantages
    User doesn’t require Python code
    Automatic environment creation
    Different environments/dependencies
    for each package

    View Slide

  18. {zellkonverter}
    Aaron Lun
    @LTLA
    Luke Zappia
    @lazappi
    anndata2ri
    BiocManagerw:install("zellkonverter")

    View Slide

  19. {basilisk}
    .h5ad file
    readH5AD()
    AnnData2SCE() SingleCellExperiment ...
    .h5ad file
    AnnData
    {basilisk}
    writeH5AD()
    SCE2AnnData()
    AnnData
    AnnData2SCE() SingleCellExperiment SCE2AnnData()
    AnnData AnnData

    View Slide

  20. View Slide

  21. {anndata}
    Robrecht Cannoodt
    @rcannood
    {sceasy}
    Vladimir Kiselev
    @wikiselev
    Ni Huang
    @nh3
    install.packages("anndata")
    anndata
    remotesw:install_github("sceasy")

    View Slide

  22. {velociraptor}
    Kevin Rue-Albrecht
    @kevinrue
    Aaron Lun
    @LTLA
    Charlotte Soneson
    @csoneson
    scvelo
    BiocManagerw:install("velociraptor")

    View Slide

  23. {basilisk}
    SingleCellExperiment
    scvelo()
    AnnData2SCE()
    AnnData
    scv.tl.velocity(...)
    scv.tl.latent_time(...)
    ...
    AnnData
    X
    SingleCellExperiment

    View Slide

  24. scVelo
    Volker Bergen
    @Volker Bergen
    pip install scvelo

    View Slide

  25. RNA velocity

    View Slide

  26. Dynamical RNA velocity

    View Slide

  27. View Slide

  28. View Slide

  29. CellRank
    Marius Lange
    @Marius1311
    pip install cellrank

    View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. Pancreas development

    View Slide

  34. Summary
    Interoperability between Bioconductor and Python is
    already possible
    {zellkonverter} converts between
    SingleCellExperiment and AnnData objects
    scVelo and CellRank for analysis of dynamic processes

    View Slide

  35. Thanks!
    Luke Zappia
    @_lazappi_
    @lazappi
    lazappi.id.au
    scvelo.org
    cellrank.org
    Theis Lab
    @fabian_theis
    @ICBmunich
    www.comp.bio

    View Slide