Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jupyter and RethinkDB

Ryan Paul
August 26, 2015

Jupyter and RethinkDB

Learn how to use RethinkDB with Jupyter, a rich platform for interactive programming built on top of the powerful IPython REPL. In this talk, I'll demonstrate how to perform ReQL queries in a Jupyter notebook, integrating with matplotlib and other libraries to generate data visualizations.

Ryan Paul

August 26, 2015
Tweet

More Decks by Ryan Paul

Other Decks in Programming

Transcript

  1. RethinkDB and Jupyter
    Interactive Data Science

    View Slide

  2. Ryan Paul
    RethinkDB
    Evangelist
    @segphault

    View Slide

  3. Introduction
    What is Jupyter?

    View Slide

  4. In the beginning,
    there was the
    REPL

    View Slide

  5. In the beginning,
    there was the
    REPL
    READ
    EVAL
    PRINT
    LOOP

    View Slide

  6. Jupyter’s Origin
    The IPython REPL

    View Slide

  7. Jupyter’s Origin
    IPython Notebook

    View Slide

  8. • Rich interactive REPL with
    terminal and desktop frontends
    • Persistent REPL notebook that
    can evaluate code and save
    results
    What is Jupyter?

    View Slide

  9. • Language-agnostic platform
    abstracted out of IPython
    • IPython itself now provides
    Jupyter’s Python kernel
    • Wide range of other programming
    languages are supported
    What is Jupyter?

    View Slide

  10. • Interactive literate programming
    environment that runs in browser
    • Combine code snippets and
    output with rich text content
    • Displays embedded content like
    visualizations
    Jupyter Notebook

    View Slide

  11. RethinkDB
    Consume data in Jupyter

    View Slide

  12. What is RethinkDB?
    • Open source database for building
    realtime web applications
    • NoSQL database that stores schemaless
    JSON documents
    • Distributed database that is easy to scale
    • High availability database that is
    resilient against failure

    View Slide

  13. ReQL & Jupyter
    • ReQL is the RethinkDB query
    language
    • ReQL integrates with syntax of the
    underlying language
    • ReQL is expressive and provides
    useful tools for data manipulation

    View Slide

  14. Data Explorer

    View Slide

  15. Jupyter Notebook

    View Slide

  16. Jupyter Extensibility
    Magic functions & APIs

    View Slide

  17. • Add special commands to Jupyter
    that work on REPL and notebook
    • Typically prefixed with a % sign
    • Can programmatically transform
    user input
    Magic Functions

    View Slide

  18. Magic Functions
    from IPython.core.magic import register_line_magic
    @register_line_magic
    def r(line):
    import rethinkdb as r
    conn = r.connect()
    response = eval(line).run(conn)
    if type(response) == r.net.DefaultCursor:
    response = list(response)
    print to_pretty_json(response)
    conn.close()

    View Slide

  19. • Jupyter provides APIs for
    displaying rich content
    • Can embed images, HTML, JSON,
    and other kinds of content
    • Import from IPython.display
    Display Functions

    View Slide

  20. Display Functions
    from IPython.display import Image, display
    display(Image("http://i.imgur.com/lswhE2n.jpg"))

    View Slide

  21. Data Visualizations
    Graphing with matplotlib

    View Slide

  22. Using Matplotlib
    %matplotlib inline
    from matplotlib import pyplot
    import rethinkdb as r
    conn = r.connect()
    quakes = r.table("quake") \
    .filter(r.row["time"].month() == r.now().month()) \
    .group(r.row["time"].day()).count() \
    .ungroup().order_by(r.row["group"]) \
    .do([r.row["group"], r.row["reduction"]]).run(conn)
    conn.close()
    pyplot.bar(quakes[0], quakes[1])
    pyplot.show()

    View Slide

  23. Using Matplotlib
    %matplotlib inline
    import mplleaflet
    from matplotlib import pyplot
    import rethinkdb as r
    conn = r.connect()
    near_tokyo = list(r.table("quake").get_intersecting(
    r.circle([139.69, 35.68], 200, unit="mi"), index="geometry")
    ["geometry"] \
    .map(r.row.to_geojson()["coordinates"]).run(conn))
    conn.close()
    pyplot.plot([p[0] for p in near_tokyo],
    [p[1] for p in near_tokyo], 'rs')
    mplleaflet.display()

    View Slide

  24. Additional Resources
    • RethinkDB website:

    http://rethinkdb.com
    • Jupyter:

    http://jupyter.org/
    • Matplotlib:

    http://matplotlib.org/

    View Slide