Slide 1

Slide 1 text

RethinkDB and Jupyter Interactive Data Science

Slide 2

Slide 2 text

Ryan Paul RethinkDB Evangelist @segphault

Slide 3

Slide 3 text

Introduction What is Jupyter?

Slide 4

Slide 4 text

In the beginning, there was the REPL

Slide 5

Slide 5 text

In the beginning, there was the REPL READ EVAL PRINT LOOP

Slide 6

Slide 6 text

Jupyter’s Origin The IPython REPL

Slide 7

Slide 7 text

Jupyter’s Origin IPython Notebook

Slide 8

Slide 8 text

• Rich interactive REPL with terminal and desktop frontends • Persistent REPL notebook that can evaluate code and save results What is Jupyter?

Slide 9

Slide 9 text

• Language-agnostic platform abstracted out of IPython • IPython itself now provides Jupyter’s Python kernel • Wide range of other programming languages are supported What is Jupyter?

Slide 10

Slide 10 text

• Interactive literate programming environment that runs in browser • Combine code snippets and output with rich text content • Displays embedded content like visualizations Jupyter Notebook

Slide 11

Slide 11 text

RethinkDB Consume data in Jupyter

Slide 12

Slide 12 text

What is RethinkDB? • Open source database for building realtime web applications • NoSQL database that stores schemaless JSON documents • Distributed database that is easy to scale • High availability database that is resilient against failure

Slide 13

Slide 13 text

ReQL & Jupyter • ReQL is the RethinkDB query language • ReQL integrates with syntax of the underlying language • ReQL is expressive and provides useful tools for data manipulation

Slide 14

Slide 14 text

Data Explorer

Slide 15

Slide 15 text

Jupyter Notebook

Slide 16

Slide 16 text

Jupyter Extensibility Magic functions & APIs

Slide 17

Slide 17 text

• Add special commands to Jupyter that work on REPL and notebook • Typically prefixed with a % sign • Can programmatically transform user input Magic Functions

Slide 18

Slide 18 text

Magic Functions from IPython.core.magic import register_line_magic @register_line_magic def r(line): import rethinkdb as r conn = r.connect() response = eval(line).run(conn) if type(response) == r.net.DefaultCursor: response = list(response) print to_pretty_json(response) conn.close()

Slide 19

Slide 19 text

• Jupyter provides APIs for displaying rich content • Can embed images, HTML, JSON, and other kinds of content • Import from IPython.display Display Functions

Slide 20

Slide 20 text

Display Functions from IPython.display import Image, display display(Image("http://i.imgur.com/lswhE2n.jpg"))

Slide 21

Slide 21 text

Data Visualizations Graphing with matplotlib

Slide 22

Slide 22 text

Using Matplotlib %matplotlib inline from matplotlib import pyplot import rethinkdb as r conn = r.connect() quakes = r.table("quake") \ .filter(r.row["time"].month() == r.now().month()) \ .group(r.row["time"].day()).count() \ .ungroup().order_by(r.row["group"]) \ .do([r.row["group"], r.row["reduction"]]).run(conn) conn.close() pyplot.bar(quakes[0], quakes[1]) pyplot.show()

Slide 23

Slide 23 text

Using Matplotlib %matplotlib inline import mplleaflet from matplotlib import pyplot import rethinkdb as r conn = r.connect() near_tokyo = list(r.table("quake").get_intersecting( r.circle([139.69, 35.68], 200, unit="mi"), index="geometry") ["geometry"] \ .map(r.row.to_geojson()["coordinates"]).run(conn)) conn.close() pyplot.plot([p[0] for p in near_tokyo], [p[1] for p in near_tokyo], 'rs') mplleaflet.display()

Slide 24

Slide 24 text

Additional Resources • RethinkDB website:
 http://rethinkdb.com • Jupyter:
 http://jupyter.org/ • Matplotlib:
 http://matplotlib.org/