Slide 1

Slide 1 text

Caravel an open source data visualization platform

Slide 2

Slide 2 text

* explore your data! * create interactive dashboards * share discoveries Caravel

Slide 3

Slide 3 text

The Vision Any visualization Any database microscopic amount of work

Slide 4

Slide 4 text

Maxime Beauchemin

Slide 5

Slide 5 text

* Data is too strategic to depend on vendors! * Tableau doesn’t support Presto & Druid * Tableau extracts don’t scale well * Buying means lock-in and increasing costs * We need deep integration with our stack * We’re builders not buyers!

Slide 6

Slide 6 text

Live demo!

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Brains! Bogdan Kyryliuk Vera Liu Maxime Beauchemin Jeff Feng Alanna Scott

Slide 16

Slide 16 text

How do we use Python?

Slide 17

Slide 17 text

Flask App Builder * Authentication / permission / role management * CRUD! * Babel (translation framework) * Bootstrap / dynamic navbar / font-awesome

Slide 18

Slide 18 text

Flask [App Builder] vs Django Pros * Airbnb was already using Flask / SqlAlchemy * FAB’s CRUD doesn’t require an Admin, all models ship with a `show` permission * FAB has a lean codebase, it’s easy to contribute to it Cons * Few guarantees on quality / security / support * A lot less features than Django has to grow into * Not much of a community (yet)

Slide 19

Slide 19 text

SQLAlchemy

Slide 20

Slide 20 text

Pandas * Crafting the right JSON for the visualization: * pivot_table * groupby * multi-sorts * … * Time series transforms: * rolling functions * resampling * period shifts * period ratios

Slide 21

Slide 21 text

Package and distribute * setup tools * nose / coverage (tests) * alembic (db migrations) * requires.io (dependency tracking) * coveralls.io (coverage reporting) * landscape.io (code quality reporting) * sphinx (documentation) * pypi (`pip install caravel` )

Slide 22

Slide 22 text

The Frontend Stack * Javascript frontend * npm / ES6 / webpack / React * d3.js! * nvd3.org

Slide 23

Slide 23 text

Security * Provided by Flask AppBuilder (python web framework) * Easily integrate with: OpenID, LDAP, REMOTE_USER, OAUTH, or use the builtin database * Ships with 3 roles: * Admin (all access) * Alpha (all access but cannot alter permissions) * Gamma (per-datasource / table access) * Fine grain controls to create new roles

Slide 24

Slide 24 text

A thin Semantic Layer * Verbose names and long descriptions for columns and metrics * Add calculated fields and metrics as SQL expression * Set how individual columns are exposed

Slide 25

Slide 25 text

Druid.io Integration

Slide 26

Slide 26 text

Event Logs MySQL Dumps Gold Hive Cluster HDFS Spark Cluster Airpal Airflow Scheduling Presto Cluster Silver Hive Cluster HDFS Replication Kafka Sqoop Tableau S3 Caravel ! Druid

Slide 27

Slide 27 text

Caching! * Provided by flask-cache * Backends: memcache, redis, filesystem, memory, … * cascading timeout configuration * UI is upfront about staleness * allows to force-refresh

Slide 28

Slide 28 text

Open Source FTW !

Slide 29

Slide 29 text

* Grow a community! * Ship “SQL Lab” * Ship visualizations & controls as React.js components * Reactify the whole app * DSL for the semantic layer * UX -> smoothen common flows What’s next?

Slide 30

Slide 30 text

github.com/airbnb/caravel

Slide 31

Slide 31 text

Q?