Slide 1

Slide 1 text

Architecting Immediacy Joe Hellerstein and Seshadri Mahalingam The Design of a High-Performance Portable Wrangling Engine PHOTON

Slide 2

Slide 2 text

Wrangling and Immediacy Technical Challenges Architectural Context JavaScript and its Discontents Photon: A Clean Slate Design Outline 1 2 3 4 5

Slide 3

Slide 3 text

Turn of the Century Data Transformation - Schemas and annotations - Box-and-arrow programming - Batch execution

Slide 4

Slide 4 text

Research Roots: Potter’s Wheel 2001 + Real data, sampled on the fly + Menu-driven transforms + Immediate execution and feedback [Raman & Hellerstein, VLDB01]

Slide 5

Slide 5 text

Research Roots: Open Source Data Wrangler, 2011 – Browser-sized data sets + Predictive Transformation + Immediate feedback on
 multiple choices [Kandel, Heer & Hellerstein, CHI 11]

Slide 6

Slide 6 text

2016: Data Wrangling and Immediacy DEMO Grown from research roots. + Deeper and broader Predictive 
 Transformation + Sample-to-Scale with Intelligent 
 Execution + Interactive Visual Profiling Multi-faceted immediacy!

Slide 7

Slide 7 text

The Value of Immediacy Immediate, step-by-step feedback during data transformation Understand options Confirm intent Assess
 progress

Slide 8

Slide 8 text

The Value of Immediacy Miller ’68, 
 Card ’91, 
 Nielsen ‘93 Elapsed Time (secs) 1 2 3 4 5 6 7 8 9 10 0 User attention User flow “Instantaneous” Batch processing is in conflict with these goals!

Slide 9

Slide 9 text

Without immediacy, what’s left? suggestion previews suggestion previews SUGGESTION PREVIEWS

Slide 10

Slide 10 text

Without immediacy, what’s left? suggestion previews suggestion previews SUGGESTION PREVIEWS

Slide 11

Slide 11 text

Without immediacy, what’s left? COLUMN PROFILES

Slide 12

Slide 12 text

Without immediacy, what’s left? COLUMN PROFILES

Slide 13

Slide 13 text

Without immediacy, what’s left? DATA QUALITY BARS

Slide 14

Slide 14 text

Without immediacy, what’s left? DATA QUALITY BARS

Slide 15

Slide 15 text

Without immediacy, what’s left? SUMMARY STATS

Slide 16

Slide 16 text

Without immediacy, what’s left? SUMMARY STATS

Slide 17

Slide 17 text

Without immediacy, what’s left? DATA SAMPLES

Slide 18

Slide 18 text

Without immediacy, what’s left? Programming. DATA SAMPLES

Slide 19

Slide 19 text

Without immediacy, what’s left? Programming. SCHEMA TRANSFORM SPEC

Slide 20

Slide 20 text

Without immediacy, what’s left? Programming. TRANSFORM SPEC GO! SCHEMA

Slide 21

Slide 21 text

Wrangling and Immediacy Technical Challenges Architectural Context JavaScript and its Discontents Photon: A Clean Slate Design Outline 1 2 3 4 5

Slide 22

Slide 22 text

Three Technical Challenges Performance Scale The Dirty Details
 of Dirty Data

Slide 23

Slide 23 text

Technical Challenges Performance: Stay in the Flow Miller ’68, 
 Card ’91, 
 Nielsen ‘93 Elapsed Time (secs) 1 2 3 4 5 6 7 8 9 10 0 User attention User flow Instantaneous

Slide 24

Slide 24 text

Technical Challenges Scale Scaling the Unknown P NO PRECOMPUTING NO SCHEMA
 NO INDEXES NO SUMMARIES

Slide 25

Slide 25 text

Technical Challenges The Dirty Details of Dirty Data Expect the Unexpected

Slide 26

Slide 26 text

The Dirty Details of Dirty Data 1 2 3 4 5 Technical Challenges Ambiguous Schemas 6 Heavy String Processing Ambiguous Types Noise & Exceptions Limited Filtering Rich Transforms

Slide 27

Slide 27 text

Ambiguous Schemas Technical Challenges

Slide 28

Slide 28 text

Heavy String Processing Technical Challenges

Slide 29

Slide 29 text

Ambiguous Types Technical Challenges

Slide 30

Slide 30 text

Noise & Exceptions Technical Challenges

Slide 31

Slide 31 text

Technical Challenges Limited Filtering

Slide 32

Slide 32 text

Rich Transforms Technical Challenges

Slide 33

Slide 33 text

This is the New World of Data Wrangling Stay in 
 the Flow Scale the Unknown Expect the Unexpected

Slide 34

Slide 34 text

Wrangling and Immediacy Technical Challenges Architectural Context JavaScript and its Discontents Photon: A Clean Slate Design Outline 1 2 3 4 5

Slide 35

Slide 35 text

Architecture: UX, DSL Compiler and Runtime DSL COMPILER

Slide 36

Slide 36 text

Scaling Up Via New Compiler Targets Server User-interface DSL COMPILER

Slide 37

Slide 37 text

Sample-to-Scale Server User-interface DSL COMPILER 1 Extract Sample

Slide 38

Slide 38 text

Sample-to-Scale Server User-interface DSL COMPILER 2 Wrangle Sample

Slide 39

Slide 39 text

Sample-to-Scale Server User-interface DSL COMPILER 3 Execute final
 wrangling at scale

Slide 40

Slide 40 text

Sample-to-Scale Server User-interface DSL COMPILER 4 Return results, profiles, interesting rows

Slide 41

Slide 41 text

Architecture: Multiple Runtimes, Intelligent Execution Server User-interface DSL Compiler Intelligent Execution

Slide 42

Slide 42 text

Architecture: Spectrum of Execution Server User-interface DSL Compiler Intelligent Execution

Slide 43

Slide 43 text

Flexible Deployments Trifacta Wrangler App On Premises DSL Compiler Intelligent Execution Hosted DSL Compiler Intelligent Execution AWS DSL Compiler Intelligent Execution

Slide 44

Slide 44 text

Wrangling and Immediacy Technical Challenges Architectural Context JavaScript and its Discontents Photon: A Clean Slate Design Outline 1 2 3 4 5

Slide 45

Slide 45 text

Why a JS Wrangling Engine? Stanford Wrangler team was deep in JS JS runs in the browser Dynamically typed JS is the way people-centric data apps are built today! Easy to prototype & add new features Continued to attract top JS talent …and on the desktop too JS deals well with ambiguous types and structures found while in wrangling

Slide 46

Slide 46 text

Why a JS Wrangling Engine? Stanford Wrangler team was deep in JS JS runs in the browser Dynamically typed JS is the way people-centric data apps are built today! Easy to prototype & add new features Continued to attract top JS talent …and on the desktop too JS deals well with ambiguous types and structures found while in wrangling But does it perform? Does it scale?

Slide 47

Slide 47 text

Jeff Heer’s Datavore Prototype Datavore can complete queries over million-element data tables at interactive (sub-100ms) rates. in-memory column-oriented database

Slide 48

Slide 48 text

Hard to extend, performantly You will have to modify the guts of the engine to add new aggregate operators add new logic to the inner loop of the query processor (for both dense and sparse queries)

Slide 49

Slide 49 text

Type ambiguity Inconsistencies with strongly typed execution engines Core engine slows down switch (typeof inputValue) { case 'string': return inputValue; case 'number': return inputValue + '';

Slide 50

Slide 50 text

Code Generation is deceptively easy Function bodies can be analyzed: Function.prototype.toString() Functions can be generated: new Function(functionBody) Inlining function calls brought ~15-20% speed up Difficult to maintain & debug. Modest wins don’t justify the maintenance cost.

Slide 51

Slide 51 text

JavaScript runtime: fast, but not easily tamed Difficult to exploit parallelism via multi-core processors Garbage-collected language pain points: Value copies, object allocation and garbage collection 
 Small inputs easily blow up memory usage

Slide 52

Slide 52 text

Escape from JS: A Study of Alternatives DSL COMPILER INTELLIGENT
 EXECUTION Interaction via roundtrip to a server User-interface Server

Slide 53

Slide 53 text

A Clean Slate Go for performance? Maybe write our own C++/LLVM- based query engine! Java? Not viable in the browser Portability a primary concern Browser + 
 Single-node Server But … portability? LLVM

Slide 54

Slide 54 text

Surprise: What can’t a browser do these days? Native
 code execution frameworks Chrome’s Portable Native Client
 (PNaCl) asm.js Upcoming: WebAssembly Run compiled
 C and C++ code LLVM No JIT overhead Dense data representations No unpredictable garbage collection

Slide 55

Slide 55 text

Portable Native Client (PNaCl) LLVM-based, cross-compilation style toolchain https://developer.chrome.com/native-client We went with Chrome’s PNaCl Plenty of ports for popular vendor libraries (webports)

Slide 56

Slide 56 text

Wrangling and Immediacy Technical Challenges Architectural Context JavaScript and its Discontents Photon: A Clean Slate Design Outline 1 2 3 4 5

Slide 57

Slide 57 text

Performance Requirements Low memory footprint % of dataset size Immediate Feedback < 1 second response time

Slide 58

Slide 58 text

Engineering Process Start by establishing “Speed of Light” Step 1 Then add features for functionality and extensibility Step 2 Compare to speed of light; compromise judiciously. Step 3 Iterate, back to step 1 Step 4

Slide 59

Slide 59 text

Inspiration & Reading ➔ Serializable & human-readable description of data flow ➔ Partitioned (sharded) data flow ➔ Strategies for complex transformations ➔ Data locality maximization Impala HyPer Tupleware Spark

Slide 60

Slide 60 text

PHOTON Architecture DSL COMPILER INTELLIGENT
 EXECUTION Router Cache Engine Execution Nodes Scalar & Aggregate Functions

Slide 61

Slide 61 text

In-memory data layout Table Metadata Row Metadata Column 0 Column 1 Column 2 Chunk 0 Chunk 1 Column Metadata Chunk 0 Chunk 1 Chunk 0 Chunk 1 Column Metadata Column Metadata

Slide 62

Slide 62 text

In-memory data layout Row Batch Chunk 0 Row Metadata Row Batch http://arrow.apache.org/ Chunk 0 Chunk 0 Chunk 1 Row Metadata Chunk 1 Chunk 1

Slide 63

Slide 63 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T0 T1

Slide 64

Slide 64 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 0 Row Batch 1 Row Batch 2 T0 T1

Slide 65

Slide 65 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 2 T1 Row Batch 1 T0 Row Batch 0

Slide 66

Slide 66 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 2 T1 Row Batch 1 T0 Row Batch 0

Slide 67

Slide 67 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 2 T1 Row Batch 1 T0 Row Batch 0

Slide 68

Slide 68 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 2 T1 Row Batch 1 T0 Row Batch 0

Slide 69

Slide 69 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer Row Batch 2 Row Batch 0 T0 T1 Row Batch 3 Row Batch 1

Slide 70

Slide 70 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T1 Row Batch 2 T0 Row Batch 3

Slide 71

Slide 71 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T1 Row Batch 2 T0 Row Batch 3

Slide 72

Slide 72 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T1 Row Batch 2 T0 Row Batch 3

Slide 73

Slide 73 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T1 Row Batch 2 T0 Row Batch 3

Slide 74

Slide 74 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T1 Row Batch 2 T0 Row Batch 3 Row Batch 3

Slide 75

Slide 75 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T0 T1 Row Batch 3

Slide 76

Slide 76 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T0 T1 Row Batch 3

Slide 77

Slide 77 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T0 T1 Row Batch 3

Slide 78

Slide 78 text

Execution Thread Pool Source Filter Map Agg Map Sink Producer Pipe Pipe Pipe Barrier Consumer T0 Row Batch 3 T1

Slide 79

Slide 79 text

Development Toolchain CMake Handles
 cross-compilation toolchain for PNaCl & Desktop Google C++ libraries Modern C++11/14 std::unique_ptr & std::shared_ptr simplify memory management Standard
 threading library New
 container classes LLVM Clang libraries ASAN Address Sanitizer TSAN Thread Sanitizer Useful for catching large classes of bugs googletest googlemock benchmark

Slide 80

Slide 80 text

Was it all worth it?

Slide 81

Slide 81 text

Experimental Setup 86 MB dataset
 ~250k rows MacBook Pro
 (mid 2014) Experimental Setup Use Cases

Slide 82

Slide 82 text

Experimental Setup Experimental Setup COLUMN PROFILES suggestion previews suggestion previews SUGGESTION PREVIEWS

Slide 83

Slide 83 text

Spark performance problematic for populating grid Spark performance problematic for populating grid

Slide 84

Slide 84 text

Memory pressure a problem Memory pressure a problem

Slide 85

Slide 85 text

Aggregation: Computing histograms

Slide 86

Slide 86 text

Memory pressure is still an issue Memory pressure is still an issue

Slide 87

Slide 87 text

Design Requirements What is your desired workload? Low memory footprint % of dataset size Immediate Feedback < 1 second response time

Slide 88

Slide 88 text

➔ The right engine for the right job ➔ Photon is yet another payoff of (DSL + compiler + intelligent execution) ➔ Alongside Spark and Hadoop ➔ Photon specialized for UX: immediacy and scale ➔ Start by establishing speed-of-light ➔ Design to ensure it remains achievable as you grow ➔ Memory management with portability ➔ LLVM + toolchain form a powerful portability platform. ➔ Explicit memory management is critical for immediacy, even more so in-browser Lessons learned

Slide 89

Slide 89 text

Part of Trifacta’s Intelligent Execution Alongside the distributed processing of Spark and Hadoop Portability
 
 Browser, Desktop, Single-Node Server Lean usage of
 client memory Efficient and predictable. Photon performance preserves user flow No loss of context
 Immediate data wrangling Fluid UX with predictions, profiles, previews

Slide 90

Slide 90 text

Questions? @joe_hellerstein @seshness

Slide 91

Slide 91 text

No content