Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speed Meets Scale: Interactively Analyzing & Vi...

OmniSci
December 09, 2019

Speed Meets Scale: Interactively Analyzing & Visualizing Billions of Rows of Spatiotemporal Data

Data analytics is fundamentally changing: millions of rows are becoming billions of rows (either through more fine-grained collection, or through mash-ups with relevant, third-party data), and geospatial and time series data types are becoming more commonplace. On top of these technical challenges, analysts are also being pushed to make more effective, data-driven decisions in real time, with their data. The shift last decade from legacy databases to in-memory databases helped, but the speed and scale of traditional solutions has not kept pace with these challenges, and the lagging user experience costs time and money for companies that are increasingly data rich but insight poor. In this talk we’ll look at a way to classify this new category of big, visual, interactive data, and look at how OmniSci leverages the fastest hardware (fast memory, fast parallel processing) to run SQL queries hundreds of times faster than traditional tools. We will demonstrate a 10B row dataset, that can be queried in less than 300 milliseconds without any indexing or aggregation of the data.

OmniSci

December 09, 2019
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. Speed Meets Scale: Interactively Analyzing & Visualizing Billions of Rows

    of Spatiotemporal Data Minneanalytics | Minneapolis | December 9, 2019
  2. Technological Advantages Exploit modern compilation techniques in analytic workflows Efficiently

    use the modern memory hierarchy Rethink analytic operations for modern hardware 7
  3. Points and Polygons 1B Taxi Rides + 1M Buildings public

    demo: https://omnisci.com/demos/taxis/
  4. 14 Efficient use of the modern memory hierarchy Minimize unnecessary

    data movement and exploit spatial/temporal locality SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-4 GB/sec CPU RAM (L2) 32GB to 3TB 140-560 GB/sec GPU RAM (L1) 32GB to 256GB 1-7 TB/sec Hot Data Speedup = 250x to 1750x Over Cold Data Warm Data Speedup = 35x to 140x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  5. 15 10111010101001010110101101010101 00110101101101010101010101011101 Exploit modern compiler infrastructure for analytics LLVM-based

    JIT compilation of both SQL queries and User-Defined kernels Traditional Analytics Engines use a ‘Chain of Iterators’ model (VOLCANO) • Each operator in SQL treated as a separate function • Incurs significant overhead and prevents vectorization OmniSci compiles both queries and UDF kernels using LLVM • LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc). • Code can be generated to run query on CPU and GPU simultaneously • Queries and UDFs can run at speeds approaching hand-written functions • Also allows support of modern analytic frontends - Python, Julia, Swift for greater productivity
  6. © OmniSci 2018 • omnisci.com/demos Play with our live demos

    for yourself! • omnisci.cloud Get an OmniSci instance in 60 seconds • omnisci.com/platform/downloads/ Download a 30-day trial of OmniSci • community.omnisci.com Ask questions and share your experiences Self Discovery
  7. USED BY 100+ GLOBAL ORGS $92 MILLION IN FUNDING OPEN-SOURCE

    COMMUNITY About OmniSci TOP-TIER VENTURE BACKING