Workshop: GPU-Accelerated Interactive Exploratory Big Data Analysis

GPU-Accelerated Interactive Exploratory Big Data Analysis AI Tech Community and
Learning 2018, Sunnyvale Julian Hyde, Vraj Pandya & Veda Shankar, OmniSci | October 16th, 2018

Veda Shankar Developer Advocate , OmniSci Community [email protected] Vraj Pandya
Developer, OmniSci [email protected] slides: https://speakerdeck.com/omnisci Julian Hyde Architect , Looker [email protected]

© OmniSci 2018 Data Grows Faster Than CPU Processing Data
Growth 40% per year CPU Processing Power 20% per year

The Fastest Software Designed for the Fastest Hardware HARNESS GPUs

© OmniSci 2018 7 OmniSci Innovations Powering Extreme Analytics 3-Tier
Memory Caching Query Compilation In-Situ Rendering

DEMO : OmniSci Immerse

OmniSci Deep Dive

Agenda • GPU Programming • Memory layout • Extension Functions
Note: Commit hash for reference e744544eadfd5e7ebdcf6d2e324995e599ab4907

Philosophy of GPU programming CPU Thread GPU Threads

Philosophy of GPU programing To Program for a group of
thread, is to program for a single thread

A simple problem SELECT a FROM t WHERE a <
27; a Results

A simple solution int32_t arr[arr_size]; int32_t res[arr_size]; for (auto i
= 0; i < arr_size; i++) { if (arr[i] < key) { res[i] = arr[i]; } } const size_t worker_count = std::thread::hardware_concurrency(); std::vector<std::thread> workers; for (int worker_idx = 0; worker_idx < worker_count; ++worker_idx) { workers.emplace_back([&arr, &res, worker_idx, worker_count, arr_size, key]() { for (size_t id = worker_idx; id < arr_size; id += worker_count) { if (arr[id] < key) { res[id] = arr[id]; } } }); } for (auto& worker : workers) {

A simple Solution for GPU __global__ void filter(int32_t* col, int32_t*
res, int32_t key) { int32_t t_id = (blockIdx.x * NUMTHREAD) + threadIdx.x; if(arr[t_id] < key) { res[t_id] = arr[t_id]; } }

Switching Gears!

Calcite RA to LLVM IR • OmniSci generates LLVM IR
using the RA nodes generated by calcite • OmniSci generate IR for every RA node. • OmniSci currently supports 8 RA nodes • EnumerableTableScan, LogicalProject, LogicalFilter, LogicalAggregate, LogicalJoin, LogicalSort, LogicalValues and LogicalTableModify • Use explain calcite to generate RA nodes of the query • Use explain to generate LLVM IR of the query Please Read a note on how OmniSci uses calcite here: https://www.omnisci.com/blog/fast-and-flexible-query-analysis-at-mapd-with-apache-calcite-2/

RA of the simple problem mapdql> explain calcite select a
from t where a < 27; Explanation LogicalProject(a=[$0]) LogicalFilter(condition=[<($0, 27)]) EnumerableTableScan(table=[[mapd, t]])

High level code generation

A simple OmniSci solution define i32 @row_func_hoisted_literals(i64* %group_by_buff, i64* %small_group_by_buff,
i32* %crt_match, i32* %total_matched, i32* %old_total_matched, i64* %agg_init_val, i64 %pos, i64*%frag_row_off, i64* %num_rows_per_scan, i8* %literals, i8* %col_buf0, i64* %join_hash_tables, i32 %arg_literal_0) #21 entry: %0 = call i64 @fixed_width_int_decode(i8* %col_buf0, i32 4, i64 %pos) %1 = trunc i64 %0 to i32 %2 = call i8 @lt_int32_t_nullable_lhs(i32 %1, i32 %arg_literal_0, i64 -2147483648, i8 -128) %3 = icmp sgt i8 %2, 0 %4 = and i1 true, %3 br i1 %4, label %filter_true, label %filter_false Note: lt_int32_t_nullable_lhs is a preprocessor generated function See QueryEngine/RuntimeFunctions.cpp

Runtime Functions OmniSci has defined runtime functions to be generated
and linked at runtime. These functions are very primitive in their implementations E.g. agg_sum_int32_shared(i32*, i32), ExtractFromTime(i32, i64), DateAddNullable(i32, i64, i64, i32, i64) Find function declarations in QueryEngine/NativeCodegen.cpp Line #288

Memory layout • Cache what we need in LRU manner
on every level. • Fragment: Horizontal partition of table • Chunk: Vertical partition of the Fragment • Use \memory_cpu and \memory_gpu to query the chunk placement for in each locality • Chunk Stats: min, max, is_null

Data in disk • In “data” directory there are four
directory • mapd_data, mapd_catalog, mapd_logs, mapd_export • Storage format is : mapd_data/table_<db_id>_<table_id> e.g. mapd_data/table_0_4

Chunk Key Chunk is primary container of data which can
be identified by chunk key DB_ID, Table_ID, Frag_ID, Col_ID

Memory Layout mapdql> select a from t where a <
27; a 1 2 …. 3 7 rows returned. mapdql> \memory_gpu MapD Server Detailed GPU Memory Usage: Maximum Bytes for one page: 512 Bytes Maximum Bytes for node: 5473 MB Memory allocated: 2048 MB GPU[0] Slab Information: SLAB ST_PAGE NUM_PAGE TOUCH CHUNK_KEY 0 0 1 0 USED 1,4,1,0, 0 1 4194303 14 FREE ---------------------------------------------------------------

Extension Function • Non standard functions for relational algebra which
has a very well defined algorithm. • Simple examples: sin, tan, likely, now, etc. • Interesting examples: All Geo functions i.e. ST_contains, ST_distance, etc. • Extension functions could be filter or projection functions • You can add your own easily Your function Here

Internals of Extension functions • Compiled all extension function to
llvm AST (Abstract Syntax Trees) at compile time. See QueryEngine/CMakeList.txt #150, #160 • Calcite loads the available extensions from `ExtensionFunctions.ast`, adds them to its operator table and shares the list with the execution layer in JSON format. Build an in-memory representation of that list here so that it can be used by getLLVMDeclarations(), when the LLVM IR codegen asks for it. • getLLVMDeclarations() converts JSON signature to LLVM representation.

Example Extension Function EXTENSION_NOINLINE double distance_in_meters(const double fromlon, const double
fromlat, const double tolon, const double tolat) { double latitudeArc = (fromlat - tolat) * 0.017453292519943295769236907684886; double longitudeArc = (fromlon - tolon) * 0.017453292519943295769236907684886; double latitudeH = sin(latitudeArc * 0.5); latitudeH *= latitudeH; double lontitudeH = sin(longitudeArc * 0.5); lontitudeH *= lontitudeH; double tmp = cos(fromlat * 0.017453292519943295769236907684886) * cos(tolat * 0.017453292519943295769236907684886); return 6372797.560856 * (2.0 * asin(sqrt(latitudeH + tmp * lontitudeH))); }

You can add your own extension function A very well
defined way to add an extension function into OmniSci core.

Add your own extension function 1. Add Function definition/implementation in
ExtensionFunctions.hpp / ExtensionFunctionsGeo.hpp 2. Make Calcite aware of the function by adding function information in MapDSqlOperatorTable.java 3. Add Logic in RelationalAlgebraTranslator.cpp See the diff file: https://gist.github.com/VrajPandya/152212536f6910178aa066d728acd133

© OmniSci 2018 32 Lab: Cloud API Access with PyMapD
• Launch OmniSci Cloud Instance • Generating API Keys • Connecting to OmniSci • Database Table Info • Loading Data & Table Creation • Creating Dashboard

© OmniSci 2018 33 OmniSci Geospatial Features • Geospatial objects
◦ POINT, LINESTRING, POLYGON, MULTIPOLYGON • Geospatial File Formats ◦ GeoJSON, ESRI Shapefile, KML and CSV/TSV with WKT • Geospatial Functions ◦ Geometry Constructors ◦ Geometry Editors ◦ Geometry Accessors ◦ Spatial Relationships and Measurements ▪ ST_Distance, ST_Contains, ST_Within, ST_Area, ST_Perimeter, ST_Length

© OmniSci 2018 36 GPU Open Analytics Initiative (GOAI) Creating
common data frameworks to accelerate data science on GPUs

© OmniSci 2018 Four Ways to Get Started GitHub repo
OPEN SOURCE Website download COMMUNITY OmniSci as a service OMNISCI CLOUD Contact sales ENTERPRISE 39 Ask questions and share your experiences @ https://community.omnisci.com

THANK YOU Questions, comments, resumes(willing to split the referral fees)?
[email protected] @vrajspandya [email protected] @veda_shankar

Workshop: GPU-Accelerated Interactive Explorato...

Workshop: GPU-Accelerated Interactive Exploratory Big Data Analysis

More Decks by OmniSci

Other Decks in Technology

Featured

Transcript