Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU_accelerated Interactive Exploratory Big Dat...

OmniSci
August 28, 2018

GPU_accelerated Interactive Exploratory Big Data Analysis with MapD

A hands-on workshop at the 6th Annual Global Big Data Conference in Santa Clara.

OmniSci

August 28, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. © MapD 2018 GPU-Accelerated Interactive Exploratory Big Data Analysis with

    MapD Global Big Data Conference, Santa Clara Veda Shankar & Wamsi Viswanath, MapD | August 28, 2018
  2. © MapD 2018 Veda Shankar Sr Developer Advocate , MapD

    Community [email protected] slides: https://speakerdeck.com/mapd Wamsi Viswanath Data Scientist, MapD [email protected]
  3. © MapD 2018 AGENDA • Overview of MapD • Demo

    - MapD Immerse • Lab : Launch MapD Docker Image on Laptop or AWS • Lab : Google Analytics example and introduction to PyMapD • Lab : Machine Learning Pipeline with GPU Data Frame • MapD GeoSpatial Features • Lab : Rendering GeoSpatial Data • Q&A
  4. © MapD 2018 13 Exploring Google Analytics with MapD •

    Now that you have successfully launched MapD as a docker image from the previous lab. Let us ingest some real-world data into it. • In this lab, we will follow the steps detailed in the following blogs to ingest web analytics data into MapD and create dashboards that provide insights into the activities on the site. ◦ Exploring Session-Level Google Analytics Data - Part 1 ◦ Exploring Session-Level Google Analytics Data - Part 2 • You can find articles covering other use cases at the MapD Blog site.
  5. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 1 4 SSD or NVRAM STORAGE (L3) 250GB to 20TB 1-2 GB/sec CPU RAM (L2) 32GB to 3TB 70-120 GB/sec GPU RAM (L1) 24GB to 256GB 1000-6000 GB/sec Hot Data Speedup = 1500x to 5000x Over Cold Data Warm Data Speedup = 35x to 120x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  6. © MapD 2018 The GPU Open Analytics Initiative (GOAI) Seamless

    data interchange framework in GPU memory 1 5
  7. © MapD 2018 The GPU Open Analytics Initiative (GOAI) Creating

    common data frameworks to accelerate data science on GPUs 1 6 /mapd/pymapd /gpuopenanalytics/pygdf
  8. © MapD 2018 • We’ve published a few notebooks showing

    how to connect to a MapD database and use an ML algorithm to make predictions 17 Github ML Examples https://github.com/mapd/mapd-ml-demo
  9. © MapD 2018 Geospatial Objects POINT A point described by

    two coordinates POLYGON A set of one or more rings (closed linestrings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings) LINESTRING A sequence of 2 or more points and the lines that connect them MULTIPOLYGON A set of one or more polygons
  10. © MapD 2018 Supported Geo File Formats Type Description GeoJSON

    Uses the JavaScript Object Notation (JSON) open data standard for storing geographical features as key-value pairs. ESRI Shapefile Consists of a group of files (.shp, .shx, .dbf etc) that need to be stored in the same directory or part of a zip file. The .shp file contains the feature geometry itself. KML Keyhole Markup Language is based on XML, using tag-based structure with nested elements and attributes to store geographic data. CSV/TSV with WKT Well Known Text (WKT) is a text markup language for representing vector geometry objects on a map.
  11. © MapD 2018 Geospatial Functions Spatial Relationship and Measurement Functions

    ST_Distance ST_Contains Returns shortest planar distance between geometries. Returns shortest geodesic distance between geographies (in meters, limited support) . Returns true if first geometry contains the second one.
  12. © MapD 2018 Categories of common MapD use cases 2

    2 Operational Analytics • Thwart Banking Fraud • Scan for Cyber Threats • Fine-tune Advertising • Maintain the Utility Grid Geospatial Analytics • Monitor Networks • Ready Logistics • Forecast Micro-weather Data Science • Model Financial Markets • Predict Maintenance • Predict Staffing Levels
  13. © MapD 2018 © MapD 2018 • community.mapd.com Ask questions

    and share your experiences • mapd.com/cloud Try 14-day free trial, no credit card needed • mapd.com/demos Play with our demos • mapd.com/platform/download-community/ Get our free Community Edition and start playing 23 Next Steps