Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Supercharging Analytics with GPUs - Masood Krohy

Supercharging Analytics with GPUs - Masood Krohy

Masood Krohy at April 30, 2019 event of EDPP Montreal (montrealml.dev/data)

Title: Supercharging Analytics with GPUs: OmniSci/cuDF vs Postgres/Pandas/PDAL

Presentation/Demo video: check out PatternedScience's YouTube channel at https://www.youtube.com/channel/UCjbEIZlS2DA45Bswi5EXWRw

Summary: GPUs are known to significantly accelerate machine learning model training speeds, especially when using deep learning libraries like TensorFlow. But did you know that there are now solid options to also accelerate data analytics workloads, BI tools and dashboards with the help of GPUs? Join us for a presentation of performance benchmarks of GPU-based options and their CPU-based counterparts. We compare the performance that one could get from OmniSci Core DB (a GPU database) compared to the performance of Postgres DB (for data analytics) and PDAL (for LiDAR processing). On the in-memory side, we benchmark cuDF (NVIDIA's GPU DataFrame) against the widely popular Pandas DataFrame. We will share results and include some code walk-throughs and live benchmarking. Coming out of this technical talk, you will have insight regarding how GPUs can accelerate your data analytics and geospatial workloads.

Code on GitHub: https://github.com/patternedscience/GPU-Analytics-Perf-Tests

Bio: Masood Krohy is a Data Science Platform Architect/Advisor and most recently acted as the Chief Architect of UniAnalytica, an advanced data science platform with wide, out-of-the-box support for time-series and geospatial use cases. He has worked with several corporations in different industries in the past few years to design, implement and productionize Deep Learning and Big Data products. He holds a Ph.D. in computer engineering.

PatternedScience

April 30, 2019
Tweet

More Decks by PatternedScience

Other Decks in Technology

Transcript

  1. Copyright © 2019, PatternedScience Inc.
    www.patterned.science
    Supercharging Analytics with GPUs
    A GPU-vs-CPU performance benchmark
    (OmniSci Core DB / cuDF GPU DataFrame) vs (Postgres DB / Pandas DataFrame / PDAL)
    Presenter
    Masood Krohy, Ph.D.
    Version 1.2
    Apr 30, 2019

    View Slide

  2. This is an independent benchmarking carried out on the
    UniAnalytica platform by PatternedScience Inc.
    The technical talk and meetup are sponsored by
    OmniSci and PatternedScience.

    View Slide

  3. 3
    Copyright © 2019, PatternedScience Inc.
    Final Notes / Q&A
    Code Walkthrough
    & Live
    Benchmarking
    Data Specs &
    Summary of
    Results
    Presenter Bio
    Presentation Layout

    View Slide

  4. 4
    Copyright © 2019, PatternedScience Inc.
    Ph.D. in Computer Engineering
    Analytical modeling of botnets. Validated by data collected in industry. 3 top publications.
    Senior Analyst, Rogers
    Managing the analytics reporting/statistical analyses of the national benchmarking program.
    Data Scientist, Intact
    First Data Scientist of the company. Led the Big Data mining project for the UBI program.
    Lead Data Scientist, CN
    Implemented an object-within-object detection system to detect cracks in railway equipment.
    Masood Krohy
    Presenter Bio
    2013
    Sr Data Science Advisor, B.Yond
    Implemented a pattern detection system for stream of alarms coming from telecom devices.
    Chief Architect, UniAnalytica (advanced data science platform)
    Platform contains the open-source technologies benchmarked here, among many others.
    2014
    2016
    2017
    2018
    2019
    Data Science Platform Architect & Advisor

    View Slide

  5. 5
    Copyright © 2019, PatternedScience Inc.
    In-memory
    DataFrames
    0
    1
    2
    3
    4
    5 sec
    GPU
    Processing Times (financial time-series)
    Mean and variance of a column (trading volume)
    CPU
    cuDF
    On-disk
    Databases
    Pandas OmniSci Postgres
    8.72 ms
    673 ms
    Bar heights are approximate. Numbers are exact.
    First-run result for OmniSci (2nd and subsequent
    runs are faster due to caching)
    1.6 s
    3.05 s
    CPU node/worker
    8 vCPUs, ~60GB RAM
    GPU node/worker
    V100 GPU, 8 vCPUs, ~60GB RAM
    CPU model in both workers:
    Intel Xeon CPU E5-2686 v4 @ 2.30GHz
    Hardware Specs
    ● Covering last 20 years for 63 ETF symbols
    ● Average per-minute price (available when traded)
    ● 50 million records
    ● 3.5 GB CSV file size
    ● 6 GB in-memory size (RAM, Pandas DF)
    ● 5 GB in-memory size (GPU memory, cuDF)
    Data Specs

    View Slide

  6. 6
    Copyright © 2019, PatternedScience Inc.
    In-memory
    DataFrames
    0
    3
    6
    9
    12
    15 sec
    GPU
    Processing Times (financial time-series)
    Sorting a column (trading volume)
    CPU
    cuDF
    On-disk
    Databases
    Pandas OmniSci Postgres
    374 ms
    15.1 s
    604 ms
    11.4 s
    CPU node/worker
    8 vCPUs, ~60GB RAM
    GPU node/worker
    V100 GPU, 8 vCPUs, ~60GB RAM
    CPU model in both workers:
    Intel Xeon CPU E5-2686 v4 @ 2.30GHz
    Hardware Specs
    ● Covering last 20 years for 63 ETF symbols
    ● Average per-minute price (available when traded)
    ● 50 million records
    ● 3.5 GB CSV file size
    ● 6 GB in-memory size (RAM, Pandas DF)
    ● 5 GB in-memory size (GPU memory, cuDF)
    Data Specs
    Bar heights are approximate. Numbers are exact.
    First-run result for OmniSci (2nd and subsequent
    runs are faster due to caching)

    View Slide

  7. 7
    Copyright © 2019, PatternedScience Inc.
    In-memory
    DataFrames
    0
    3
    6
    9
    12
    15 sec
    GPU
    Processing Times (financial time-series)
    Mixed analytics (math ops + sorting) [largest return in %]
    CPU
    cuDF
    On-disk
    Databases
    Pandas OmniSci Postgres
    327 ms
    10.7 s
    3.6 s
    13.2 s
    CPU node/worker
    8 vCPUs, ~60GB RAM
    GPU node/worker
    V100 GPU, 8 vCPUs, ~60GB RAM
    CPU model in both workers:
    Intel Xeon CPU E5-2686 v4 @ 2.30GHz
    Hardware Specs
    ● Covering last 20 years for 63 ETF symbols
    ● Average per-minute price (available when traded)
    ● 50 million records
    ● 3.5 GB CSV file size
    ● 6 GB in-memory size (RAM, Pandas DF)
    ● 5 GB in-memory size (GPU memory, cuDF)
    Data Specs
    Bar heights are approximate. Numbers are exact.
    First-run result for OmniSci (2nd and subsequent
    runs are faster due to caching)

    View Slide

  8. Code
    Walkthrough
    & Live
    Benchmarking

    View Slide

  9. 9
    Copyright © 2019, PatternedScience Inc.
    Data: Montreal LiDAR aerial scan
    Benchmarking of an operation on a single tile
    Source: City of Montreal (http://donnees.ville.montreal.qc.ca/dataset/lidar-aerien-2015)

    View Slide

  10. 10
    Copyright © 2019, PatternedScience Inc.
    Data: Montreal LiDAR aerial scan
    Sherbrooke St.
    Berri St.

    View Slide

  11. 11
    Copyright © 2019, PatternedScience Inc.
    Data: Montreal LiDAR aerial scan
    Cropping a building out of a tile

    View Slide

  12. 12
    Copyright © 2019, PatternedScience Inc.
    On-disk database
    and filesystem
    0
    4
    8
    12
    16
    20 sec
    GPU
    Processing Times (LiDAR)
    Cropping a building out of a tile
    CPU
    OmniSci PDAL
    268 ms
    17 s
    Montreal LiDAR aerial scan
    Stats of 1 tile:
    ● 18,306,827 points
    ● 82 MB Laz
    ● 1.6 GB CSV
    ● 681 MB in the OmniSci DB
    Data Specs
    CPU node/worker
    8 vCPUs, ~60GB RAM
    GPU node/worker
    V100 GPU, 8 vCPUs, ~60GB RAM
    CPU model in both workers:
    Intel Xeon CPU E5-2686 v4 @ 2.30GHz
    Hardware Specs
    Bar heights are approximate. Numbers are exact.
    First-run result for OmniSci (2nd and subsequent
    runs are faster due to caching)

    View Slide

  13. Code
    Walkthrough
    & Live
    Benchmarking

    View Slide

  14. 14
    Copyright © 2019, PatternedScience Inc.
    Final Notes
    ● CUDA 10.0
    ● Postgres 10-2
    ● Pandas 0.24
    ● OmniSci (MapD) Core DB v4.4.2
    More specifically, v20190123 (commit 568e77d)
    [just after v4.4.2 due to CUDA 10 support]
    ● cuDF (GPU DataFrame) 0.5.0
    ● PDAL 1.8.0
    Component versions evaluated
    https://github.com/patternedscience/GPU-Analytics-Perf-Tests
    Code and results are on GitHub
    To discuss technical results and give feedback, please email
    us at [email protected]
    We love feedback!
    https://www.linkedin.com/company/patterned-science/
    Follow us for future results & announcements!

    View Slide

  15. Q&A

    View Slide