Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OmniSci Converge Community Day

B368ef65fbf835fc57b08617f9b8d5a5?s=47 OmniSci
October 21, 2019

OmniSci Converge Community Day

Community Day Keynote & Hackathon Kickoff
Speaker: Aaron Williams, VP of Global Community, OmniSci

Lightning Talk: IoT Data Integration with StreamSets for Analytics in OmniSci
Speaker: Pat Patterson, Director of Evangelism, StreamSets

Lightning Talk: Real-time Automatic Identification System (AIS) Data Analytics
Speaker: Umesh Gupta, PhD Student, NC State University

Lightning Talk: Leveraging Data and Analytics to Custom Fit Retail Logistics
Speaker: Madhav Sadhu, Senior IT Director, Tailored Brands

Lightning Talk: AI-Ready Building Blocks for Deep-learning Based Accelerated Workflow and Data Curation
Speaker: Jacci Cenci, Sr. Technical Marketing Engineer, NVIDIA

Lightning Talk: The Future of Energy Market Analytics with OmniSci
Speaker: Alan Lipe, Principal, i2enabled, Inc.

B368ef65fbf835fc57b08617f9b8d5a5?s=128

OmniSci

October 21, 2019
Tweet

Transcript

  1. Welcome to Community Day

  2. Aaron Williams VP, Global Community @_arw_ #OmniSciConverge aaron@omnisci.com /in/aaronwilliams/ /williamsaaron

    speakerdeck.com/omnisci
  3. So we begin ...

  4. Community Day Schedule 1:00 Keynote and Lightning Talks NVIDIA, StreamSets,

    i2 Enabled, Tailored Brands, NC State 2:00 Workshops 1) OmniSci 101: Accelerating the Data Science Workflow (Boole) 2) Creating Custom Viz and Apps Using OmniSci (Lovelace) 4:40 Leaders Panel An Exclusive Peek into the Brains of OmniSci 5:15 Shuttles to the Welcome Reception Meet outside to return to the Hyatt
  5. Why Are We All Here?

  6. Volume Spatiotemporal Agility V A ST

  7. OmniSci for Good • Our goal is to accelerate the

    work good people are doing, using data, to make a difference in our communities • Program includes a free OmniSci license for non-profits and researchers • To propose a project, or get more info, email: community@omnisci.com • Check out the Flint Water team’s talk on Wednesday @ 1:30 pm in Boole
  8. Converge Preview

  9. • Crowd-sourced mobile phone data from Tutela • Free OmniSci

    Cloud instances courtesy of AWS and OmniSci • 3 interesting data challenges https://community.omnisci.com/converge-challenge-19 • Experts from OmniSci and AWS are ready to help • Look for the Tech Team table in the exhibit hall, and people with this >> pin on their lanyard Converge Data Challenges OmniSci Tech Team Uyanga Sean Dennis Israel Mike
  10. Community Talks @ Converge 1:00 Manipulating Space and Time Without

    Infinity Stones: Intro to Immerse -- in Lovelace Dr. Michael Flaxman, Founder, Geodesign Technologies Tuesday 3:00 Real-time Automatic Identification System (AIS) Data Analytics -- in Lovelace Umesh Gupta, PhD Student, NC State University 2:10 RAPIDS: The Platform Inside and Out -- in Lovelace Josh Patterson, GM of Data Science, NVIDIA
  11. Community Talks @ Converge 12:00 DevOps for Data Integration --

    in Boole Pat Patterson, Director of Evangelism, StreamSets Wednesday 2:10 Mapping the Invisible: Finding Places to Build Dwelling Units in Los Angeles -- in Lovelace Benjamin Pezzillo, CEO, Pactriglo 1:30 Flint Water Crisis: Data-Driven Solutions & Transparency -- in Boole Jared Webb, Data Scientist, University of Michigan 3:00 The Galaxy in a Dashboard: Visualizing the Milky Way using OmniSci -- in Lovelace Samantha Chappell, Data Scientist, UCLA
  12. Pat Patterson Director of Evangelism pat@streamsets.com @metadaddy IoT Data Integration

    with StreamSets for Analytics in OmniSci
  13. 13 © StreamSets, Inc. All rights reserved. Use Case ?

  14. 14 © StreamSets, Inc. All rights reserved. JDBC? JDBC

  15. 15 © StreamSets, Inc. All rights reserved. Issue “When inserting

    data, it is best to load data in batches rather than loading one row at a time (as you might with a streaming data source). The overhead for loading data is comparatively high for each transaction, regardless of the number of rows you insert.”
  16. 16 © StreamSets, Inc. All rights reserved. Apache Kafka Open

    source stream-processing platform High-throughput, low-latency Publish/subscribe via commit log Decouple producer from consumer
  17. 17 © StreamSets, Inc. All rights reserved. KafkaImporter?

  18. 18 © StreamSets, Inc. All rights reserved. The StreamSets DataOps

    Platform Data Lake
  19. 19 © StreamSets, Inc. All rights reserved. A Swiss Army

    Knife for Data
  20. 20 © StreamSets, Inc. All rights reserved. StreamSets Data Collector

    StreamSets Data Collector Edge for constrained environments StreamSets Data Collector for maximum connectivity
  21. 21 © StreamSets, Inc. All rights reserved. Solution StreamSets Data

    Collector Edge StreamSets Data Collector JDBC
  22. 22 © StreamSets, Inc. All rights reserved. F1 Demo

  23. None
  24. thank you Pat Patterson Director of Evangelism pat@streamsets.com @metadaddy

  25. K. Umesh PhD Student Center for Geospatial Analytics NC State

    University Vessel AIS Data Analytics
  26. AIS Data Information MMSI Navigation status SOG & COG Position

    accuracy Lat-Long Timestamp Vessel name Vessel type Source Destination Big Reports vessel characteristics every six minutes by each large vessel on water, 100 million records everyday Critical To handle the speed of streaming big data. For real-time data analytics & action. For interactive visualization
  27. Why Now

  28. Research Questions with AIS Data

  29. Approach

  30. Spatial Relationship • Density by vessel type: Analysing trends in

    data • Time based: Studying vessel moving patterns • Heat analysis: Identifying busy areas • Planning: Knowing the situational route • Predictive analysis: Answering different pattern problems • Least-cost analysis: Optimization
  31. Future with HP Visual Spatial Analysis

  32. Short Demo

  33. None
  34. thank you

  35. Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and

    Marketing Systems Delivering the Supply Chain How Data and Analytics Can Transform Logistics
  36. • Managed multiple IT departments with primary focus on Data

    Engineering and Business Intelligence for the past 20 years. • Implemented multiple large size data warehouse projects on MPP architecture and on Cloud and enabled companies for self-service analytics. • Primary focus at Tailored brands is to enable technologies so that business decisions can made faster with little dependency on IT Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and Marketing Systems
  37. Who & Why Who is Tailored Brands? Tailored Brands: Men’s

    Wearhouse, Jos A. Bank, Joseph Abboud, KNG, Moore's Clothing • Biggest Men’s specialty retailer • 1,500+ retail locations & Omni-channel • 11 Distribution Centers/Hubs • 6 National Tailoring Centers • 28 Company Owned Depots • Fleet of 225 delivery vehicles • 3,000 Personnel in Supply Chain Services WHY • Inconsistent reporting • Incomplete merchant visibility • DC visibility of inbound volumes • Focus – Speed to Customer • Inventory Control
  38. Visibility Market • Visibility/Traceability market is overcrowded • Half of

    the vendors at any supply chain conference provides visibility solutions • Yet, not a single solution will provide end to end visibility
  39. End to End Logistics Visibility PO Issued to vendor -

    ERP ASN Issued by vendor - TMS Shipment Info – TMS, UPSS Reached Yard - UPSS Yard to DC – Truck Carriers DC Yard Mgmt – TMS DC receiving– WMOS Sort to Stores– Sortation Ship to Stores – Agile Small Parcel – Local Fleet, Fedex Store Receiving – SIF
  40. Technology Stack EDI FTP Data Extracts Fedex/UPS/Cheetah API BI Layer

    Storage Layer Data Replication Tool DB Data Replication MSTR and Tableau are used for reporting and dashboarding Data science Tools to create Labor forecasting models / Delivery estimation ETL/ELT Tool Emails
  41. None
  42. thank you

  43. AI-Ready Building Blocks for Deep-Learning Based Accelerated Workflow and Data

    Curation Jacci Cenci, Senior Technical Marketing Engineer at NVIDIA
  44. DRIVE INFRASTRUCTURE AT NVIDIA Developing AI for AV at Massive

    Scale
  45. THE VALUE OF AI INFRASTRUCTURE WITH DGX REFERENCE ARCHITECTURES Reference

    architectures from NVIDIA and leading storage partners SCALABLE PERFORMANCE Simplified, validated, converged infrastructure offers FASTER, SIMPLIFIED DEPLOYMENT TRUSTED EXPERTISE AND SUPPORT Available through select NPN partners as a turnkey solution DGX RA Solution Storage
  46. OBJECTIVE • DGX-1 “POD” = DGX-1 Performance Optimized Design •

    Work with Partners to understand the repeatable, GPU-intensive, and revenue-generating automotive AV use cases • Help partners develop prescriptive reference architectures that guide customers move from single node designs to large scale DGX POD deployments DGX POD RA Automotive Solutions DGX POD Config Tailored to Automotive Use Cases Data Ingest / Mining AI Training / Inference Replay Automotive Use Cases + Partner Solution + = AUTOMOTIVE HIGHER EDUCATION HEALTH CARE TELCOS FINANCIAL SERVICES Certification (Validated by RAPS:Lab) [+ ] DGX POD CUDA-X AI Applications DGX-1 POD SOLUTIONS VALUE BASED APPROACH (1)Compute + (2)Storage + (3)Network + (4)Validated Solutions
  47. USE CASES

  48. DESIGNING INFRASTRUCTURE THAT SCALES Insights gained from deep learning data

    centers Rack Design Networking Storage Facilities Software • DL drives close to operational limits • Similarities to HPC best practices • IB or Ethernet based fabric • 100Gbps inter-connec t • High-bandwi dth, ultra-low latency • Datasets range from 10k’s to millions objects • TBs to PBs of storage • High IOPS, throughput, low latency • Assume higher watts per-rack • Higher FLOPS/watt = DC less floorspace required • Scale requires “cluster-awa re” software Example: • Autonomous vehicle = 1TB / hr • Training sets up to 500 PB • RN50: 113 days to train • Objective: 7 days • 6 simultaneous developers = 97 node cluster 1500+ node cluster…lessons learned
  49. Ingest Curate Label Train Replay (DNN Validation) Simulate (System Validation)

    AV Development Develop Test Integrate DNN Development OTA OTA Collect Data On Road Test Validation NVIDIA DRIVE End-to-End Platform for AV Development & Validation
  50. • OmniSciDB is the ideal SQL engine for IoT big

    data. • Specifically developed to harness the parallel processing power of GPUs, OmniSciDB is capable of unprecedented ingestion speeds and can query up to billions of rows in milliseconds. • With OmniSci and the ONTAP AI validated architecture, you get an infrastructure designed to enable reliable high performance from end to end. • The NetApp AFF A800 array is capable of feeding data to NVIDIA DGX-1 systems up to four times faster than competing solutions. Read throughput of up to 300GBps per all-flash cluster. NetApp ONTAP AI and OmniSci Solution Architecture Overview OmniSciDB
  51. None
  52. thank you

  53. Analyzing Energy Market Data Challenges and Opportunities Alan Lipe, Principal

    at i2 Enabled, Inc.
  54. For over 20 years, Alan has been an executive in

    the Energy Analysis, Trading and Risk Management space. Alan has been engaged as a business and technical specialist for the past 15 years, focusing specifically in energy market analysis, trading systems, and analytics based ERP integration. With the founding i2 Enabled, Inc. in 2017, his focus is currently on building market analysis and visualization platforms for Fortune 500 corporate analytical needs and intelligence capabilities for global government policy needs. Email: alan.lipe@i2enabled.com LinkedIn: https://www.linkedin.com/in/alan-lipe-516417/ Website: https://www.i2enabled.com/
  55. Challenge I Crude Production Trends Natural Gas Production Trends Global

    Changes in Energy Production and Consumption Global Energy Consumption Needs
  56. Challenge II Changes in Power Generation Mix Example: UK Power

    Generation Stack
  57. Source/Commodity • Oil, Gas, Coal, Nuclear • Solar, Wind, Hydro,

    Geothermal • Fusion? Location • Origin • Destination • Supply Chain Time Period • History • Current • Futures Challenge III Increasing Data Volume and Dimensionality * US Energy Infrastructure – Energy Information Administration (EIA)
  58. OmniSci Advantages for Meeting Energy Analytics Challenges • On-Demand queries

    for large data-sets • Near real-time analytical processing and visualization • SQL for consolidated App Dev and Business Analyst workflows (ex. Immerse/Vega capabilities + BI Tool accessibility) • GPU Based Machine Learning Optionality
  59. Initial i2 Use-Case for OmniSci Platform US Natural Gas Meter

    Flow Data • ~ 40,000 Meters 5 times/day • ~ 200MM Rows for initial use-case • Immerse/Vega Tooling
  60. BI vs OmniSci Comparison BI Approach • Postgres DB •

    Standard BI Tooling (Tibco Spotfire, Tableau) • Query Time – 1-4 Minutes * ~ 200MM Rows of Gas Meter Data OmniSci Approach • Immerse and Vega Based UI • Basic GPU Server Instance • Query Time - Milliseconds
  61. None
  62. thank you

  63. OmniSci 101: Accelerating the Data Science Workflow Location: Boole Room

    Creating Custom Visualizations and Applications Using OmniSci Location: Lovelace Room Workshop Sessions 2:15 - 4:30 PM