Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OmniSci Converge Community Day

OmniSci
October 21, 2019

OmniSci Converge Community Day

Community Day Keynote & Hackathon Kickoff
Speaker: Aaron Williams, VP of Global Community, OmniSci

Lightning Talk: IoT Data Integration with StreamSets for Analytics in OmniSci
Speaker: Pat Patterson, Director of Evangelism, StreamSets

Lightning Talk: Real-time Automatic Identification System (AIS) Data Analytics
Speaker: Umesh Gupta, PhD Student, NC State University

Lightning Talk: Leveraging Data and Analytics to Custom Fit Retail Logistics
Speaker: Madhav Sadhu, Senior IT Director, Tailored Brands

Lightning Talk: AI-Ready Building Blocks for Deep-learning Based Accelerated Workflow and Data Curation
Speaker: Jacci Cenci, Sr. Technical Marketing Engineer, NVIDIA

Lightning Talk: The Future of Energy Market Analytics with OmniSci
Speaker: Alan Lipe, Principal, i2enabled, Inc.

OmniSci

October 21, 2019
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. Community Day Schedule 1:00 Keynote and Lightning Talks NVIDIA, StreamSets,

    i2 Enabled, Tailored Brands, NC State 2:00 Workshops 1) OmniSci 101: Accelerating the Data Science Workflow (Boole) 2) Creating Custom Viz and Apps Using OmniSci (Lovelace) 4:40 Leaders Panel An Exclusive Peek into the Brains of OmniSci 5:15 Shuttles to the Welcome Reception Meet outside to return to the Hyatt
  2. OmniSci for Good • Our goal is to accelerate the

    work good people are doing, using data, to make a difference in our communities • Program includes a free OmniSci license for non-profits and researchers • To propose a project, or get more info, email: [email protected] • Check out the Flint Water team’s talk on Wednesday @ 1:30 pm in Boole
  3. • Crowd-sourced mobile phone data from Tutela • Free OmniSci

    Cloud instances courtesy of AWS and OmniSci • 3 interesting data challenges https://community.omnisci.com/converge-challenge-19 • Experts from OmniSci and AWS are ready to help • Look for the Tech Team table in the exhibit hall, and people with this >> pin on their lanyard Converge Data Challenges OmniSci Tech Team Uyanga Sean Dennis Israel Mike
  4. Community Talks @ Converge 1:00 Manipulating Space and Time Without

    Infinity Stones: Intro to Immerse -- in Lovelace Dr. Michael Flaxman, Founder, Geodesign Technologies Tuesday 3:00 Real-time Automatic Identification System (AIS) Data Analytics -- in Lovelace Umesh Gupta, PhD Student, NC State University 2:10 RAPIDS: The Platform Inside and Out -- in Lovelace Josh Patterson, GM of Data Science, NVIDIA
  5. Community Talks @ Converge 12:00 DevOps for Data Integration --

    in Boole Pat Patterson, Director of Evangelism, StreamSets Wednesday 2:10 Mapping the Invisible: Finding Places to Build Dwelling Units in Los Angeles -- in Lovelace Benjamin Pezzillo, CEO, Pactriglo 1:30 Flint Water Crisis: Data-Driven Solutions & Transparency -- in Boole Jared Webb, Data Scientist, University of Michigan 3:00 The Galaxy in a Dashboard: Visualizing the Milky Way using OmniSci -- in Lovelace Samantha Chappell, Data Scientist, UCLA
  6. 15 © StreamSets, Inc. All rights reserved. Issue “When inserting

    data, it is best to load data in batches rather than loading one row at a time (as you might with a streaming data source). The overhead for loading data is comparatively high for each transaction, regardless of the number of rows you insert.”
  7. 16 © StreamSets, Inc. All rights reserved. Apache Kafka Open

    source stream-processing platform High-throughput, low-latency Publish/subscribe via commit log Decouple producer from consumer
  8. 20 © StreamSets, Inc. All rights reserved. StreamSets Data Collector

    StreamSets Data Collector Edge for constrained environments StreamSets Data Collector for maximum connectivity
  9. 21 © StreamSets, Inc. All rights reserved. Solution StreamSets Data

    Collector Edge StreamSets Data Collector JDBC
  10. AIS Data Information MMSI Navigation status SOG & COG Position

    accuracy Lat-Long Timestamp Vessel name Vessel type Source Destination Big Reports vessel characteristics every six minutes by each large vessel on water, 100 million records everyday Critical To handle the speed of streaming big data. For real-time data analytics & action. For interactive visualization
  11. Spatial Relationship • Density by vessel type: Analysing trends in

    data • Time based: Studying vessel moving patterns • Heat analysis: Identifying busy areas • Planning: Knowing the situational route • Predictive analysis: Answering different pattern problems • Least-cost analysis: Optimization
  12. Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and

    Marketing Systems Delivering the Supply Chain How Data and Analytics Can Transform Logistics
  13. • Managed multiple IT departments with primary focus on Data

    Engineering and Business Intelligence for the past 20 years. • Implemented multiple large size data warehouse projects on MPP architecture and on Cloud and enabled companies for self-service analytics. • Primary focus at Tailored brands is to enable technologies so that business decisions can made faster with little dependency on IT Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and Marketing Systems
  14. Who & Why Who is Tailored Brands? Tailored Brands: Men’s

    Wearhouse, Jos A. Bank, Joseph Abboud, KNG, Moore's Clothing • Biggest Men’s specialty retailer • 1,500+ retail locations & Omni-channel • 11 Distribution Centers/Hubs • 6 National Tailoring Centers • 28 Company Owned Depots • Fleet of 225 delivery vehicles • 3,000 Personnel in Supply Chain Services WHY • Inconsistent reporting • Incomplete merchant visibility • DC visibility of inbound volumes • Focus – Speed to Customer • Inventory Control
  15. Visibility Market • Visibility/Traceability market is overcrowded • Half of

    the vendors at any supply chain conference provides visibility solutions • Yet, not a single solution will provide end to end visibility
  16. End to End Logistics Visibility PO Issued to vendor -

    ERP ASN Issued by vendor - TMS Shipment Info – TMS, UPSS Reached Yard - UPSS Yard to DC – Truck Carriers DC Yard Mgmt – TMS DC receiving– WMOS Sort to Stores– Sortation Ship to Stores – Agile Small Parcel – Local Fleet, Fedex Store Receiving – SIF
  17. Technology Stack EDI FTP Data Extracts Fedex/UPS/Cheetah API BI Layer

    Storage Layer Data Replication Tool DB Data Replication MSTR and Tableau are used for reporting and dashboarding Data science Tools to create Labor forecasting models / Delivery estimation ETL/ELT Tool Emails
  18. AI-Ready Building Blocks for Deep-Learning Based Accelerated Workflow and Data

    Curation Jacci Cenci, Senior Technical Marketing Engineer at NVIDIA
  19. THE VALUE OF AI INFRASTRUCTURE WITH DGX REFERENCE ARCHITECTURES Reference

    architectures from NVIDIA and leading storage partners SCALABLE PERFORMANCE Simplified, validated, converged infrastructure offers FASTER, SIMPLIFIED DEPLOYMENT TRUSTED EXPERTISE AND SUPPORT Available through select NPN partners as a turnkey solution DGX RA Solution Storage
  20. OBJECTIVE • DGX-1 “POD” = DGX-1 Performance Optimized Design •

    Work with Partners to understand the repeatable, GPU-intensive, and revenue-generating automotive AV use cases • Help partners develop prescriptive reference architectures that guide customers move from single node designs to large scale DGX POD deployments DGX POD RA Automotive Solutions DGX POD Config Tailored to Automotive Use Cases Data Ingest / Mining AI Training / Inference Replay Automotive Use Cases + Partner Solution + = AUTOMOTIVE HIGHER EDUCATION HEALTH CARE TELCOS FINANCIAL SERVICES Certification (Validated by RAPS:Lab) [+ ] DGX POD CUDA-X AI Applications DGX-1 POD SOLUTIONS VALUE BASED APPROACH (1)Compute + (2)Storage + (3)Network + (4)Validated Solutions
  21. DESIGNING INFRASTRUCTURE THAT SCALES Insights gained from deep learning data

    centers Rack Design Networking Storage Facilities Software • DL drives close to operational limits • Similarities to HPC best practices • IB or Ethernet based fabric • 100Gbps inter-connec t • High-bandwi dth, ultra-low latency • Datasets range from 10k’s to millions objects • TBs to PBs of storage • High IOPS, throughput, low latency • Assume higher watts per-rack • Higher FLOPS/watt = DC less floorspace required • Scale requires “cluster-awa re” software Example: • Autonomous vehicle = 1TB / hr • Training sets up to 500 PB • RN50: 113 days to train • Objective: 7 days • 6 simultaneous developers = 97 node cluster 1500+ node cluster…lessons learned
  22. Ingest Curate Label Train Replay (DNN Validation) Simulate (System Validation)

    AV Development Develop Test Integrate DNN Development OTA OTA Collect Data On Road Test Validation NVIDIA DRIVE End-to-End Platform for AV Development & Validation
  23. • OmniSciDB is the ideal SQL engine for IoT big

    data. • Specifically developed to harness the parallel processing power of GPUs, OmniSciDB is capable of unprecedented ingestion speeds and can query up to billions of rows in milliseconds. • With OmniSci and the ONTAP AI validated architecture, you get an infrastructure designed to enable reliable high performance from end to end. • The NetApp AFF A800 array is capable of feeding data to NVIDIA DGX-1 systems up to four times faster than competing solutions. Read throughput of up to 300GBps per all-flash cluster. NetApp ONTAP AI and OmniSci Solution Architecture Overview OmniSciDB
  24. For over 20 years, Alan has been an executive in

    the Energy Analysis, Trading and Risk Management space. Alan has been engaged as a business and technical specialist for the past 15 years, focusing specifically in energy market analysis, trading systems, and analytics based ERP integration. With the founding i2 Enabled, Inc. in 2017, his focus is currently on building market analysis and visualization platforms for Fortune 500 corporate analytical needs and intelligence capabilities for global government policy needs. Email: [email protected] LinkedIn: https://www.linkedin.com/in/alan-lipe-516417/ Website: https://www.i2enabled.com/
  25. Challenge I Crude Production Trends Natural Gas Production Trends Global

    Changes in Energy Production and Consumption Global Energy Consumption Needs
  26. Source/Commodity • Oil, Gas, Coal, Nuclear • Solar, Wind, Hydro,

    Geothermal • Fusion? Location • Origin • Destination • Supply Chain Time Period • History • Current • Futures Challenge III Increasing Data Volume and Dimensionality * US Energy Infrastructure – Energy Information Administration (EIA)
  27. OmniSci Advantages for Meeting Energy Analytics Challenges • On-Demand queries

    for large data-sets • Near real-time analytical processing and visualization • SQL for consolidated App Dev and Business Analyst workflows (ex. Immerse/Vega capabilities + BI Tool accessibility) • GPU Based Machine Learning Optionality
  28. Initial i2 Use-Case for OmniSci Platform US Natural Gas Meter

    Flow Data • ~ 40,000 Meters 5 times/day • ~ 200MM Rows for initial use-case • Immerse/Vega Tooling
  29. BI vs OmniSci Comparison BI Approach • Postgres DB •

    Standard BI Tooling (Tibco Spotfire, Tableau) • Query Time – 1-4 Minutes * ~ 200MM Rows of Gas Meter Data OmniSci Approach • Immerse and Vega Based UI • Basic GPU Server Instance • Query Time - Milliseconds
  30. OmniSci 101: Accelerating the Data Science Workflow Location: Boole Room

    Creating Custom Visualizations and Applications Using OmniSci Location: Lovelace Room Workshop Sessions 2:15 - 4:30 PM