OmniSci Converge Community Day

Welcome to Community Day

Aaron Williams VP, Global Community @_arw_ #OmniSciConverge [email protected] /in/aaronwilliams/ /williamsaaron
speakerdeck.com/omnisci

So we begin ...

Community Day Schedule 1:00 Keynote and Lightning Talks NVIDIA, StreamSets,
i2 Enabled, Tailored Brands, NC State 2:00 Workshops 1) OmniSci 101: Accelerating the Data Science Workflow (Boole) 2) Creating Custom Viz and Apps Using OmniSci (Lovelace) 4:40 Leaders Panel An Exclusive Peek into the Brains of OmniSci 5:15 Shuttles to the Welcome Reception Meet outside to return to the Hyatt

Why Are We All Here?

Volume Spatiotemporal Agility V A ST

OmniSci for Good • Our goal is to accelerate the
work good people are doing, using data, to make a difference in our communities • Program includes a free OmniSci license for non-profits and researchers • To propose a project, or get more info, email: [email protected] • Check out the Flint Water team’s talk on Wednesday @ 1:30 pm in Boole

Converge Preview

• Crowd-sourced mobile phone data from Tutela • Free OmniSci
Cloud instances courtesy of AWS and OmniSci • 3 interesting data challenges https://community.omnisci.com/converge-challenge-19 • Experts from OmniSci and AWS are ready to help • Look for the Tech Team table in the exhibit hall, and people with this >> pin on their lanyard Converge Data Challenges OmniSci Tech Team Uyanga Sean Dennis Israel Mike

Community Talks @ Converge 1:00 Manipulating Space and Time Without
Infinity Stones: Intro to Immerse -- in Lovelace Dr. Michael Flaxman, Founder, Geodesign Technologies Tuesday 3:00 Real-time Automatic Identification System (AIS) Data Analytics -- in Lovelace Umesh Gupta, PhD Student, NC State University 2:10 RAPIDS: The Platform Inside and Out -- in Lovelace Josh Patterson, GM of Data Science, NVIDIA

Community Talks @ Converge 12:00 DevOps for Data Integration --
in Boole Pat Patterson, Director of Evangelism, StreamSets Wednesday 2:10 Mapping the Invisible: Finding Places to Build Dwelling Units in Los Angeles -- in Lovelace Benjamin Pezzillo, CEO, Pactriglo 1:30 Flint Water Crisis: Data-Driven Solutions & Transparency -- in Boole Jared Webb, Data Scientist, University of Michigan 3:00 The Galaxy in a Dashboard: Visualizing the Milky Way using OmniSci -- in Lovelace Samantha Chappell, Data Scientist, UCLA

Pat Patterson Director of Evangelism [email protected] @metadaddy IoT Data Integration
with StreamSets for Analytics in OmniSci

15 © StreamSets, Inc. All rights reserved. Issue “When inserting
data, it is best to load data in batches rather than loading one row at a time (as you might with a streaming data source). The overhead for loading data is comparatively high for each transaction, regardless of the number of rows you insert.”

16 © StreamSets, Inc. All rights reserved. Apache Kafka Open
source stream-processing platform High-throughput, low-latency Publish/subscribe via commit log Decouple producer from consumer

18 © StreamSets, Inc. All rights reserved. The StreamSets DataOps
Platform Data Lake

19 © StreamSets, Inc. All rights reserved. A Swiss Army
Knife for Data

20 © StreamSets, Inc. All rights reserved. StreamSets Data Collector
StreamSets Data Collector Edge for constrained environments StreamSets Data Collector for maximum connectivity

21 © StreamSets, Inc. All rights reserved. Solution StreamSets Data
Collector Edge StreamSets Data Collector JDBC

thank you Pat Patterson Director of Evangelism [email protected] @metadaddy

K. Umesh PhD Student Center for Geospatial Analytics NC State
University Vessel AIS Data Analytics

AIS Data Information MMSI Navigation status SOG & COG Position
accuracy Lat-Long Timestamp Vessel name Vessel type Source Destination Big Reports vessel characteristics every six minutes by each large vessel on water, 100 million records everyday Critical To handle the speed of streaming big data. For real-time data analytics & action. For interactive visualization

Why Now

Research Questions with AIS Data

Approach

Spatial Relationship • Density by vessel type: Analysing trends in
data • Time based: Studying vessel moving patterns • Heat analysis: Identifying busy areas • Planning: Knowing the situational route • Predictive analysis: Answering different pattern problems • Least-cost analysis: Optimization

Future with HP Visual Spatial Analysis

Short Demo

thank you

Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and
Marketing Systems Delivering the Supply Chain How Data and Analytics Can Transform Logistics

• Managed multiple IT departments with primary focus on Data
Engineering and Business Intelligence for the past 20 years. • Implemented multiple large size data warehouse projects on MPP architecture and on Cloud and enabled companies for self-service analytics. • Primary focus at Tailored brands is to enable technologies so that business decisions can made faster with little dependency on IT Madhav Sadhu, Senior IT Director, Data Engineering, Supply Chain and Marketing Systems

Who & Why Who is Tailored Brands? Tailored Brands: Men’s
Wearhouse, Jos A. Bank, Joseph Abboud, KNG, Moore's Clothing • Biggest Men’s specialty retailer • 1,500+ retail locations & Omni-channel • 11 Distribution Centers/Hubs • 6 National Tailoring Centers • 28 Company Owned Depots • Fleet of 225 delivery vehicles • 3,000 Personnel in Supply Chain Services WHY • Inconsistent reporting • Incomplete merchant visibility • DC visibility of inbound volumes • Focus – Speed to Customer • Inventory Control

Visibility Market • Visibility/Traceability market is overcrowded • Half of
the vendors at any supply chain conference provides visibility solutions • Yet, not a single solution will provide end to end visibility

End to End Logistics Visibility PO Issued to vendor -
ERP ASN Issued by vendor - TMS Shipment Info – TMS, UPSS Reached Yard - UPSS Yard to DC – Truck Carriers DC Yard Mgmt – TMS DC receiving– WMOS Sort to Stores– Sortation Ship to Stores – Agile Small Parcel – Local Fleet, Fedex Store Receiving – SIF

Technology Stack EDI FTP Data Extracts Fedex/UPS/Cheetah API BI Layer
Storage Layer Data Replication Tool DB Data Replication MSTR and Tableau are used for reporting and dashboarding Data science Tools to create Labor forecasting models / Delivery estimation ETL/ELT Tool Emails

thank you

AI-Ready Building Blocks for Deep-Learning Based Accelerated Workflow and Data
Curation Jacci Cenci, Senior Technical Marketing Engineer at NVIDIA

DRIVE INFRASTRUCTURE AT NVIDIA Developing AI for AV at Massive
Scale

THE VALUE OF AI INFRASTRUCTURE WITH DGX REFERENCE ARCHITECTURES Reference
architectures from NVIDIA and leading storage partners SCALABLE PERFORMANCE Simplified, validated, converged infrastructure offers FASTER, SIMPLIFIED DEPLOYMENT TRUSTED EXPERTISE AND SUPPORT Available through select NPN partners as a turnkey solution DGX RA Solution Storage

OBJECTIVE • DGX-1 “POD” = DGX-1 Performance Optimized Design •
Work with Partners to understand the repeatable, GPU-intensive, and revenue-generating automotive AV use cases • Help partners develop prescriptive reference architectures that guide customers move from single node designs to large scale DGX POD deployments DGX POD RA Automotive Solutions DGX POD Config Tailored to Automotive Use Cases Data Ingest / Mining AI Training / Inference Replay Automotive Use Cases + Partner Solution + = AUTOMOTIVE HIGHER EDUCATION HEALTH CARE TELCOS FINANCIAL SERVICES Certification (Validated by RAPS:Lab) [+ ] DGX POD CUDA-X AI Applications DGX-1 POD SOLUTIONS VALUE BASED APPROACH (1)Compute + (2)Storage + (3)Network + (4)Validated Solutions

USE CASES

DESIGNING INFRASTRUCTURE THAT SCALES Insights gained from deep learning data
centers Rack Design Networking Storage Facilities Software • DL drives close to operational limits • Similarities to HPC best practices • IB or Ethernet based fabric • 100Gbps inter-connec t • High-bandwi dth, ultra-low latency • Datasets range from 10k’s to millions objects • TBs to PBs of storage • High IOPS, throughput, low latency • Assume higher watts per-rack • Higher FLOPS/watt = DC less floorspace required • Scale requires “cluster-awa re” software Example: • Autonomous vehicle = 1TB / hr • Training sets up to 500 PB • RN50: 113 days to train • Objective: 7 days • 6 simultaneous developers = 97 node cluster 1500+ node cluster…lessons learned

Ingest Curate Label Train Replay (DNN Validation) Simulate (System Validation)
AV Development Develop Test Integrate DNN Development OTA OTA Collect Data On Road Test Validation NVIDIA DRIVE End-to-End Platform for AV Development & Validation

• OmniSciDB is the ideal SQL engine for IoT big
data. • Specifically developed to harness the parallel processing power of GPUs, OmniSciDB is capable of unprecedented ingestion speeds and can query up to billions of rows in milliseconds. • With OmniSci and the ONTAP AI validated architecture, you get an infrastructure designed to enable reliable high performance from end to end. • The NetApp AFF A800 array is capable of feeding data to NVIDIA DGX-1 systems up to four times faster than competing solutions. Read throughput of up to 300GBps per all-flash cluster. NetApp ONTAP AI and OmniSci Solution Architecture Overview OmniSciDB

thank you

Analyzing Energy Market Data Challenges and Opportunities Alan Lipe, Principal
at i2 Enabled, Inc.

For over 20 years, Alan has been an executive in
the Energy Analysis, Trading and Risk Management space. Alan has been engaged as a business and technical specialist for the past 15 years, focusing specifically in energy market analysis, trading systems, and analytics based ERP integration. With the founding i2 Enabled, Inc. in 2017, his focus is currently on building market analysis and visualization platforms for Fortune 500 corporate analytical needs and intelligence capabilities for global government policy needs. Email: [email protected] LinkedIn: https://www.linkedin.com/in/alan-lipe-516417/ Website: https://www.i2enabled.com/

Challenge I Crude Production Trends Natural Gas Production Trends Global
Changes in Energy Production and Consumption Global Energy Consumption Needs

Challenge II Changes in Power Generation Mix Example: UK Power
Generation Stack

Source/Commodity • Oil, Gas, Coal, Nuclear • Solar, Wind, Hydro,
Geothermal • Fusion? Location • Origin • Destination • Supply Chain Time Period • History • Current • Futures Challenge III Increasing Data Volume and Dimensionality * US Energy Infrastructure – Energy Information Administration (EIA)

OmniSci Advantages for Meeting Energy Analytics Challenges • On-Demand queries
for large data-sets • Near real-time analytical processing and visualization • SQL for consolidated App Dev and Business Analyst workflows (ex. Immerse/Vega capabilities + BI Tool accessibility) • GPU Based Machine Learning Optionality

Initial i2 Use-Case for OmniSci Platform US Natural Gas Meter
Flow Data • ~ 40,000 Meters 5 times/day • ~ 200MM Rows for initial use-case • Immerse/Vega Tooling

BI vs OmniSci Comparison BI Approach • Postgres DB •
Standard BI Tooling (Tibco Spotfire, Tableau) • Query Time – 1-4 Minutes * ~ 200MM Rows of Gas Meter Data OmniSci Approach • Immerse and Vega Based UI • Basic GPU Server Instance • Query Time - Milliseconds

thank you

OmniSci 101: Accelerating the Data Science Workflow Location: Boole Room
Creating Custom Visualizations and Applications Using OmniSci Location: Lovelace Room Workshop Sessions 2:15 - 4:30 PM

OmniSci Converge Community Day

OmniSci Converge Community Day

More Decks by OmniSci

Other Decks in Technology

Featured

Transcript