Slide 1

Slide 1 text

Presentation Unleashing Enterprise Potential: The Importance of Data Collection for Unveiling Opportunities and Driving Growth

Slide 2

Slide 2 text

© 2023 Cloudera, Inc. All rights reserved. 2 AGENDA 30min 1 2 3 Cloudera: The Hybrid Data Platform Company Enabling Analytics at Scale Q&A Streaming Data Lifecycle & Machine learning for Generative AI value creation

Slide 3

Slide 3 text

3 © 2022 Cloudera, Inc. All rights reserved. CLOUDERA - THE HYBRID DATA PLATFORM COMPANY Manage and secure the data lifecycle in any cloud or datacenter SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE 01 03 04 05 STREAMING & DATA FLOW DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE MACHINE LEARNING & AI 02 Collect Enrich Report Serve Predict

Slide 4

Slide 4 text

© 2022 Cloudera, Inc. All rights reserved. 4 WE MANAGED 1 BILLION EVENTS PER SECOND

Slide 5

Slide 5 text

5 © 2022 Cloudera, Inc. All rights reserved. ENABLING ANALYTICS AND INSIGHTS ANYWHERE Driving enterprise business value REAL-TIME STREAMING ENGINE ANALYTICS & DATA WAREHOUSE DATA SCIENCE/ MACHINE LEARNING CENTRALIZED DATA PLATFORM STORAGE & PROCESSING ANALYTICS & INSIGHTS Stream Ingest Ingest – Data at Rest Deploy Models BI Solutions SQL Predictive Analytics • Model Building • Model Training • Model Scoring Actions & Alerts [SQL] Real-Time Apps STREAMING DATA SOURCES Clickstream Market data Machine logs Social ENTERPRISE DATA SOURCES CRM Customer history Research Compliance Data Risk Data Lending

Slide 6

Slide 6 text

© 2022 Cloudera, Inc. All rights reserved. 6 Enterprise Services Provisioning, Management and Monitoring Unified Security Edge-to-Enterprise Governance Single Sign-On Edge Management Edge data collection, Routing and monitoring MiNiFi Edge Flow Manager NiFi Registry Flow Management Enterprise data ingestion, transformation and enrichment Apache NiFi NiFi Registry Stream Processing Real-time stream processing at IoT scale Apache Kafka Schema Registry Streaming Analytics Predictive analytics and real-time insights Kafka Streams Apache Flink / SSB Spark Streaming Streams Messaging Manager Streams Replication Manager Streams Management

Slide 7

Slide 7 text

© 2022 Cloudera, Inc. All rights reserved. 7 Apache Kafka • Highly reliable distributed messaging system • Decouple applications, enables many-to-many patterns • Publish-Subscribe semantics • Horizontal scalable • Efficient implementation to operate at speed with big data volumes • Organized by topic to support several use cases Source System Source System Source System Kafka Hadoop Security Systems Real-Time Monitoring Source System Source System Source System Hadoop Security Systems Real-Time Monitoring Many-To-Many Publish-Subscribe Point-To-Point Request-Response

Slide 8

Slide 8 text

© 2022 Cloudera, Inc. All rights reserved. 8 KAFKA REPLICATION USE-CASES Disaster Recovery Geo-Locality Data Movement / Deployment Centralized Analytics Workload Isolation Legal / Compliance In an event of a partial or complete datacenter disaster, providing failover/failback to a secondary cluster in a different region / DC Active-active geo-localized deployments allows users to access a near-by data center to optimize their architecture for low latency and high performance. Use Kafka to synchronize data between on-prem applications and cloud deployments Aggregate data from multiple Kafka clusters into one location for organization-wide analytics Creation of different envs for SDLC: Dev, Test, Prod. Clusters for specific use case cases (ETL, ingestion, analytics, etc) Different data storage and security policies require clusters to be created in region but data still needs to be shared.

Slide 9

Slide 9 text

9 © 2022 Cloudera, Inc. All rights reserved. Example of Bounded and Unbounded join(s) Write Streaming Result to Kudu Join 2 Streaming User Event Topics Enrich Stream from Warehouse HR Table Enrich Stream from RT Mart Timesheet Table Filter & Transform

Slide 10

Slide 10 text

© 2019 Cloudera, Inc. All rights reserved. The Active Data Warehouse with Apache Kudu CDF IOT Devices Applications Metrics Logs & Files HDFS/ Object Storage Hot Storage Cold Storage SQL ○ s u p p o r t Real-Time Analytics Alerting Event Driven Applications Dashboards

Slide 11

Slide 11 text

© 2019 Cloudera, Inc. All rights reserved. LLMs at Cloudera

Slide 12

Slide 12 text

© 2022 Cloudera, Inc. All rights reserved. Data Cycle of a Generative AI Model DA Scope Adapt and Align model Application Integration Select Define the use case Choose an existing model Prompt Engineering Evaluate Integrate a model and build LLM- powered applications Optimise and deploy your model for inference Augmentatio n Fine Tuning Inspired by, Source Citation: Andrew Ng, DeepLearning.AI , Generative AI with LLMs course Data Collection and Preparation

Slide 13

Slide 13 text

13 © 2022 Cloudera, Inc. All rights reserved. HOW ARE LLMS DIFFERENT THAN ANYTHING BEFORE? Simplicity, speed and scale, over all your data! TECH SPARK / MAPREDUCE HIVE SQL / SQL LLMs PERSONAS/ SKILLS Programmers • Complex coding Analysts • Semantic queries Everyone • Natural language RESPONSE TIME Hours Seconds to minutes Milliseconds to seconds DATA PB Scale Most Data • Slower processing • Structured data • Semi-structured data TB scale with ETL High Value Data • Structured data • Rest is ETL-ed out PB scale All Data • Structured data • Semi-structured data • Unstructured data

Slide 14

Slide 14 text

© 2022 Cloudera, Inc. All rights reserved. 14 DEMO

Slide 15

Slide 15 text

© 2019 Cloudera, Inc. All rights reserved. 15 Dylienne Every - Engineer | SME ML & Cyber Security ✉ [email protected] Rein de Jong- Regional Vice President Benelux ✉ [email protected] 📞 +31 (0) 653 86 57 01 CONTACTS