Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cloudera

Marketing OGZ
September 15, 2023
96

 Cloudera

Marketing OGZ

September 15, 2023
Tweet

Transcript

  1. © 2023 Cloudera, Inc. All rights reserved. 2 AGENDA 30min

    1 2 3 Cloudera: The Hybrid Data Platform Company Enabling Analytics at Scale Q&A Streaming Data Lifecycle & Machine learning for Generative AI value creation
  2. 3 © 2022 Cloudera, Inc. All rights reserved. CLOUDERA -

    THE HYBRID DATA PLATFORM COMPANY Manage and secure the data lifecycle in any cloud or datacenter SECURITY | GOVERNANCE | CATALOG | METADATA | INTELLIGENCE 01 03 04 05 STREAMING & DATA FLOW DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE MACHINE LEARNING & AI 02 Collect Enrich Report Serve Predict
  3. 5 © 2022 Cloudera, Inc. All rights reserved. ENABLING ANALYTICS

    AND INSIGHTS ANYWHERE Driving enterprise business value REAL-TIME STREAMING ENGINE ANALYTICS & DATA WAREHOUSE DATA SCIENCE/ MACHINE LEARNING CENTRALIZED DATA PLATFORM STORAGE & PROCESSING ANALYTICS & INSIGHTS Stream Ingest Ingest – Data at Rest Deploy Models BI Solutions SQL Predictive Analytics • Model Building • Model Training • Model Scoring Actions & Alerts [SQL] Real-Time Apps STREAMING DATA SOURCES Clickstream Market data Machine logs Social ENTERPRISE DATA SOURCES CRM Customer history Research Compliance Data Risk Data Lending
  4. © 2022 Cloudera, Inc. All rights reserved. 6 Enterprise Services

    Provisioning, Management and Monitoring Unified Security Edge-to-Enterprise Governance Single Sign-On Edge Management Edge data collection, Routing and monitoring MiNiFi Edge Flow Manager NiFi Registry Flow Management Enterprise data ingestion, transformation and enrichment Apache NiFi NiFi Registry Stream Processing Real-time stream processing at IoT scale Apache Kafka Schema Registry Streaming Analytics Predictive analytics and real-time insights Kafka Streams Apache Flink / SSB Spark Streaming Streams Messaging Manager Streams Replication Manager Streams Management
  5. © 2022 Cloudera, Inc. All rights reserved. 7 Apache Kafka

    • Highly reliable distributed messaging system • Decouple applications, enables many-to-many patterns • Publish-Subscribe semantics • Horizontal scalable • Efficient implementation to operate at speed with big data volumes • Organized by topic to support several use cases Source System Source System Source System Kafka Hadoop Security Systems Real-Time Monitoring Source System Source System Source System Hadoop Security Systems Real-Time Monitoring Many-To-Many Publish-Subscribe Point-To-Point Request-Response
  6. © 2022 Cloudera, Inc. All rights reserved. 8 KAFKA REPLICATION

    USE-CASES Disaster Recovery Geo-Locality Data Movement / Deployment Centralized Analytics Workload Isolation Legal / Compliance In an event of a partial or complete datacenter disaster, providing failover/failback to a secondary cluster in a different region / DC Active-active geo-localized deployments allows users to access a near-by data center to optimize their architecture for low latency and high performance. Use Kafka to synchronize data between on-prem applications and cloud deployments Aggregate data from multiple Kafka clusters into one location for organization-wide analytics Creation of different envs for SDLC: Dev, Test, Prod. Clusters for specific use case cases (ETL, ingestion, analytics, etc) Different data storage and security policies require clusters to be created in region but data still needs to be shared.
  7. 9 © 2022 Cloudera, Inc. All rights reserved. Example of

    Bounded and Unbounded join(s) Write Streaming Result to Kudu Join 2 Streaming User Event Topics Enrich Stream from Warehouse HR Table Enrich Stream from RT Mart Timesheet Table Filter & Transform
  8. © 2019 Cloudera, Inc. All rights reserved. The Active Data

    Warehouse with Apache Kudu CDF IOT Devices Applications Metrics Logs & Files HDFS/ Object Storage Hot Storage Cold Storage SQL ◦ s u p p o r t Real-Time Analytics Alerting Event Driven Applications Dashboards
  9. © 2022 Cloudera, Inc. All rights reserved. Data Cycle of

    a Generative AI Model DA Scope Adapt and Align model Application Integration Select Define the use case Choose an existing model Prompt Engineering Evaluate Integrate a model and build LLM- powered applications Optimise and deploy your model for inference Augmentatio n Fine Tuning Inspired by, Source Citation: Andrew Ng, DeepLearning.AI , Generative AI with LLMs course Data Collection and Preparation
  10. 13 © 2022 Cloudera, Inc. All rights reserved. HOW ARE

    LLMS DIFFERENT THAN ANYTHING BEFORE? Simplicity, speed and scale, over all your data! TECH SPARK / MAPREDUCE HIVE SQL / SQL LLMs PERSONAS/ SKILLS Programmers • Complex coding Analysts • Semantic queries Everyone • Natural language RESPONSE TIME Hours Seconds to minutes Milliseconds to seconds DATA PB Scale Most Data • Slower processing • Structured data • Semi-structured data TB scale with ETL High Value Data • Structured data • Rest is ETL-ed out PB scale All Data • Structured data • Semi-structured data • Unstructured data
  11. © 2019 Cloudera, Inc. All rights reserved. 15 Dylienne Every

    - Engineer | SME ML & Cyber Security ✉ [email protected] Rein de Jong- Regional Vice President Benelux ✉ [email protected] 📞 +31 (0) 653 86 57 01 CONTACTS