Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OML usage highlight: Live Demo of Oracle Stream Analytics with OML AutoML UI and OML Services plus Data Mesh

OML usage highlight: Live Demo of Oracle Stream Analytics with OML AutoML UI and OML Services plus Data Mesh

On this weekly Office Hours for Oracle Machine Learning on Autonomous Database, Hadi Javaherian, Senior AppDev and Integration Platform Specialist explained all the benefits of the integration between Oracle Stream Analytics and Oracle Machine Learning Services. He shared how this is directly related to the concept of Data Mesh at a high level, and also showed how easy it is for a user to create models using OML AutoML UI and deploy them in seconds to OML Services, which then are made available immediately to Oracle Stream Analytics for real time scoring.

The Oracle Machine Learning product family supports data scientists, analysts, developers, and IT to achieve data science project goals faster while taking full advantage of the Oracle platform.

The Oracle Machine Learning Notebooks offers an easy-to-use, interactive, multi-user, collaborative interface based on Apache Zeppelin notebook technology, and support SQL, PL/SQL, Python and Markdown interpreters. It is available on all Autonomous Database versions and Tiers, including the always-free editions.

OML includes AutoML, which provides automated machine learning algorithm features for algorithm selection, feature selection and model tuning, in addition to a specialized AutoML UI exclusive to the Autonomous Database.

OML Services is also included in Autonomous Database, where you can deploy and manage native in-database OML models as well as ONNX ML models (for classification and regression) built using third-party engines, and can also invoke cognitive text analytics.

Marcos Arancibia

September 21, 2021

More Decks by Marcos Arancibia

Other Decks in Technology


  1. OML usage highlight: Live Demo of Oracle Stream Analytics with

    OML AutoML UI and OML Services OML AskTOM Office Hours Hadi Javaherian Solution Engineer, App Dev & Integration Supported by Marcos Arancibia, Mark Hornick and Sherry LaMonica Product Management, Oracle Machine Learning Move the Algorithms; Not the Data! Copyright © 2021, Oracle and/or its affiliates. This Session will be Recorded
  2. Upcoming Sessions Live Demo of Oracle Stream Analytics with OML

    AutoML UI and OML Services Q&A Topics for today Copyright © 2021, Oracle and/or its affiliates 2
  3. November 9 2021 08:00 AM Pacific OML Usage Highlight: Leveraging

    OML algorithms in Retail Science platform November 2 2021 08:00 AM Pacific Weekly Office Hours: OML on Autonomous Database - Ask & Learn October 12 2021 08:00 AM Pacific OML feature highlight: Time Series analysis with Oracle Machine Learning October 5 2021 08:00 AM Pacific OML4Py: Using third-party Python packages from Python, SQL and REST September 28 2021 08:00 AM Pacific Weekly Office Hours: OML on Autonomous Database - Ask & Learn (OML4Py updated Hands-on-Lab) Upcoming Sessions Copyright © 2021, Oracle and/or its affiliates 3
  4. GGSA and AutoML UI Low Code and No Code Data

    Mesh in Practice Hadi Javaherian Solution Engineer, App Dev & Integration Copyright © 2021 Oracle and/or its affiliates. September 2021
  5. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe Harbor Copyright © 2021 Oracle and/or its affiliates.
  6. Agenda Copyright © 2021 Oracle and/or its affiliates. Data Product

    Data Mesh & Principles GoldenGate Stream Analytics Stream Analytics – AutoML UI Integration Demo Q/A 1 2 3 4 5 6
  7. Data Fabric | Stream Processing | Data Mesh Copyright ©

    2021, Oracle and/or its affiliates 7 Oracle is a Leader in this Space
  8. Copyright © 2021, Oracle and/or its affiliates 9 Business Needs

    User Needs Data Attributes Data Product Value-focused, Data Product Thinking Trusted, Polyglot Streams stream processing events Apps / IoT trans- actions any pipes ACID trusted data Attributes of a Trusted Data Mesh Decentralized, Multi-Cloud Mesh Upgrade legacy enterprise data architecture, monolithic integration tools and outmoded batch processes Enterprise Data Ledgers events trans- actions Apps / IoT ACID
  9. - Enterprise Data Mesh: Solutions, Use Cases and Case Studies

    10 Page Data Mesh Attribute DATA PRODUCTS Products of any kind, from raw commodities to items at your local store are produced as assets of value, intended to be consumed and with a specific ‘job to be done.’ Data products can take a variety of forms, depending on the business domain or problem to be solved, and may include: • Analytics – historic/real-time reports & dashboards • Data Sets – data collections in different shapes/formats • Models – domain objects, data models, ML features • Algorithms – ML models, scoring, business rules • Data Services & APIs – docs, payloads, topics, REST APIs… A data product is created for consumers, requiring tracking of additional attributes such as: • Stakeholder Map – who creates and consumes this product? • Packaging, Documentation – how is it consumed? • Purpose & Value – implicit/explicit value? depreciation? • Quality, Consistency – KPIs and SLAs of usage? • Provenance, Lifecycle & Governance – trust & explainability? Data Products Data Assets Business Data Digital Noise higher value & more controls
  10. A Decentralized Mesh Copyright © 2021, Oracle and/or its affiliates

    11 Edge Gateways Edge Multi-Cloud Enterprise Applications Stream Analytics Data Mesh Self-Service GUI filter λ dist. ingest load dist. ingest ingest capture dist. capture dist. capture capture replicat join load capture dist. capture Exadata Cloud@Customer SaaS Data Lake IaaS Analytics
  11. Data Mesh Use Case STREAMING DATA PIPELINES Data / Event

    Services SQL Access Notebooks /ML Marts Data Warehouse Data Visualization Raw Data Zone Curated Data Prepared Data Master Data DATA MESH Once ingested into the analytic data stores, there is usually a need for ‘data pipelines’ to prepare and transform the data across different data stages or data zones. This is a process of data refinement often needed for the downstream analytic data products. A Data Mesh can provide an independently governed data pipeline layer that works with the analytic data stores, providing the following core services: • Self-service data discovery and data preparation • Governance of data resources across domains • Data transformation into required data product formats • Eg; streaming ETL • Data verification, by policy, to assure consistency These data pipelines should be capable to work across different physical data stores (such as marts, warehouses, lakes etc) or as a “pushdown data stream” within analytic data platforms that support streaming data, such as Apache Spark and other data lake house technologies. Figure 1: a data mesh can create, execute and govern streaming pipelines within a Data Lake - Enterprise Data Mesh: Solutions, Use Cases and Case Studies 12 Page
  12. GoldenGate Stream Analytics Ingest Events Select Processing Patterns Build Event

    Pipelines Serve Data Downstream 100+ supported sources from OCI- GoldenGate, OCI-Streaming, Oracle Integration Cloud and Oracle IoT Cloud Rich set of pre-built patterns will dramatically improve developer efficiency and time-to-value Easily leverage geo-fencing, machine-learning, and other reference data within the stream Data can be delivered out to Kafka, databases, or easily staged for external ETL jobs OCI Streaming Copyright © 2021 Oracle and/or its affiliates. Ø OCI Object storage Ø OCI Streaming Service Ø Oracle ATP Ø Oracle Autonomous DW Ø Oracle Database
  13. Tangible, Trusted Data Products: Copyright © 2021, Oracle and/or its

    affiliates 14 DB or App Events json avro AutoML Curated Change Streams No Code Data Pipelines Production ML Scoring / Predictions Real-time Dashboards / Alerting Time Series & Spatial Analytics Streaming Data Services
  14. AutoML UI Experiment 15 OML Simplified • Identify in-DB algorithm

    with better Model Quality • Find best algorithm faster than exhaustive search • Identify right sample size for training data • Adjust sample for unbalanced data • De-noise data • Reduce features by identifying most predictive • Improve accuracy and performance • Improves model accuracy • Automated tuning of hyperparameters • Avoid manual or exhaustive search techniques Auto Algorithm Selection Adaptive Sampling Auto Feature Selection Auto Model Tuning DB Table ML Model Copyright © 2021 Oracle and/or its affiliates. Pipeline
  15. Real-time Data Feed Data Pre-Processing Oracle Machine Learning Service Post

    ML Scoring Processing Stages AutoML UI (Create Experiment) Stream Stream Stream OML Services Stream Processed Data Stream to Target GGSA and AutoML UI Integration Generated ML Model Database Deploy ML Model Initial Training Dataset Enriched Training Dataset Stream Send to Autonomous Target Scoring Pipeline Training Pipeline
  16. Data Mesh with Stream Analytics and AutoML UI Copyright ©

    2021, Oracle and/or its affiliates 17 Edge Object Storage Data Mesh Self-Service GUI Upload ingest load dist. ingest capture dist. capture replicat join load Data Lake Analytics Maintenance AutoML UI Enriched Data Scoring Pipeline Maintenance Record Machine Details M aintenance Record Ø Apache Druid Ø Apache HDFS Ø Apache Hive Ø Apache Ignite Ø Apache Kafka Ø AWS S3 Ø Azure Datal Lake Gen2 Ø Block Storage Ø Confluent Kafka Ø Elasticsearch Ø MongoDB Ø OCI Object storage Ø OCI Streaming Service Ø Oracle ATP Ø Oracle Autonomous DW Ø Oracle Database Data Consumer Training Pipeline Events M aintenance Data Send M aintenance Today Data Ledger Training Pipeline
  17. - Enterprise Data Mesh: Solutions, Use Cases and Case Studies

    21 Page Case Study SAILGP – STREAMING ANALYTICS People, Process and Methods: Data Product Focus P Technical Architecture Attributes: Distributed Architecture P Event Driven Ledgers P ACID Support ingest to DW Stream Oriented P Analytic Data Focus P Operational Data Focus P Physical & Logical Mesh physical data GoldenGate Use Case stream analytics, real-time event correlation, ETL, analysis and ingest to DW SailGP runs one of the most exciting race venues in the world, with high tech and high-speed sail boats. Live race data and analytics are provided within milliseconds using data mesh tech.27 Distributed edge technology links race boat, support boat and race helicopter data into streaming pipelines. Telemetry data is streamed into nearby clouds for real-time ETL, analytics and ingest to cloud data warehouse. Data mesh tech uses GoldenGate and Kafka (Oracle Streaming). Stream analytics are used in real- time on race day to assist with support crews and broadcast networks.
  18. Data Mesh Data Integration Meta-Catalog Microservices Messaging Data Lake Distributed

    DW People, Process and Methods: Data Product Focus ˜ ˜ ˜   œ œ Technical Architecture Attributes: Distributed Architecture ˜  œ ˜ ˜  œ Event Driven Ledgers ˜ ™  ˜ ˜   ACID Support ˜ ˜ ™ ™ œ  ˜ Stream Oriented ˜  ™ ™    Analytic Data Focus ˜ ˜ ˜ ™ ™ ˜ ˜ Operational Data Focus ˜  ˜ ˜ ˜ ™ ™ Physical & Logical Mesh ˜ ˜ ™  œ œ  - Enterprise Data Mesh: Solutions, Use Cases and Case Studies 22 Page Data Mesh CASE STUDY CRITERIA There is no single ‘perfect’ example of a Data Mesh. Other software development and data architecture patterns, or technology categories exist and there remains substantial overlap among the most common concepts like Data Fabrics, Microservices Service Mesh, and Data Lake Houses. For this document, we are considering Data Mesh as a type of Data Fabric. Case Studies should have ‘significant solution focus’ using technology with the following attributes: • Data Products – driving cultural and process changes that affect cross-organizational data domains, and institutionalize strong management practices around data assets • Distributed Architecture – decentralized, microservices-based software architecture patterns • Event Driven Ledgers – durable running log of events to drive cross-domain integrations • ACID Support – for polyglot streams, empowering correct and trusted data transactions • Stream-Oriented – data processing on ‘data in motion’ to drive solution outcomes • Analytic Data Focus – data pipelines or data products in the analytics domain (eg; OLAP) • Operational Data Focus – solution focus on operational data outcomes (eg; OLTP) • Physical & Logical Mesh – data is both physically and logically ‘meshed’ together
  19. Transaction Outbox for the Whole Enterprise Copyright © 2021 Oracle

    and/or its affiliates. DB2/z Replication of Real-time Ledger • Transactions • Events GoldenGate Stream Analytics ETL &ML Object Storage Relational Non- Relational Apps DBMS Cloud Big Data NoSQL Streams Data Products Source of Truth
  20. Trusted Data Mesh: Outcomes & Benefits Copyright © 2021, Oracle

    and/or its affiliates Agile, DataOps and CI/CD for Data • Mesh comes to data services Distributed (Big) Data Lake • Fast data from anywhere Polyglot Data Streams • Trusted, consistent data • All payloads and serializations Uninterrupted Continuity • >99.999% up-time SLAs Multi-Cloud Data Liquidity • Data capital is fast data LoB Powered Data Products • Modern self-service data fabric Edge, Location-based Data Services • Correlate IRL device events Predictive Analytics • Data monetization, new ‘data services’ for sale Benefits Data-Driven Business Transformation IT Modernization Reduce Costs for Data Operations Faster Business Innovation Cycles
  21. What Next? Copyright © 2021, Oracle and/or its affiliates Ask

    Oracle for Tech Paper or a demo! Oracle #1 in Data Fabric Strategy GoldenGate YouTube | Data Mesh: Free Trial of GoldenGate Streaming: https://www.youtube.com/playlist?list=PL bqmhpwYrlZJ-583p3KQGDAd6038i1ywe https://cloudmarketplace.oracle.com/marke tplace/en_US/listing/70961838 https://blogs.oracle.com/dataintegration/oracl e_forresterwave_datafabric_2020?xd_co_f=66 bcf41f-e285-4ccc-a5b5-1c790cab0db0 Customer Success 25