Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Ecosystem to Power AI

Data Ecosystem to Power AI

Nishan Subedi

November 18, 2021
Tweet

More Decks by Nishan Subedi

Other Decks in Technology

Transcript

  1. Pattern 1: Service Oriented Architecture 1. Applications manage their own

    state 2. Easy to add more services 3. No natural consistency for Data handling mechanism 4. Difficult to control for semantic segmentation
  2. Pattern 2: 3NF databases for BI Brittle replication of Microservices

    Application state tightly Coupled with reporting
  3. Merging the 3 patterns: • Push accountability to source systems

    (applications) ◦ Inner state is published as events without breaking consistency ◦ Data / event correctness is based into the request and part of every release ◦ We get accountability of data published by source systems for free! • Handle all downstream activities as consumption and processing of events ◦ Raw data for ML features ◦ Curated data for exploration & analytics ◦ Standard stable definitions for reporting
  4. Change in architecture: • Application do not expose internal databases

    • Reporting and warehousing is based on data explicitly logged by applications • Real-time KPI dashboards off logged data • Featurization uses the same data source as reporting and warehousing
  5. Characteristics of Microservices built with Reactive Principles: 1. Autonomy: publishes

    behavior through an API 2. Event-driven: produces, consumes and reacts to events a. Events are reusable in multiple contexts b. Should include enough state to describe change c. Schemas are backwards compatible 3. State ownership a. Own their state exclusively by managing and persisting their own state
  6. Event-First Domain Driven Design Events provide a record of what

    happens to each entity and when it happened. They represent facts about the domain and are our sources of truth. Creating the domain language: https://www.oreilly.com/library/view/reactive-microsystems/9781491994368/ch 04.html
  7. Batch vs real time distinctions DataFlow & lack of distinction

    between batch & real time • Tolerance to limitations decides batch / real time use cases
  8. Outcomes: • Standard Domain model used consistently across organization •

    Better accountability structure for data • Data quality efforts have wider impact • Insights from reports directly translatable to ML models • Applications preserve state and aren’t tightly coupled • System on the whole is more robust