Slide 1

Slide 1 text

Data Ecosystem to Power AI Nishan Subedi Vice President, Algorithms & AI Overstock.com

Slide 2

Slide 2 text

Overstock.com

Slide 3

Slide 3 text

Pattern 1: Service Oriented Architecture 1. Applications manage their own state 2. Easy to add more services 3. No natural consistency for Data handling mechanism 4. Difficult to control for semantic segmentation

Slide 4

Slide 4 text

Pattern 2: 3NF databases for BI Brittle replication of Microservices Application state tightly Coupled with reporting

Slide 5

Slide 5 text

Pattern 3: Machine Learning Production https://www.tecton.ai/blog/what-is-a-feature-store/

Slide 6

Slide 6 text

Merging the 3 patterns: ● Push accountability to source systems (applications) ○ Inner state is published as events without breaking consistency ○ Data / event correctness is based into the request and part of every release ○ We get accountability of data published by source systems for free! ● Handle all downstream activities as consumption and processing of events ○ Raw data for ML features ○ Curated data for exploration & analytics ○ Standard stable definitions for reporting

Slide 7

Slide 7 text

Change in architecture: ● Application do not expose internal databases ● Reporting and warehousing is based on data explicitly logged by applications ● Real-time KPI dashboards off logged data ● Featurization uses the same data source as reporting and warehousing

Slide 8

Slide 8 text

Characteristics of Microservices built with Reactive Principles: 1. Autonomy: publishes behavior through an API 2. Event-driven: produces, consumes and reacts to events a. Events are reusable in multiple contexts b. Should include enough state to describe change c. Schemas are backwards compatible 3. State ownership a. Own their state exclusively by managing and persisting their own state

Slide 9

Slide 9 text

Event-First Domain Driven Design Events provide a record of what happens to each entity and when it happened. They represent facts about the domain and are our sources of truth. Creating the domain language: https://www.oreilly.com/library/view/reactive-microsystems/9781491994368/ch 04.html

Slide 10

Slide 10 text

[Reporting|Featurization|Alerting] as outputs of Event processing:

Slide 11

Slide 11 text

Batch vs real time distinctions DataFlow & lack of distinction between batch & real time ● Tolerance to limitations decides batch / real time use cases

Slide 12

Slide 12 text

Outcomes: ● Standard Domain model used consistently across organization ● Better accountability structure for data ● Data quality efforts have wider impact ● Insights from reports directly translatable to ML models ● Applications preserve state and aren’t tightly coupled ● System on the whole is more robust

Slide 13

Slide 13 text

Thankyou! We are hiring: overstock.com/careers