Navigating the Data Landscape: From Fundamentals to the Future

NAVIGATING THE DATA LANDSCAPE From Fundamentals to the Future

CONTENTS WHAT IS DATA DATA MODELS DATA STORAGE and it's
importance and query languages and retreival of data DATA REPLICATION and data sharding

CONTENTS DATA PLATFORM DATA MESH and it's use cases and
distributed systems DATA AS A PRODUCT and future of data and data sharding DATA PIPELINE

HI EVERYONE

ABOUT ME My name is Mann and I am not
a Data Scientist Steve Mann Delhi JUG Leader Senior Software Engineer at Fynd

WHAT THIS TALK IS NOT around building data applications a
course on big data around understanding forms of data an overview of data tools

WHAT IS DATA

Data is a magical unicorn made up of ones and
zeroes, roaming through cyberspace, collecting cat videos and conspiracy theories along the way WHAT IS DATA

WHY IS DATA IMPORTANT Save Time Make Informed Decisions Personalisation
and Customer Understanding Risk Management Foster Innovation Transparency and Accountability

INTRODUCING Data. Fitness. You

BUILDING DATA INTENSIVE APPLICATIONS Reliable Scalable Maintainable

RELATIONAL DOCUMENT GRAPH TIME SERIES DATA MODELS

user_id name email phone workout_id exercise_id date duration user_id exercise_id
name difficulty_level plan_id name exercises price Workout Table User Table Exercise Table Fitness Plan Table RELATIONAL MODEL

Workout Document Fitness Plan Document User Document DOCUMENT MODEL

User 1 Workout Post 1 User 3 User 2 follows
creates User 4 comments on comments on Workout Post 2 creates comments on GRAPH MODEL

tags: workout_id=1, user_id=101, exercise_id=201, fields: date=2023-07-01, duration=45 minutes tags: workout_id=2,
user_id=102, exercise_id=202, fields: date=2023-07-02, duration=60 minutes TIME SERIES MODEL Workout Measurement tags: exercise_id=201, fields: name=Push-ups, difficulty_level=Intermediate tags: exercise_id=202, fields: name=Squats, difficulty_level=Beginner Exercise Measurement

HOW TO QUERY YOUR DATA? SELECT * FROM query_languages WHERE
type = 'SQL';

QUERY STYLES MAP REDUCE? DECLARATIVE STYLE imperative or declarative? SQL
AGGREGATION PIPELINE the javascript SQL IMPERATIVE STYLE YYDT

IMPERATIVE STYLE You Yourself Do It YYDT - Pronounced at
YUKK 🤢

DECLARATIVE STYLE

MAP REDUCE QUERYING

AGGREGATION PIPELINE

DATABASES IN-MEMORY DATA WAREHOUSES DATA LAKES OBJECT STORAGE WHERE TO
STORE DATA?

JSON XML AVRO PARQUET CSV DATA ENCODING

VIA DATABASE VIA REST (OR SOAP) VIA ASYNC MESSAGING PASSING
DATA BETWEEN SERVICES?

VIA DATABASE Simple to implement Needs to be backward compatible
Needs to be forward compatible

Old version of code (missing description) DATA OUTLIVES CODE Data
written by new code Description is lost

VIA REST Easier to develop Flexible (loose coupling) Popular (community
support) MICRO-SERVICES!

Server Database Client RESTful API over HTTP Who needs a
nap when you have REST to keep you RESTed

VIA ASYNC MESSAGING Event Driven Architecture Fault Tolerant Asynchronous Flow
MICRO-SERVICES!

Server Database Client Message Huge shoutout to the clients Broker
Message Client Message

WE CAN HANDLE DATA!

BUT...

SHARDING Split a single dataset into partitions or shards All
shards run on separate nodes Leverage horizontal scaling Increased throughput

user_id name age 1 Venkat 31 2 Josh 28 3
Ivar 30 4 Mala 23 Sharding User Table SHARDING user_id name age 1 Venkat 31 2 Josh 28 user_id name age 3 Ivar 30 4 Mala 23 Sharding

DATA REPLICATION Increased throughput Scalability Fault tolerance

User Read-Write Query Leader Data change Data Change Replication Streams
Follower Follower LEADERS AND FOLLOWERS

TIME TO DIFFERENTIATE BETWEEN OLTP AND OLAP

DATA PIPELINES Flexibility Scalability Separation of Concern

extract extract extract extract load load load load EXTRACT-TRANSFORM-LOAD (ETL)
Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse Transform Transform Transform Transform OLAP System

E & L extract and load E & L extract
and load EXTRACT-LOAD-TRANSFORM (ELT) Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse / Lake OLAP System Transform

DATA PLATFORM Application Layer Security Layer Authentication, Authorisation, Logging, Alerting
Web Apps, Microservices, Enterprise Applications Storage Layer OLTP Systems and Databases Ingestion Layer APIs, ETL, ELT, Pub Sub, Data Streams Analytics Layer OLAP Systems, Data Warehouses, Data Lakes Data Governance

DATA GOVERNANCE Security Access and Availability Quality Compliance

ISSUES WITH DISTRIBUTED SYSTEMS System Faults and Partial Failures Unreliable
Networks Unreliable System Clocks Questionable Reality

FUTURE TRENDS

load load load load extract extract extract extract REVERSE ETL
Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse / Lake Transform Transform Transform Transform OLAP System

DATA MESH Data Infrastructure as a Platform Domain 1 Domain
3 Domain 2 Domain 4

MESH OR NOT TO MESH More about process than architecture
Domain Centric Segregation Federated Governance Data as a product

DATA AS A PRODUCT DaaP is a mindset Treats data
as a valuable asset Data Ownership Introduces data interfaces

FEEL FREE TO REACH OUT

THANK YOU!

Navigating the Data Landscape: From Fundamental...

Navigating the Data Landscape: From Fundamentals to the Future

Other Decks in Programming

Featured

Transcript