Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Navigating the Data Landscape: From Fundamental...

Navigating the Data Landscape: From Fundamentals to the Future

This session unveils the power of data and its transformative impact on organisations. Join us as we explore key topics, including data-intensive application development, reliable and scalable architectures, diverse data models, query languages, storage and retrieval strategies, replication methods, data pipelines, and future data trends.

Discover why data is crucial for driving business success and innovation. Gain insights into building reliable and scalable applications capable of handling vast amounts of data. Explore different data models such as relational, document, and graph, and learn about their use cases. Delve into query languages and storage options, including in-memory databases, data warehouses, data lakes and lake houses.

Understand data flow models via databases, networks (REST, RPC), and async message passing (Kafka, RabbitMQ). Explore replication techniques, data partitioning strategies, and challenges with transactions in distributed systems. Uncover the significance of data pipelines, batch processing, stream processing, ETL, ELT, and reverse ETL. Look at the ideas behind data platforms and see how different technologies are combined to create strong data ecosystems.

Discover how to leverage your knowledge to create a decentralised and scalable data ecosystem—a data mesh. Issues related to distributed systems, consistency, consensus and data governance will be addressed, shedding light on the challenges and considerations involved. Explore the idea of “Data as a product” with essential aspects of managing and deriving value from data assets. Finally, gain insights into emerging data trends and the future of data applications.

Join us in this session to unlock the potential of data, acquire valuable techniques, and prepare for the future of data-driven innovation.

Steve Mann

July 16, 2023
Tweet

Other Decks in Programming

Transcript

  1. CONTENTS WHAT IS DATA DATA MODELS DATA STORAGE and it's

    importance and query languages and retreival of data DATA REPLICATION and data sharding
  2. CONTENTS DATA PLATFORM DATA MESH and it's use cases and

    distributed systems DATA AS A PRODUCT and future of data and data sharding DATA PIPELINE
  3. ABOUT ME My name is Mann and I am not

    a Data Scientist Steve Mann Delhi JUG Leader Senior Software Engineer at Fynd
  4. WHAT THIS TALK IS NOT around building data applications a

    course on big data around understanding forms of data an overview of data tools
  5. Data is a magical unicorn made up of ones and

    zeroes, roaming through cyberspace, collecting cat videos and conspiracy theories along the way WHAT IS DATA
  6. WHY IS DATA IMPORTANT Save Time Make Informed Decisions Personalisation

    and Customer Understanding Risk Management Foster Innovation Transparency and Accountability
  7. user_id name email phone workout_id exercise_id date duration user_id exercise_id

    name difficulty_level plan_id name exercises price Workout Table User Table Exercise Table Fitness Plan Table RELATIONAL MODEL
  8. User 1 Workout Post 1 User 3 User 2 follows

    creates User 4 comments on comments on Workout Post 2 creates comments on GRAPH MODEL
  9. tags: workout_id=1, user_id=101, exercise_id=201, fields: date=2023-07-01, duration=45 minutes tags: workout_id=2,

    user_id=102, exercise_id=202, fields: date=2023-07-02, duration=60 minutes TIME SERIES MODEL Workout Measurement tags: exercise_id=201, fields: name=Push-ups, difficulty_level=Intermediate tags: exercise_id=202, fields: name=Squats, difficulty_level=Beginner Exercise Measurement
  10. QUERY STYLES MAP REDUCE? DECLARATIVE STYLE imperative or declarative? SQL

    AGGREGATION PIPELINE the javascript SQL IMPERATIVE STYLE YYDT
  11. Server Database Client RESTful API over HTTP Who needs a

    nap when you have REST to keep you RESTed
  12. SHARDING Split a single dataset into partitions or shards All

    shards run on separate nodes Leverage horizontal scaling Increased throughput
  13. user_id name age 1 Venkat 31 2 Josh 28 3

    Ivar 30 4 Mala 23 Sharding User Table SHARDING user_id name age 1 Venkat 31 2 Josh 28 user_id name age 3 Ivar 30 4 Mala 23 Sharding
  14. extract extract extract extract load load load load EXTRACT-TRANSFORM-LOAD (ETL)

    Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse Transform Transform Transform Transform OLAP System
  15. E & L extract and load E & L extract

    and load EXTRACT-LOAD-TRANSFORM (ELT) Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse / Lake OLAP System Transform
  16. DATA PLATFORM Application Layer Security Layer Authentication, Authorisation, Logging, Alerting

    Web Apps, Microservices, Enterprise Applications Storage Layer OLTP Systems and Databases Ingestion Layer APIs, ETL, ELT, Pub Sub, Data Streams Analytics Layer OLAP Systems, Data Warehouses, Data Lakes Data Governance
  17. ISSUES WITH DISTRIBUTED SYSTEMS System Faults and Partial Failures Unreliable

    Networks Unreliable System Clocks Questionable Reality
  18. load load load load extract extract extract extract REVERSE ETL

    Workout DB Nutrition DB User DB Wearable Sales DB OLTP Systems Data Warehouse / Lake Transform Transform Transform Transform OLAP System
  19. MESH OR NOT TO MESH More about process than architecture

    Domain Centric Segregation Federated Governance Data as a product
  20. DATA AS A PRODUCT DaaP is a mindset Treats data

    as a valuable asset Data Ownership Introduces data interfaces