Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Systems - Database Internals

Avatar for tiemma tiemma
April 25, 2025

Distributed Systems - Database Internals

Based on the book "Database Internals", this covers the distributed systems part of the talk.

Avatar for tiemma

tiemma

April 25, 2025
Tweet

More Decks by tiemma

Other Decks in Programming

Transcript

  1. Who am I? Emmanuel Bakare but people call me “Bakman”

    Senior DevOps Engineer at Twilio Fair amount of distributed systems experience (Replex, AWS)
  2. We will be discussing Distributed Systems Starting off easy: •

    Concurrency and parallelism • Shared state ◦ Inter Process Communication (IPC) ◦ Distributed Memory ▪ RDMA • Networking (reliability) ◦ Partitions and failures • Cascading failures ◦ Retry storms ◦ Chained workflows ▪ Sagas Rounding off hard: • Timing (realtime, logical and monotonic) • Synchronisation • Acknowledgements ◦ Message Ordering ◦ Delivery mechanisms • Processing ◦ Asynchronous ◦ Event Driven • Consensus ◦ CAP ◦ FLP Impossibility ◦ Failure Modes
  3. Notes ⛕ - Means there is a small detour to

    cover additional material The material covered here will be relatively simplified, the technical summary of the concepts will still be delivered. A lot of these concepts are theoretical and practical in a mix. I hope you have fun going through this.
  4. What is a distributed system? As found in the Part

    II chapter titled “Distributed Systems” of Database Internals “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable” - Leslie Lamport
  5. Breakdown of how distributed systems help teams Star Networks -

    Single - Point - Of - Failure Client Master 💀
  6. Breakdown of how distributed systems help teams Mesh Networks -

    Multiple - Point - Of - Redundancy Client Master 😎
  7. Breakdown of how distributed systems help teams Mesh Networks -

    Multiple - Point - Of - Redundancy Client Master 😎
  8. Failures happen with every system Distributed systems help with stability

    by finding smarter ways to have “redundant” paths improving the ease of separating individual systems whilst keeping it working.
  9. Distributed systems are made of nodes Nodes Nodes are the

    individual components in the distributed system. Multiple nodes communicate with each other in the distributed system.
  10. Distributed systems are made of nodes Nodes Nodes are the

    individual components in the distributed system. Multiple nodes communicate with each other in the distributed system. These nodes may be called replicas also, the terminology varies
  11. Again, Failures happen with every system Distributed systems help with

    stability by finding smarter ways to have “redundant” paths improving the ease of separating individual systems whilst keeping it working.
  12. Again, Failures happen with every system Distributed systems help with

    stability by finding smarter ways to have “redundant” paths improving the ease of separating individual systems whilst keeping it working. Despite helping improve stability, they can also increase complexity. Use these concepts sparingly.
  13. We will be discussing Distributed Systems This is not related

    to distributed algorithms, this is an overview of the more infrastructure and theoretical related parts of it.
  14. We will be discussing Distributed Systems This is not related

    to distributed algorithms, this is an overview of the more infrastructure and theoretical related parts of it. Algorithms handle the interactions between these systems in the more lower level, this is very high level, we will not be doing any coding.
  15. We will be discussing Distributed Systems This is not related

    to distributed algorithms, this is an overview of the more infrastructure and theoretical related parts of it. Algorithms handle the interactions between these systems in the more lower level, this is very high level, we will not be doing any coding. This discussion is based off the Database Internals book by Alex Petrov.
  16. Concurrency and Parallelism These are both ways to break down

    tasks so they can run faster. It is simply “division of labour” across various workers.
  17. Concurrency and Parallelism These are both ways to break down

    tasks so they can run faster. It is simply “division of labour” across various workers. The more workers you have does not always imply faster performance And breaking a task into bits does not always make it more efficient
  18. Concurrency and Parallelism Concurrency is the ability to run one

    or more independent tasks independently. Concurrency can mean taking a process A and creating a chain of processes to become: B -> C -> D -> A Where B, C and D are running independently to achieve A. Parallelism is the ability to run the same thing at the same time multiple times. Parallelism means taking a process A and making it run in X times where each is distinctly independent but performing the same operation. X -> A X -> A X -> A
  19. Concurrency and Parallelism Concurrency will usually involve context switching across

    multiple tasks and is not limited by hardware Concurrency can be context switched across hardware so there is not a “hard” reliance on the number of cores assigned. Parallelism implies each process is running on a single dedicated resource and will usually just stay active from start to finish Parallelism requires actual physical hardware to run so if you have X cores, you can reasonably run X versions of a task before it becomes slower.
  20. Intermittent break: Hyperthreads are not cores Hyperthreading is a hack

    in the physical setup of a core to use a high speed register to emulate context switching with such performance that it can be likened to an actual physical core.
  21. Intermittent break: Hyperthreads are not cores Hyperthreading is a hack

    in the physical setup of a core to use a high speed register to emulate context switching with such performance that it can be likened to an actual physical core. Hyperthreaded v Real Core Thread 1 Thread 2 (Register) Core Thread 1
  22. Intermittent break: Hyperthreads are not cores This means if you

    schedule two of the same tasks on a single core, it will really be fighting for resources on the same core. This works for most things because cache access and other resources on a core are efficient so it is appears seamless.
  23. Intermittent break: Hyperthreads are not cores This means if you

    schedule two of the same tasks on a single core, it will really be fighting for resources on the same core. This works for most things because cache access and other resources on a core are efficient so it is appears seamless. The operating system will usually just take threads as cores to simplify scheduling tasks onto them, but there is always a performance hit for such emulation.
  24. Intermittent break: Hyperthreads are not cores This means if you

    schedule two of the same tasks on a single core, it will really be fighting for resources on the same core. This works for most things because cache access and other resources on a core are efficient so it is appears seamless. The operating system will usually just take threads as cores to simplify scheduling tasks onto them, but there is always a performance hit for such emulation. If you have latency specific or contention driven processes like databases that need performance, use physical cores against hyperthreaded ones.
  25. Concurrency v parallelism When considering parallel tasks, you can have

    concurrency in parallelism but you cannot have parallelism in concurrent tasks.
  26. Concurrency v parallelism When considering parallel tasks, you can have

    concurrency in parallelism but you cannot have parallelism in concurrent tasks. The reason is parallelism implies that the same process is executed at the same time in independent blobs.
  27. Concurrency v parallelism When considering parallel tasks, you can have

    concurrency in parallelism but you cannot have parallelism in concurrent tasks. The reason is parallelism implies that the same process is executed at the same time in independent blobs. Concurrency implies composition of independent single tasks working together so the timelines overlap.
  28. Concurrency v parallelism Basically Concurrent tasks can be run in

    parallel BUT Parallel tasks cannot be run concurrently
  29. Concurrency is not parallelism I recommend watching this video from

    Rob Pike at a later time for more details: https://go.dev/blog/waza-talk
  30. Shared State There are primarily two methods for sharing state

    in distributed systems You either have • Message passing (queues, signals)
  31. Shared State There are primarily two methods for sharing state

    in distributed systems You either have • Message passing (queues, signals) • Shared memory (databases, heaps)
  32. Shared State There are primarily two methods for sharing state

    in distributed systems You either have • Message passing (queues, signals) • Shared memory (databases, heaps) Unlike message passing, shared memory requires synchronization to allow for safe(serializable) updates. The concept of serializable consistency will be discussed later on.
  33. Shared State - Message Passing Message passing is more about

    using pipes. Yeah, same concept as a | b PUBLISHER PIPE [/dev/mqueue] SUBSCRIBER
  34. Shared State - Message Passing When you perform message passing,

    you ephemerally take some data from point A and pass it to point B through a pipe. That pipe is usually called a queue but you can do so through io redirection etc.
  35. Shared State - Message Passing When you perform message passing,

    you ephemerally take some data from point A and pass it to point B through a pipe. That pipe is usually called a queue but you can do so through io redirection etc. In Linux, this can be emulated with the /dev/mqueue filesystem which allows kernel support for message passing primitives. See here: https://man7.org/linux/man-pages/man7/mq_overview.7.html
  36. Shared State - Message Passing When you perform message passing,

    you ephemerally take some data from point A and pass it to point B through a pipe. That pipe is usually called a queue but you can do so through io redirection etc. In Linux, this can be emulated with code but most queues will use the /dev/mqueue filesystem which allows kernel support for message passing primitives. See here: https://man7.org/linux/man-pages/man7/mq_overview.7.html This can also be done with code using your basic queue data structure from the user space.
  37. Shared State - Shared Memory Shared Memory is what we

    understand as dictionaries, tuples etc. Things that persist data. Shared memory stores are basically databases, filesystems and more trivially files on filesystems. In Linux, even a file can be a filesystem. SHM Writer Reader Writer
  38. Shared State - Shared Memory When you use shared memory,

    you are guaranteed that within the allocated block of memory, data persisted can be retrieved multiple times.
  39. Shared State - Shared Memory When you use shared memory,

    you are guaranteed that within the allocated block of memory, data persisted can be retrieved multiple times. In Linux, the /dev/shm filesystem allows you to perform allocations with kernel support for shared memory stores. See https://man7.org/linux/man-pages/man7/shm_overview.7.html
  40. Shared State - Shared Memory When you use shared memory,

    you are guaranteed that within the allocated block of memory, data persisted can be retrieved multiple times. In Linux, the /dev/shm filesystem allows you to perform allocations for shared memory stores. See https://man7.org/linux/man-pages/man7/shm_overview.7.html Memory allocators, arenas and various other primitives allow for allocation of shared memory blocks. Just like the message passing note, this can be implemented without this filesystem.
  41. Intermittent break: Distributed Memory In addition to shared memory, you

    can also have distributed memory which in addition to data locality, moves memory information to a store outside the systems where it is running.
  42. Intermittent break: Distributed Memory In addition to shared memory, you

    can also have distributed memory which in addition to data locality, moves memory information to a store outside the systems where it is running. The interconnect to allow data transfer is usually over some network (wired or wireless) and pages can be swapped between independent systems as required.
  43. Intermittent break: Distributed Memory A good example of distributed memory

    protocols is Remote DMA (RDMA) which allows direct access to memory pages bypassing the CPU. You can find this in Infiniband NICs and GPUs for example. To cover GPUs, they have both local and remote memory regions that can be accessed over interconnects for fast processing and retrieval across many parallel executing cores.
  44. Networking In distributed systems, everything is separated physically in some

    sense. This means we need some way to communicate.
  45. Networking In distributed systems, everything is separated physically in some

    sense. This means we need some way to communicate. Networking allows different systems to pass information over a protocol that defines the standard for how these informations are sent and received.
  46. Networking Client -> Server Communication Client Server Both the client

    and server understand the protocol so they can communicate with each other
  47. Networking Client -> Server Communication Client Server Both the client

    and server understand the protocol so they can communicate with each other For databases, we have protocols also. MySQL, MongoDB, Redis etc all implement a protocol for communication with their clients.
  48. Networking Reliability Sometimes, the network fails Client Server A client

    is isolated from the server due to networking issues. This can be a broken link or intentional blocks (eg firewalls)
  49. Networking Reliability Sometimes, the network fails Client Server A client

    is isolated from the server due to networking issues. This can be a broken link or intentional blocks (eg firewalls) We call these scenarios network partitions and they can be good or BAD
  50. Networking Reliability Networks fail! - Biggest - Single - Point

    - Of - Failure Client Server 💀 Sometimes, the network can also be bad in general where nothing can communicate.
  51. Networking Reliability Networks - Multiple - Point - Of -

    Redundancy Client Server 😎 In designing resilient distributed systems, there is usually notions of redundancy where we apply multiple paths for communication.
  52. Cascading Failures Usually, when networks, storage or most other systems

    fail, in the delicate balance of distributed systems, this can cause cascading failures.
  53. Cascading Failures Usually, when networks, storage or most other systems

    fail, in the delicate balance of distributed systems, this can cause cascading failures. Cascading failures can occur for a number of reasons, for example, replication failures reach a limit due to network partitions and the nodes failover due to WAL build up causing allocated storage to become filled up.
  54. Cascading Failures Usually, when networks, storage or most other systems

    fail, in the delicate balance of distributed systems, this can cause cascading failures. Cascading failures can occur for a number of reasons, for example, replication failures reach a limit due to network partitions and the nodes failover due to WAL build up causing allocated storage to become filled up. that’s a mouthful of things in succession but it happens
  55. Cascading Failures In designing distributed systems, cascading failures are part

    of the process and designing to handle them is more difficult than it might appear.
  56. Cascading Failures In designing distributed systems, cascading failures are part

    of the process and designing to handle them is more difficult than it might appear. The general goal is to try to fix them as they appear cause there’s no universal fix.
  57. Cascading Failures In designing distributed systems, cascading failures are part

    of the process and designing to handle them is more difficult than it might appear. The general goal is to try to fix them as they appear cause there’s no universal fix. In the words of Leslie Lamport “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable”
  58. Recovering from failure In recovering from failure, you can apply

    a numerous amount of techniques to do so. I cover only two for simplicity, you can use retries in the event of short blips and/or use transactions to rollback in the event of such failures so they can retried later on.
  59. Retry Storms In distributed systems, there’s always more than one

    machine. Imagine if 1000 nodes retried (multiple) requests to a server during a momentary outage, the requests coming back would leave the entire system under significant load due to the now synchronised barrage of requests that failed.
  60. Retry Storms In distributed systems, there’s always more than one

    machine. Imagine if 1000 nodes retried (multiple) requests to a server during a momentary outage, the requests coming back would leave the entire system under significant load due to the now synchronised barrage of requests that failed. To avoid this, apply some random jitter and backoff in some effect on startup, on retry etc to save yourself from the loop of incoming requests that end up causing cascading failures downstream.
  61. Transactions “If I pay you, you owe me bitcoin” -

    Good transaction “If I don’t pay you, you owe me bitcoin” - Bad transaction Simple as that.
  62. Transactions Transactions allow us chain workflows together so we can

    prevent actions that partially succeed. In distributed systems, transactions are performed through different means. You can do so through consensus algorithms like Raft, Paxos, 2+ Phase Commits and various others.
  63. Intermittent break: Sagas (contrary to belief, this is not a

    star trek reference) Sagas are way to chain transactions across many distributed systems. When you have multiple transactions that occur across multiple isolated data sources, it is impossible to synchronize because each place the transaction goes has a different view of the world.
  64. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D A B C D 1 1 1
  65. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D ✅ A B C D 1 1 4
  66. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D ✅ ✅ A B C D 1 4 4
  67. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D ✅ ✅ ❌ The transaction fails on C->D A B C D 1 4 4
  68. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D 🔙 We rollback all the changes made thus far A B C D 1 4 4
  69. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D 🔙 🔙 We rollback all the changes made thus far A B C D 1 1 4
  70. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D 🔙 🔙 🔙 We rollback all the changes made thus far A B C D 1 1 1
  71. Intermittent break: Sagas We want to change the value under

    D from 1 to 4, This can be only done through A -> B -> C -> D We’re back to where we started. A B C D 1 1 1
  72. Intermittent break: Sagas Sagas help orchestrate a means to avoid

    a transaction failure on one of many services from being achieved. If one system fails in the chain, all the previous changes are rolled back downstream to prevent inconsistency.
  73. Intermittent break: Sagas Sagas help orchestrate a means to avoid

    a transaction failure on one of many services from being achieved. If one system fails in the chain, all the previous changes are rolled back downstream to prevent inconsistency. As you can imagine, it is quite difficult to achieve this pattern but it is very useful in proper microservice designs for systems with independent transactions eg finance
  74. Intermittent break: Sagas More details on sagas can be found

    here: https://microservices.io/patterns/data/saga.html
  75. Timing “It is impossible to know the position and momentum

    of an atomic particle at any given instant in time” - Heisenberg’s Uncertainty Principle So, it is also difficult to accurately state the time in distributed systems. Based on the frequency of ticks to count the time, we will always have some miniscule amount of drift due to the nature of particle physics and delay in fetching the time to continue ticking with.
  76. Timing In distributed systems, we can tell the time using

    three different clocks These clocks are:
  77. Timing In distributed systems, we can tell the time using

    three different clocks These clocks are: • Realtime (based on world time and time zones)
  78. Timing In distributed systems, we can tell the time using

    three different clocks These clocks are: • Realtime (based on world time and time zones) • Monotonic (based on the difference of when we started counting)
  79. Timing In distributed systems, we can tell the time using

    three different clocks These clocks are: • Realtime (based on world time and time zones) • Monotonic (based on the difference of when we started counting) • Logical (based on an algorithm for timing events)
  80. Timing: Realtime Clocks This is the usual time that we

    get from our digital time keepers syncing with NTP servers that provide the time. However, this fails because the time lost in getting the time now is usually off by some miniscule fraction or larger.
  81. Timing: Realtime Clocks This is the usual time that we

    get from our digital time keepers syncing with NTP servers that provide the time. However, this fails because the time lost in getting the time now is usually off by some miniscule fraction or larger. This is not an issue unless you have lots of events happening at the same time in different parts of the world, the tiny difference in skew and the fact that time can be globally the same (collision) means this system across multiple transactions can cause issues in consensus and ordering of events.
  82. Timing: Monotonic Clocks Monotonic clocks work based on some starting

    point, like the big bang (theory?) or the instance the system booted to be more realistic. In Linux, this time can be accessed through syscalls, documentation is available here: https://man7.org/linux/man-pages/man2/clock_getres.2.html
  83. Timing: Logical Clocks This is based on algorithms that time

    the transactions of events so as to better alleviate the issues with real time clocks. An example of this is Lamport’s Clock where rather than a tick being defined by some frequency pinned to a quartz crystal, we tick based on a monotonically increasing counter for every event that happens.
  84. Timing: Logical Clocks An event A happens, X is incremented

    by 1 and assigned as the time Event A happened X X + 1 Event A
  85. Timing: Logical Clocks Another event B happens, X + 1

    from Event A is incremented by 1 and assigned X X + 1 Event A X + 2 Event B
  86. Timing: Logical Clocks Logical clocks are great because they have

    no skew and we can accurately process the events based on the counter tracking how many “events” have occurred since we started and assigning IDs of time based on that value.
  87. Why is timing important for databases? When resolving conflicting entries

    across multiple nodes, the time is very important. The time tells a story of when something was created and what was created after it. This allows for what database enthusiasts called serializable consistency which defines that if event A < event B in time, then event B will take precedence over event A. If we updated a row with event A, it would be overwritten by event B.
  88. Why is timing important for databases? When resolving conflicting entries

    across multiple nodes, the time is very important. The time tells a story of when something was created and what was created after it. This allows for what database enthusiasts called serializable consistency which defines that if event A < event B in time, then event B will take precedence over event A. If we updated a row with event A, it would be overwritten by event B. If the timing is off, then serializable consistency fails. Very large databases cannot rely on realtime clocks so they have a mix of logical and realtime clocks.
  89. Timing Resources It is very important to have the time

    correct across database replicas, you can find some resources below on how this is accomplished across distributed systems: • Spanner: TrueTime and external consistency | Google Cloud • Consistency without Clocks: The Fauna Distributed Transaction Protocol • Facebook did tons of research just to fix timing: NTP: Building a more accurate time service at Facebook scale
  90. Synchronization Go to this link https://www.google.com/search?q=cha+cha+slide and tap the shiny

    microphone icon One step before the other, that’s all there is to it.
  91. Synchronization In earlier slides, we discussed shared memory systems that

    have readers and writers. When performing read and writes, we have to ensure we do not perform writes at the same time else we get collisions as discussed in the timing chapter. SHM Writer Reader Writer
  92. Synchronization Synchronization in distributed systems is very complex depending on

    the setup. We covered sagas where transactions can happen in one far away system. What happens when we have two events at the same time? What do we take, A1 or A2? A 1 B C D 1 1 1 A 2
  93. Synchronization For databases, this system of consistency ignoring all the

    timing and networking is based on who can go first and who’s next. It will usually involve entering a critical section where such updates are only permitted to be done.
  94. Synchronization This is achieved through the following: • Locks •

    Semaphores • Spinlocks • Conditions • Signals
  95. Synchronization This is achieved through the following: • Locks •

    Semaphores • Spinlocks • Conditions • Signals And a lot more primitives depending on the use case.
  96. Synchronization This is achieved through the following: • Locks •

    Semaphores • Spinlocks • Conditions • Signals And a lot more primitives depending on the use case. I heavily recommend reading Djikstra’s original paper on this topic: Concurrent Programming, Mutual Exclusion (1965; Dijkstra)
  97. Synchronization in Databases Databases are complex systems with varying critical

    sections that rely on multiple parameters. In replicated storage, the WAL (Write Ahead Log) is a localized transaction log containing sequential entries of transactions. In the wider spectrum of replication, the master replica will apply these replicated updates based on another set of factors, usually this only happens during a master failover. There are tons and tons of consensus requirements in database design which work past mutexes. However, most will resolve as long as the system stays operational.
  98. Acknowledgements “You, you, you” - How many times did I

    spell you? It is 3 times. However, I only intended to spell it once. Seems we have duplicates.
  99. Acknowledgements Acknowledgements in Distributed Systems aim to guarantee message ordering

    and delivery requirements. These are specific agreements which are made on messages delivered.
  100. Acknowledgements Acknowledgements in Distributed Systems aim to guarantee message ordering

    and delivery requirements. These are specific agreements which are made on messages delivered. Usually, you will find acknowledgements in message passing systems than shared memory stores like databases. However, shared memory systems also enforce deduplication and other forms of acknowledgements in their design.
  101. Acknowledgements You can have quite simply the following types of

    acknowledgements: • At least once delivery
  102. Acknowledgements You can have quite simply the following types of

    acknowledgements: • At least once delivery • Exactly once delivery
  103. Acknowledgements You can have quite simply the following types of

    acknowledgements: • At least once delivery • Exactly once delivery • At most once delivery
  104. At least once delivery In this system, the messages will

    be retried indefinitely until there is at least one message received on one end and an acknowledgement sent. It does not matter if the requests are duplicated, what matters is that at least one event of it exists.
  105. At least once delivery In this system, the messages will

    be retried indefinitely until there is at least one message received on one end and an acknowledgement sent. It does not matter if the requests are duplicated, what matters is that at least one event of it exists. This is useful for things like heartbeats where multiple events are good to signal that things are fine. No events implies some failure with the existing system.
  106. Exactly once delivery In this system, the messages will be

    retried indefinitely until one message is received on one end and an acknowledgement sent. This is needed for critical events where duplicates just cannot happen.
  107. Exactly once delivery In this system, the messages will be

    retried indefinitely until one message is received on one end and an acknowledgement sent. This is needed for critical events where duplicates just cannot happen. In cases like data transfer, we cannot have the same packet for example sent and acknowledged twice or not acknowledged at all. That would imply data corruption and misordering of TCP sequences.
  108. At most once delivery In this system, the system will

    send a message but there is no issue if it is never received or sent.
  109. At most once delivery In this system, the system will

    send a message but there is no issue if it is never received or sent. Systems using UDP packets for data transfer like torrents are a great example. Having at most once delivery implies that acknowledgements are not required for the system to be operational, these can also be looked at as lossy delivery acknowledgements.
  110. Acknowledgements in Databases In databases, you will usually have the

    exactly once delivery although in practice, it is usually at most once delivery. Databases are high integrity systems requiring strict acknowledgement of packets and data storage and retrieval.
  111. Acknowledgements in Databases In databases, you will usually have the

    exactly once delivery although in practice, it is usually at most once delivery. Databases are high integrity systems requiring strict acknowledgement of packets and data storage and retrieval. The WAL is implemented to keep data localised until it can be replicated and the acknowledgements are strictly required so database protocols are built on TCP as a result.
  112. Processing Remember our discussion on concurrency and parallelism, we stated

    that it is possible to process things at the same time speeding up the process. In a similar fashion, we can employ techniques of state sharing to apply those techniques in distributed systems.
  113. Processing - Pipelines To achieve the benefits of concurrency in

    addition to parallel execution, we model the stages of processing in chained workflows. These chained workflows are called pipelines and these pipelines may employ any number of transactions and rollbacks in processing information.
  114. Processing - Pipelines To achieve the benefits of concurrency in

    addition to parallel execution, we model the stages of processing in chained workflows. These chained workflows are called pipelines and these pipelines may employ any number of transactions and rollbacks in processing information. As a refresher: MQ - Message Queue (message queue) SHM - Shared Memory (database)
  115. Intermittent break: Message Passing Terminology Message passing implementations can have

    varying names, some might say: • Message Queue • Message Broker
  116. Intermittent break: Message Passing Terminology Message passing implementations can have

    varying names, some might say: • Message Queue • Message Broker • Service Bus
  117. Intermittent break: Message Passing Terminology Message passing implementations can have

    varying names, some might say: • Message Queue • Message Broker • Service Bus • Streams Processor
  118. Intermittent break: Message Passing Terminology Message passing implementations can have

    varying names, some might say: • Message Queue • Message Broker • Service Bus • Streams Processor And several others.
  119. Intermittent break: Message Passing Terminology Message passing implementations can have

    varying names, some might say: • Message Queue • Message Broker • Service Bus • Streams Processor And several others. There are differences but the basic idea is all of them implement message passing.
  120. Intermittent break: Message Passing Terminology Despite the terminology, you will

    always have a source (start of the message transfer) and a sink (end of the message transfer) in the process. The source and sink could also be termed:
  121. Intermittent break: Message Passing Terminology Despite the terminology, you will

    always have a source (start of the message transfer) and a sink (end of the message transfer) in the process. The source and sink could also be termed: • PUBlisher and SUBscriber (PUBSUB)
  122. Intermittent break: Message Passing Terminology Despite the terminology, you will

    always have a source (start of the message transfer) and a sink (end of the message transfer) in the process. The source and sink could also be termed: • PUBlisher and SUBscriber (PUBSUB) • Producer and Consumer
  123. Intermittent break: Message Passing Terminology Despite the terminology, you will

    always have a source (start of the message transfer) and a sink (end of the message transfer) in the process. The source and sink could also be termed: • PUBlisher and SUBscriber (PUBSUB) • Producer and Consumer And several others, it’s all still message passing.
  124. Processing - Pipelines A -> Sends the message through MQ

    1 to B A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  125. Processing - Pipelines B -> Writes to SHM A and

    MQ 2, persisted and async respectively A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  126. Processing - Pipelines C -> Processes each event from MQ

    2 and persists to SHM A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  127. Processing - Pipelines D -> Batch processes from SHM A

    and writes to another SHM B A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  128. Processing - Pipelines SHM B is the final path of

    our processing pipeline A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  129. Processing - Pipelines In distributed systems, we will always have

    pipelines that employ various forms of techniques, these pipelines can be:
  130. Processing - Pipelines In distributed systems, we will always have

    pipelines that employ various forms of techniques, these pipelines can be: • Event Driven / Stream Processing
  131. Processing - Pipelines In distributed systems, we will always have

    pipelines that employ various forms of techniques, these pipelines can be: • Event Driven / Stream Processing • Batch Processing
  132. Processing - Pipelines In distributed systems, we will always have

    pipelines that employ various forms of techniques, these pipelines can be: • Event Driven / Stream Processing • Batch Process • Extract, Transform, Load (ETL)
  133. Processing - Event Driven / Stream Processing In this instance,

    we just take events or streams of data as they come and process them to where they need to go.
  134. Processing - Event Driven / Stream Processing In this instance,

    we just take events or streams of data as they come and process them to where they need to go. No persistence needed to make this work. Webhooks are a good example of this paradigm.
  135. Processing - Event Driven / Stream Processing A -> B

    and B -> C are Event Driven / Stream Processed A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  136. Processing - Batch Processing In batch processing, we require some

    persistent store of data to aggregate all the information over some interval. Batch processing is useful when you need to gather tons of data and makes sense of it in a single outlook.
  137. Processing - Batch Processing In batch processing, we require some

    persistent store of data to aggregate all the information over some interval. Batch processing is useful when you need to gather tons of data and makes sense of it in a single outlook. An example of this is Payroll where all your expenses over a month need to be added together so you can get a final ledger over that period.
  138. Processing - Batch Processing SHM A -> D -> SHM

    B can be Batch Processed A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  139. Processing - Extract, Transform, Load (ETL) In this processing pattern,

    you are usually running a mix of batch and stream processing. You take data aggregated over a period of time (Extract) with various input and output streams performing operations on them in the process (Transform) only to save the final output in a warehouse for consumption (Load)
  140. Processing - Extract, Transform, Load (ETL) This entire pipeline is

    an ETL A B C Workers MQ 1 SHM A D SHM B Workers MQ 2
  141. Processing - OLAP and OLTP Usually in database processing systems,

    you will always find OLAP and OLTP mentioned.
  142. Processing - OLAP and OLTP Usually in database processing systems,

    you will always find OLAP and OLTP mentioned. OLAP - Online Analytical Processing
  143. Processing - OLAP and OLTP Usually in database processing systems,

    you will always find OLAP and OLTP mentioned. OLAP - Online Analytical Processing OLTP - Online Transaction Processing
  144. Processing - OLAP OLAP databases are very large and accumulate

    tons of information for acquiring trends and performing complex data analysis. These processing systems have very long storage requirements and integrate multiple sources of information.They have very strong transaction guarantees and require significant compute to run.
  145. Processing - OLAP OLAP databases are very large and accumulate

    tons of information for acquiring trends and performing complex data analysis. These processing systems have very long storage requirements and integrate multiple sources of information.They have very strong transaction guarantees and require significant compute to run. OLAPs databases are usually the likes of Spanner, Aurora, Snowflake and most other data warehousing solutions you will find.
  146. Processing - OLTP OLTPs are good for the stream/event driven

    and short term batch processing outcomes where data needs to be processed in a relatively short time and the data sources to aggregate from are minimal. OLTP databases usually do not have very strong transaction guarantees and this eases performance in the processing aspect.
  147. Processing - OLTP OLTPs are good for the stream/event driven

    and short term batch processing outcomes where data needs to be processed in a relatively short time and the data sources to aggregate from are minimal. OLTP databases usually do not have very strong transaction guarantees and this eases performance in the processing aspect. Examples are the classic databases, Redshift, RDS and others.
  148. Processing - Conclusion In summary, several of the patterns noted

    here are used internally in the database internals design. In building distributed systems with databases, these are similar concepts that can be applied to their design.
  149. Consensus: CAP Unlike the head wear, this is more about

    systems. CAP refers to: • Consistency
  150. Consensus: CAP Unlike the head wear, this is more about

    systems. CAP refers to: • Consistency • Availability
  151. Consensus: CAP Unlike the head wear, this is more about

    systems. CAP refers to: • Consistency • Availability • Partition Tolerance
  152. Consensus: CAP Consistency means that knowledge of information in the

    distributed system is always the same at any given time.
  153. Consensus: CAP Consistency means that knowledge of information in the

    distributed system is always the same at any given time. Availability means that every update to such information will always be received without error.
  154. Consensus: CAP Consistency means that knowledge of information in the

    distributed system is always the same at any given time. Availability means that every update to such information will always be received without error. Partition Tolerance means that no matter the number of nodes, we will always be operating despite the failure of other nodes.
  155. Consensus: CAP Theorem The CAP theorem basically states that it

    is impossible to have Consistency, Availability and Partition Tolerance in a distributed system.
  156. Consensus: CAP Theorem The CAP theorem basically states that it

    is impossible to have Consistency, Availability and Partition Tolerance in a distributed system. Partition Tolerance is more aligned with the “distributed” aspect of systems as every node can independently perform actions.
  157. Consensus: CAP Theorem The CAP theorem basically states that it

    is impossible to have Consistency, Availability and Partition Tolerance in a distributed system. Partition Tolerance is more aligned with the “distributed” aspect of systems as every node can independently perform actions, doing so in consensus means that we will either sacrifice availability of the system since the now failed nodes will give errors.
  158. Consensus: CAP Theorem The CAP theorem basically states that it

    is impossible to have Consistency, Availability and Partition Tolerance in a distributed system. Partition Tolerance is more aligned with the “distributed” aspect of systems as every node can independently perform actions, doing so in consensus means that we will either sacrifice availability of the system since the now failed nodes will give errors or consistency in the results since failed nodes cannot get the most updated information.
  159. Consensus: CAP Theorem The CAP Theorem holds that there is

    no perfect distributed system that can operate in all three states, only two are possible. Hence it is visualised as the CAP triangle.
  160. Consensus: FLP Impossibility This paper was written by Fisher, Lynch

    and Paterson (hence the name). Source: Impossibility of Distributed Consensus with One Faulty Process It draws on three properties of the consensus process:
  161. Consensus: FLP Impossibility This paper was written by Fisher, Lynch

    and Paterson (hence the name). Source: Impossibility of Distributed Consensus with One Faulty Process It draws on three properties of the consensus process: • Agreement
  162. Consensus: FLP Impossibility This paper was written by Fisher, Lynch

    and Paterson (hence the name). Source: Impossibility of Distributed Consensus with One Faulty Process It draws on three properties of the consensus process: • Agreement • Validity
  163. Consensus: FLP Impossibility This paper was written by Fisher, Lynch

    and Paterson (hence the name). Source: Impossibility of Distributed Consensus with One Faulty Process It draws on three properties of the consensus process: • Agreement • Validity • Termination
  164. Consensus: FLP Impossibility Agreement defines that a consensus protocol should

    ensure decisions are defined and agreed upon by active non-failing members of the network.
  165. Consensus: FLP Impossibility Agreement defines that a consensus protocol should

    ensure decisions are defined and agreed upon by active non-failing members of the network. Validity implies that the value proposed is made by members of the network participating in the voting process, it cannot be externally provided.
  166. Consensus: FLP Impossibility Agreement defines that a consensus protocol should

    ensure decisions are defined and agreed upon by active non-failing members of the network. Validity implies that the value proposed is approved by members of the network participating in the voting process, it cannot be externally provided or defaulted. Termination means a decision is only made when active non-faulty members agree it can be made.
  167. Consensus: FLP Impossibility In the paper, we assume all processes

    are asynchronous implying there is no upper bound on processing time before a decision is made. The basis of the paper establishes that:
  168. Consensus: FLP Impossibility In the paper, we assume all processes

    are asynchronous implying there is no upper bound on processing time before a decision is made. The basis of the paper establishes that: 1. it is not possible to have a distributed system that can guarantee its current state is always up to date without failure
  169. Consensus: FLP Impossibility In the paper, we assume all processes

    are asynchronous implying there is no upper bound on processing time before a decision is made. The basis of the paper establishes that: 1. it is not possible to have a distributed system that can guarantee its current state is always up to date without failure 2. consensus cannot be said to occur within a bounded time, there will always be delays that violate a predefined interval for resolution
  170. Consensus: FLP Impossibility and CAP These two argue different perspectives

    of the same problem, that distributed systems are a tradeoff in consensus.
  171. Consensus: FLP Impossibility and CAP These two argue different perspectives

    of the same problem, that distributed systems are a tradeoff in consensus. FLP Impossibility makes it known that time in distributed systems cannot be bounded and the state of each interacting member is never perfectly known, hence why consensus will always take an unbounded amount of time due to uncooperating members.
  172. Consensus: FLP Impossibility and CAP These two argue different perspectives

    of the same problem, that distributed systems are a tradeoff in consensus. FLP Impossibility makes it known that time in distributed systems cannot be bounded and the state of each interacting member is never perfectly known, hence why consensus will always take an unbounded amount of time due to uncooperating members. CAP argues that a distributed system cannot be perfectly designed to have all members achieve consensus without ignoring failing members or the known state of the system.
  173. Conclusion Distributed systems incorporate asynchronous patterns of communication that makes

    it impossible to reliably know the state of the network at any given time.
  174. Conclusion Distributed systems incorporate asynchronous patterns of communication that makes

    it impossible to reliably know the state of the network at any given time. Workarounds for failures are best known depending on the problem and there are extensive literatures taking into consideration these failures in considering solutions.
  175. Conclusion Distributed systems incorporate asynchronous patterns of communication that makes

    it impossible to reliably know the state of the network at any given time. Workarounds for failures are best known depending on the problem and there are extensive literatures taking into consideration these failures in considering solutions. In summary, there is no perfect distributed system.