Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-Abstract Large Systems Design

Non-Abstract Large Systems Design

Yury Nino

July 31, 2021
Tweet

More Decks by Yury Nino

Other Decks in Technology

Transcript

  1. YURY NIÑO ROA Site Reliability Engineer Chaos Engineering Advocate ADL

    Digital Labs www.sitereliabilityenginering.co . www.yurynino.com yury nino
  2. GLOSSARY OF SYSTEMS DESIGN www.sitereliabilityenginering.co . www.yurynino.com Load Balancing Data

    Partitioning Proxies Caching Indexes Redundancy Replication SQL vs NoSQL Consistent Hashing CAP Theorem PACELC Theorem Bloom Quorum Leader and Follower
  3. Consistent Core Follower Readers Generation Clock Gossip Dissemination HeartBeat Hybrid

    Clock Idempotent Receiver State Watch Quorum SYSTEMS DESIGN PATTERNS www.sitereliabilityenginering.co . www.yurynino.com https://martinfowler.com/articles/patterns-of-distributed-systems/
  4. SYSTEMS DESIGN FALLACIES The Network is Reliable Latency is Zero

    The Topology doesn’t change Transport Cost Is Zero Bandwidth is Infinity The Network is Secure www.sitereliabilityenginering.co . www.yurynino.com
  5. Iterative style for designing and implementing systems. NON ABSTRACT LARGE

    SYSTEM DESIGN WHAT IS NALSD? SRE Ability to assess, design, and evaluate large systems. Robust and scalable designs with low operational costs. www.sitereliabilityenginering.co . www.yurynino.com
  6. NALSD IN DETAIL Google SREs are expected to be able

    to start resource planning with a basic whiteboard diagram of a system, think through the various scaling and failure domains, and focus their design into a concrete proposal for resources. www.sitereliabilityenginering.co . www.yurynino.com
  7. WHY NALSD? Google has learned (the hard way) that the

    people designing distributed systems need to develop and continuously exercise the muscle of design into concrete estimates of resources at multiple steps in the process. www.sitereliabilityenginering.co . www.yurynino.com
  8. Consider running our entire application on a single computer. One

    Machine Now we’ll need multiple machines, what’s the best design to join them? Distributed System * Is it possible? * Can we do better? * Is it feasible? * Is it resilient? Design Process * Read & Understand * Required SLOs * Ask that you consider Initial Requirements NALSD IN DETAIL www.sitereliabilityenginering.co . www.yurynino.com
  9. HOW TO BEGIN? https://danrl.com/sre-flash-cards/SRE%20Flash%20Cards.pdf ‘The numbers everyone should know’ Time

    Main Memory Reference Time Round trip within same datacenter Power of ten? ns / us / ms Speed Read sequentially from SSD From: https://cloud.google.com/blog/products/manage ment-tools/sre-principles-and-flashcards-to-design- nalsd Time Read 1 MB sequentially from memory www.sitereliabilityenginering.co . www.yurynino.com
  10. USE CASE • The Google AdWords service displays text advertisements

    on Google Web Search. • The click-through rate (CTR) metric tells advertisers how well their ads are performing. • CTR is the ratio of times the ad is clicked versus the number of times the ad is shown. AdWords Challenge Design a system capable of measuring and reporting an accurate CTR for every AdWords ad. www.sitereliabilityenginering.co . www.yurynino.com
  11. INITIAL REQUIREMENTS Each advertiser may have multiple advertisements. Each ad

    is keyed by ad_id and is associated with a list of search terms selected by the advertiser. * How often this search term triggered this ad to be shown? * How many times the ad was clicked by someone who saw the ad? * With this information, we can calculate the CTR CTR: the number of clicks divided by the number of impressions. www.sitereliabilityenginering.co . www.yurynino.com
  12. INITIAL REQUIREMENTS • We know our advertisers care about two

    things: ◦ That the dashboard displays quickly! ◦ That the data is recent. Therefore, we will consider our requirements in terms of SLOs: • 99.9% of dashboard queries complete in < 1 second. • 99.9% of the time, the CTR data displayed is less than 5 minutes old. www.sitereliabilityenginering.co . www.yurynino.com
  13. ONE MACHINE For every web search query, we log The

    TIME the query occurred A QUERY_ID unique identifier An AD_ID The AD IDs of THE AdWords advertisements shown for the search A SEARCH_TERM the query content www.sitereliabilityenginering.co . www.yurynino.com
  14. ONE MACHINE Calculations TIME 64-bit integer, 8 bytes QUERY_ID 64-bit

    integer, 8 bytes An AD_ID 3 64-bit integer, 24 bytes A SEARCH_TERM A long string, up to 500 bytes www.sitereliabilityenginering.co . www.yurynino.com
  15. ONE MACHINE The volume of query logs generated in a

    24-hour period:: * (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (2 × 103 bytes) = 86.4 TB/day -- A common 4 TB HDD sustains 200 input/output operations per second (IOPS): * (5 × 105 queries/sec) / (200 IOPS/disk) = 2.5 × 103 disks or 2,500 disks -- * (100 TB) / (64 GB RAM/machine) = 1,563 machines www.sitereliabilityenginering.co . www.yurynino.com
  16. ASSESSMENT We can not we reasonably support our SLOs if

    one of these components fails. One-machine design looks unfeasible www.sitereliabilityenginering.co . www.yurynino.com
  17. EXPLORE ANOTHER IDEA * We can process and join the

    logs with a MapReduce. * We can grab the accumulated query logs and click logs. MapReduce will produce a data set organized by ad_id & the number of clicks each search_term received. Unfortunately, this type of batch process can’t meet our SLO of joined log availability within 5 minutes of logs being received. www.sitereliabilityenginering.co . www.yurynino.com
  18. DISTRIBUTED SYSTEM The amount of network throughput LogJoiner needs to

    process the logs: * (104 clicks/sec) × (2 × 103 bytes) = 2 × 107 = 20 MB/sec = 160 Mbps -- * 3 × (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (8 bytes + 8 bytes) = 2 × 1012 = 2 TB/day for QueryMap The next step in scaling the design is to shard the inputs and outputs. To divide the incoming query logs and click logs into multiple streams. www.sitereliabilityenginering.co . www.yurynino.com