Introduction to Non-Abstract Large Design Systems

Slide 1

Slide 1 text

Non-Abstract Large Design Systems

Slide 2

Slide 2 text

YURY NIÑO ROA Cloud Infrastructure Engineer Chaos Engineering Advocate @Google www.yurynino.com yurynino

Slide 3

Slide 3 text

AGENDA How NALSD? Use Case Introduction Systems Design Fundamentals What is NALSD?

Slide 4

Slide 4 text

REFERENCES

Slide 5

Slide 5 text

What do you look in an SRE? Automation Curiosity

Slide 6

Slide 6 text

NALSD IN DETAIL Google SREs are expected to be able to start resource planning with a basic whiteboard diagram of a system, think through the various scaling and failure domains, and focus their design into a concrete proposal for resources.

Slide 7

Slide 7 text

Iterative style for designing and implementing systems. NON ABSTRACT LARGE SYSTEM DESIGN WHAT IS NALSD? SRE Ability to assess, design, and evaluate large systems. Robust and scalable designs with low operational costs.

Slide 8

Slide 8 text

Google has learned (the hard way) that the people designing distributed systems need to develop and continuously exercise the muscle of design into concrete estimates of resources at multiple steps in the process. WHY NALSD?

Slide 9

Slide 9 text

NALSD is a critical skill for SREs. In NALSD, we consider how to design large systems for reliability, resilience, and efficiency. NALSD is not only used when building a new system, but also when systems need to be changed. Focus on building experience and judgment, not simply more algorithms.

Slide 10

Slide 10 text

We can’t talk about designing for reliability and SRE without touching on non-abstract, large system design. At Google, we found that addressing reliability issues during the design phase reduces future costs!

Slide 11

Slide 11 text

Consider running our entire application on a single computer. One Machine Now we’ll need multiple machines, what’s the best design to join them? Distributed System Basic Design Phase * Is it possible? * Can we do better? Basic Design Phase * Is it feasible? * Is resilient? * Is it resilient? Design Process NALSD DESIGN PROCESS * Read & Understand * Required SLOs * Ask that you consider Initial Requirements

Slide 12

Slide 12 text

BEFORE TO BEGIN Load Balancing Data Partitioning Proxies Caching Indexes Redundancy Replication SQL vs NoSQL Consistent Hashing CAP Theorem PACELC Theorem Bloom Quorum Leader and Follower

Slide 13

Slide 13 text

Consistent Core Follower Readers Generation Clock Gossip Dissemination HeartBeat Hybrid Clock Idempotent Receiver State Watch Quorum SYSTEMS DESIGN PATTERNS https://martinfowler.com/articles/patterns-of-distributed-systems/

Slide 14

Slide 14 text

BEFORE TO BEGIN https://danrl.com/sre-flash-cards/SRE%20Flash%20Cards.pdf ‘The numbers everyone should know’ Time Main Memory Reference Time Round trip within same datacenter Power of ten? ns / us / ms Speed Read sequentially from SSD From: https://cloud.google.com/blog/products/manage ment-tools/sre-principles-and-ﬂashcards-to-design- nalsd Time Read 1 MB sequentially from memory

Slide 15

Slide 15 text

USE CASE ● The Google AdWords service displays text advertisements on Google Web Search. ● The click-through rate (CTR) metric tells advertisers how well their ads are performing. CTR = # clicks on the announcement # times that the announcement is shown AdWords Challenge Design a system capable of measuring and reporting an accurate CTR for every AdWords ad.

Slide 16

Slide 16 text

Slide 17

Slide 17 text

DESIGN PROCESS Is it possible? If we didn’t have to worry about enough RAM, CPU, network bandwidth, and so on, what would we design to satisfy the requirements? Can we do better? If the design solves the problem in O(N) time, can we solve it more quickly—say, O(ln(N))? CTR: the number of clicks divided by the number of impressions.

Slide 18

Slide 18 text

DESIGN PROCESS Next phase, we try to scale up our basic design Is it feasible? Is it possible to scale this design, given constraints on HW? What distributed design would satisfy the requirements? Is it resilient? Can the design fail gracefully? What happens when this component fails? How does the system work when fails? Can we do better? CTR: the number of clicks divided by the number of impressions.

Slide 19

Slide 19 text

Slide 20

Slide 20 text

INITIAL REQUIREMENTS Each advertiser may have multiple advertisements. Each ad is keyed by ad_id and is associated with a list of search terms selected by the advertiser. * How often this search term triggered this ad to be shown? * How many times the ad was clicked by someone who saw the ad? * With this information, we can calculate the CTR CTR: the number of clicks divided by the number of impressions.

Slide 21

Slide 21 text

● We know our advertisers care about two things: ○ That the dashboard displays quickly! ○ That the data is recent. Therefore, we will consider our requirements in terms of SLOs: ● 99.9% of dashboard queries complete in < 1 second. ● 99.9% of the time, the CTR data displayed is less than 5 minutes old. INITIAL REQUIREMENTS

Slide 22

Slide 22 text

Slide 23

Slide 23 text

For every web search query, we log The TIME the query occurred A QUERY_ID unique identifier An AD_ID The AD IDs of THE AdWords advertisements shown for the search A SEARCH_TERM the query content ONE MACHINE

Slide 24

Slide 24 text

Calculations TIME 64-bit integer, 8 bytes QUERY_ID 64-bit integer, 8 bytes An AD_ID 3 64-bit integer, 24 bytes A SEARCH_TERM A long string, up to 500 bytes ONE MACHINE

Slide 25

Slide 25 text

We will round up to treat each query log entry as 2 KB. Click log volume should be considerably smaller than query log volume: because the average CTR is 2% (10,000 clicks / 500,000 queries) Remember that we chose big numbers to illustrate that these principles scale to arbitrarily large implementations. ONE MACHINE

Slide 26

Slide 26 text

The volume of query logs generated in a 24-hour period:: * (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (2 × 103 bytes) = 86.4 TB/day 100TB/day -- A common 4 TB HDD sustains 200 input/output operations per second (IOPS): * (5 × 105 queries/sec) / (200 IOPS/disk) = 2.5 × 103 disks or 2,500 disks * (100 TB) / (64 GB RAM/machine) = 1,563 machines ONE MACHINE

Slide 27

Slide 27 text

We can not we reasonably support our SLOs if one of these components fails. One-machine design looks unfeasible EVALUATION

Slide 28

Slide 28 text

Slide 29

Slide 29 text

DISTRIBUTED SYSTEM * We can process and join the logs with MapReduce. * We can grab the accumulated query logs and click logs. MapReduce works as a batch processor: its inputs are a large data set, and it can use many machines to process that data via workers and produce a result. Unfortunately, this type of batch process can’t meet our SLO of joined log availability within 5 minutes of logs being received. EVALUATION

Slide 30

Slide 30 text

Slide 31

Slide 31 text

DISTRIBUTED SYSTEM What if we loop over the click logs and pull in the specific queries referenced. We’ll call this component the LogJoiner LogJoiner takes a continuous stream of data from the click logs, joins it with the data in QueryStore, and then stores that information, organized by ad_id. Once the queries that were clicked on are stored and indexed by ad_id, we have half the data required to generate the CTR dashboard. We will call this the ClickMap, because it maps from ad_id to the clicks. LogJoiner

Slide 32

Slide 32 text

ITERATE LogJoiner

Slide 33

Slide 33 text

DISTRIBUTED SYSTEM The amount of network throughput LogJoiner needs to process the logs: * (104 clicks/sec) × (2 × 103 bytes) = 2 × 107 = 20 MB/sec = 160 Mbps -- * 3 × (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (8 bytes + 8 bytes) = 2 × 1012 = 2 TB/day for QueryMap The next step in scaling the design is to shard the inputs and outputs. To divide the incoming query logs and click logs into multiple streams.