Introduction to Non-Abstract Large Design Systems

Non-Abstract Large Design Systems

YURY NIÑO ROA Cloud Infrastructure Engineer Chaos Engineering Advocate @Google
www.yurynino.com yurynino

AGENDA How NALSD? Use Case Introduction Systems Design Fundamentals What
is NALSD?

REFERENCES

What do you look in an SRE? Automation Curiosity

NALSD IN DETAIL Google SREs are expected to be able
to start resource planning with a basic whiteboard diagram of a system, think through the various scaling and failure domains, and focus their design into a concrete proposal for resources.

Iterative style for designing and implementing systems. NON ABSTRACT LARGE
SYSTEM DESIGN WHAT IS NALSD? SRE Ability to assess, design, and evaluate large systems. Robust and scalable designs with low operational costs.

Google has learned (the hard way) that the people designing
distributed systems need to develop and continuously exercise the muscle of design into concrete estimates of resources at multiple steps in the process. WHY NALSD?

NALSD is a critical skill for SREs. In NALSD, we
consider how to design large systems for reliability, resilience, and efficiency. NALSD is not only used when building a new system, but also when systems need to be changed. Focus on building experience and judgment, not simply more algorithms.

We can’t talk about designing for reliability and SRE without
touching on non-abstract, large system design. At Google, we found that addressing reliability issues during the design phase reduces future costs!

Consider running our entire application on a single computer. One
Machine Now we’ll need multiple machines, what’s the best design to join them? Distributed System Basic Design Phase * Is it possible? * Can we do better? Basic Design Phase * Is it feasible? * Is resilient? * Is it resilient? Design Process NALSD DESIGN PROCESS * Read & Understand * Required SLOs * Ask that you consider Initial Requirements

BEFORE TO BEGIN Load Balancing Data Partitioning Proxies Caching Indexes
Redundancy Replication SQL vs NoSQL Consistent Hashing CAP Theorem PACELC Theorem Bloom Quorum Leader and Follower

Consistent Core Follower Readers Generation Clock Gossip Dissemination HeartBeat Hybrid
Clock Idempotent Receiver State Watch Quorum SYSTEMS DESIGN PATTERNS https://martinfowler.com/articles/patterns-of-distributed-systems/

BEFORE TO BEGIN https://danrl.com/sre-flash-cards/SRE%20Flash%20Cards.pdf ‘The numbers everyone should know’ Time
Main Memory Reference Time Round trip within same datacenter Power of ten? ns / us / ms Speed Read sequentially from SSD From: https://cloud.google.com/blog/products/manage ment-tools/sre-principles-and-ﬂashcards-to-design- nalsd Time Read 1 MB sequentially from memory

USE CASE • The Google AdWords service displays text advertisements
on Google Web Search. • The click-through rate (CTR) metric tells advertisers how well their ads are performing. CTR = # clicks on the announcement # times that the announcement is shown AdWords Challenge Design a system capable of measuring and reporting an accurate CTR for every AdWords ad.

DESIGN PROCESS Is it possible? If we didn’t have to
worry about enough RAM, CPU, network bandwidth, and so on, what would we design to satisfy the requirements? Can we do better? If the design solves the problem in O(N) time, can we solve it more quickly—say, O(ln(N))? CTR: the number of clicks divided by the number of impressions.

DESIGN PROCESS Next phase, we try to scale up our
basic design Is it feasible? Is it possible to scale this design, given constraints on HW? What distributed design would satisfy the requirements? Is it resilient? Can the design fail gracefully? What happens when this component fails? How does the system work when fails? Can we do better? CTR: the number of clicks divided by the number of impressions.

INITIAL REQUIREMENTS Each advertiser may have multiple advertisements. Each ad
is keyed by ad_id and is associated with a list of search terms selected by the advertiser. * How often this search term triggered this ad to be shown? * How many times the ad was clicked by someone who saw the ad? * With this information, we can calculate the CTR CTR: the number of clicks divided by the number of impressions.

• We know our advertisers care about two things: ◦
That the dashboard displays quickly! ◦ That the data is recent. Therefore, we will consider our requirements in terms of SLOs: • 99.9% of dashboard queries complete in < 1 second. • 99.9% of the time, the CTR data displayed is less than 5 minutes old. INITIAL REQUIREMENTS

For every web search query, we log The TIME the
query occurred A QUERY_ID unique identifier An AD_ID The AD IDs of THE AdWords advertisements shown for the search A SEARCH_TERM the query content ONE MACHINE

Calculations TIME 64-bit integer, 8 bytes QUERY_ID 64-bit integer, 8
bytes An AD_ID 3 64-bit integer, 24 bytes A SEARCH_TERM A long string, up to 500 bytes ONE MACHINE

We will round up to treat each query log entry
as 2 KB. Click log volume should be considerably smaller than query log volume: because the average CTR is 2% (10,000 clicks / 500,000 queries) Remember that we chose big numbers to illustrate that these principles scale to arbitrarily large implementations. ONE MACHINE

The volume of query logs generated in a 24-hour period::
* (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (2 × 103 bytes) = 86.4 TB/day 100TB/day -- A common 4 TB HDD sustains 200 input/output operations per second (IOPS): * (5 × 105 queries/sec) / (200 IOPS/disk) = 2.5 × 103 disks or 2,500 disks * (100 TB) / (64 GB RAM/machine) = 1,563 machines ONE MACHINE

We can not we reasonably support our SLOs if one
of these components fails. One-machine design looks unfeasible EVALUATION

DISTRIBUTED SYSTEM * We can process and join the logs
with MapReduce. * We can grab the accumulated query logs and click logs. MapReduce works as a batch processor: its inputs are a large data set, and it can use many machines to process that data via workers and produce a result. Unfortunately, this type of batch process can’t meet our SLO of joined log availability within 5 minutes of logs being received. EVALUATION

DISTRIBUTED SYSTEM What if we loop over the click logs
and pull in the specific queries referenced. We’ll call this component the LogJoiner LogJoiner takes a continuous stream of data from the click logs, joins it with the data in QueryStore, and then stores that information, organized by ad_id. Once the queries that were clicked on are stored and indexed by ad_id, we have half the data required to generate the CTR dashboard. We will call this the ClickMap, because it maps from ad_id to the clicks. LogJoiner

ITERATE LogJoiner

DISTRIBUTED SYSTEM The amount of network throughput LogJoiner needs to
process the logs: * (104 clicks/sec) × (2 × 103 bytes) = 2 × 107 = 20 MB/sec = 160 Mbps -- * 3 × (5 × 105 queries/sec) × (8.64 × 104 seconds/day) × (8 bytes + 8 bytes) = 2 × 1012 = 2 TB/day for QueryMap The next step in scaling the design is to shard the inputs and outputs. To divide the incoming query logs and click logs into multiple streams.

THE FINAL DESIGN

https://sre.google/classroom/

Introduction to Non-Abstract Large Design Systems

Introduction to Non-Abstract Large Design Systems

More Decks by Yury Nino

Other Decks in Technology

Featured

Transcript