Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

The Facebook Cache Infrastructure Scaling in-memory data stores Yannick Gingras PyCon Canada – 2013-08-11

Slide 3

Slide 3 text

What is Cache? Why do we cache?

Slide 4

Slide 4 text

Reasons for caching ▪ Almost 1000:1 read/write ratio ▪ Extreme data dependency

Slide 5

Slide 5 text

Memcache: a key value store

Slide 6

Slide 6 text

Memcache Memcache is deployed as a demand-filled look-aside key-value cache.

Slide 7

Slide 7 text

Memcache Data is shared across multiple servers using consistent hashing. Failed hosts are replaced with hot spares.

Slide 8

Slide 8 text

Speed of light is slow Note: not the actual data center presence map 140 ms

Slide 9

Slide 9 text

Our Memcache syncs via MySQL replication

Slide 10

Slide 10 text

Memcache Some numbers ▪ Thousands of servers ▪ > 1G Ops/s ▪ >1T items ▪ 98.1% hit rate in “wildcard” ▪ ~90% hit rate in “regional” ▪ <50% hit rate in “pyk”

Slide 11

Slide 11 text

Python for Memcache scaling

Slide 12

Slide 12 text

Python for Memcache mcconf ▪ Short deployment cycle ▪ Pool management: allocation, resizing ▪ Spare selection based on hardware requirements ▪ Template-based region bootstrapping ▪ Cluster maintenance and decommission

Slide 13

Slide 13 text

Python for Memcache Adaptive deployments: mcroll and mcpush ▪ Software upgrades ▪ Cold rolls / cache flushing ▪ Rated are adaptive based on health metrics ▪ Global parallelism logic

Slide 14

Slide 14 text

The social graph: nodes and edges

Slide 15

Slide 15 text

The graph data model

Slide 16

Slide 16 text

TAO: Associations and Objects

Slide 17

Slide 17 text

TAO TAO is a two-level read-through, write-through cache. TAO is aware of graph semantic and supports structured queries.

Slide 18

Slide 18 text

TAO Some numbers ▪ thousands of machines ▪ >1G Ops/s ▪ 97.5% hit rate on followers

Slide 19

Slide 19 text

Python for TAO

Slide 20

Slide 20 text

TAO – a fun story

Slide 21

Slide 21 text

TAO – a fun story

Slide 22

Slide 22 text

Python for TAO Shard splitting / replication ▪ Extension of consistent hashing ▪ Based on client machine ID ▪ Wired with the invalidation pipeline Shard placement ▪ Two-level load distribution ▪ Hash table of hot shards mapped to cold servers ▪ Falls back to consistent hashing if shards are not placed ▪ Candidate shards and destinations are identified by Python services

Slide 23

Slide 23 text

TAO – another story PHP Arrays “An array in PHP is actually an ordered map. A map is a type that associates values to keys.” – http://php.net/manual/en/language.types.array.php Python dictionaries “Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. […] It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary).” – http://docs.python.org/2/tutorial/datastructures.html

Slide 24

Slide 24 text

More Python in the Cache infrastructure ▪ FBAR – auto-remediation engine ▪ Tupperware – job engine used for invalidation pipeline ▪ thrift – language-agnostic service layer, enables many Python clients http://thrift.apache.org/ ▪ Dataswarm – Python frontend to our data warehouse ▪ fbdeploy – job supervisor and BitTorrent deployment

Slide 25

Slide 25 text

The Facebook Cache: wrapping up

Slide 26

Slide 26 text

Further reading ▪ Memcache public page: https://www.facebook.com/MemcacheAtFacebook ▪ Memcache paper: http://bit.ly/fb-memcache-paper ▪ TAO public note: http://bit.ly/tao-blog-post ▪ TAO Paper: http://bit.ly/fb-tao-paper

Slide 27

Slide 27 text

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Slide 28

Slide 28 text

Memcache: advanced topics

Slide 29

Slide 29 text

Memcache Thundering herd problem – leases Database Memcache Web Server Web Server Web Server

Slide 30

Slide 30 text

Memcache Read after write semantic – remote markers Replica DB Memcache Web Server Master DB 2. Write to master 3. Delete from memcache 5. Delete remote marker 4. Mysql replication 1. Set remote marker

Slide 31

Slide 31 text

Memcache Aggregated deletes • Reduce packet rate by 18x. MC MC MC Aqueduct DB Aqueduct DB Aqueduct DB MC MC MC MC Memcache Routers Memcache Routers MC MC MC MC Memcache Routers Memcache Routers

Slide 32

Slide 32 text

TAO: advanced topics

Slide 33

Slide 33 text

Other TAO Topics ▪ TACO ▪ CCW via failover ▪ Two-level cache provide read after write semantic ▪ Two-level cache shields against thundering herd

Slide 34

Slide 34 text

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0