Zen and the Art of Storage Services

Raghavendra Prabhu (RVP) Zen: Pinterest’s Graph Storage Service

“Given how robust the messenger is on day one, it’s
surprising to learn that Pinterest built the entire product   in three months.” — The Verge

What does it take to do this consistently? Many diﬀerent
things •Hire the best •Culture •Focus Infrastructure that doesn’t get in the way

Challenge for Infrastructure

Our approach •Make it part of the team mission statement
•Design systems with ‘move fast’ in mind •Separation of concerns: feature vs reliability

Persistent Storage Even with a distributed database, app needs to
deal with: •Schema design •Fault tolerance •Capacity management •Performance tuning

Solution 1: UserMetaStore Storage-as-a-Service: Key-value thrift API on top of
HBase Features: •Key partitioning to balance load •Master-slave clusters, semi automatic failover •Speculative execution •Multi-tenancy with traﬃc isolation

Storage-as-a-service is a great step forward, but can we do
better?

Example: Messages Data Model Conversation Message 1 Message 2 Message
N User User Participates Contains

Realization •These object models closely resemble a graph •Objects are
nodes, edges represent relationships •Typical needs: • retrieve data for a node or edge • get all outgoing edges from a node • get all incoming edges from a node • count incoming or outgoing edges for a node

Enter Zen! •Provides a graph data model instead of key-value
•Automatically creates necessary indexes •Materializes counts for eﬃcient querying •Implemented on top of HBase, but can plug in other backends

What Zen is NOT •NOT a full ﬂedged graph database
•NO advanced graph operations •Basically an object-relationship data model on top of existing databases to simplify app development

Zen API Nodes: • addNode, removeNode, getNode • Node id:
globally unique 64-bit integer  ID 123 Prop 1 Val 1 Prop 2 Val 2

Zen API Edges: • addEdge, removeEdge, getEdge • Edge Ref:
(edgeType, fromId, toId) • Score for ordering Edge Ref 120, 123, 4567 Prop 1 Val 1 Prop 2 Val 2

Zen API Edge Queries: • getEdges, countEdges, removeEdges struct EdgeQuery
{! ! 1: required NodeId nodeId;! ! 2: required EdgeDirection direction;! ! 3: optional TypeId edgeType;! ! }

Zen API Property Indexes •Unique index •Ensures a property value
is unique across all nodes of a type •Non-unique index •Allows retrieval by property value •Works for both nodes and edges

Zen API Type System •Declare node and edge types •Specify
type schema, e.g. unique and non-unique index properties •Fully online: no deploy, no conﬁg •Internally implemented on top of Zen itself!

Illustration: Messages on Zen Id:1234 Id:2345 Id:3456 Type: Participates Type:
Contains Type: Conversation Started: 12 Aug 2014 08:00 Header: “Great pin!” Pin Id: 10001 [non-unique] Type: User Name: “Ben Smith” [unique] Status: Active Type: Message Sent: 12 Aug 2014 08:00 Text: “Great pin!”

Zen: Current Usage Products: • smart feed, messages, network news,
interest graph and other upcoming features ! Numbers: • ~10 clusters • 100,000+ requests per second at peak • Over 5 million HBase operations per second

Xun Liu Internals and Production Learnings

Zen Backends •HBase backend implemented in fall 2013 •Currently working
on MySQL backend •Other potential backends in future

HBase Data Model Overview Data

HBase Data Model Overview Data col1 col2 row-key-1 val1 val2

HBase Data Model Overview Data col1 col2 col3 row-key-1 val1
val2 row-key-2 val3 val4

HBase Data Model Overview Data col1 col2 col3 col4 row-key-1
val1 val2 row-key-2 val3 val4 row-key-3 val5

Zen - Property Data type name score distance 12345 (node)
10 Ben Smith 12345-20-67890 (edge) 1000 1 mile

Zen - Property Index Data ID <hash>-unique-10-name=ben smith 12345 <hash>-nonuniq-10-lastname=smith-12345
<hash>-nonuniq-10-lastname=smith-67890

Zen - Edge Score Index Data 12345-out-20-1000-67890 12345-out-20-1001-67891 12345-in-30-990-67892 12345-in-30-991-67893

Zen - Edge Count Data Count 12345-out-20 2 12345-in-30 4

Status - Soft Delete New Features

Built-in Cache New Features Zen Cache HBase Client Zen HBase
Client Cache Before After

Namespace New Features Node Namespace 1 Edge Index Node Namespace
2 Edge Index

New Features •Online type schema change •Optional reverse edge •Optional
edge count •Retrieval of subset of properties •Descending edge score

Performance Work Demanding work load needs special tuning • Inserting
1 million edges per second • Excessive HLog (WAL) ﬂushes

Performance Work Batching • Client Side Batching — bulk edge
insertion • Zen Server Side Batching — buffer edits across clients & flush together • Reduced HLog (WAL) flushes by orders of magnitude

Performance Work Memory v.s. Performance • Bloom ﬁlter • reduce
disk seeks • memory cost: 1 byte per row • Block size • the smaller block size the better random access performance • memory cost: bigger index size

Performance Work CPU v.s. Data Size • Encoding • FAST_DIFF:
eﬀective in reducing data size, cpu intensive • PREFIX: less eﬀective in size reduction, less cpu intensive • Compression • SNAPPY, LZO, GZ, etc

Performance Work Capability to tune storage engine per special load
Zen production setup • Dedicated Zen cluster • Namespace in shared Zen cluster

Data Consistency Add an edge 1. CAS create the edge
row and properties 2. CAS create the unique index if any 3. Create non-unique index if any 4. Create edge score index for outgoing direction 5. Create edge score index for incoming direction 6. Increment edge count for outgoing direction 7. Increment edge count for incoming direction

Distributed transaction or not?

Data Consistency Stay on top of data inconsistencies • Manual
rollback in Zen server • Offline jobs (Dr Zen) to scan and fix inconsistencies • Tools to debug and fix one-off inconsistency

Future Work •Dr Zen (make it more eﬃcient) •Other backends:
MySQL, etc •Distributed transactions •Open source!

Zen and the Art of Storage Services

Zen and the Art of Storage Services

More Decks by Hakka Labs

Other Decks in Programming

Featured

Transcript