Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zen and the Art of Storage Services

Hakka Labs
January 16, 2015

Zen and the Art of Storage Services

Hakka Labs

January 16, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. “Given how robust the messenger is on day one, it’s

    surprising to learn that Pinterest built the entire product 
 in three months.” — The Verge
  2. What does it take to do this consistently? Many different

    things •Hire the best •Culture •Focus Infrastructure that doesn’t get in the way
  3. Our approach •Make it part of the team mission statement

    •Design systems with ‘move fast’ in mind •Separation of concerns: feature vs reliability
  4. Persistent Storage Even with a distributed database, app needs to

    deal with: •Schema design •Fault tolerance •Capacity management •Performance tuning
  5. Solution 1: UserMetaStore Storage-as-a-Service: Key-value thrift API on top of

    HBase Features: •Key partitioning to balance load •Master-slave clusters, semi automatic failover •Speculative execution •Multi-tenancy with traffic isolation
  6. Realization •These object models closely resemble a graph •Objects are

    nodes, edges represent relationships •Typical needs: • retrieve data for a node or edge • get all outgoing edges from a node • get all incoming edges from a node • count incoming or outgoing edges for a node
  7. Enter Zen! •Provides a graph data model instead of key-value

    •Automatically creates necessary indexes •Materializes counts for efficient querying •Implemented on top of HBase, but can plug in other backends
  8. What Zen is NOT •NOT a full fledged graph database

    •NO advanced graph operations •Basically an object-relationship data model on top of existing databases to simplify app development
  9. Zen API Nodes: • addNode, removeNode, getNode • Node id:

    globally unique 64-bit integer
 ID 123 Prop 1 Val 1 Prop 2 Val 2
  10. Zen API Edges: • addEdge, removeEdge, getEdge • Edge Ref:

    (edgeType, fromId, toId) • Score for ordering Edge Ref 120, 123, 4567 Prop 1 Val 1 Prop 2 Val 2
  11. Zen API Edge Queries: • getEdges, countEdges, removeEdges struct EdgeQuery

    {! ! 1: required NodeId nodeId;! ! 2: required EdgeDirection direction;! ! 3: optional TypeId edgeType;! ! }
  12. Zen API Property Indexes •Unique index •Ensures a property value

    is unique across all nodes of a type •Non-unique index •Allows retrieval by property value •Works for both nodes and edges
  13. Zen API Type System •Declare node and edge types •Specify

    type schema, e.g. unique and non-unique index properties •Fully online: no deploy, no config •Internally implemented on top of Zen itself!
  14. Illustration: Messages on Zen Id:1234 Id:2345 Id:3456 Type: Participates Type:

    Contains Type: Conversation Started: 12 Aug 2014 08:00 Header: “Great pin!” Pin Id: 10001 [non-unique] Type: User Name: “Ben Smith” [unique] Status: Active Type: Message Sent: 12 Aug 2014 08:00 Text: “Great pin!”
  15. Zen: Current Usage Products: • smart feed, messages, network news,

    interest graph and other upcoming features ! Numbers: • ~10 clusters • 100,000+ requests per second at peak • Over 5 million HBase operations per second
  16. Zen Backends •HBase backend implemented in fall 2013 •Currently working

    on MySQL backend •Other potential backends in future
  17. HBase Data Model Overview Data col1 col2 col3 col4 row-key-1

    val1 val2 row-key-2 val3 val4 row-key-3 val5
  18. Zen - Property Data type name score distance 12345 (node)

    10 Ben Smith 12345-20-67890 (edge) 1000 1 mile
  19. New Features •Online type schema change •Optional reverse edge •Optional

    edge count •Retrieval of subset of properties •Descending edge score
  20. Performance Work Demanding work load needs special tuning • Inserting

    1 million edges per second • Excessive HLog (WAL) flushes
  21. Performance Work Batching • Client Side Batching — bulk edge

    insertion • Zen Server Side Batching — buffer edits across clients & flush together • Reduced HLog (WAL) flushes by orders of magnitude
  22. Performance Work Memory v.s. Performance • Bloom filter • reduce

    disk seeks • memory cost: 1 byte per row • Block size • the smaller block size the better random access performance • memory cost: bigger index size
  23. Performance Work CPU v.s. Data Size • Encoding • FAST_DIFF:

    effective in reducing data size, cpu intensive • PREFIX: less effective in size reduction, less cpu intensive • Compression • SNAPPY, LZO, GZ, etc
  24. Performance Work Capability to tune storage engine per special load

    Zen production setup • Dedicated Zen cluster • Namespace in shared Zen cluster
  25. Data Consistency Add an edge 1. CAS create the edge

    row and properties 2. CAS create the unique index if any 3. Create non-unique index if any 4. Create edge score index for outgoing direction 5. Create edge score index for incoming direction 6. Increment edge count for outgoing direction 7. Increment edge count for incoming direction
  26. Data Consistency Stay on top of data inconsistencies • Manual

    rollback in Zen server • Offline jobs (Dr Zen) to scan and fix inconsistencies • Tools to debug and fix one-off inconsistency
  27. Future Work •Dr Zen (make it more efficient) •Other backends:

    MySQL, etc •Distributed transactions •Open source!