Polyglot Persistence: Riak + PostgreSQL

Polyglot Persistence Riak + PostgreSQL NYC PostgreSQL User Group September
20, 2012

$ whoami • Tom Santero • @tsantero • [email protected]

My History with Tech 2012 2010 2008 2006 2004 2002
2000 1998 1996

2000 1998 1996 1995: First Computer

2000 1998 1996 1997: I learned HTML

2000 1998 1996 1998: JavaScript

2000 1998 1996 1999: Linux, Bash, C

2000 1998 1996 2001: PHP + MySQL on Apache

2000 1998 1996 2009+: Automated FX trading

Down the path to Polyglot Persistence

My Ticker Plant Objectives: 1. Capture and Store Forex historical
tick data 2. Use this data for backtesting trading strategies 3. Store results of backtesting

My Ticker Plant Solution #1:

My Ticker Plant Solution #1: MySQL

My Ticker Plant Solution #1: MySQL _id INTEGER NOT_NULL bid
VARCHAR(8) NOT_NULL ask VARCHAR(8) NOT_NULL quote VARCHAR(8) NOT_NULL timestamp DATETIME NOT_NULL EUR/USD 1. Modeled data

VARCHAR(8) NOT_NULL ask VARCHAR(8) NOT_NULL quote VARCHAR(8) NOT_NULL timestamp DATETIME NOT_NULL EUR/USD 1. Modeled data 2. Went live

VARCHAR(8) NOT_NULL ask VARCHAR(8) NOT_NULL quote VARCHAR(8) NOT_NULL timestamp DATETIME NOT_NULL EUR/USD 1. Modeled data 2. Went live 3. Lasted about 1 week

commence intensive googling

What I Quickly Discovered • I bit o! more than
I can chew • this was a widely discussed issue • relational databases are ill suited for analysis of time-series data • while I wasn’t paying attention, a renaissance in data storage tech emerged

In the meantime... Solution #2:

In the meantime... Solution #2: !at "les

In the meantime... Solution #2: !at "les tick_data

In the meantime... Solution #2: !at "les tick_data EUR_USD EUR_JPY
GBP_USD GBP_EUR GBP_JPY

GBP_USD GBP_EUR GBP_JPY 2010 2009 2008 2007 2006

GBP_USD GBP_EUR GBP_JPY 2010 2009 2008 2007 2006 12 11 10 09 08

GBP_USD GBP_EUR GBP_JPY 2010 2009 2008 2007 2006 12 11 10 09 08 30 29 28 27

GBP_USD GBP_EUR GBP_JPY 2010 2009 2008 2007 2006 12 11 10 09 08 30 29 28 27 2300.csv 2200.csv 2100.csv

The Datastorage Landscape RavenDB MongoDB MySQL OrientDB Redis Cassandra Project
Voldemort Oracle CouchDB HBase Neo4j Riak SQL Server Memcached BerkeleyDB ad in"nitum....

The Problem Is Choice

When Choosing a DB • What am I trying to
accomplish? • What might I want to change in the future? • Which is the best tool for the job? • What does XYZ data store do well and why?

RDBMS Key/Value Document Column Graph PostgreSQL Oracle MySQL SQL Server
Riak Voldemort Redis Cabinet CouchDB MongoDB RavenDB Cassandra HBase Neo4j OrientDB

USE ALL THE DATABASES!

Polyglot Persistence • Big tool bag with lots of shiny
new tools • Not every tool can do every job • You have to do a lot of research • You have to do a lot of prototyping • This may ruin your marriage

Riak Overview • distributed, key/value store • masterless, peer to
peer replication • written primarily in Erlang • open source (Apache 2.0) http://github.com/basho/riak

History of Riak • Internal project at Basho in 2007
• custom datastore for Basho’s SaaS • Basho pivots, open sourced Riak • September 2011 Riak turned 1.0 • Currently run in production by 1000s • Basho sells commercial extensions to Riak

Design Goals • High-Availability • Low-Latency • Horizontal Scalability •
Fault-Tolerance • Ops Friendly • Predictability

A Closer Look

Data Model • store values against keys • keys are
grouped into namespaces called buckets • basic operations: GET, PUT, DELETE • content-agnostic • accepts any datatype (JSON, XML, JPEG...) • riak objects are stored on disk as binaries

Interfaces + Libraries • HTTP & Protocol Bu!ers • Erlang,
Ruby, Java, Python, PHP, OCaml, C, Squeak, Haskell, Lisp, Go, .NET, etc..

Riak is Distributed node node node node node

Dealing with the Network • replicas + consistent hashing •
virtual nodes • request quorums • hando! and gossip protocols

client request N = 3 replicas R1 R3 R2 coordinating

Quorum Requests • N - replication factor • R -
read quorum • W - write quorum • PR/PW - primary read/write • DR/DW - durable read/write

Consistent Hashing

Consistent Hashing • 160-bit integer keyspace 0 2160/2 2160/4

Consistent Hashing • 160-bit integer keyspace • divided into "xed
number of evenly-sized partitions 32 partitions 0 2160/2 2160/4

number of evenly-sized partitions • partitions are claimed by nodes in the cluster 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4

number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3

number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 hash(“meetups/NYC-postgres”) N=3

Anatomy of a Request get(“meetups/NYC-postgres”)

Anatomy of a Request get(“meetups/NYC-postgres”) client Riak

Anatomy of a Request get(“meetups/NYC-postgres”) Get Handler (FSM) client Riak

hash(“meetups/NYC-postgres”) == 10, 11, 12

hash(“meetups/NYC-postgres”) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring

get(“meetups/NYC-postgres”) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring

Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2

Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 v1

R=2 v1 v2

R=2 v2 v2

Anatomy of a Request get(“meetups/NYC-postgres”) v2

Partition Hando! • Ownership Hando! • administrative: adding/removing nodes •
dynamic rebalancing of data • Hinted Hando! • failure scenarios: fallbacks

Disaster Scenario

Disaster Scenario • node fails X X X X X
X X X

Disaster Scenario • node fails • requests go to fallback
X X X X X X X X hash(“meetups/NYC-postgres”)

• node comes back hash(“meetups/NYC-postgres”)

• node comes back • hinted hando# - data returns to recovered node hash(“meetups/NYC-postgres”)

• node comes back • hinted hando# - data returns to recovered node • normal operations resume hash(“meetups/NYC-postgres”)

Persistence + Queries

Append-Only Storage • pluggable backend architecture • Bitcask • LevelDB
• Memory • all writes are appends to "le • tradeo!: periodic, background compaction

Other Ways To Query • Map/Reduce • 2 phase distributed
computation • Riak Search • distributed, full-text search • Secondary Indexes (2i) • “tag” objects

Eventual Consistency

Issues with EC • Simultaneous Writes • client a: some_key
= ‘foo’ • client b: some_key = ‘bar’ • Simultaneous Reads + Writes • which value will the client read? • Semantic Resolution

When To Use Riak • When you have more data
than reasonable for 1 physical machine • HA requirements • availability more important than consistency • latency requirements • data easily "ts key/value model

Keys with Low Churn • pro"le data • session storage
• logs • media • metadata storage

Riak + PostgreSQL

Dead Simple Example PostgreSQL e-commerce website session data inventory shopping
cart millions of concurrent users

Requirements? • High-Availability / Low-Latency • session data • inventory
• Strict Consistency • order processing • inventory control

Redesign User Proﬁles Inventory Shopping Cart Riak Session Storage PostgreSQL
Order History Inventory Management Order Processing

Session Storage • Enable “multi_backend” on Riak • store session
data in bitcask • “sessions” bucket • set key expirey (example: 86400) • application knows your key

User Pro"les • store in “customers” bucket • use LevelDB
backend • can’t use predictable keys, 2i • share customer_id with PostgreSQL “customers” table

Inventory • JSON documents for product descriptions, reviews, etc •
enable Riak Search for full-text searching • store images directly in Riak • maintain an inventory count in PostgreSQL • use matching product_id’s • every* hit to Riak = hit to Postgres • easy schema updates

Order Processing • reliable inventory control • no duplicate orders
• analytics

Inventory Lookup Riak PostgreSQL Find Items:

Inventory Lookup Riak PostgreSQL Find Items: led televisions

Inventory Lookup Riak PostgreSQL Find Items: led televisions Riak Search:
Found 37 Results

Inventory Lookup Riak PostgreSQL Find Items: led televisions Riak Search:
Found 37 Results Map/Red: return product_id’s

Inventory Lookup Riak PostgreSQL Find Items: led televisions SELECT [list
of product_id’s] FROM inventory...

Inventory Lookup Riak PostgreSQL Find Items: led televisions

Inventory Lookup Riak PostgreSQL Find Items: led televisions results

Other Examples • Major Insurance Company • initiative for customer
centric apps • unstructured data in Riak, else: PG • kiip - platform for in-game rewards • K/V data in Riak • rich queries in PostgreSQL • http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb

Further Reading Eric Redmond @coderoshi Jim R. Wilson @hexlib Pramod
J. Sadalage @pramodsadalage Martin Fowler @martinfowler Mathias Meyer @roidrage

Questions? @tsantero

Polyglot Persistence: Riak + PostgreSQL

Polyglot Persistence: Riak + PostgreSQL

More Decks by Tom Santero

Other Decks in Technology

Featured

Transcript