Pairing MongoDB with NAND Flash for Real Time Results

Slide 1

Slide 1 text

D E C E M B E R 9 , 2 0 1 1

Slide 2

Slide 2 text

PAIRING MONGODB WITH NAND FLASH FOR REAL TIME RESULTS Greg Pendler Produc(on and Opera(ons Manager Dale Russell Technical Consultant Brian Knox Service Delivery Data Architect S P E A K I N G T O D A Y

Slide 3

Slide 3 text

3 Kontera: Real-time Analytics in a Flash December 2011 Greg Pendler December 9, 2011

Slide 4

Slide 4 text

4 December 9, 2011 CONTEXT. INTEREST. RESULTS. Premium Sites Premium Brands

Slide 5

Slide 5 text

5 December 9, 2011 WRITE BOTTLENECK FOR NEW PAGE ANALYSIS MongoDB 1 2 Content matching and persistent matching recipe New Page Detected 3 Appropriate context-based ads delivered quickly Reading Content Page

Slide 6

Slide 6 text

6 December 9, 2011 THE IMPORTANCE OF FAST RESPONSE •  Internet traffic spikes •  Seconds can mean many lost click-through opportunities 1 hour 1 day 1 week 1 month 1 year Traffic Time after posting

Slide 7

Slide 7 text

7 December 9, 2011 HIGH PERFORMANCE CONTENT DELIVERY Application Layer Caching Layer (Memcache) Database/Analysis Layer (MongoDB) Internet

Slide 8

Slide 8 text

8 December 9, 2011 INITIAL ATTEMPTS iSCSI initiator iSCSI target 6 local 15K SAS drives Dell R710, 72GB Memory

Slide 9

Slide 9 text

9 December 9, 2011 MISSION-CRITICAL DATA NEEDS ENTERPRISE QUALITY SOLID-STATE Implemented a competing “enterprise” solid-state product before Fusion-io •  50% failure rate in 90 days •  SSD even died during migration to Fusion-io •  ioDrives running around 4 months without a hiccup

Slide 10

Slide 10 text

10 December 9, 2011 SPEEDING WRITES WITH FUSION-IO •  Store the 360GB database on 640 GB ioDrives (gives room to grow) •  3 slaves in different geographic locations for redundancy

Slide 11

Slide 11 text

11 December 9, 2011 KONTERA SYNAPSE ANALYTICS SYSTEM Loca2on 1 MongoDB MongoDB 640GB ioDrive 640GB ioDrive Master Slave Loca2on 2 MongoDB MongoDB 640GB ioDrive 640GB ioDrive Slave Slave

Slide 12

Slide 12 text

12 December 9, 2011 THE RESULTS •  Queries that used to take hours, now take seconds •  No fear of memcached layer interruption –  Fusion Powered MongoDB handled reboot of memcached layer flawlessly

Slide 13

Slide 13 text

AnalyOcs + AQribuOon = AcOonable Insights MongoDB with Fusion-io: Go Vertical… December 2011 Brian Knox Dale Russell 13 December 9, 2011

Slide 14

Slide 14 text

14 December 9, 2011 GETTING TO REAL TIME STREAMING §  How do you ingest, store, and report §  With lots of data if you wait to process after you are too late §  Rsyslog + ZeroMQ + AK secret sauce §  Convert syslog into event message routing system §  ZeroMQ takes events, turns into objects for object rating system §  90% of decisions about what you want to do with data happens before you get to database

Slide 15

Slide 15 text

15 December 9, 2011 PUB-SUB ZEROMQ INTEGRATION THIS IS COOL! Explore a tap into the event stream. §  New algorithm? Put ZeroMQ on the front and start listening. §  Need a few minutes worth of events as they come in? Connect and take what you need. NO MORE! Going off to the log server, finding the logs, parsing them, breaking them up, etc. TAP AND GO!

Slide 16

Slide 16 text

REAL TIME STREAMING ANALOGY 16 December 9, 2011

Slide 17

Slide 17 text

17 December 9, 2011 BASICS TO A BILLION 1.  Events are generated at the edge by incoming http requests 2.  Events are sent from the edge servers over Rsyslog 3.  Events are forwarded over Rsyslog to an event ventilator (another Rsyslog instance) 4.  The ventilator balances work across a cluster of "enrichment" processes 5.  The enrichment processes pull in additional information about the event from a Mongodb cluster running on Fusion-io 6.  The enriched events are sent to a "summarizer" (the AK secret sauce) that does all sorts of real time aggregation and correlation 7.  The secret sauce data structures generated by the summarizer are sent to a modified PostgreSQL database 8.  The front end can generate reports on the fly (no need for batch) in great detail off the distilled data Searching 1 Billion Documents

Slide 18

Slide 18 text

ARCHITECTURE CHART 18 December 9, 2011 HTTP Servers Enrichers Report Servers Event Router Report Database Summarizer Enrichment Store

Slide 19

Slide 19 text

MONGODB CLUSTER S1M S2M S3M S4M S5M S6M S7M S8M S9M S10M S11M S12M S13M S14M S15M S16M S1S S2S S3S S4S S5S S6S S7S S8S S9S S10S S11S S12S S13S S14S S15S S16S LDR LDR LDR LDR S1A S2A S3A S4A S5A S6A S7A S8A S8A S10A S11A S12A S13A S14A S15A S16A CONFIGDB LDR LDR LDR LDR S9A S10A S11A S12A S13A S14A S15A S16A S1A S2A S3A S4A S5A S6A S7A S8A CONFIGDB 02_OLTP UI_OLTP CONFIGDB 19 December 9, 2011

Slide 20

Slide 20 text

20 December 9, 2011 WHY FUSION-IO? Before Fusion-io: §  Reporting infrastructure was a large PostgreSQL data warehouse: events came in, got shoved into the db, reports were generated by a batch report system §  Doing unique user stats across large numbers of dimensions could take hours Why Fusion-io? §  Fusion-io gives low latency when correlating in band data with out of band data sources on the fly, which allows streaming data enrichment §  Possibility to run streaming algorithms against the incoming pre-enriched data §  Reporting databases can run on relatively cheap hardware on local disks Example: A report that took 40 hours to run now can be run in a few seconds §  Possibility to report so quickly off of Postgres on relatively commodity hardware, whereas before was a large data warehouse on an expensive SAN

Slide 21

Slide 21 text

TEST HARDWARE AND PARAMETERS §  HP DL580R07 CTO Chassis §  HP X7542 DL580 G7 2P Fusion-io kit §  HP 8GB (1x8GB) Dual Rank x4 PC3-10600 (DDR3-1333) Registered CAS-9 Memory Kit (total = 256GB) §  HP 146GB 15K 6G 2.5 SAS DP HDD x 8 §  Local disk test –  Six 15K/RPM drives in RAID-10 §  Fusion-io test –  Two 640GB ioDrive Duos §  1 billion records – long integer pkey §  4 shards §  Randomly generate ID to search §  Find and pull back the records §  32 processes §  15 minute test time 21 December 9, 2011

Slide 22

Slide 22 text

LOCAL DISK LOOKUP LESS THAN 1 MILLION §  734,687 transactions §  Average TPS 816 §  Poor latency and response time §  Not even 1 million transactions in 15-min test window Response Time Transactions per Second 22 December 9, 2011

Slide 23

Slide 23 text

FUSION-IO LOOKUP 57 MILLIONS Response Time Transactions per Second §  57,307,403 transactions §  Average TPS 63,674 §  Near memory speed §  Ramps to full rate in about 30 sec §  Fast §  Stable and predictable §  100x scale 23 December 9, 2011

Slide 24

Slide 24 text

ACHIEVING REAL TIME RESULTS §  78x average throughput increase using Fusion-io vs. traditional storage §  Fusion-io enabled sub millisecond search times on 1 billion documents §  Fusion-io allows us to scale vertically before having to go horizontal §  Fusion-io gives us predictable scalability with reliable response times 24 December 9, 2011

Slide 25

Slide 25 text

THANK YOU [email protected] [email protected] [email protected] C O N T A C T U S