Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling iParadigms with PostgreSQL

Richard Yen
November 03, 2010

Scaling iParadigms with PostgreSQL

Case study reviewing iParadigm's (Turnitin's) growth to 3000 queries per second, and the replication and hardware decisions incorporated to meet increased demand. Presented at PGWest Conference 2010

Richard Yen

November 03, 2010
Tweet

More Decks by Richard Yen

Other Decks in Technology

Transcript

  1. What is iParadigms? Small Internet company in downtown Oakland, established

    in 1996 Creators of Turnitin.com, the leading provider of academic integrity services to schools and universities worldwide
  2. What is iParadigms? Small Internet company in downtown Oakland, established

    in 1996 Creators of Turnitin.com, the leading provider of academic integrity services to schools and universities worldwide Creators of iThenticate.com, an up-and-coming service for publications and journals
  3. Some more stats Over 3000 Queries per second across entire

    Turnitin replication cluster--approx. 1,000 transactions per second
  4. Some more stats Over 3000 Queries per second across entire

    Turnitin replication cluster--approx. 1,000 transactions per second Turnitin grew from 90GB to 110GB on disk since beginning of 2010
  5. Pitfalls Needed replication to sustain performance Scalability Load distribution Lack

    of replication put system at risk as SPOF No failover No quick recovery
  6. Slony Features Replication from master to multiple slaves Lightweight and

    fast Fairly robust Supports complex replication architectures
  7. How Slony fits in Provided load distribution Replication asynchronous --

    wrote our own Multiplexor to use slaves as read-only copies
  8. How Slony fits in Provided load distribution Replication asynchronous --

    wrote our own Multiplexor to use slaves as read-only copies Failover/Switchover Provided recovery plan -- whew! Enabled easy upgrades
  9. How Slony fits in Provided load distribution Replication asynchronous --

    wrote our own Multiplexor to use slaves as read-only copies Failover/Switchover Provided recovery plan -- whew! Enabled easy upgrades Delayed replication
  10. How Slony fits in Provided load distribution Replication asynchronous --

    wrote our own Multiplexor to use slaves as read-only copies Failover/Switchover Provided recovery plan -- whew! Enabled easy upgrades Delayed replication Log Shipping (will discuss in a bit)
  11. Still Problems Slony not very scalable Cannot handle 10+ slave

    nodes System begins to crumble Network traffic becomes a problem
  12. Still Problems Slony not very scalable Cannot handle 10+ slave

    nodes System begins to crumble Network traffic becomes a problem Each connection consumes DB resources
  13. Quick Fixes Segregated services Put Turnitin, TurnitinUK, and iThenticate on

    separate clusters Upgraded hardware to keep #nodes to a minimum (instead of horizontal scaling)
  14. Further Improvements Moved to Solid-State Disk 90% of traffic is

    SELECT traffic FusionIO drives improved SELECT performance on index tremendously
  15. SSD v. Spindle Comparison Host tii-master (150GB SSD) tii-slave1 (300GB

    SSD) tii-slave2 (225GB RAID1-0) pgfouine queries/s 3450 1753 1825 iostat tps 155.9224605 189.1772031 413.4493879 iostat blk_w/s 983.9365667 936.4881394 859.8786056 iostat blk_r/s 2698.912213 2021.420332 8650.986441 # of 30sec samples 2466 2466 2466 Specs Storage Type 2 FusionIO RAID0 2 FusionIO RAID0 RAID-10 spindle RAM 64GB 64GB 64GB CPU 8x Xeon 3.00GHz 16x Xeon 2.67GHz 8x Xeon 3.00GHz Avg Load (15 min) 4.3 1.92 3.34 Avg CPU iowait (%) 0.495457861 0.23653706 3.013296881
  16. More Reasons for SSD Lower power consumption Increases rack space

    real estate Short-stroking RAID drives provides almost equal performance at similar cost
  17. More Reasons for SSD Lower power consumption Increases rack space

    real estate Short-stroking RAID drives provides almost equal performance at similar cost Price of SSD is dropping!
  18. More Reasons for SSD Lower power consumption Increases rack space

    real estate Short-stroking RAID drives provides almost equal performance at similar cost Price of SSD is dropping! Planning to move to Virident shortly
  19. Current Challenges Connection capacity Improving our Multiplexor Integrating pgBouncer Data

    set continues to grow Need to find a way to partition/shard
  20. Current Challenges Connection capacity Improving our Multiplexor Integrating pgBouncer Data

    set continues to grow Need to find a way to partition/shard Postgres stats helps much in pruning the data set
  21. Current Challenges Connection capacity Improving our Multiplexor Integrating pgBouncer Data

    set continues to grow Need to find a way to partition/shard Postgres stats helps much in pruning the data set Need a long-term capacity management plan
  22. Lessons Learned XID wraparound -- make sure you have space

    in pg_xlog! table bloat -- don’t let autovacuum bite you (8.4 resolves this) slony failover not working -- found race condition idle transactions -- keep an eye on them
  23. Lessons Learned RAID config -- Use the right RAID level,

    and keep your pg_xlog on a separate partition Slony log-shipping -- very versatile and useful
  24. Lessons Learned Postgres is very robust, very scalable Third-party software

    like Slony and PgBouncer are very reliable Mailing list and IRC chat are very useful