Biggish Data with Rails and Postgresql

Biggish Data with Rails and Postgresql

See the video here: http://www.starrhorne.com/2014/05/19/biggish-data-rails-postgres.html

Once your database hits a few hundred million rows normal ActiveRecord conventions don't work so well.

...you find yourself in a new world. One where calling count() can bring your app to its knees. One where migrations can take all day.

This talk will show you how to work with big datasets in Rails and Postgresql. We'll show how normal conventions break down, and offer practical real-world advice on maintaining performance, doing maintenance, and tuning rails for optimal DB performance.

6eb5a4e87bd69489d0a531d489f90552?s=128

Starr Horne

April 23, 2014
Tweet

Transcript

  1. Starr Horne Biggish Data w/ Rails & Postgresql

  2. @StarrHorne

  3. @StarrHorne BIG DATA (sorry)

  4. @StarrHorne Architecture …nope

  5. @StarrHorne How to keep an app working as your production

    dataset grows
  6. @StarrHorne t { Biggish  Data          

                                     10,000     1       Terabyte  Database 2     Gigabytes  new  data  every  day Serve  with  a  normal  rails  app
  7. @StarrHorne Postgresql can handle it I’m pretty awesome like that

  8. @StarrHorne t YOUR APP CAN’T

  9. @StarrHorne $ rails console > Error.count()

  10. @StarrHorne $ rails console > user = User.find(123) > user.errors.page(100)

  11. @StarrHorne $ rails console > Error.delete_where(“created_at < ‘1/1/2012’”)

  12. @STARRHORNE SUMMER 1978

  13. @STARRHORNE VT-100 TERMINAL

  14. @STARRHORNE Oracle 1.0 written in PDP-11 assembly language

  15. @STARRHORNE This is a PDP-11

  16. @STARRHORNE not a web stack

  17. @StarrHorne ! Computers get more awesome as time goes forward

    Moore’s Law
  18. @StarrHorne Starr’s Corollary to Moore’s Law As DB growth outpaces

    Moore’s law, you travel back in time. Literally.
  19. @StarrHorne GREAT SCOTT!

  20. @StarrHorne HARD WARE

  21. @StarrHorne Real Computers (like our ancestors used)

  22. @StarrHorne Lots of Disks (DB and OS on separate HDDs)

  23. @StarrHorne InqUIRY EFFICIENCY

  24. @StarrHorne QUERY PLAN -------------------------------- ὣς οἳ μὲν περὶ νηὸς ἐϋσσέλμοιο

    μάχοντο: Πάτροκλος δ' Ἀχιλῆϊ παρίστατο ποιμένι λαῶν δάκρυα θερμὰ χέων ὥς τε κρήνη μελάνυδρος, ἥ τε κατ' αἰγίλιπος πέτρης δνοφερὸν χέει ὕδωρ. τὸν δὲ ἰδὼν ᾤκτιρε ποδάρκης δῖος Ἀχιλλεύς, καί μιν φωνήσας ἔπεα πτερόεντα προσηύδα: τίπτε δεδάκρυσαι Πατρόκλεες, ἠύ̈τε κούρη νηπίη, ἥ θ' ἅμα μητρὶ θέουσ' ἀνελέσθαι ἀνώγει εἱανοῦ ἁπτομένη, καί τ' ἐσσυμένην κατερύκει, δακρυόεσσα δέ μιν ποτιδέρκεται, ὄφρ' ἀνέληται: τῇ ἴκελος Πάτροκλε τέρεν κατὰ δάκρυον εἴβεις. ἠέ τι Μυρμιδόνεσσι πιφαύσκεαι, ἢ ἐμοὶ αὐτῷ, ἦέ τιν' ἀγγελίην Φθίης ἐξέκλυες οἶος; ζώειν μὰν ἔτι φασὶ Μενοίτιον Ἄκτορος υἱόν, db=# EXPLAIN SELECT msg FROM errors WHERE id = 1000;
  25. @StarrHorne # EXPLAIN (format yaml) SELECT msg FROM errors WHERE

    id = 1000; QUERY PLAN -------------------------------- - Plan: + Node Type: "Index Scan" + Scan Direction: "Forward" + Index Name: "errors_pkey" + Relation Name: "errors" + Alias: "errors" + Startup Cost: 0.42 + Total Cost: 8.44 + Plan Rows: 1 + Plan Width: 4 + Index Cond: "(id = 1000)" (1 row)
  26. @StarrHorne Mo’ rows Mo’ problems

  27. @StarrHorne db=# EXPLAIN (format yaml) SELECT count(*) FROM notices; QUERY

    PLAN -------------------------------------- - Plan: + Node Type: "Aggregate" + Total Cost: 49971.71 + Plan Rows: 1 + Plans: + - Node Type: "Seq Scan" + Relation Name: "notices" + Total Cost: 48172.96 + Plan Rows: 719496 +
  28. @StarrHorne $ psql honeybadger psql (9.3.4) Type "help" for help.

    ! honeybadger=# select count(*) from errors;
  29. @StarrHorne Offsets & Pagination

  30. @StarrHorne db=# EXPLAIN ANALYZE SELECT id FROM errors OFFSET 500000

    LIMIT 100; ! QUERY PLAN -------------------------------------- - Plan: + Node Type: "Limit" + Actual Rows: 100 + Plans: + - Node Type: "Seq Scan" + Relation Name: "errors" + Actual Rows: 500100 +
  31. @StarrHorne Use > and < Instead http://bit.ly/biggish-order

  32. @StarrHorne db=# EXPLAIN ANALYZE SELECT msg FROM errors WHERE id

    >= 500000 AND id < 500100; ! QUERY PLAN ----------------------------------------------------------- - Plan: + Node Type: "Bitmap Heap Scan" + Relation Name: "errors" + Actual Rows: 100 + Plans: + - Node Type: "Bitmap Index Scan" + Index Name: "errors_pkey" + Total Cost: 5.42 + Actual Rows: 100 + Index Cond: "((id >= 500000) AND (id <= 500100))"
  33. @StarrHorne Sorting (is tricky)

  34. @StarrHorne db=# EXPLAIN ANALYZE SELECT id FROM errors ORDER BY

    id DESC LIMIT 10; QUERY PLAN -------------------------------------- - Plan: + Node Type: "Limit" + Startup Cost: 0.42 + Total Cost: 25.63 + Plan Rows: 10 + Plans: + - Node Type: "Index Only Scan"+ Scan Direction: "Backward" + Index Name: "errors_pkey" + Relation Name: "errors" + Actual Rows: 10 +
  35. @StarrHorne db=# EXPLAIN SELECT id FROM errors ORDER BY token

    DESC LIMIT 10; ! QUERY PLAN ------------------------------------------ - Plan: + Node Type: "Limit" + Total Cost: 63721.03 + Plan Rows: 10 + Plans: + - Node Type: "Sort" + Total Cost: 65519.75 + Plan Rows: 719496 + Sort Key: + - "token" +
  36. @StarrHorne TIL: Develop against a real dataset Don’t assume, EXPLAIN.

    Limit the number of rows you touch
  37. @StarrHorne

  38. @StarrHorne Lemmings! Cute  pic Name   Link  for  more  info

    http://bit.ly/biggish-lemmings
  39. @StarrHorne DOS & DB TUNING

  40. @StarrHorne Increase Read-Ahead blockdev --setra 2048 /dev/sda

  41. @StarrHorne Use a modern filesystem http://bit.ly/biggish-fs

  42. @StarrHorne Tell PG about all that RAM http://bit.ly/biggish-pgtune

  43. @StarrHorne Vacuum Regularly http://bit.ly/biggish-vacuum

  44. @StarrHorne VELOCITY

  45. @StarrHorne Too Many DB Connections http://bit.ly/biggish-pool

  46. @StarrHorne Too Many Locks http://bit.ly/biggish-locks

  47. @StarrHorne Intensive DB Queries http://bit.ly/biggish-replication

  48. @StarrHorne Partitioning for deletion & archival http://bit.ly/biggish-partition

  49. @StarrHorne Backups take forever http://bit.ly/biggish-wal-e

  50. @StarrHorne This is a lot of stuff

  51. @StarrHorne But you can do it because you’re awesome

  52. @StarrHorne MEXICAN CANDY

  53. @StarrHorne http://bit.ly/biggish-data http://www.honeybadger.io All the links referenced in this talk

    Want more visibility into your errors in production? No elephants were harmed in the making of this presentation