Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Biggish Data with Rails and Postgresql

Biggish Data with Rails and Postgresql

See the video here: http://www.starrhorne.com/2014/05/19/biggish-data-rails-postgres.html

Once your database hits a few hundred million rows normal ActiveRecord conventions don't work so well.

...you find yourself in a new world. One where calling count() can bring your app to its knees. One where migrations can take all day.

This talk will show you how to work with big datasets in Rails and Postgresql. We'll show how normal conventions break down, and offer practical real-world advice on maintaining performance, doing maintenance, and tuning rails for optimal DB performance.

Starr Horne

April 23, 2014
Tweet

More Decks by Starr Horne

Other Decks in Programming

Transcript

  1. @StarrHorne t { Biggish  Data          

                                     10,000     1       Terabyte  Database 2     Gigabytes  new  data  every  day Serve  with  a  normal  rails  app
  2. @StarrHorne Starr’s Corollary to Moore’s Law As DB growth outpaces

    Moore’s law, you travel back in time. Literally.
  3. @StarrHorne QUERY PLAN -------------------------------- ὣς οἳ μὲν περὶ νηὸς ἐϋσσέλμοιο

    μάχοντο: Πάτροκλος δ' Ἀχιλῆϊ παρίστατο ποιμένι λαῶν δάκρυα θερμὰ χέων ὥς τε κρήνη μελάνυδρος, ἥ τε κατ' αἰγίλιπος πέτρης δνοφερὸν χέει ὕδωρ. τὸν δὲ ἰδὼν ᾤκτιρε ποδάρκης δῖος Ἀχιλλεύς, καί μιν φωνήσας ἔπεα πτερόεντα προσηύδα: τίπτε δεδάκρυσαι Πατρόκλεες, ἠύ̈τε κούρη νηπίη, ἥ θ' ἅμα μητρὶ θέουσ' ἀνελέσθαι ἀνώγει εἱανοῦ ἁπτομένη, καί τ' ἐσσυμένην κατερύκει, δακρυόεσσα δέ μιν ποτιδέρκεται, ὄφρ' ἀνέληται: τῇ ἴκελος Πάτροκλε τέρεν κατὰ δάκρυον εἴβεις. ἠέ τι Μυρμιδόνεσσι πιφαύσκεαι, ἢ ἐμοὶ αὐτῷ, ἦέ τιν' ἀγγελίην Φθίης ἐξέκλυες οἶος; ζώειν μὰν ἔτι φασὶ Μενοίτιον Ἄκτορος υἱόν, db=# EXPLAIN SELECT msg FROM errors WHERE id = 1000;
  4. @StarrHorne # EXPLAIN (format yaml) SELECT msg FROM errors WHERE

    id = 1000; QUERY PLAN -------------------------------- - Plan: + Node Type: "Index Scan" + Scan Direction: "Forward" + Index Name: "errors_pkey" + Relation Name: "errors" + Alias: "errors" + Startup Cost: 0.42 + Total Cost: 8.44 + Plan Rows: 1 + Plan Width: 4 + Index Cond: "(id = 1000)" (1 row)
  5. @StarrHorne db=# EXPLAIN (format yaml) SELECT count(*) FROM notices; QUERY

    PLAN -------------------------------------- - Plan: + Node Type: "Aggregate" + Total Cost: 49971.71 + Plan Rows: 1 + Plans: + - Node Type: "Seq Scan" + Relation Name: "notices" + Total Cost: 48172.96 + Plan Rows: 719496 +
  6. @StarrHorne $ psql honeybadger psql (9.3.4) Type "help" for help.

    ! honeybadger=# select count(*) from errors;
  7. @StarrHorne db=# EXPLAIN ANALYZE SELECT id FROM errors OFFSET 500000

    LIMIT 100; ! QUERY PLAN -------------------------------------- - Plan: + Node Type: "Limit" + Actual Rows: 100 + Plans: + - Node Type: "Seq Scan" + Relation Name: "errors" + Actual Rows: 500100 +
  8. @StarrHorne db=# EXPLAIN ANALYZE SELECT msg FROM errors WHERE id

    >= 500000 AND id < 500100; ! QUERY PLAN ----------------------------------------------------------- - Plan: + Node Type: "Bitmap Heap Scan" + Relation Name: "errors" + Actual Rows: 100 + Plans: + - Node Type: "Bitmap Index Scan" + Index Name: "errors_pkey" + Total Cost: 5.42 + Actual Rows: 100 + Index Cond: "((id >= 500000) AND (id <= 500100))"
  9. @StarrHorne db=# EXPLAIN ANALYZE SELECT id FROM errors ORDER BY

    id DESC LIMIT 10; QUERY PLAN -------------------------------------- - Plan: + Node Type: "Limit" + Startup Cost: 0.42 + Total Cost: 25.63 + Plan Rows: 10 + Plans: + - Node Type: "Index Only Scan"+ Scan Direction: "Backward" + Index Name: "errors_pkey" + Relation Name: "errors" + Actual Rows: 10 +
  10. @StarrHorne db=# EXPLAIN SELECT id FROM errors ORDER BY token

    DESC LIMIT 10; ! QUERY PLAN ------------------------------------------ - Plan: + Node Type: "Limit" + Total Cost: 63721.03 + Plan Rows: 10 + Plans: + - Node Type: "Sort" + Total Cost: 65519.75 + Plan Rows: 719496 + Sort Key: + - "token" +
  11. @StarrHorne http://bit.ly/biggish-data http://www.honeybadger.io All the links referenced in this talk

    Want more visibility into your errors in production? No elephants were harmed in the making of this presentation