Using Databases to pull your application weight

Using Databases to pull your application weight

The talk is about utilizing your database more so that you can offload lot of CPU intensive tasks from ruby to rails to make your application run faster.


Harisankar P S

December 02, 2016


  1. Using Database to pull your application’s weight | December

    2nd, 2016
  2. { “name” => "Harisankar P S", “email” => ”", “twitter”

    => "coderhs", “facebook" => "coderhs", “github” => "coderhs" “bio” => “ I write ruby code for a living.” }
  3. I work at Ruby on Rails dev shop

  4. I am from the city of Kochi In the State

    of Kerala In India Namaskaram in malaylam
  5. നമsാരം How we greet in Malayalam Fun Fact: There is

    22 official languages in India,1653 spoken language. And over 50,000 Dialects And I know three of them English, Malayalam and Tamil I am from Huge and Diverse country =)
  6. I brought some taste from Kerala Come meet me after

    my talk if I might share it with you.
  7. Lets Start

  8. Our Tech Stack is like The Avengers Really awesome when

    they work together
  9. Ruby is Captain America

  10. Rails is Iron man

  11. Database is the hulk

  12. The Hulk can smash, anything

  13. But we make him carry our suitcase

  14. This talk is about how we can offload couple of

    the jobs done by Rails to Database. You have a hulk then don’t feel scared to USE it.
  15. Today we are going to talk about • Query Planner

    • Indexing • Materialised Views • Generating JSON
  16. Query Planner

  17. Question • SQL syntax is all about how the results

    should be • What you want in your result - SELECT id, name • Or some information about data like - SELECT average(price), max(price), min(price) Where is the decision on how the data should be fetch made.
  18. Well thats what Query planner is all about. Its the

    brain of a DB We need to understand how the system work before we can improve its performance
  19. • A query plan is created by the DB before

    the query you gave is executed. • Plan is the cost of running the query. The DB chooses the one with the least cost. • Query Plan assumes the plan it has is the ideal one
  20. But how can a DB handle all the scenarios?

  21. Truth is, It can’t. A DB doesn’t know what all

    scenarios its put under, its upto us to nudge and optimise it.
  22. So we need to see what the query planner see

    Active Record has .explain method to help us there
  23. Asset.where(asset_id: 1).explain User.where(id: 1).explain

  24. So we check the query plan find where we are

    slowing down and then fix them and make the plan choose the faster method.
  25. Sounds Simple Doesn't it =)

  26. So lets see how we can do that.

  27. I have done all these in production =), so you

    don’t to feel scared to run this.
  28. Indexing

  29. Okay..Lets not do that.

  30. • Indexes are a special lookup table that the database

    search engine can use to speed up data retrieval. • An Index is like a pointer to a particular row of a table. Where all the fields in the table are ordered. Database is smart even if you have indexes if it find the sequential search to be cost less then it would go for that one.
  31. We should Index • Index Primary key • Index Foreign

    key • Index all columns you would be passing into where clause • Index the keys used to Join tables • Index the date column (if you are going to call it frequent, like rankings of a particular date) • Index the type column in an STI or polymorphism. • Add partial index to scopes
  32. Do not Index • Do not index tables with a

    lot of read, write • Do not index tables you know that will remain small, all through out its life time • Do not index columns where you will be manipulating lot of its values.
  33. Materialised View

  34. Database views? Database views are like the view in our

    rails. A rails view(an html page) shows data from multiple model in a single page Similarly we can show data from multiple table as a single table using the concept called views Why would we do that? Because it makes life easier
  35. Instead of doing Every time you want the managers SELECT

    id, name, email FROM companies where role=‘manager’
  36. CREATE VIEW company_managers AS SELECT id, name, email FROM companies

    WHERE role='manager'; You can create a view And simple do SELECT * FROM company_managers;
  37. Note: • A schema of view lives in memory of

    a DB • The result is not stored in memory • Its is actually running our query to get the results • They are called pseudo tables
  38. Materialised views are the next evolution of Database views. We

    store the result as well in a table • This was first introduced by Oracle • But now found in PostgreSQL, MicrosoftSQL, IBM DB2, etc. • MySQL doesn’t have it you can create it using open source extensions.
  39. How can we use it in Ruby? Thanks to ActiveRecord

    its easy to access such pseudo tables
  40. Create a migration to record the Materialised view We need

    a bit of SQL here class CreateAllTimesSalesMatView < ActiveRecord::Migration def up execute <<-SQL CREATE MATERIALIZED VIEW all_time_sales_mat_view AS SELECT sum(amount) as total_sale, DATE_TRUNC('day', invoice_adte) as date_of_sale FROM sales GROUP BY DATE_TRUNC('day', invoice_adte) SQL end def down execute("DROP MATERIALIZED VIEW IF EXISTS all_time_sales_view") end end
  41. Create Active Record model I place these views at the

    location app/models/views class AllTimeSalesMatView < ActiveRecord::Base self.table_name = 'all_time_sales_mat_view' def readonly? true end def self.refresh ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED VIEW CONCURRENTLY all_time_sales_mat_view') end end
  42. Now we can do AllTimeSalesMatView.where(email: '')

  43. First, Last and Find • They don’t work in your

    view as they operate on your tables primary key and a view doesn’t have it • If you want to use it then you need to one of the fields in your table as primary key class Model < ActiveRecord::Base self.primary_key = :id end
  44. Benchmark • I created a table with 1 million random

    sales and random dates in a year. (Dates where bookmarked as well)
  45. Take Away • Faster to fetch data. • Capture commonly

    used joins & filters. • Push data intensive processing from Ruby to Database. • Allow fast and live filtering of complex associations or calculation .fields. • We can index various fields in the table.
  46. Pain Points • We need to write SQL • We

    will be using more RAM and Storage • Requires Postgres 9.3 for MatView • Requires Postgres 9.4 to refresh concurrently • Can’t have Live data • You can fix this by creating your own table and 
 updating it with the latest information
  47. JSON generation in DB

  48. • Websites with simple HTML and plain javascript based AJAX

    is coming to an end • Its the era of new modern day JS frameworks • JSON is the glue that binds the fronted and our backend • So its natural to find more and more DB supporting the generation and storage of JSON.
  49. To convert a single row to JSON select row_to_json(users) from

    users where id = 1 we use row_to_json() method in SQL
  50. {“id":1,"email":"", "encrypted_password":"iwillbecrazytodisplaythat", "reset_password_token":null,"reset_password_sent_at":null, "remember_created_at":"2016-11-06T08:39:47.983222", "sign_in_count": 11,"current_sign_in_at":"2016-11-18T11:47:01.946542", "last_sign_in_at":"2016-11-16T20:46:31.110257", "current_sign_in_ip":"::1","last_sign_in_ip":"::1", "created_at":"2016-11-06T08:38:46.193417", "updated_at":"2016-11-18T11:47:01.956152",

  51. But for more practical use we write queries like select

    row_to_json(results) from ( select id, email from users ) as results {"id":1,"email":""}
  52. A more complex one select row_to_json(result) from ( select id,

    email, ( select array_to_json(array_agg(row_to_json(user_projects))) from ( select id, name from projects where order by created_at asc ) user_projects ) as projects from users where id = 1 ) result
  53. { “id":1,"email":"", "project":["id": 3, "name": “CSnipp"] } We did data

    preloading as well, instead of having the need to run another query separate from the first one. We got the data about projects as well.
  54. json_build_object • Added in PostgreSQL 9.4 to make JSON creation

    a bit more simpler select json_build_object('foo',1,'bar',2); {"foo": 1, "bar": 2}
  55. So where is Ruby?

  56. • For simple JSON creation you can use a gem

    called Surus •
  57. Which lets you write code like User.find_json 1 User.find_json 1,

    columns: [:id, :name, :email] Post.find_json 1, include: :author User.find_json(, include: {posts: {columns: [:id, :subject]}}) User.all_json User.where(admin: true).all_json User.all_json(columns: [:id, :name, :email], include: {posts: {columns: [:id, :subject]}}) Post.all_json(include: [:forum, :post])
  58. But for more complicated queries you might still end up

    writing SQL
  59. But Like me if you want to keep as much

    stuff as possible in Ruby then. Create a materialised view for your complicated query And then use the gem to generate JSON =)
  60. Benchmarks • In our case we saw request to a

    (.json) url which used to take 2 seconds, coming down to <= 200ms • Some benchmarks I found online mentions
  61. • Simple query • More complicate Query • Source:

  62. Synchronous Commit

  63. • PostgreSQL sacrifices speed for durability and reliability • PostgreSQL

    is known for its slow writes and faster readers • It has slow writes as it waits for confirmation that what we inserted has been recorded to the Hard Disk. • You can disable this confirmation check to speed up your inserts if you are inserting a lot of rows every second
  64. User.transaction do User.synchronous_commit false end Surus Gem Provides

  65. • Only issue now, is incase your DB crash it

    can’t recover the lost data not saved to Hard Disk • It won’t corrupt the data, but you might loose some rows of your data • Not to be used in cases when you want data integrity to be 100% • Use it where you don’t mind loosing some information or where you can rebuild it from outside your DB. Like logs, or raw information.
  66. Summarize

  67. • Index data so that we don’t end up scanning

    the whole DB • Simplify the way you fetch data from the DB using views • Move complicated JSON generation to the Databases • Disable synchronous commit when you feel like it won’t cause a problem
  68. Conclusions • Know your tech stack • We should have

    control over all our moving parts • Try to bring about the best with your tech stack before you start throwing more money at it • SQL has been around for 40 years and its planning to say for a while longer =) • There is no golden rule. What worked for me might not work for your specific use case.
  69. I blogged about this in detail. • caching-database-query/ • using-postgresql-instead-rails/ • postgresql/ • database-query-using-indexes/
  70. Questions? =)