$30 off During Our Annual Pro Sale. View Details »

Using Databases to pull your application weight

Using Databases to pull your application weight

The talk is about utilizing your database more so that you can offload lot of CPU intensive tasks from ruby to rails to make your application run faster.

Harisankar P S

December 02, 2016
Tweet

More Decks by Harisankar P S

Other Decks in Programming

Transcript

  1. Using Database to pull your application’s weight
    http://2016.rubyconf.tw | December 2nd, 2016

    View Slide

  2. {
    “name” => "Harisankar P S",
    “email” => ”[email protected]",
    “twitter” => "coderhs",
    “facebook" => "coderhs",
    “github” => "coderhs"
    “bio” => “ I write ruby code for a
    living.”
    }

    View Slide

  3. I work at
    Ruby on Rails dev shop
    https://redpanthers.co

    View Slide

  4. I am from the city of Kochi
    In the State of Kerala
    In India
    Namaskaram in malaylam

    View Slide

  5. നമsാരം
    How we greet in Malayalam
    Fun Fact: There is 22 official languages in India,1653 spoken
    language. And over 50,000 Dialects
    And I know three of them English, Malayalam and Tamil
    I am from Huge and Diverse country =)

    View Slide

  6. I brought some taste from
    Kerala
    Come meet me after my talk if I might share it with you.

    View Slide

  7. Lets Start

    View Slide

  8. Our Tech Stack
    is like
    The Avengers
    Really awesome when they work together

    View Slide

  9. Ruby
    is
    Captain America

    View Slide

  10. Rails
    is
    Iron man

    View Slide

  11. Database
    is
    the hulk

    View Slide

  12. The Hulk can smash,
    anything

    View Slide

  13. But we make him carry our
    suitcase

    View Slide

  14. This talk is about how we can offload couple
    of the jobs done by Rails to Database.
    You have a hulk then don’t feel scared to
    USE it.

    View Slide

  15. Today we are going to talk about
    • Query Planner
    • Indexing
    • Materialised Views
    • Generating JSON

    View Slide

  16. Query Planner

    View Slide

  17. Question
    • SQL syntax is all about how the results should be
    • What you want in your result - SELECT id, name
    • Or some information about data like - SELECT
    average(price), max(price), min(price)
    Where is the decision on how the data should
    be fetch made.

    View Slide

  18. Well thats what Query
    planner is all about.
    Its the brain of a DB
    We need to understand how the system work
    before we can improve its performance

    View Slide

  19. • A query plan is created by the DB before the query
    you gave is executed.
    • Plan is the cost of running the query. The DB
    chooses the one with the least cost.
    • Query Plan assumes the plan it has is the ideal one

    View Slide

  20. But how can a DB handle all the scenarios?

    View Slide

  21. Truth is, It can’t.
    A DB doesn’t know what all scenarios its put under,
    its upto us to nudge and optimise it.

    View Slide

  22. So we need to see what
    the query planner see
    Active Record has .explain method to help us there

    View Slide

  23. Asset.where(asset_id: 1).explain
    User.where(id: 1).explain

    View Slide

  24. So we check the query plan find where we
    are slowing down and then fix them and
    make the plan choose the faster method.

    View Slide

  25. Sounds Simple
    Doesn't it =)

    View Slide

  26. So lets see how we
    can do that.

    View Slide

  27. I have done all these in
    production =), so you don’t
    to feel scared to run this.

    View Slide

  28. Indexing

    View Slide

  29. Okay..Lets not do that.

    View Slide

  30. • Indexes are a special lookup table that the
    database search engine can use to speed up data
    retrieval.
    • An Index is like a pointer to a particular row of a
    table. Where all the fields in the table are ordered.
    Database is smart even if you have indexes
    if it find the sequential search to be cost less
    then it would go for that one.

    View Slide

  31. We should Index
    • Index Primary key
    • Index Foreign key
    • Index all columns you would be passing into where clause
    • Index the keys used to Join tables
    • Index the date column (if you are going to call it frequent, like
    rankings of a particular date)
    • Index the type column in an STI or polymorphism.
    • Add partial index to scopes

    View Slide

  32. Do not Index
    • Do not index tables with a lot of read, write
    • Do not index tables you know that will remain small,
    all through out its life time
    • Do not index columns where you will be
    manipulating lot of its values.

    View Slide

  33. Materialised View

    View Slide

  34. Database views?
    Database views are like the view in our rails.
    A rails view(an html page) shows data from
    multiple model in a single page
    Similarly we can show data from multiple table
    as a single table using the concept called views
    Why would we do that?
    Because it makes life easier

    View Slide

  35. Instead of doing
    Every time you want the managers
    SELECT id, name, email
    FROM companies
    where role=‘manager’

    View Slide

  36. CREATE VIEW company_managers AS
    SELECT id, name, email
    FROM companies
    WHERE role='manager';
    You can create a view
    And simple do
    SELECT * FROM company_managers;

    View Slide

  37. Note:
    • A schema of view lives in memory of a DB
    • The result is not stored in memory
    • Its is actually running our query to get the results
    • They are called pseudo tables

    View Slide

  38. Materialised views are the next evolution
    of Database views.
    We store the result as well in a table
    • This was first introduced by Oracle
    • But now found in PostgreSQL, MicrosoftSQL, IBM
    DB2, etc.
    • MySQL doesn’t have it you can create it using open
    source extensions.

    View Slide

  39. How can we use it in
    Ruby?
    Thanks to ActiveRecord its easy to access such pseudo
    tables

    View Slide

  40. Create a migration to record
    the Materialised view
    We need a bit of SQL here
    class CreateAllTimesSalesMatView < ActiveRecord::Migration
    def up
    execute <<-SQL
    CREATE MATERIALIZED VIEW all_time_sales_mat_view AS
    SELECT sum(amount) as total_sale,
    DATE_TRUNC('day', invoice_adte) as date_of_sale
    FROM sales
    GROUP BY DATE_TRUNC('day', invoice_adte)
    SQL
    end
    def down
    execute("DROP MATERIALIZED VIEW IF EXISTS
    all_time_sales_view")
    end
    end

    View Slide

  41. Create Active Record model
    I place these views at the location
    app/models/views
    class AllTimeSalesMatView < ActiveRecord::Base
    self.table_name = 'all_time_sales_mat_view'
    def readonly?
    true
    end
    def self.refresh
    ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED
    VIEW CONCURRENTLY all_time_sales_mat_view')
    end
    end

    View Slide

  42. Now we can do
    AllTimeSalesMatView.select(:name)
    AllTimeSalesMatView.where(email: '[email protected]')

    View Slide

  43. First, Last and Find
    • They don’t work in your view as they operate on
    your tables primary key and a view doesn’t have it
    • If you want to use it then you need to one of the
    fields in your table as primary key
    class Model < ActiveRecord::Base
    self.primary_key = :id
    end

    View Slide

  44. Benchmark
    • I created a table with 1 million random sales and
    random dates in a year. (Dates where bookmarked
    as well)

    View Slide

  45. Take Away
    • Faster to fetch data.
    • Capture commonly used joins & filters.
    • Push data intensive processing from Ruby to
    Database.
    • Allow fast and live filtering of complex associations
    or calculation .fields.
    • We can index various fields in the table.

    View Slide

  46. Pain Points
    • We need to write SQL
    • We will be using more RAM and Storage
    • Requires Postgres 9.3 for MatView
    • Requires Postgres 9.4 to refresh concurrently
    • Can’t have Live data
    • You can fix this by creating your own table and 

    updating it with the latest information

    View Slide

  47. JSON generation in
    DB

    View Slide

  48. • Websites with simple HTML and plain javascript
    based AJAX is coming to an end
    • Its the era of new modern day JS frameworks
    • JSON is the glue that binds the fronted and our
    backend
    • So its natural to find more and more DB supporting
    the generation and storage of JSON.

    View Slide

  49. To convert a single row to
    JSON
    select row_to_json(users) from users where id = 1
    we use
    row_to_json() method in SQL

    View Slide

  50. {“id":1,"email":"[email protected]",
    "encrypted_password":"iwillbecrazytodisplaythat",
    "reset_password_token":null,"reset_password_sent_at":null,
    "remember_created_at":"2016-11-06T08:39:47.983222",
    "sign_in_count":
    11,"current_sign_in_at":"2016-11-18T11:47:01.946542",
    "last_sign_in_at":"2016-11-16T20:46:31.110257",
    "current_sign_in_ip":"::1","last_sign_in_ip":"::1",
    "created_at":"2016-11-06T08:38:46.193417",
    "updated_at":"2016-11-18T11:47:01.956152",
    "first_name":"Super","last_name":"Admin","role":3}

    View Slide

  51. But for more practical use
    we write queries like
    select row_to_json(results)
    from (
    select id, email from users
    ) as results
    {"id":1,"email":"[email protected]"}

    View Slide

  52. A more complex one
    select row_to_json(result)
    from (
    select id, email,
    (
    select array_to_json(array_agg(row_to_json(user_projects)))
    from (
    select id, name
    from projects
    where user_id=users.id
    order by created_at asc
    ) user_projects
    ) as projects
    from users
    where id = 1
    ) result

    View Slide

  53. {
    “id":1,"email":"[email protected]",
    "project":["id": 3, "name": “CSnipp"]
    }
    We did data preloading as well, instead of having the need
    to run another query separate from the first one. We got
    the data about projects as well.

    View Slide

  54. json_build_object
    • Added in PostgreSQL 9.4 to make JSON creation a
    bit more simpler
    select json_build_object('foo',1,'bar',2);
    {"foo": 1, "bar": 2}

    View Slide

  55. So where is Ruby?

    View Slide

  56. • For simple JSON creation you can use a gem
    called Surus
    • https://github.com/jackc/surus

    View Slide

  57. Which lets you write code
    like
    User.find_json 1
    User.find_json 1, columns: [:id, :name, :email]
    Post.find_json 1, include: :author
    User.find_json(user.id, include: {posts: {columns:
    [:id, :subject]}})
    User.all_json
    User.where(admin: true).all_json
    User.all_json(columns: [:id, :name, :email], include: {posts:
    {columns: [:id, :subject]}})
    Post.all_json(include: [:forum, :post])

    View Slide

  58. But for more complicated
    queries you might still
    end up writing SQL

    View Slide

  59. But Like me if you want to keep as much stuff
    as possible in Ruby then. Create a
    materialised view for your complicated query
    And then use the gem to generate JSON =)

    View Slide

  60. Benchmarks
    • In our case we saw request to a (.json) url which
    used to take 2 seconds, coming down to <= 200ms
    • Some benchmarks I found online mentions

    View Slide

  61. • Simple query
    • More complicate Query
    • Source: https://github.com/JackC/json_api_bench

    View Slide

  62. Synchronous Commit

    View Slide

  63. • PostgreSQL sacrifices speed for durability and
    reliability
    • PostgreSQL is known for its slow writes and faster
    readers
    • It has slow writes as it waits for confirmation that what
    we inserted has been recorded to the Hard Disk.
    • You can disable this confirmation check to speed up
    your inserts if you are inserting a lot of rows every
    second

    View Slide

  64. User.transaction do
    User.synchronous_commit false
    @user.save
    end
    Surus Gem Provides

    View Slide

  65. • Only issue now, is incase your DB crash it can’t
    recover the lost data not saved to Hard Disk
    • It won’t corrupt the data, but you might loose some
    rows of your data
    • Not to be used in cases when you want data
    integrity to be 100%
    • Use it where you don’t mind loosing some
    information or where you can rebuild it from outside
    your DB. Like logs, or raw information.

    View Slide

  66. Summarize

    View Slide

  67. • Index data so that we don’t end up scanning the
    whole DB
    • Simplify the way you fetch data from the DB using
    views
    • Move complicated JSON generation to the
    Databases
    • Disable synchronous commit when you feel like it
    won’t cause a problem

    View Slide

  68. Conclusions
    • Know your tech stack
    • We should have control over all our moving parts
    • Try to bring about the best with your tech stack before
    you start throwing more money at it
    • SQL has been around for 40 years and its planning to
    say for a while longer =)
    • There is no golden rule. What worked for me might not
    work for your specific use case.

    View Slide

  69. I blogged about this in detail.
    • http://blog.redpanthers.co/materialized-views-
    caching-database-query/
    • http://blog.redpanthers.co/create-json-response-
    using-postgresql-instead-rails/
    • http://blog.redpanthers.co/different-types-index-
    postgresql/
    • http://blog.redpanthers.co/optimising-postgresql-
    database-query-using-indexes/

    View Slide

  70. Questions? =)

    View Slide