Using Databases to pull your application weight

Using Database to pull your application’s weight Harisankar P S,
Founder & CEO, Red Panthers

{ “name” => "Harisankar P S", “email” => ”[email protected]”, “twitter”
=> "coderhs", “facebook" => "coderhs", “github” => “coderhs”, “linkedin” => “coderhs”, }

One thing you notice about me is. I love Stickers!!
So if you have stickers give them to me.

Ruby on Rails dev shop https://redpanthers.co

I am from Kochi, Kerala, India.

Interesting fact about India: India has 22 ofﬁcial languages,1653 spoken
language and over 50,000 Dialects

So let me tell a story

about a novice developer

who wrote the ﬁrst line of code

for a web app that was meant to process a

Couple of 1000 rows of data, less than 10 users

but grew so BIG that it was process GB’s of
data every hour.

Me in 2014

This talk is about all the things that I learned
Which helped me Scale the application without having to spend a fortune in Hardware

Let me start with mentioning the tools I used.

Really awesome when they work together

Ruby is Captain America

Rails is Iron man

Database is the hulk

Hulk can smash

But we make him carry our suitcase

This talk is about how we can ofﬂoad couple of
the jobs done by Rails to Database. You have a HULK, don’t feel scared to USE it.

I have done all these in production , so you
don’t to feel scared to run this.

Today we are going to talk about • Query Planner
• Indexing • Attribute Preloading • Materialised Views • Generating JSON • Synchronous Commit

Query Planner

Database is a General Purpose Software

A database is not build for a single use case
or industry.

Then how does it handle all the scenarios?

Truth is, it doesn’t!

DB doesn’t know what all scenarios its put under, its
upto us to nudge and optimise it.

• SQL syntax is all about how the results should
be • What you want in your result - SELECT id, name • Or some information about data like - SELECT average(price), max(price), min(price) Where is the decision on how the data should be fetch made.

Well thats what Query planner is all about.

Its the Brain of your DB

We need to understand how the system work before we
can improve its performance

• A query plan is created by the DB before
the query you gave is executed. • Plan is the cost of running the query. The DB chooses the one with the least cost. • Query Plan assumes the plan it has is the ideal one

So we need to see what the query planner see
Active Record has .explain method to help us there

Asset.where(asset_id: 1).explain User.where(id: 1).explain

So we check the query plan ﬁnd where we are
slowing down and then ﬁx them and make the plan choose the faster method.

Sounds Simple Doesn't it

So lets see how we can do that.

Tip: We can make the query plan display in JSON,
YML & XML formats as well EXPLAIN (format YAML) select * from users

Indexing

Okay..Lets not do that.

Indexes are a special lookup table that the database search
engine can use to speed up data retrieval.

An Index is like a pointer to a particular row
of a table. Where all the ﬁelds in the table are ordered.

But you know something?

Databases are smart

Even if you have indexes if it ﬁnd the sequential
search to be cost less then it would go for that one.

Example: Lets say you have a column with a 10,000
rows, but the the content is either short, medium, long. The database don’t use index as it ﬁnds sequential is faster

We should Index • Index Primary key • Index Foreign
key • Index all columns you would be passing into where clause • Index the keys used to Join tables • Index the date column (if you are going to call it frequent, like rankings of a particular date) • Add partial index to scopes

Do not Index • Do not index tables with a
lot of read, write • Do not index tables you know that will remain small, all through out its life time • Do not index columns where you will be manipulating lot of its values.

Attribute Preloading

A good use of Postgres Array

Rails Way

select * from tasks select * from tags inner join
tasks_tags on tags.id = tasks_tags.tag_id where tasks_tags.task_id in (1,2,3,..) tasks = Task.ﬁnd(:all, :include => :tags) 2 Queries Object for each tasks Rails Code

Fast Postgres Arrays: tasks = Task.ﬁnd(:all, :select => "*, array(select
tags.name from tags inner join tasks_tags on (tags.id = tasks_tags.tag_id) where tasks_tasks.task_id=tasks.id) as tag_names") 1 SQL query Rails doesn't have to create objects >3x faster

Materialised View

Database views? Database views are like the view in our
rails. A rails view(an html page) shows data from multiple model in a single page Similarly we can show data from multiple table as a single table using the concept called views Why would we do that? Because it makes life easier

Instead of doing Every time you want the managers SELECT
id, name, email FROM companies where role=‘manager’

CREATE VIEW company_managers AS SELECT id, name, email FROM companies
WHERE role='manager'; You can create a view And simple do SELECT * FROM company_managers;

Note: • A schema of view lives in memory of
a DB • The result is not stored in memory • Its is actually running our query to get the results • They are called pseudo tables

Materialised views are the next evolution of Database views. We
store the result as well in a table • This was ﬁrst introduced by Oracle • But now found in PostgreSQL, MicrosoftSQL, IBM DB2, etc. • MySQL doesn’t have it you can create it using open source extensions.

How can we use it in Ruby? Thanks to ActiveRecord
its easy to access such pseudo tables

Create a migration to record the Materialised view We need
a bit of SQL here class CreateAllTimesSalesMatView < ActiveRecord::Migration def up execute <<-SQL CREATE MATERIALIZED VIEW all_time_sales_mat_view AS SELECT sum(amount) as total_sale, DATE_TRUNC('day', invoice_adte) as date_of_sale FROM sales GROUP BY DATE_TRUNC('day', invoice_adte) SQL end def down execute("DROP MATERIALIZED VIEW IF EXISTS all_time_sales_view") end end

Create Active Record model I place these views at the
location app/models/views class AllTimeSalesMatView < ActiveRecord::Base self.table_name = 'all_time_sales_mat_view' def readonly? true end def self.refresh ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED VIEW CONCURRENTLY all_time_sales_mat_view') end end

Now we can do AllTimeSalesMatView.select(:name) AllTimeSalesMatView.where(email: '[email protected]')

First, Last and Find • They don’t work in your
view as they operate on your tables primary key and a view doesn’t have it • If you want to use it then you need to one of the ﬁelds in your table as primary key class Model < ActiveRecord::Base self.primary_key = :id end

Benchmark • I created a table with 1 million random
sales and random dates in a year. (Dates where bookmarked as well)

Take Away • Faster to fetch data. • Capture commonly
used joins & filters. • Push data intensive processing from Ruby to Database. • Allow fast and live filtering of complex associations or calculation .fields. • We can index various fields in the table.

Pain Points • We will be using more RAM and
Storage • Requires Postgres 9.3 for MatView • Requires Postgres 9.4 to refresh concurrently • Can’t have Live data • You can ﬁx this by creating your own table and   updating it with the latest information

JSON generation in DB

• Websites with simple HTML and plain javascript based AJAX
is coming to an end • Its the era of new modern day JS frameworks • JSON is the glue that binds the fronted and our backend • So its natural to ﬁnd more and more DB supporting the generation and storage of JSON.

To convert a single row to JSON select row_to_json(users) from
users where id = 1 we use row_to_json() method in SQL

{ “id":1,"email":"[email protected]", "encrypted_password":"iwillbecrazytodisplaythat", "reset_password_token":null,"reset_password_sent_at":null, "remember_created_at":"2016-11-06T08:39:47.983222", "sign_in_count": 11,"current_sign_in_at":"2016-11-18T11:47:01.946542", "last_sign_in_at":"2016-11-16T20:46:31.110257", "current_sign_in_ip":"::1","last_sign_in_ip":"::1", "created_at":"2016-11-06T08:38:46.193417",
"updated_at":"2016-11-18T11:47:01.956152", “first_name":"Super","last_name":"Admin","role":3 }

But for more practical use we write queries like select
row_to_json(results) from ( select id, email from users ) as results {"id":1,"email":"[email protected]"}

A more complex one select row_to_json(result) from ( select id,
email, ( select array_to_json(array_agg(row_to_json(user_projects))) from ( select id, name from projects where user_id=users.id order by created_at asc ) user_projects ) as projects from users where id = 1 ) result

{ “id":1,"email":"[email protected]", "project":["id": 3, "name": “CSnipp"] } We did data
preloading as well, instead of having the need to run another query separate from the ﬁrst one. We got the data about projects as well.

json_build_object • Added in PostgreSQL 9.4 to make JSON creation
a bit more simpler select json_build_object('foo',1,'bar',2); {"foo": 1, "bar": 2}

So where is Ruby?

• For simple JSON creation you can use a gem
called Surus • https://github.com/jackc/surus

Which lets you write code like User.find_json 1 User.find_json 1,
columns: [:id, :name, :email] Post.find_json 1, include: :author User.find_json(user.id, include: {posts: {columns: [:id, :subject]}}) User.all_json User.where(admin: true).all_json User.all_json(columns: [:id, :name, :email], include: {posts: {columns: [:id, :subject]}}) Post.all_json(include: [:forum, :post])

But for more complicated queries you might still end up
writing SQL

But Like me if you want to keep as much
stuff as possible in Ruby then. Create a materialised view for your complicated query And then use the gem to generate JSON =)

Benchmarks • In our case we saw request to a
(.json) url which used to take 2 seconds, coming down to <= 200ms • Some benchmarks I found online mentions

• Simple query • More complicate Query • Source: https://github.com/JackC/json_api_bench

Synchronous Commit

• PostgreSQL sacrifices speed for durability and reliability • PostgreSQL
is known for its slow writes and faster readers • It has slow writes as it waits for confirmation that what we inserted has been recorded to the Hard Disk. • You can disable this confirmation check to speed up your inserts if you are inserting a lot of rows every second

User.transaction do User.synchronous_commit false @user.save end Surus Gem Provides

• Only issue now, is incase your DB crash it
can’t recover the lost data not saved to Hard Disk • It won’t corrupt the data, but you might loose some rows of your data • Not to be used in cases when you want data integrity to be 100% • Use it where you don’t mind loosing some information or where you can rebuild it from outside your DB. Like logs, or raw information.

Postgres Config that we use # How much memory we
have to cache the database, RAM_FOR_DATABASE * 3/4 effective_cache_size = <%= ram_for_database.to_i * 3/4 %>MB # Shared memory to hold data in RAM, RAM_FOR_DATABASE/4 shared_buffers = <%= ram_for_database.to_i / 3 %>MB # Work memory for queries (RAM_FOR_DATABASE/max_connections) ROUND DOWN 2^x work_mem = <%= 2**(Math.log(ram_for_database.to_i / expected_max_active_connections.to_i)/Math.log(2)).floor %>MB # Memory for vacuum, autovacuum, index creation, RAM/16 ROUND DOWN 2^x maintenance_work_mem = <%= 2**(Math.log(ram_for_database.to_i / 16)/Math.log(2)).floor %>MB # To ensure that we don't lose data, always fsync after commit synchronous_commit = on

# Size of WAL on disk, recommended setting: 16 checkpoint_segments
= 16 # WAL memory buffer wal_buffers = 8MB # Ensure autovacuum is always turned on autovacuum = on # Set the number of concurrent disk I/O operations that PostgreSQL # expects can be executed simultaneously. effective_io_concurrency = 4

Summarize

• Index data so that we don’t end up scanning
the whole DB • Use arrays for data preloading • Simplify the way you fetch data from the DB using views • Move complicated JSON generation to the Databases • Disable synchronous commit when you feel like it won’t cause a problem

Conclusions • Know your tech stack • We should have
control over all our moving parts • Try to bring about the best with your tech stack before you start throwing more money at it • SQL has been around for 40 years and its planning to say for a while longer =) • There is no golden rule. What worked for me might not work for your speciﬁc use case.

I blogged about this in detail. • http://blog.redpanthers.co/materialized-views- caching-database-query/ •
http://blog.redpanthers.co/create-json-response- using-postgresql-instead-rails/ • http://blog.redpanthers.co/different-types-index- postgresql/ • http://blog.redpanthers.co/optimising-postgresql- database-query-using-indexes/

Thank you ந"# നnി ਤuਹਾਡਾ ਧ'ਨਵਾਦ ಧನ#$ದ ధన ଥା# େଯାଉ
ہیرکش ধন বাদ આભાર धन्यवाद अनुगृिहतोऽिस्म धन्यवाद ତu#$ୁ ଧନ(ବା+ ينابرهم يج ناهوت धन्यवाद * India has 1653 spoken languages

Using Databases to pull your application weight

Using Databases to pull your application weight

More Decks by Harisankar P S

Other Decks in Technology

Featured

Transcript