key • Index all columns you would be passing into where clause • Index the keys used to Join tables • Index the date column (if you are going to call it frequent, like rankings of a particular date) • Add partial index to scopes
rails. A rails view(an html page) shows data from multiple model in a single page Similarly we can show data from multiple table as a single table using the concept called views Why would we do that? Because it makes life easier
store the result as well in a table • This was ﬁrst introduced by Oracle • But now found in PostgreSQL, MicrosoftSQL, IBM DB2, etc. • MySQL doesn’t have it you can create it using open source extensions.
a bit of SQL here class CreateAllTimesSalesMatView < ActiveRecord::Migration def up execute <<-SQL CREATE MATERIALIZED VIEW all_time_sales_mat_view AS SELECT sum(amount) as total_sale, DATE_TRUNC('day', invoice_adte) as date_of_sale FROM sales GROUP BY DATE_TRUNC('day', invoice_adte) SQL end def down execute("DROP MATERIALIZED VIEW IF EXISTS all_time_sales_view") end end
view as they operate on your tables primary key and a view doesn’t have it • If you want to use it then you need to one of the ﬁelds in your table as primary key class Model < ActiveRecord::Base self.primary_key = :id end
Storage • Requires Postgres 9.3 for MatView • Requires Postgres 9.4 to refresh concurrently • Can’t have Live data • You can ﬁx this by creating your own table and updating it with the latest information
is coming to an end • Its the era of new modern day JS frameworks • JSON is the glue that binds the fronted and our backend • So its natural to ﬁnd more and more DB supporting the generation and storage of JSON.
email, ( select array_to_json(array_agg(row_to_json(user_projects))) from ( select id, name from projects where user_id=users.id order by created_at asc ) user_projects ) as projects from users where id = 1 ) result
is known for its slow writes and faster readers • It has slow writes as it waits for conﬁrmation that what we inserted has been recorded to the Hard Disk. • You can disable this conﬁrmation check to speed up your inserts if you are inserting a lot of rows every second
can’t recover the lost data not saved to Hard Disk • It won’t corrupt the data, but you might loose some rows of your data • Not to be used in cases when you want data integrity to be 100% • Use it where you don’t mind loosing some information or where you can rebuild it from outside your DB. Like logs, or raw information.
have to cache the database, RAM_FOR_DATABASE * 3/4 effective_cache_size = <%= ram_for_database.to_i * 3/4 %>MB # Shared memory to hold data in RAM, RAM_FOR_DATABASE/4 shared_buffers = <%= ram_for_database.to_i / 3 %>MB # Work memory for queries (RAM_FOR_DATABASE/max_connections) ROUND DOWN 2^x work_mem = <%= 2**(Math.log(ram_for_database.to_i / expected_max_active_connections.to_i)/Math.log(2)).ﬂoor %>MB # Memory for vacuum, autovacuum, index creation, RAM/16 ROUND DOWN 2^x maintenance_work_mem = <%= 2**(Math.log(ram_for_database.to_i / 16)/Math.log(2)).ﬂoor %>MB # To ensure that we don't lose data, always fsync after commit synchronous_commit = on
= 16 # WAL memory buffer wal_buffers = 8MB # Ensure autovacuum is always turned on autovacuum = on # Set the number of concurrent disk I/O operations that PostgreSQL # expects can be executed simultaneously. effective_io_concurrency = 4
the whole DB • Use arrays for data preloading • Simplify the way you fetch data from the DB using views • Move complicated JSON generation to the Databases • Disable synchronous commit when you feel like it won’t cause a problem
control over all our moving parts • Try to bring about the best with your tech stack before you start throwing more money at it • SQL has been around for 40 years and its planning to say for a while longer =) • There is no golden rule. What worked for me might not work for your speciﬁc use case.