Slide 1

Slide 1 text

PostgreSQL PostgreSQL as seen by Rubyists Rubyists Andrey Novikov, Evil Martians Kaigi on Rails 2022 22 October 2022

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

evilmartians.com 邪悪な火星人? イービルマーシャンズ!

Slide 4

Slide 4 text

Martian Open Source Yabeda: Ruby application instrumentation framework Lefthook: git hooks manager AnyCable: Polyglot replacement for ActionCable server PostCSS: A tool for transforming CSS with JavaScript Imgproxy: Fast and secure standalone server for resizing and converting remote images Logux: Client-server communication framework based on Optimistic UI, CRDT, and log Overmind: Process manager for Procfile-based applications and tmux Even more at evilmartians.com/oss

Slide 5

Slide 5 text

This talk is about… Subtle differences

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Example of subtle yet important difference Are they same? … Left: road sign conforming Vienna Convention on Road Signs and Signals Right: Japanese road sign per “Order on Road Sign, Road Line, and Road Surface Marking”

Slide 8

Slide 8 text

Example of subtle yet important difference Are they same? NO! Both require drivers to give a way… European sign doesn’t have a requirement to stop or even slow down!

Slide 9

Slide 9 text

Example of subtle yet important difference Are they same? YES! Stop signs around the world are mostly octagonal. Japanese signs have “STOP” word in English after 2017, but are still pretty rare.

Slide 10

Slide 10 text

And how it relates to Ruby and PostgreSQL?

Slide 11

Slide 11 text

And how it relates to Ruby and PostgreSQL? Application data Data in the database

Slide 12

Slide 12 text

Ruby vs PostgreSQL data types Are they same?

Slide 13

Slide 13 text

Integers Integer Variable length integer 2, 4, 8 bytes signed In ActiveModel there is validation for databases: Name Size Range smallint 2 -32768 to +32767 integer 4 -2147483648 to +2147483647 bigint 8 -9223372036854775808 to +9223372036854775807 See bignum.c in Ruby sources. See Numeric types docs 1.size # => 8 (bytes) (256**8 - 1).size # => 8 (bytes) (256**8).size # => 9 (bytes) (256**40 - 1).size # => 40 (bytes) Test.create(value: 2147483648) # ActiveModel::RangeError: 2147483648 is out of rang # for ActiveModel::Type::Integer with limit 4 bytes ` ` ` ` ` ` INSERT INTO "tests" ("value") VALUES (2147483648); -- ERROR: integer out of range

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Oh no, I have integer primary keys! What to do? 0. Don’t panic! 1. Use pghero or Heroku pg:diagnose to detect problematic primary keys. 2. Migrate to bigint or uuid if needed (use triggers, Luke!) 3. In case of emergency, remember that all integers are signed! You always have 2 more billions of values on the dark negative side! Example of pkey migration from int to bigint: engineering.silverfin.com/pg-zero-downtime-bigint-migration ` ` ` ` ALTER SEQUENCE tablename_id_seq MINVALUE -2147483647 RESTART WITH -2147483647; pghero

Slide 16

Slide 16 text

Floating point numbers Float 8 bytes (double-precision) real — 4 bytes double — 8 bytes See Ruby docs for Float More fun at 0.30000000000000004.com! IEEE 754 0.1 + 0.2 # => 0.30000000000000004 Float::MAX # => 1.7976931348623157e+308 Float::MAX + '1e+308'.to_f # => Infinity # BUT! Float::MAX + '0.1'.to_f # => 1.7976931348623157e+308 🤔 Float::MAX == (Float::MAX + '0.1'.to_f) # => true 🤯 Float::NAN == Float::NAN # => false SELECT 0.1::float + 0.2::float; -- 0.300000000000000 SELECT 0.1 + 0.2; -- 0.3 (but it is NOT float!) SELECT '1.7976931348623157e+308'::float + '1e+308':: --- ERROR: value out of range: overflow SELECT '+inf'::double precision; -- Infinity 🤷 SELECT '1.7976931348623157e+308'::float = ('1.7976931348623157e+308'::float + '0.1'::float); -- true ¯\_(ツ)_/¯ SELECT 'NaN'::float = 'NaN'::float; -- true 🤯 0.30000000000000004.com

Slide 17

Slide 17 text

Don’t use floats for calculating money! 🤑💥🤕 Never ever!

Slide 18

Slide 18 text

Arbitrary precision numbers BigDecimal Variable length numeric Variable length See Ruby docs for BigDecimal Use numeric to store money! BigDecimal("0.1") + BigDecimal("0.2") # => 0.3e0 BigDecimal("NaN") == BigDecimal("NaN") # => false BigDecimal("1.0") / BigDecimal("0.0") #=> Infinity # To match PostgreSQL behavior: BigDecimal.mode(BigDecimal::EXCEPTION_OVERFLOW, tru BigDecimal("1.0") / BigDecimal("0.0") # Computation results in 'Infinity' (FloatDomainEr BigDecimal("0.1") + 0.2.to_d == 0.30000000000000004 # true 🤔 SELECT 0.1 + 0.2; -- 0.3 which is decimal SELECT 'NaN'::decimal = 'NaN'::decimal; -- true SELECT '1.0'::decimal / '0.0'::decimal; -- ERROR: division by zero SELECT (0.1 + 0.2) = (0.1::float + 0.2::float); -- false ` `

Slide 19

Slide 19 text

But there is money type, isn’t it? BigDecimal Variable size money 8 byte fixed-precision number. ActiveRecord has to parse textual representation, see connection_adapters/postgresql/oid/money.rb Also see issue № 31457 for lots of pain. Both output and acceptable input format depends on session-level lc_monetary setting! Precision is defined by lc_monetary at database creation time and can’t be changed! # If the database locale setting isn't `en_US`: # Creation may fail: Product.create!(price: 100500.42) # ERROR: invalid input syntax for type money: "1005 # Or it can succeed, but won't be able to be parsed Product.last.price # => 0.0 -- on production: SELECT 100500.42::money; -- $100,500.42 -- on dev machine: SELECT 100500.42::money; -- ¥ 100,500 -- 🤯 But it should be dollars, and where are my cen ` ` ` `

Slide 20

Slide 20 text

Strings and texts, lyrics and prose String Variable size varchar, text variable size, max 1 GB Read the docs: String Read the docs: 8.3. Character Types "こんにちは地球人!".encoding # => # "\xe3\x2e\x2e".encoding # => # "\xe3\x2e\x2e".valid_encoding? # => false "これ\x00やばい!".valid_encoding? # => true SELECT 'こんにちは地球人!'; -- こんにちは地球人! SELECT E'\xe3\x2e\x2e'); -- ERROR: invalid byte sequence for encoding "UTF8 SELECT E'これ\x00やばい!'; -- ERROR: invalid byte sequence for encoding "UTF8

Slide 21

Slide 21 text

So many string types! 1. Don’t use char(n) It is always size of n and stores unnecessary spaces at right. Mostly for compatibility with older applications. 2. varchar and text are effectively the same, choose whatever you like. string in migrations is varchar in PostgreSQL. 💡 Did you know? SimpleForm gem will render multi-line HTML tag for text type and single-line for character varying. Convention over configuration! See SimpleForm gem README: github.com/heartcombo/simple_form ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `

Slide 22

Slide 22 text

Please, use utf8everywhere.org! utf8everywhere.org

Slide 23

Slide 23 text

Binary data String Variable size bytea Variable size, max 1 GB Memory and network traffic consumption: 📈 See Binary Data Types page in the docs. data = File.binread(“meme.png”) # => "\x89PNG\r\n\x1A…" data.encoding # => # data.bytesize # => 46534 Test.last.blob # => "\x89PNG\r\n\x1A…" Test.last.blob_before_type_cast.bytesize # => 46534 Test.last.blob_before_type_cast # => "\\x89504e470d0a1a0a" Test.last.blob_before_type_cast.bytesize # => 93070 SELECT '\x89504e470d0a1a0a…'::bytea; # Note hexadecimal format ↑

Slide 24

Slide 24 text

What if 1 GB isn’t enough? You can’t store more in a table column (hard limit) But you can store up 4 TB in large objects table! And there is a gem for that: active_storage-postgresql Beware performance implications of TOAST → More about it in PG docs: 70.2. TOAST Andrey Novikov @Envek · Follow — PostgreSQL, did you update all these megabyte- long JSON I gave you? — Yes — What did it cost? — Everything… all the IO on the server. Let’s talk about how PostgreSQL stores JSONB (yep, it will be a thread about TOAST) 10:47 PM · May 18, 2022 Read the full conversation on Twitter 359 Reply Copy link Read 4 replies

Slide 25

Slide 25 text

Dates Date date 4 bytes Internally stores number of days since year 4713 BC up to infinity. The Julian day number is in elapsed days since noon (Greenwich Mean Time) on January 1, 4713 BCE (in the Julian calendar). The day count is virtually the astronomical Julian day number. Internally stores number of days since year 4713 BC up to year 5874897 AD. In the Julian Date system, each day has a sequential number, starting from JD 0 (which is sometimes called the Julian Date). JD 0 corresponds to 1 January 4713 BC in the Julian calendar. See the docs for Date class. See B.7. Julian Dates in PostgreSQL docs.

Slide 26

Slide 26 text

Time and timezones Time AS::TimeWithZone Two UNIX timestamps inside and tzdata also timestamp timestamptz 8 bytes, microsecond precision Ruby on Rails uses UTC timezone internally. Use timestamp with time zone whenever possible! Time.now # => 2022-10-22 13:42:42 +0900 Time.current # => Sat, 22 Oct 2022 04:42:42 UTC +00: Time.current.time_zone # => # Sat, 22 Oct 2020 13:42:42 JST +09:00 CREATE TABLE tests (t1 timestamp, t2 timestamptz); SET SESSION timezone TO 'Etc/UTC'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Europe/Lisbon'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; SELECT * FROM tests; t1 | t2 ---------------------+------------------------- 2022-10-22 04:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 13:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 05:42:42 │ 2022-10-02 13:42:42+09 │ ` `

Slide 27

Slide 27 text

How to not mess up with timezones 1. Use timezone-aware methods Use Time.current and Date.current instead of Time.now and Date.today 2. Convert timestamps to user time zone 3. Don’t use dates in SQL, use timestamps More tips here: thoughtbot.com/blog/its-about-time-zones ` ` ` ` ` ` ` ` Time.use_zone(user.timezone) do # Do SQL queries, render views, … end # or Time.current.in_time_zone(user.timezone) - Posts.where(published_at: Date.today...Date.tomorrow) + Posts.where(published_at: Time.current.beginning_of_day..Time.current.end_of_day)

Slide 28

Slide 28 text

Time zones are hard 見る The Problem with Time & Timezones - Computerphile The Problem with Time & Timezones - Computerphile 共有 共有

Slide 29

Slide 29 text

Void and uncertainity NilClass NULL nil == nil # => true ¯\_(ツ)_/¯ SELECT NULL = NULL; -- NULL 🚨 SELECT NULL IS NULL; -- true SELECT NULL IS DISTINCT FROM NULL; -- false SELECT 'Ruby' = NULL; -- NULL 🚨 SELECT 'Ruby' IS NULL; -- false SELECT 'Ruby' IS DISTINCT FROM NULL; -- true

Slide 30

Slide 30 text

PostgreSQL-specific datatypes

Slide 31

Slide 31 text

JSON Hash, Array json, jsonb Variable length, up to 1GB Be careful with symbols as keys Define as_json method on your classes to serialize them to JSON automatically. Behavior of JSON.dump and to_json in Rails is different! JSON saves value as is (it is just a string) JSONB is effective but strict: no duplicate keys, doesn’t preserve whitespaces, etc… Inside: string (no null-bytes!), numeric, … { "foo" => "bar", foo: "baz" }.to_json # {"foo":"baz"} ` ` ` ` ` ` SELECT '{"foo": "bar","foo":"baz"}'::json; -- {"foo": "bar","foo":"baz"} SELECT '{"foo": "bar","foo":"baz"}'::jsonb; -- {"foo": "baz"}

Slide 32

Slide 32 text

JSON on steroids Use store_model gem to make powerful value objects from JSON fields. But don’t overuse! There is performance penalty for serialization and deserialization. store_model gem repo

Slide 33

Slide 33 text

Ranges Range intrange, tsrange, … intmultirange, … Additional methods in the facets gem. https://www.postgresql.org/docs/14/rangetypes.html 5..7 or 5...8 Time.current..1.day.from_now # endless or beginless ranges Time.current.. ..Date.yesterday nil.. or Range.new(nil, nil) # Beginning is always included if possible :-( Test.pluck("intrange(1, 5, '()')").first # 2...5 Test.pluck("tstzrange(now(), now() + '1 hour', '() # ArgumentError: The Ruby Range object does not # support excluding the beginning of a Range. SELECT int8range(5, 7, '[]'); -- [5,8] SELECT int8range(5, 8); -- [5,8) SELECT tstzrange(now(), now() + '1 day', '()'); -- ["2022-10-22 14:42:42+09","2022-10-23 14:42:42+09 -- endless or beginless ranges SELECT tstzrange(now(), NULL); SELECT tstzrange(NULL, NULL); -- PG 14: Multiranges and operators SELECT nummultirange(numrange(1, 20)) - nummultirange(numrange(4, 6)); -- {[1,4),[6,20)} -- and many more… (exclusion constraints!)

Slide 34

Slide 34 text

UUID String 36 bytes uuid 16 bytes See Rails docs for Digest::UUID See docs for pgcrypto and uuid-ossp extensions. Also take a look at upcoming UUIDv6, v7, and v8! datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04 # All-random UUIDv4 SecureRandom.uuid # => “40f15398-4b38-4e16-8b3c-ff16fc960d38” # Determined UUIDv5 (hash-based) Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "n # => "9b8edca0-90f2-5031-8e5d-3f708834696c" CREATE EXTENSION "pgcrypto"; SELECT gen_random_uuid(); -- 2cfff962-4a24-4ef3-b2f8-35351b18bf63 CREATE EXTENSION "uuid-ossp"; SELECT uuid_generate_v5(uuid_ns_dns(), 'name'); -- 9b8edca0-90f2-5031-8e5d-3f708834696c

Slide 35

Slide 35 text

IP addresses IPAddr inet, cidr 7 or 19 bytes both inet works with both host and network addresses. cidr works with network addresses only. See IPAddr docs. See Network address types and functions and operators. ip6 = IPAddr.new "3ffe:505:2::1" ip4 = IPAddr.new "192.168.2.0/24" IPAddr.new("192.168.2.0/24").mask(16) # SELECT '::1'::inet; SELECT '127.0.0.1/32'::inet; SELECT set_masklen(cidr '192.168.1.0/24', 16); -- 192.168.0.0/16 SELECT inet_merge(inet '192.168.1.5/24', inet '192. -- 192.168.0.0/22 ` ` ` `

Slide 36

Slide 36 text

Durations ActiveSupport::Duration interval 16 bytes Disclaimer: I added it to Rails in pull request № 16919. Supported out-of-the-box in Ruby on Rails 6.1+ Time.current + 1.year # => Thu, 18 Jun 2021 21:00:00 MSK +03:00 100500.weeks.iso8601 # => "P100500W" 1.month.to_i # => 2629746 (30.436875 days in seconds) SELECT now() + ‘1 year’; -- 2021-06-18 21:00:00+03 SELECT '100500 weeks'::interval; -- 703500 days SELECT EXTRACT(epoch FROM '1 month'::interval); -- 2592000 (30.0 days in seconds)

Slide 37

Slide 37 text

Enums String Custom enum types 4 bytes Values are human-readable in SQL On Rails < 7 you can use activerecord-postgres_enum gem See 8.7 Enumerated Types page in PostgreSQL docs # In migration (Rails 7+): create_enum :status, ["draft", "published", "archive change_table :posts do |t| t.enum :status, enum_type: "status", default: "dra end # In the application code: class Article < ApplicationRecord enum :status, { draft: "draft", published: "publis end Article.last.status #=> "draft" Article.last.draft? #=> true Article.last.published! # UPDATE articles SET status = 'published' WHERE id CREATE TYPE status AS ENUM ('draft', 'published', ' ALTER TABLE posts ADD COLUMN "status" status NOT NU INSERT INTO posts (status) VALUES ('published'); INSERT INTO posts (status) VALUES ('draft'); SELECT id, status FROM posts; id | status ----+------------ 2 | draft 1 | published

Slide 38

Slide 38 text

Inside ActiveRecord How datatypes are working under the hood

Slide 39

Slide 39 text

Example of composite type use case What if we want to: 1. Store products with prices in different currencies in one table 2. Work with price and currency as a whole There is a great money gem in Ruby, but how to do it in SQL? 3. And do some calculations without having to write complex SQL queries with joins. Before: After: User (string currency) has many Product (numeric price) User (string currency) # only as a setting! has many Product (true_money price (string+numeric))

Slide 40

Slide 40 text

Create custom datatype Declare composite datatype in the database: CREATE TYPE _true_money AS ( currency varchar, amount numeric ); -- type with constraints to allow: -- - either NULL value (no price, can be forbidden by NOT NULL) -- - or value with both currency and amount specified CREATE DOMAIN true_money AS _true_money CHECK ( value IS NULL AND value IS DISTINCT FROM (null, null)::_true_money OR ((value).currency IS NOT NULL AND (value).amount IS NOT NULL) );

Slide 41

Slide 41 text

Fun fact about composite datatypes Every table defines own datatype which can be used elsewhere But don’t use it in reality, please! (There are limitations) CREATE TABLE "inner" ( v1 integer, v2 text ); CREATE TABLE "outer" (v inner); INSERT INTO "outer" (v) VALUES ((42,'Hello world!')); SELECT * FROM "outer"; v ------------------- (42,"Hello world!")

Slide 42

Slide 42 text

Use composite datatype See 8.16. Composite Types in PostgreSQL docs for more advices and caveats. ALTER TABLE tests ADD COLUMN price true_money; INSERT INTO tests (price) VALUES (('JPY',10000.0)); INSERT INTO tests (price) VALUES ('("JPY",100.0)'); SELECT price FROM tests; -- (JPY,10000.0),(JPY,100.0) SELECT (price).currency, (price).amount FROM tests; currency | amount ----------+--------- JPY | 10000.0 JPY | 100.0

Slide 43

Slide 43 text

Declare it in ActiveRecord module ActiveRecord module ConnectionAdapters module PostgreSQL module OID class TrueMoney < Type::Value def type :true_money end # Here will be (de)serialization code end end end end end

Slide 44

Slide 44 text

Deserialization And "(USD,4.2)" becomes # in Ruby ✨ def deserialize(value) return nil if value.nil? currency, amount = value.match(/\A\("?(\w+)"?,(\d+(?:\.\d+)?)\)\z/).captures ::Money.from_amount(BigDecimal(amount), currency) end ` ` ` `

Slide 45

Slide 45 text

Casting user input Add ability to assign ready object to attribute: def cast(value) return nil if value.nil? case value when ::Money then value when String then deserialize(value) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end

Slide 46

Slide 46 text

Deserialization and input casting at once Replaces both deserialize and cast , also handles nil s. def cast_value(value) case value when ::Money then value when String currency, amount = value.match(/\A\("?(\w+)"?,(\d+(?:\.\d+)?)\)\z/).captures ::Money.from_amount(BigDecimal(amount), currency) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end end ` ` ` ` ` `

Slide 47

Slide 47 text

Serialization for the database Reuse available serialization methods for subtypes. def serialize(value) return nil if value.nil? # ActiveRecord will handle NULL for us amount_t = ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::Decimal.new currency_t = ::ActiveModel::Type::String.new "(#{currency_t.serialize(value.currency.iso_code).inspect},#{amount_t.serialize(value.amount)})" end

Slide 48

Slide 48 text

Register datatype in ActiveRecord PostgreSQLAdapterWithTrueMoney = Module.new do def initialize_type_map(m = type_map) m.register_type "true_money" do |*_args, _sql_type| ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney.new end m.alias_type "_true_money", "true_money" super end end ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.prepend(PostgreSQLAdapterWithTrueMoney) ActiveRecord::Type.register( :true_money, ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney, adapter: :postgresql, )

Slide 49

Slide 49 text

Also add it for migrations… module SchemaStatementsWithTrueMoney def type_to_sql(type, limit: nil, precision: nil, scale: nil, array: nil, **) case type.to_s when 'true_money' then 'true_money' else super end end end ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsWithTrueMoney) module ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods def true_money(name, options = {}) column(name, :true_money, options) end end

Slide 50

Slide 50 text

Ready to use! rails g model Product title price:true_money rails db:migrate rails console Product.create!(title: "Something", price: Money.from_amount(100000, “USD”)) Product.last # => #

Slide 51

Slide 51 text

But it is not done yet! A lot of stuff has to be done to make a full-featured datatype in SQL… But then you can do a lot in SQL: After all, one might re-invent abandoned pg-currency CREATE FUNCTION true_money_add(a true_money, b true_ BEGIN IF (a).currency != (b).currency THEN RAISE EXCEPTION '% can not be added to % - cur END IF; RETURN ((a).currency, (a).amount + (b).amount); END; $$ IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plp CREATE OPERATOR +(leftarg=true_money, rightarg=true CREATE FUNCTION true_money_sum(state true_money, va BEGIN IF value IS NULL AND state IS NULL THEN RETURN NULL; END IF; IF state IS NULL THEN RETURN value; END IF; RETURN state + value; END; $$ IMMUTABLE LANGUAGE plpgsql; CREATE AGGREGATE sum (true_money) (sfunc = true_mon SELECT (price).currency AS currency, sum(price) AS total FROM products GROUP BY currency;

Slide 52

Slide 52 text

Play with it yourself! https://gist.github.com/Envek/780b917e72a86c123776ee763b8dd986

Slide 53

Slide 53 text

Gems, gems, gems! Everything That Can Be Invented Has Been Invented

Slide 54

Slide 54 text

Gems for datatypes torque-postgresql — standard datatypes not (yet) supported by Rails. activerecord-postgis-adapter — all the power of PostGIS extension in Ruby. activerecord-postgres_enum — support enum in migrations and schema (before Rails 7)

Slide 55

Slide 55 text

Gems for other PostgreSQL features Because PostgreSQL is much more than datatypes. ActiveRecordExtended — functions for datatypes and DSL for queries fx — make schema.rb great again with triggers scenic — add support for views order_query — keyset-pagination for your models postgresql_cursor — get more data from the database efficiently And also martian pg_trunk gem to rule them all get fx , scenic , object dependency management and more within a single gem! ` ` ` ` structure.sql no more! Make schema.rb great again! ` ` ` ` pg_trunk gem

Slide 56

Slide 56 text

That’s it! Questions?

Slide 57

Slide 57 text

Thank you! @Envek @Envek @Envek @Envek github.com/Envek @evilmartians @evilmartians @evil-martians @evil.martians evilmartians.com Our awesome blog: evilmartians.com/chronicles! (Special thanks to @hachi8833 for translating ≈20 posts to Japanese!) See these slides at envek.github.io/kaigionrails-postgresql-as-seen-by-rubyists These slides