Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PostgreSQL as seen by Rubyists

PostgreSQL as seen by Rubyists

PostgreSQL has become the most popular RDBMS in the Ruby ecosystem in the last decade. It has a great set of built-in features, including a variety of versatile data types, both common and very specific.

But when we load data from the database to our application code, we're working with Ruby data types: classes from the standard library, Rails, or other gems. So while they can seem to be the same as their PostgreSQL counterparts, they are not absolutely identical, and sometimes that could lead to surprising behavior.

In this talk, I would like to explore the power of data types in PostgreSQL and Ruby and how to work with them properly to use both Ruby and PostgreSQL on 100% of their power!

Andrey Novikov

October 22, 2022
Tweet

More Decks by Andrey Novikov

Other Decks in Programming

Transcript

  1. Martian Open Source Yabeda: Ruby application instrumentation framework Lefthook: git

    hooks manager AnyCable: Polyglot replacement for ActionCable server PostCSS: A tool for transforming CSS with JavaScript Imgproxy: Fast and secure standalone server for resizing and converting remote images Logux: Client-server communication framework based on Optimistic UI, CRDT, and log Overmind: Process manager for Procfile-based applications and tmux Even more at evilmartians.com/oss
  2. Example of subtle yet important difference Are they same? …

    Left: road sign conforming Vienna Convention on Road Signs and Signals Right: Japanese road sign per “Order on Road Sign, Road Line, and Road Surface Marking”
  3. Example of subtle yet important difference Are they same? NO!

    Both require drivers to give a way… European sign doesn’t have a requirement to stop or even slow down!
  4. Example of subtle yet important difference Are they same? YES!

    Stop signs around the world are mostly octagonal. Japanese signs have “STOP” word in English after 2017, but are still pretty rare.
  5. Integers Integer Variable length integer 2, 4, 8 bytes signed

    In ActiveModel there is validation for databases: Name Size Range smallint 2 -32768 to +32767 integer 4 -2147483648 to +2147483647 bigint 8 -9223372036854775808 to +9223372036854775807 See bignum.c in Ruby sources. See Numeric types docs 1.size # => 8 (bytes) (256**8 - 1).size # => 8 (bytes) (256**8).size # => 9 (bytes) (256**40 - 1).size # => 40 (bytes) Test.create(value: 2147483648) # ActiveModel::RangeError: 2147483648 is out of rang # for ActiveModel::Type::Integer with limit 4 bytes ` ` ` ` ` ` INSERT INTO "tests" ("value") VALUES (2147483648); -- ERROR: integer out of range
  6. Oh no, I have integer primary keys! What to do?

    0. Don’t panic! 1. Use pghero or Heroku pg:diagnose to detect problematic primary keys. 2. Migrate to bigint or uuid if needed (use triggers, Luke!) 3. In case of emergency, remember that all integers are signed! You always have 2 more billions of values on the dark negative side! Example of pkey migration from int to bigint: engineering.silverfin.com/pg-zero-downtime-bigint-migration ` ` ` ` ALTER SEQUENCE tablename_id_seq MINVALUE -2147483647 RESTART WITH -2147483647; pghero
  7. Floating point numbers Float 8 bytes (double-precision) real — 4

    bytes double — 8 bytes See Ruby docs for Float More fun at 0.30000000000000004.com! IEEE 754 0.1 + 0.2 # => 0.30000000000000004 Float::MAX # => 1.7976931348623157e+308 Float::MAX + '1e+308'.to_f # => Infinity # BUT! Float::MAX + '0.1'.to_f # => 1.7976931348623157e+308 🤔 Float::MAX == (Float::MAX + '0.1'.to_f) # => true 🤯 Float::NAN == Float::NAN # => false SELECT 0.1::float + 0.2::float; -- 0.300000000000000 SELECT 0.1 + 0.2; -- 0.3 (but it is NOT float!) SELECT '1.7976931348623157e+308'::float + '1e+308':: --- ERROR: value out of range: overflow SELECT '+inf'::double precision; -- Infinity 🤷 SELECT '1.7976931348623157e+308'::float = ('1.7976931348623157e+308'::float + '0.1'::float); -- true ¯\_(ツ)_/¯ SELECT 'NaN'::float = 'NaN'::float; -- true 🤯 0.30000000000000004.com
  8. Arbitrary precision numbers BigDecimal Variable length numeric Variable length See

    Ruby docs for BigDecimal Use numeric to store money! BigDecimal("0.1") + BigDecimal("0.2") # => 0.3e0 BigDecimal("NaN") == BigDecimal("NaN") # => false BigDecimal("1.0") / BigDecimal("0.0") #=> Infinity # To match PostgreSQL behavior: BigDecimal.mode(BigDecimal::EXCEPTION_OVERFLOW, tru BigDecimal("1.0") / BigDecimal("0.0") # Computation results in 'Infinity' (FloatDomainEr BigDecimal("0.1") + 0.2.to_d == 0.30000000000000004 # true 🤔 SELECT 0.1 + 0.2; -- 0.3 which is decimal SELECT 'NaN'::decimal = 'NaN'::decimal; -- true SELECT '1.0'::decimal / '0.0'::decimal; -- ERROR: division by zero SELECT (0.1 + 0.2) = (0.1::float + 0.2::float); -- false ` `
  9. But there is money type, isn’t it? BigDecimal Variable size

    money 8 byte fixed-precision number. ActiveRecord has to parse textual representation, see connection_adapters/postgresql/oid/money.rb Also see issue № 31457 for lots of pain. Both output and acceptable input format depends on session-level lc_monetary setting! Precision is defined by lc_monetary at database creation time and can’t be changed! # If the database locale setting isn't `en_US`: # Creation may fail: Product.create!(price: 100500.42) # ERROR: invalid input syntax for type money: "1005 # Or it can succeed, but won't be able to be parsed Product.last.price # => 0.0 -- on production: SELECT 100500.42::money; -- $100,500.42 -- on dev machine: SELECT 100500.42::money; -- ¥ 100,500 -- 🤯 But it should be dollars, and where are my cen ` ` ` `
  10. Strings and texts, lyrics and prose String Variable size varchar,

    text variable size, max 1 GB Read the docs: String Read the docs: 8.3. Character Types "こんにちは地球人!".encoding # => #<Encoding:UTF-8> "\xe3\x2e\x2e".encoding # => #<Encoding:UTF-8> "\xe3\x2e\x2e".valid_encoding? # => false "これ\x00やばい!".valid_encoding? # => true SELECT 'こんにちは地球人!'; -- こんにちは地球人! SELECT E'\xe3\x2e\x2e'); -- ERROR: invalid byte sequence for encoding "UTF8 SELECT E'これ\x00やばい!'; -- ERROR: invalid byte sequence for encoding "UTF8
  11. So many string types! 1. Don’t use char(n) It is

    always size of n and stores unnecessary spaces at right. Mostly for compatibility with older applications. 2. varchar and text are effectively the same, choose whatever you like. string in migrations is varchar in PostgreSQL. 💡 Did you know? SimpleForm gem will render multi-line HTML <textarea> tag for text type and single-line <input type="text"> for character varying. Convention over configuration! See SimpleForm gem README: github.com/heartcombo/simple_form ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
  12. Binary data String Variable size bytea Variable size, max 1

    GB Memory and network traffic consumption: 📈 See Binary Data Types page in the docs. data = File.binread(“meme.png”) # => "\x89PNG\r\n\x1A…" data.encoding # => #<Encoding:ASCII-8BIT> data.bytesize # => 46534 Test.last.blob # => "\x89PNG\r\n\x1A…" Test.last.blob_before_type_cast.bytesize # => 46534 Test.last.blob_before_type_cast # => "\\x89504e470d0a1a0a" Test.last.blob_before_type_cast.bytesize # => 93070 SELECT '\x89504e470d0a1a0a…'::bytea; # Note hexadecimal format ↑
  13. What if 1 GB isn’t enough? You can’t store more

    in a table column (hard limit) But you can store up 4 TB in large objects table! And there is a gem for that: active_storage-postgresql Beware performance implications of TOAST → More about it in PG docs: 70.2. TOAST Andrey Novikov @Envek · Follow — PostgreSQL, did you update all these megabyte- long JSON I gave you? — Yes — What did it cost? — Everything… all the IO on the server. Let’s talk about how PostgreSQL stores JSONB (yep, it will be a thread about TOAST) 10:47 PM · May 18, 2022 Read the full conversation on Twitter 359 Reply Copy link Read 4 replies
  14. Dates Date date 4 bytes Internally stores number of days

    since year 4713 BC up to infinity. The Julian day number is in elapsed days since noon (Greenwich Mean Time) on January 1, 4713 BCE (in the Julian calendar). The day count is virtually the astronomical Julian day number. Internally stores number of days since year 4713 BC up to year 5874897 AD. In the Julian Date system, each day has a sequential number, starting from JD 0 (which is sometimes called the Julian Date). JD 0 corresponds to 1 January 4713 BC in the Julian calendar. See the docs for Date class. See B.7. Julian Dates in PostgreSQL docs.
  15. Time and timezones Time AS::TimeWithZone Two UNIX timestamps inside and

    tzdata also timestamp timestamptz 8 bytes, microsecond precision Ruby on Rails uses UTC timezone internally. Use timestamp with time zone whenever possible! Time.now # => 2022-10-22 13:42:42 +0900 Time.current # => Sat, 22 Oct 2022 04:42:42 UTC +00: Time.current.time_zone # => #<ActiveSupport::TimeZone name="UTC", @tzinfo= Time.use_zone("Asia/Tokyo") { Time.current } # => Sat, 22 Oct 2020 13:42:42 JST +09:00 CREATE TABLE tests (t1 timestamp, t2 timestamptz); SET SESSION timezone TO 'Etc/UTC'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Europe/Lisbon'; INSERT INTO tests (t1, t2) VALUES (now(), now()); SET SESSION timezone TO 'Asia/Tokyo'; SELECT * FROM tests; t1 | t2 ---------------------+------------------------- 2022-10-22 04:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 13:42:42 │ 2022-10-02 13:42:42+09 │ 2022-10-22 05:42:42 │ 2022-10-02 13:42:42+09 │ ` `
  16. How to not mess up with timezones 1. Use timezone-aware

    methods Use Time.current and Date.current instead of Time.now and Date.today 2. Convert timestamps to user time zone 3. Don’t use dates in SQL, use timestamps More tips here: thoughtbot.com/blog/its-about-time-zones ` ` ` ` ` ` ` ` Time.use_zone(user.timezone) do # Do SQL queries, render views, … end # or Time.current.in_time_zone(user.timezone) - Posts.where(published_at: Date.today...Date.tomorrow) + Posts.where(published_at: Time.current.beginning_of_day..Time.current.end_of_day)
  17. Time zones are hard 見る The Problem with Time &

    Timezones - Computerphile The Problem with Time & Timezones - Computerphile 共有 共有
  18. Void and uncertainity NilClass NULL nil == nil # =>

    true ¯\_(ツ)_/¯ SELECT NULL = NULL; -- NULL 🚨 SELECT NULL IS NULL; -- true SELECT NULL IS DISTINCT FROM NULL; -- false SELECT 'Ruby' = NULL; -- NULL 🚨 SELECT 'Ruby' IS NULL; -- false SELECT 'Ruby' IS DISTINCT FROM NULL; -- true
  19. JSON Hash, Array json, jsonb Variable length, up to 1GB

    Be careful with symbols as keys Define as_json method on your classes to serialize them to JSON automatically. Behavior of JSON.dump and to_json in Rails is different! JSON saves value as is (it is just a string) JSONB is effective but strict: no duplicate keys, doesn’t preserve whitespaces, etc… Inside: string (no null-bytes!), numeric, … { "foo" => "bar", foo: "baz" }.to_json # {"foo":"baz"} ` ` ` ` ` ` SELECT '{"foo": "bar","foo":"baz"}'::json; -- {"foo": "bar","foo":"baz"} SELECT '{"foo": "bar","foo":"baz"}'::jsonb; -- {"foo": "baz"}
  20. JSON on steroids Use store_model gem to make powerful value

    objects from JSON fields. But don’t overuse! There is performance penalty for serialization and deserialization. store_model gem repo
  21. Ranges Range intrange, tsrange, … intmultirange, … Additional methods in

    the facets gem. https://www.postgresql.org/docs/14/rangetypes.html 5..7 or 5...8 Time.current..1.day.from_now # endless or beginless ranges Time.current.. ..Date.yesterday nil.. or Range.new(nil, nil) # Beginning is always included if possible :-( Test.pluck("intrange(1, 5, '()')").first # 2...5 Test.pluck("tstzrange(now(), now() + '1 hour', '() # ArgumentError: The Ruby Range object does not # support excluding the beginning of a Range. SELECT int8range(5, 7, '[]'); -- [5,8] SELECT int8range(5, 8); -- [5,8) SELECT tstzrange(now(), now() + '1 day', '()'); -- ["2022-10-22 14:42:42+09","2022-10-23 14:42:42+09 -- endless or beginless ranges SELECT tstzrange(now(), NULL); SELECT tstzrange(NULL, NULL); -- PG 14: Multiranges and operators SELECT nummultirange(numrange(1, 20)) - nummultirange(numrange(4, 6)); -- {[1,4),[6,20)} -- and many more… (exclusion constraints!)
  22. UUID String 36 bytes uuid 16 bytes See Rails docs

    for Digest::UUID See docs for pgcrypto and uuid-ossp extensions. Also take a look at upcoming UUIDv6, v7, and v8! datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04 # All-random UUIDv4 SecureRandom.uuid # => “40f15398-4b38-4e16-8b3c-ff16fc960d38” # Determined UUIDv5 (hash-based) Digest::UUID.uuid_v5(Digest::UUID::DNS_NAMESPACE, "n # => "9b8edca0-90f2-5031-8e5d-3f708834696c" CREATE EXTENSION "pgcrypto"; SELECT gen_random_uuid(); -- 2cfff962-4a24-4ef3-b2f8-35351b18bf63 CREATE EXTENSION "uuid-ossp"; SELECT uuid_generate_v5(uuid_ns_dns(), 'name'); -- 9b8edca0-90f2-5031-8e5d-3f708834696c
  23. IP addresses IPAddr inet, cidr 7 or 19 bytes both

    inet works with both host and network addresses. cidr works with network addresses only. See IPAddr docs. See Network address types and functions and operators. ip6 = IPAddr.new "3ffe:505:2::1" ip4 = IPAddr.new "192.168.2.0/24" IPAddr.new("192.168.2.0/24").mask(16) #<IPAddr: IPv4:192.168.0.0/255.255.0.0> SELECT '::1'::inet; SELECT '127.0.0.1/32'::inet; SELECT set_masklen(cidr '192.168.1.0/24', 16); -- 192.168.0.0/16 SELECT inet_merge(inet '192.168.1.5/24', inet '192. -- 192.168.0.0/22 ` ` ` `
  24. Durations ActiveSupport::Duration interval 16 bytes Disclaimer: I added it to

    Rails in pull request № 16919. Supported out-of-the-box in Ruby on Rails 6.1+ Time.current + 1.year # => Thu, 18 Jun 2021 21:00:00 MSK +03:00 100500.weeks.iso8601 # => "P100500W" 1.month.to_i # => 2629746 (30.436875 days in seconds) SELECT now() + ‘1 year’; -- 2021-06-18 21:00:00+03 SELECT '100500 weeks'::interval; -- 703500 days SELECT EXTRACT(epoch FROM '1 month'::interval); -- 2592000 (30.0 days in seconds)
  25. Enums String Custom enum types 4 bytes Values are human-readable

    in SQL On Rails < 7 you can use activerecord-postgres_enum gem See 8.7 Enumerated Types page in PostgreSQL docs # In migration (Rails 7+): create_enum :status, ["draft", "published", "archive change_table :posts do |t| t.enum :status, enum_type: "status", default: "dra end # In the application code: class Article < ApplicationRecord enum :status, { draft: "draft", published: "publis end Article.last.status #=> "draft" Article.last.draft? #=> true Article.last.published! # UPDATE articles SET status = 'published' WHERE id CREATE TYPE status AS ENUM ('draft', 'published', ' ALTER TABLE posts ADD COLUMN "status" status NOT NU INSERT INTO posts (status) VALUES ('published'); INSERT INTO posts (status) VALUES ('draft'); SELECT id, status FROM posts; id | status ----+------------ 2 | draft 1 | published
  26. Example of composite type use case What if we want

    to: 1. Store products with prices in different currencies in one table 2. Work with price and currency as a whole There is a great money gem in Ruby, but how to do it in SQL? 3. And do some calculations without having to write complex SQL queries with joins. Before: After: User (string currency) has many Product (numeric price) User (string currency) # only as a setting! has many Product (true_money price (string+numeric))
  27. Create custom datatype Declare composite datatype in the database: CREATE

    TYPE _true_money AS ( currency varchar, amount numeric ); -- type with constraints to allow: -- - either NULL value (no price, can be forbidden by NOT NULL) -- - or value with both currency and amount specified CREATE DOMAIN true_money AS _true_money CHECK ( value IS NULL AND value IS DISTINCT FROM (null, null)::_true_money OR ((value).currency IS NOT NULL AND (value).amount IS NOT NULL) );
  28. Fun fact about composite datatypes Every table defines own datatype

    which can be used elsewhere But don’t use it in reality, please! (There are limitations) CREATE TABLE "inner" ( v1 integer, v2 text ); CREATE TABLE "outer" (v inner); INSERT INTO "outer" (v) VALUES ((42,'Hello world!')); SELECT * FROM "outer"; v ------------------- (42,"Hello world!")
  29. Use composite datatype See 8.16. Composite Types in PostgreSQL docs

    for more advices and caveats. ALTER TABLE tests ADD COLUMN price true_money; INSERT INTO tests (price) VALUES (('JPY',10000.0)); INSERT INTO tests (price) VALUES ('("JPY",100.0)'); SELECT price FROM tests; -- (JPY,10000.0),(JPY,100.0) SELECT (price).currency, (price).amount FROM tests; currency | amount ----------+--------- JPY | 10000.0 JPY | 100.0
  30. Declare it in ActiveRecord module ActiveRecord module ConnectionAdapters module PostgreSQL

    module OID class TrueMoney < Type::Value def type :true_money end # Here will be (de)serialization code end end end end end
  31. Deserialization And "(USD,4.2)" becomes #<Money fractional:420 currency:USD> in Ruby ✨

    def deserialize(value) return nil if value.nil? currency, amount = value.match(/\A\("?(\w+)"?,(\d+(?:\.\d+)?)\)\z/).captures ::Money.from_amount(BigDecimal(amount), currency) end ` ` ` `
  32. Casting user input Add ability to assign ready object to

    attribute: def cast(value) return nil if value.nil? case value when ::Money then value when String then deserialize(value) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end
  33. Deserialization and input casting at once Replaces both deserialize and

    cast , also handles nil s. def cast_value(value) case value when ::Money then value when String currency, amount = value.match(/\A\("?(\w+)"?,(\d+(?:\.\d+)?)\)\z/).captures ::Money.from_amount(BigDecimal(amount), currency) else raise NotImplementedError, "Don't know how to cast #{value.class} #{value.inspect} into Money" end end end ` ` ` ` ` `
  34. Serialization for the database Reuse available serialization methods for subtypes.

    def serialize(value) return nil if value.nil? # ActiveRecord will handle NULL for us amount_t = ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::Decimal.new currency_t = ::ActiveModel::Type::String.new "(#{currency_t.serialize(value.currency.iso_code).inspect},#{amount_t.serialize(value.amount)})" end
  35. Register datatype in ActiveRecord PostgreSQLAdapterWithTrueMoney = Module.new do def initialize_type_map(m

    = type_map) m.register_type "true_money" do |*_args, _sql_type| ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney.new end m.alias_type "_true_money", "true_money" super end end ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.prepend(PostgreSQLAdapterWithTrueMoney) ActiveRecord::Type.register( :true_money, ::ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::OID::TrueMoney, adapter: :postgresql, )
  36. Also add it for migrations… module SchemaStatementsWithTrueMoney def type_to_sql(type, limit:

    nil, precision: nil, scale: nil, array: nil, **) case type.to_s when 'true_money' then 'true_money' else super end end end ActiveRecord::ConnectionAdapters::PostgreSQL::SchemaStatements.prepend(SchemaStatementsWithTrueMoney) module ActiveRecord::ConnectionAdapters::PostgreSQL::ColumnMethods def true_money(name, options = {}) column(name, :true_money, options) end end
  37. Ready to use! rails g model Product title price:true_money rails

    db:migrate rails console Product.create!(title: "Something", price: Money.from_amount(100000, “USD”)) Product.last # => #<Product id: 1, title: "Something", price: 100000.00 USD>
  38. But it is not done yet! A lot of stuff

    has to be done to make a full-featured datatype in SQL… But then you can do a lot in SQL: After all, one might re-invent abandoned pg-currency CREATE FUNCTION true_money_add(a true_money, b true_ BEGIN IF (a).currency != (b).currency THEN RAISE EXCEPTION '% can not be added to % - cur END IF; RETURN ((a).currency, (a).amount + (b).amount); END; $$ IMMUTABLE RETURNS NULL ON NULL INPUT LANGUAGE plp CREATE OPERATOR +(leftarg=true_money, rightarg=true CREATE FUNCTION true_money_sum(state true_money, va BEGIN IF value IS NULL AND state IS NULL THEN RETURN NULL; END IF; IF state IS NULL THEN RETURN value; END IF; RETURN state + value; END; $$ IMMUTABLE LANGUAGE plpgsql; CREATE AGGREGATE sum (true_money) (sfunc = true_mon SELECT (price).currency AS currency, sum(price) AS total FROM products GROUP BY currency;
  39. Gems for datatypes torque-postgresql — standard datatypes not (yet) supported

    by Rails. activerecord-postgis-adapter — all the power of PostGIS extension in Ruby. activerecord-postgres_enum — support enum in migrations and schema (before Rails 7)
  40. Gems for other PostgreSQL features Because PostgreSQL is much more

    than datatypes. ActiveRecordExtended — functions for datatypes and DSL for queries fx — make schema.rb great again with triggers scenic — add support for views order_query — keyset-pagination for your models postgresql_cursor — get more data from the database efficiently And also martian pg_trunk gem to rule them all get fx , scenic , object dependency management and more within a single gem! ` ` ` ` structure.sql no more! Make schema.rb great again! ` ` ` ` pg_trunk gem
  41. Thank you! @Envek @Envek @Envek @Envek github.com/Envek @evilmartians @evilmartians @evil-martians

    @evil.martians evilmartians.com Our awesome blog: evilmartians.com/chronicles! (Special thanks to @hachi8833 for translating ≈20 posts to Japanese!) See these slides at envek.github.io/kaigionrails-postgresql-as-seen-by-rubyists These slides