Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Elephant in the Room

The Elephant in the Room

A presentation on PostgreSQL for the AmsterdamX.pm at Booking.com HQ.

Brad Lhotsky

August 27, 2013
Tweet

More Decks by Brad Lhotsky

Other Decks in Technology

Transcript

  1. The Elephant in the Room PostgreSQL For Perl Programmers Brad

    Lhotsky http://twitter.com/reyjrar http://github.com/reyjrar
  2. NoSQL Redis Cassandra MongoDB CouchDB etc.. •Distributed •Free Form Data

    Store •Document Storage •No need for intensive normalization
  3. testdb=#; ERROR: column "risky_mortgages.risk_score" must appear in the GROUP BY

    clause or be used in an aggregate function LINE 1: SELECT grade, risk_score, risk_adjusted_value, ... ^ PostgreSQL
  4. mysql>; +-------+------------+---------------------+--------------------------+ | grade | risk_score | risk_adjusted_value | SUM(risk_adjusted_value)

    | +-------+------------+---------------------+--------------------------+ | A | 90 | 90000 | 182000 | | A+ | 95 | 95000 | 95000 | | B | 80 | 80000 | 409000 | | C- | 60 | 60000 | 169000 | +-------+------------+---------------------+--------------------------+ MySQL
  5. MySQL •Bought by Sun •Sun bought by Oracle •Oracle has

    a shining reputation with the Open Source Community, right? ! •What the Fork?!@$ •MariaDB •XtraDB •OurDelta •Drizzle
  6. SELECT * FROM network WHERE ip << inet ‘10.0.0.0/8’ SELECT

    * FROM documents WHERE json->>'priority' > 10 SELECT * FROM events WHERE event_ts > NOW() - interval '30 minutes'
  7. •Simple or complex •Best to keep simple, use triggers for

    complex •Validation at point of storage Features: Constraints
  8. CREATE TABLE positivity ( id SERIAL, happy_int INTEGER CHECK (

    happy_int > 0 ), entry_ts TIMESTAMP DEFAULT NOW() );
  9. INSERT INTO positivity ( happy_int ) VALUES ( -2 );

    ERROR: new row for relation "positivity" violates check constraint "positivity_happy_int_check"
  10. Features: Stored Procedures •PL/PgSQL •Oracle Compatible •Powerful and Transactional! •PL/Perl

    •Like mod_perl, shared interpreter •PL/Perl, no filesystem access •PL/PerlU, filesystem access •DBD::PgSPI •PL/R •Powerful Statistical Programming inside of Procedures
  11. CREATE FUNCTION lookup_dev_id(character varying) RETURNS integer AS $$ ! DECLARE

    in_ip inet := CAST($1 AS inet); out_dev_id INTEGER; ! BEGIN SELECT dev_id INTO out_dev_id FROM devices WHERE ip = in_ip; ! IF NOT FOUND THEN RAISE EXCEPTION 'Unable to locate IP: %', in_ip; END IF; ! RETURN out_dev_id; END$$ LANGUAGE plpgsql;
  12. •Use Stored Procedures to do things when something happens •Modify

    before or after change happens •Validation at the point of storage •DDL Changes can fire triggers Features: Triggers
  13. CREATE OR REPLACE FUNCTION email_signature() RETURNS trigger AS $$ !

    use MIME::Lite; ! my $subject = q{New Signature: } . qq{$_TD->{new}{facility}:$_TD->{new}{native_sig_id}}; ! my $msg = new MIME::Lite( From => ‘[email protected]’, To => ‘[email protected]’, Subject => $subject, Type => 'TEXT', ); $msg->data(<<"EOB"); Description: $_TD->{new}{description} EOB ! $msg->send; return; $$ LANGUAGE plperlu;
  14. CREATE VIEW v_admin_index_usage AS SELECT t.relname AS “table”, c.relname AS

    index_name, c.relpages, i.idx_scan, t.seq_scan FROM pg_class c JOIN pg_stat_user_indexes i ON c.relname = i.indexrelname JOIN pg_stat_user_tables t ON i.relname = t.relname;
  15. SELECT client.id, client.ip, SUM(queries) AS queries, SUM(nx) AS nx, SUM(answers)

    AS answers, SUM(errors) AS errors, COUNT(distinct day) AS days_active FROM client INNER JOIN client_stats ON client.id = client_stats.client_id GROUP BY client.id, client.ip HAVING COUNT(distinct day) > 2
  16. QUERY PLAN ! GroupAggregate (cost=33.25..41.30 rows=230 width=35) Filter: (count(DISTINCT client_stats.day)

    > 2) -> Sort (cost=33.25..33.82 rows=230 width=35) Sort Key: client.id, client.ip -> Hash Join (cost=15.77..24.23 rows=230 width=35) Hash Cond: (client_stats.client_id = client.id) -> Seq Scan on client_stats (cost=0.00..5.30 rows=230 width=28) -> Hash (cost=10.34..10.34 rows=434 width=15) -> Seq Scan on client (cost=0.00..10.34 rows=434 width=15) EXPLAIN - What You Think You'll Do
  17. GroupAggregate (cost=33.25..41.30 rows=230 width=35) (actual time=4.498..6.117 rows=8 loops=1) Filter: (count(DISTINCT

    client_stats.day) > 2) -> Sort (cost=33.25..33.82 rows=230 width=35) (actual time=4.417..4.925 rows=234 loops=1) Sort Key: client.id, client.ip Sort Method: quicksort Memory: 43kB -> Hash Join (cost=15.77..24.23 rows=230 width=35) (actual time=2.085..3.697 rows=234 loops=1) Hash Cond: (client_stats.client_id = client.id) -> Seq Scan on client_stats (cost=0.00..5.30 rows=230 width=28) (actual time=0.014..0.468 rows=234 loops=1) -> Hash (cost=10.34..10.34 rows=434 width=15) (actual time=2.049..2.049 rows=434 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on client (cost=0.00..10.34 rows=434 width=15) (actual time=0.011..0.995 rows=434 loops=1) Total runtime: 6.242 ms EXPLAIN ANALYZE- What You Did
  18. •Tons of PL/* Languages •PostGIS (GeoSpatial) •tsearch2 full text search

    •ltree for representing tree structure in SQL ! •PGXN by David Wheeler •Based on the CPAN Features: Extensibility
  19. SELECT * FROM employees; ! id | name | dept

    | gender | salary ----+---------+------------+--------+-------- 1 | bob | it | m | 45000 2 | alice | it | f | 42000 3 | charlie | it | m | 55000 4 | bill | it | m | 46000 5 | jill | it | f | 35000 6 | rob | accounting | m | 35000 7 | jane | accounting | f | 30000 8 | janice | accounting | f | 37000 9 | jack | accounting | m | 40000 (9 rows)
  20. name | dept | salary | salary_dept ---------+------------+--------+------------- rob |

    accounting | 35000 | 35500 jane | accounting | 30000 | 35500 janice | accounting | 37000 | 35500 jack | accounting | 40000 | 35500 bob | it | 45000 | 44600 jill | it | 35000 | 44600 alice | it | 42000 | 44600 charlie | it | 55000 | 44600 bill | it | 46000 | 44600 (9 rows) SELECT name, dept, salary, AVG(salary) OVER (partition by dept)::int AS salary_dept FROM employees;
  21. SELECT dept, gender, salary, AVG(salary) OVER (partition by dept)::int AS

    salary_dept, AVG(salary) OVER (partition by gender)::int AS salary_gender, AVG(salary) OVER (partition by dept, gender)::int AS salary_dept_gender FROM employees dept | gender | salary | salary_dept | salary_gender | salary_dept_gender ------------+--------+--------+-------------+---------------+-------------------- accounting | f | 30000 | 35500 | 36000 | 33500 accounting | f | 37000 | 35500 | 36000 | 33500 accounting | m | 35000 | 35500 | 44200 | 37500 accounting | m | 40000 | 35500 | 44200 | 37500 it | f | 35000 | 44600 | 36000 | 38500 it | f | 42000 | 44600 | 36000 | 38500 it | m | 46000 | 44600 | 44200 | 48667 it | m | 45000 | 44600 | 44200 | 48667 it | m | 55000 | 44600 | 44200 | 48667 (9 rows)
  22. Major Features: 9.0 •Hot Standby •Streaming Replication •Trigger Improvements •WHEN

    •Column-based triggers •Deferrable Constraints •Anonymous Functions •Named Parameter Calls •Exclusion Constraints
  23. CREATE TABLE reservations ( room_id INTEGER NOT NULL, guest_id INTEGER

    NOT NULL, during DATERANGE NOT NULL ) ALTER TABLE reservations ADD CONSTRAINT exclude_room_period EXCLUDE USING gist( room_id WITH =, during WITH && )
  24. Major Features: 9.1 •Synchronous Replication •PGXN Support •UNLOGGED tables •SE/Linux

    Integration •SQL/MED (Management of External Data) •MySQL, Oracle, Sybase, ODBC, JDBC, Informix •CouchDB, MongoDB, Redis, Neo4j •LDAP, Twitter, S3, Storm, WWW •ETC
  25. Major Features: 9.3 •Custom Background Workers •Native Materialized Views •LATERAL

    JOIN() •Native JSON Operators •Recursive View Support •Automatic Updatable Views
  26. Also noteworthy .. •hstore for Key/Value pairs like NoSQL •Custom

    Data Types •Composite Types •Partial Indices •Namespaces •Table Partitioning via Views/Triggers