Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Elephant in the Room

The Elephant in the Room

A presentation on PostgreSQL for the AmsterdamX.pm at Booking.com HQ.

8d96f5c273062cb617255e630fe0705c?s=128

Brad Lhotsky

August 27, 2013
Tweet

Transcript

  1. None
  2. The Elephant in the Room PostgreSQL For Perl Programmers Brad

    Lhotsky http://twitter.com/reyjrar http://github.com/reyjrar
  3. Given NoSQL and MySQL Why bother with PostgreSQL?

  4. NoSQL Redis Cassandra MongoDB CouchDB etc.. •Distributed •Free Form Data

    Store •Document Storage •No need for intensive normalization
  5. MySQL for structured data. It's Web Scale.

  6. Fact or Fiction MySQL is faster than PostgreSQL.

  7. None
  8. Surprisingly, using a technology without knowing how to use it

    yields poor performance
  9. Fact or Fiction MySQL is faster than PostgreSQL.

  10. MySQL & PostgreSQL SELECT grade, risk_score, risk_adjusted_value, SUM(risk_adjusted_value) FROM risky_mortgages

    GROUP BY grade or "Understanding the Mortgage Crisis of 2008"
  11. testdb=#; ERROR: column "risky_mortgages.risk_score" must appear in the GROUP BY

    clause or be used in an aggregate function LINE 1: SELECT grade, risk_score, risk_adjusted_value, ... ^ PostgreSQL
  12. mysql>; +-------+------------+---------------------+--------------------------+ | grade | risk_score | risk_adjusted_value | SUM(risk_adjusted_value)

    | +-------+------------+---------------------+--------------------------+ | A | 90 | 90000 | 182000 | | A+ | 95 | 95000 | 95000 | | B | 80 | 80000 | 409000 | | C- | 60 | 60000 | 169000 | +-------+------------+---------------------+--------------------------+ MySQL
  13. None
  14. MySQL makes SQL easy. That's why it's so successful!

  15. MySQL •Bought by Sun •Sun bought by Oracle •Oracle has

    a shining reputation with the Open Source Community, right? ! •What the Fork?!@$ •MariaDB •XtraDB •OurDelta •Drizzle
  16. PostgreSQL learned a lot from MySQL’s success.

  17. * with commercial support available! Open Source and BSD Licensed!

  18. PostgreSQL Features •Reliability •Excellent native types •Constraints •Stored Procedures •Triggers

    • Views •Explain / Analyze •Extensible!
  19. Features: Reliability •ACID •Atomicity •Consistency •Isolation •Durability ! •It’s turtles

    all the way down.
  20. •Numeric •Strings •Dates, Times, Intervals •Boolean •Array •Geometric •Networking! •JSON!

    Features: Native Data Types
  21. SELECT * FROM network WHERE ip << inet ‘10.0.0.0/8’ SELECT

    * FROM documents WHERE json->>'priority' > 10 SELECT * FROM events WHERE event_ts > NOW() - interval '30 minutes'
  22. •Simple or complex •Best to keep simple, use triggers for

    complex •Validation at point of storage Features: Constraints
  23. CREATE TABLE positivity ( id SERIAL, happy_int INTEGER CHECK (

    happy_int > 0 ), entry_ts TIMESTAMP DEFAULT NOW() );
  24. INSERT INTO positivity ( happy_int ) VALUES ( -2 );

    ERROR: new row for relation "positivity" violates check constraint "positivity_happy_int_check"
  25. Features: Stored Procedures •PL/PgSQL •Oracle Compatible •Powerful and Transactional! •PL/Perl

    •Like mod_perl, shared interpreter •PL/Perl, no filesystem access •PL/PerlU, filesystem access •DBD::PgSPI •PL/R •Powerful Statistical Programming inside of Procedures
  26. CREATE FUNCTION lookup_dev_id(character varying) RETURNS integer AS $$ ! DECLARE

    in_ip inet := CAST($1 AS inet); out_dev_id INTEGER; ! BEGIN SELECT dev_id INTO out_dev_id FROM devices WHERE ip = in_ip; ! IF NOT FOUND THEN RAISE EXCEPTION 'Unable to locate IP: %', in_ip; END IF; ! RETURN out_dev_id; END$$ LANGUAGE plpgsql;
  27. SELECT lookup_dev_ip(‘12.34.56.78’);

  28. •Use Stored Procedures to do things when something happens •Modify

    before or after change happens •Validation at the point of storage •DDL Changes can fire triggers Features: Triggers
  29. CREATE OR REPLACE FUNCTION email_signature() RETURNS trigger AS $$ !

    use MIME::Lite; ! my $subject = q{New Signature: } . qq{$_TD->{new}{facility}:$_TD->{new}{native_sig_id}}; ! my $msg = new MIME::Lite( From => ‘yourdb@example.com’, To => ‘me@example.com’, Subject => $subject, Type => 'TEXT', ); $msg->data(<<"EOB"); Description: $_TD->{new}{description} EOB ! $msg->send; return; $$ LANGUAGE plperlu;
  30. CREATE TRIGGER new_signature_email AFTER INSERT ON security_signatures FOR EACH ROW

    EXECUTE PROCEDURE email_signature();
  31. •Read only •Use for Reports or Pages •Writable via Triggers

    •Static or Dynamic Features: Views
  32. Features: pg_stats •Tracks Usage Data •Used by optimizer •Viewable by

    mortals
  33. CREATE VIEW v_admin_index_usage AS SELECT t.relname AS “table”, c.relname AS

    index_name, c.relpages, i.idx_scan, t.seq_scan FROM pg_class c JOIN pg_stat_user_indexes i ON c.relname = i.indexrelname JOIN pg_stat_user_tables t ON i.relname = t.relname;
  34. None
  35. SELECT client.id, client.ip, SUM(queries) AS queries, SUM(nx) AS nx, SUM(answers)

    AS answers, SUM(errors) AS errors, COUNT(distinct day) AS days_active FROM client INNER JOIN client_stats ON client.id = client_stats.client_id GROUP BY client.id, client.ip HAVING COUNT(distinct day) > 2
  36. QUERY PLAN ! GroupAggregate (cost=33.25..41.30 rows=230 width=35) Filter: (count(DISTINCT client_stats.day)

    > 2) -> Sort (cost=33.25..33.82 rows=230 width=35) Sort Key: client.id, client.ip -> Hash Join (cost=15.77..24.23 rows=230 width=35) Hash Cond: (client_stats.client_id = client.id) -> Seq Scan on client_stats (cost=0.00..5.30 rows=230 width=28) -> Hash (cost=10.34..10.34 rows=434 width=15) -> Seq Scan on client (cost=0.00..10.34 rows=434 width=15) EXPLAIN - What You Think You'll Do
  37. GroupAggregate (cost=33.25..41.30 rows=230 width=35) (actual time=4.498..6.117 rows=8 loops=1) Filter: (count(DISTINCT

    client_stats.day) > 2) -> Sort (cost=33.25..33.82 rows=230 width=35) (actual time=4.417..4.925 rows=234 loops=1) Sort Key: client.id, client.ip Sort Method: quicksort Memory: 43kB -> Hash Join (cost=15.77..24.23 rows=230 width=35) (actual time=2.085..3.697 rows=234 loops=1) Hash Cond: (client_stats.client_id = client.id) -> Seq Scan on client_stats (cost=0.00..5.30 rows=230 width=28) (actual time=0.014..0.468 rows=234 loops=1) -> Hash (cost=10.34..10.34 rows=434 width=15) (actual time=2.049..2.049 rows=434 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on client (cost=0.00..10.34 rows=434 width=15) (actual time=0.011..0.995 rows=434 loops=1) Total runtime: 6.242 ms EXPLAIN ANALYZE- What You Did
  38. •Tons of PL/* Languages •PostGIS (GeoSpatial) •tsearch2 full text search

    •ltree for representing tree structure in SQL ! •PGXN by David Wheeler •Based on the CPAN Features: Extensibility
  39. Major Features: 8.4 •Windowing Functions SELECT person, dept, salary, avg(salary)

    OVER( partition by dept ) FROM employees
  40. SELECT * FROM employees; ! id | name | dept

    | gender | salary ----+---------+------------+--------+-------- 1 | bob | it | m | 45000 2 | alice | it | f | 42000 3 | charlie | it | m | 55000 4 | bill | it | m | 46000 5 | jill | it | f | 35000 6 | rob | accounting | m | 35000 7 | jane | accounting | f | 30000 8 | janice | accounting | f | 37000 9 | jack | accounting | m | 40000 (9 rows)
  41. name | dept | salary | salary_dept ---------+------------+--------+------------- rob |

    accounting | 35000 | 35500 jane | accounting | 30000 | 35500 janice | accounting | 37000 | 35500 jack | accounting | 40000 | 35500 bob | it | 45000 | 44600 jill | it | 35000 | 44600 alice | it | 42000 | 44600 charlie | it | 55000 | 44600 bill | it | 46000 | 44600 (9 rows) SELECT name, dept, salary, AVG(salary) OVER (partition by dept)::int AS salary_dept FROM employees;
  42. SELECT dept, gender, salary, AVG(salary) OVER (partition by dept)::int AS

    salary_dept, AVG(salary) OVER (partition by gender)::int AS salary_gender, AVG(salary) OVER (partition by dept, gender)::int AS salary_dept_gender FROM employees dept | gender | salary | salary_dept | salary_gender | salary_dept_gender ------------+--------+--------+-------------+---------------+-------------------- accounting | f | 30000 | 35500 | 36000 | 33500 accounting | f | 37000 | 35500 | 36000 | 33500 accounting | m | 35000 | 35500 | 44200 | 37500 accounting | m | 40000 | 35500 | 44200 | 37500 it | f | 35000 | 44600 | 36000 | 38500 it | f | 42000 | 44600 | 36000 | 38500 it | m | 46000 | 44600 | 44200 | 48667 it | m | 45000 | 44600 | 44200 | 48667 it | m | 55000 | 44600 | 44200 | 48667 (9 rows)
  43. Major Features: 9.0 •Hot Standby •Streaming Replication •Trigger Improvements •WHEN

    •Column-based triggers •Deferrable Constraints •Anonymous Functions •Named Parameter Calls •Exclusion Constraints
  44. CREATE TABLE reservations ( room_id INTEGER NOT NULL, guest_id INTEGER

    NOT NULL, during DATERANGE NOT NULL ) ALTER TABLE reservations ADD CONSTRAINT exclude_room_period EXCLUDE USING gist( room_id WITH =, during WITH && )
  45. Major Features: 9.1 •Synchronous Replication •PGXN Support •UNLOGGED tables •SE/Linux

    Integration •SQL/MED (Management of External Data) •MySQL, Oracle, Sybase, ODBC, JDBC, Informix •CouchDB, MongoDB, Redis, Neo4j •LDAP, Twitter, S3, Storm, WWW •ETC
  46. Major Features: 9.2 •Index only scans •Native JSON datatype •Native

    Range types
  47. Major Features: 9.3 •Custom Background Workers •Native Materialized Views •LATERAL

    JOIN() •Native JSON Operators •Recursive View Support •Automatic Updatable Views
  48. Also noteworthy .. •hstore for Key/Value pairs like NoSQL •Custom

    Data Types •Composite Types •Partial Indices •Namespaces •Table Partitioning via Views/Triggers
  49. None