SQL, Scaling, and What's Unique About PostgreSQL | XLDB Conference | Ozgun Erdogan

SQL, Scaling, and What's Unique About PostgreSQL | XLDB Conference | Ozgun Erdogan

Relational databases are known to have tightly integrated components that make them hard to extend. This usually leads to new databases that are specialized for scale and certain workloads.

For its last three releases, PostgreSQL has been providing official APIs to extend the database *without forking it*. Users can read the documentation, implement their ideas according to the APIs, and then dynamically load in their shared object to override or extend any database submodule.

In this talk, we will demonstrate four example APIs that make PostgreSQL a distributed database that focuses on real-time workloads:

1. PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL. 2. JSON and JSONB enable storing and querying semi-structured data. 3. HyperLogLog & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results. 4. Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries.

In summary, PostgreSQL's extensible architecture puts it in a unique place for scaling out SQL and also for adapting to evolving hardware trends. It could just be that the monolithic SQL database is dying. If so, long live Postgres!


Citus Data

May 01, 2018


  1. 2.

    I love Postgres 2 Ozgun Erdogan CTO of Citus Data

    Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas
  2. 3.


  3. 4.

    Our mission at Citus Data 4 Make it so that

    your business never has to worry about scaling their database again
  4. 5.

    Punch Line 1. What is unique about PostgreSQL? • The

    extension APIs 2. PostgreSQL extensions are a game changer for relational databases
  5. 6.

    Talk Outline 1. What is an extension? 2. Why are

    extensions a game changer for databases? 3. Postgres can’t do “this” • Semi-structured or unstructured data • Geospatial database • S3 or columnar storage for storage • Scale out 4. Conclusion
  6. 7.

    What is an Extension • An extension is a piece

    of software that adds functionality to Postgres. Each extension bundles related objects together. • Postgres 9.1 started providing official APIs to override or extend any database module’s behavior. • “CREATE EXTENSION citus;” dynamically loads these objects into Postgres’ address space.
  7. 8.

    What can you Extend in Postgres? • You can override,

    cooperate with, or extend any combination of the following database modules: • Type system and operators • User defined functions and aggregates • Storage system and indexes • Write ahead logging and replication • Transaction engine • Background worker processes • Query planner and query executor • Configuration and database metadata
  8. 9.

    Why are Extensions a game changer • Every decade brings

    new workloads for databases. • The last decade was about capturing more data, in more shapes and form. • Postgres has been forked by dozens of commercial databases for new workloads. When you fork, your database diverges from the community. • What if you could leverage the database ecosystem and grow with it?
  9. 10.

    Extending a relational database: Really? Extending a relational database is

    a relatively new idea. Over the years, we received questions on this new idea. 1. Forking vs extensions: Can you really extend any database module? 2. Building from scratch vs extensions: Postgres is a relational database from an old era. It can’t do “this”.
  10. 11.

    Relational databases can’t do “this” Postgres isn’t designed for “this”:

    1. Process semi-structured 2. Run geospatial workloads 3. Non-relational data storage 4. Scale out for large datasets
  11. 12.

    Postgres can’t do semi-structured data • NoSQL popularized the use

    of semi-structured data as an alternative to data models used in relational databases. In practice, each model has benefits. • Postgres has an extensible type system. It already supports semi-structured data types: 1. XML 2. Full-text search 3. Hstore: precursor to JSONB 4. JSON / JSONB
  12. 15.

    Postgres can do semi-structured data • PostgreSQL stores and processes

    semi-structured data just as efficiently as NoSQL databases. You also get rich features that come with a relational database. • http://goo.gl/NuoLgP (Mongo vs Postgres jsonb benchmarks) • If your semi-structured or unstructured data can’t be served by existing data types, you can always create your own type. You can even add operators, aggregate functions, or indexes.
  13. 16.

    Postgres can’t be a spatial database • A spatial database

    stores and queries data that represents objects defined in a geometric space. • Spatial databases represent geometric objects such as lines and polygons. Some databases handle complex structures such as 3D objects and topological coverages. from boundlessgeo.com
  14. 19.

    Postgres can become a spatial database • The PostGIS extension

    turns PostgreSQL into one of most popular geospatial databases in the world. • Thousands of companies use PostGIS for spatial workloads – from projects such as OpenStreetMap to start-ups like Hotel Tonight. • If you need more from your spatial database, you can easily extend Postgres. In fact, PostGIS comes with six other extensions for specific use cases.
  15. 20.

    Postgres can only do row storage • Postgres 9.1+ comes

    with foreign data wrapper APIs. With these APIs, you can add read from or write to any data source. • Postgres already has 106 wrappers. With these, you can run SQL commands on diverse data sources: 1. S3 (read-only) 2. MongoDB 3. Oracle 4. Cstore_fdw
  16. 21.

    CStore – Columnar storage • CStore is under development. For

    example, cstore doesn’t yet support Update / Delete commands. • Cstore’s primary benefit today is compression. People use it to reduce in- memory and storage footprint.
  17. 22.

    Block 1 Block 2 Block 3 Block 4 Block 5

    Block 6 Block 7 150K rows (configurable) 150K rows (configurable) 10K column values (configurable) per block ORC file format
  18. 24.

    Postgres can do more than row stores • Default storage

    engine for relational databases is row- oriented. But, Postgres can do way more than row stores. • You can extend Postgres to store data in a columnar format or interact with other databases – such as DynamoDB or Oracle. • Postgres provides extension apis to (1) scan foreign tables, (2) scan foreign joins, (3) update foreign tables, (4) lock rows, (5) sample data, (6) override planner and executor, and more.
  19. 25.

    Postgres doesn’t scale • “SQL doesn’t scale” answers a complex

    problem by making a simple statement. • SQL means different things to different people. Depending on the context, it could mean multi-tenant (B2B) databases, short read/writes, real-time analytics, or data warehousing. • Scaling each one of these workloads require extending the relational database in a different way.
  20. 26.

    Citus – Distributed database 1. Citus scales out PostgreSQL •

    Uses sharding and replication • Query engine parallelizes SQL queries across machines 2. Citus extends PostgreSQL • Uses Postgres extension APIs to cooperate with or extend all database modules 3. Available in 3 ways • Open source, enterprise software, and managed database as a service on AWS
  21. 28.

    Citus – Architecture diagram (simplified) Coordinator SELECT sum(…), count(…) FROM

    teams_1001 SELECT sum … FROM teams_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM teams_1002 SELECT sum … FROM teams_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node Postgres with Citus installed 1 shard = 1 Postgres table SELECT avg(..) FROM teams;
  22. 29.

    Postgres can scale • “SQL doesn’t scale” is a simple

    statement to a complex problem. It’s easy to dismiss a complex problem by making a statement - that trivializes the problem. • SQL is hard, not impossible, to scale.
  23. 30.

    Summary • Postgres Extension APIs provide a unique way to

    build new databases. • Postgres can be extended to many different workloads 1. jsonb: Semi-structured data 2. PostGIS: Geospatial database 3. cstore_fdw: columnar storage (in works) 4. Citus: Scale out your database
  24. 31.

    Conclusion • Postgres 10 enables you to extend any database

    module’s behavior. This way, you can use functionality built into Postgres over decades. You can also grow with the rich ecosystem of tools and libraries. • Extensions are a game changer for databases. • The monolithic relational database could be dying. If so, long live Postgres!
  25. 32.

    © 2017 Citus Data. All right reserved. ozgun@citusdata.com @citusdata Ozgun

    Erdogan www.citusdata.com citusdata.com/ newsletter