$30 off During Our Annual Pro Sale. View Details »

SQL, Scaling, and What's Unique About PostgreSQL | XLDB Conference | Ozgun Erdogan

SQL, Scaling, and What's Unique About PostgreSQL | XLDB Conference | Ozgun Erdogan

Relational databases are known to have tightly integrated components that make them hard to extend. This usually leads to new databases that are specialized for scale and certain workloads.

For its last three releases, PostgreSQL has been providing official APIs to extend the database *without forking it*. Users can read the documentation, implement their ideas according to the APIs, and then dynamically load in their shared object to override or extend any database submodule.

In this talk, we will demonstrate four example APIs that make PostgreSQL a distributed database that focuses on real-time workloads:

1. PostGIS turns Postgres into a spatial database. It adds support for geographic objects, allowing location queries to be run in SQL. 2. JSON and JSONB enable storing and querying semi-structured data. 3. HyperLogLog & TopN add approximation algorithms to Postgres. These sketch algorithms are used in distributed systems when real-time responses to queries matter more than exact results. 4. Citus transforms Postgres into a distributed database. Citus transparently shards and replicates data, performs distributed deadlock detection, and parallelizes queries.

In summary, PostgreSQL's extensible architecture puts it in a unique place for scaling out SQL and also for adapting to evolving hardware trends. It could just be that the monolithic SQL database is dying. If so, long live Postgres!

Citus Data

May 01, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. SQL, Scaling, and What’s
    Unique About PostgreSQL
    Ozgun Erdogan
    Citus Data
    XLDB | May 2018

    View Slide

  2. I love Postgres
    2
    Ozgun Erdogan
    CTO of Citus Data
    Distributed Systems
    Distributed Databases
    Formerly of Amazon
    Love drinking margaritas

    View Slide

  3. 3

    View Slide

  4. Our mission at Citus Data
    4
    Make it so that your business
    never has to worry about
    scaling their database again

    View Slide

  5. Punch Line
    1. What is unique about PostgreSQL?
    • The extension APIs
    2. PostgreSQL extensions are a game
    changer for relational databases

    View Slide

  6. Talk Outline
    1. What is an extension?
    2. Why are extensions a game changer for databases?
    3. Postgres can’t do “this”
    • Semi-structured or unstructured data
    • Geospatial database
    • S3 or columnar storage for storage
    • Scale out
    4. Conclusion

    View Slide

  7. What is an Extension
    • An extension is a piece of software that adds
    functionality to Postgres. Each extension bundles
    related objects together.
    • Postgres 9.1 started providing official APIs to override
    or extend any database module’s behavior.
    • “CREATE EXTENSION citus;” dynamically loads these
    objects into Postgres’ address space.

    View Slide

  8. What can you Extend in Postgres?
    • You can override, cooperate with, or extend any
    combination of the following database modules:
    • Type system and operators
    • User defined functions and aggregates
    • Storage system and indexes
    • Write ahead logging and replication
    • Transaction engine
    • Background worker processes
    • Query planner and query executor
    • Configuration and database metadata

    View Slide

  9. Why are Extensions a game changer
    • Every decade brings new workloads for databases.
    • The last decade was about capturing more data, in
    more shapes and form.
    • Postgres has been forked by dozens of commercial
    databases for new workloads. When you fork, your
    database diverges from the community.
    • What if you could leverage the database ecosystem
    and grow with it?

    View Slide

  10. Extending a relational database: Really?
    Extending a relational database is a relatively new idea.
    Over the years, we received questions on this new idea.
    1. Forking vs extensions: Can you really extend any
    database module?
    2. Building from scratch vs extensions: Postgres is a
    relational database from an old era. It can’t do “this”.

    View Slide

  11. Relational databases can’t do “this”
    Postgres isn’t designed for “this”:
    1. Process semi-structured
    2. Run geospatial workloads
    3. Non-relational data storage
    4. Scale out for large datasets

    View Slide

  12. Postgres can’t do semi-structured data
    • NoSQL popularized the use of semi-structured data as
    an alternative to data models used in relational
    databases. In practice, each model has benefits.
    • Postgres has an extensible type system. It already
    supports semi-structured data types:
    1. XML
    2. Full-text search
    3. Hstore: precursor to JSONB
    4. JSON / JSONB

    View Slide

  13. JSONB data type – store and query
    from compose.com

    View Slide

  14. JSONB data type – aggregate and index

    View Slide

  15. Postgres can do semi-structured data
    • PostgreSQL stores and processes semi-structured data
    just as efficiently as NoSQL databases. You also get
    rich features that come with a relational database.
    • http://goo.gl/NuoLgP (Mongo vs Postgres jsonb benchmarks)
    • If your semi-structured or unstructured data can’t be
    served by existing data types, you can always create
    your own type. You can even add operators, aggregate
    functions, or indexes.

    View Slide

  16. Postgres can’t be a spatial database
    • A spatial database stores and
    queries data that represents
    objects defined in a geometric
    space.
    • Spatial databases represent
    geometric objects such as
    lines and polygons. Some
    databases handle complex
    structures such as 3D objects
    and topological coverages.
    from boundlessgeo.com

    View Slide

  17. PostGIS – Geographic objects

    View Slide

  18. PostGIS – Geospatial joins

    View Slide

  19. Postgres can become a spatial database
    • The PostGIS extension turns PostgreSQL into one of
    most popular geospatial databases in the world.
    • Thousands of companies use PostGIS for spatial
    workloads – from projects such as OpenStreetMap to
    start-ups like Hotel Tonight.
    • If you need more from your spatial database, you can
    easily extend Postgres. In fact, PostGIS comes with six
    other extensions for specific use cases.

    View Slide

  20. Postgres can only do row storage
    • Postgres 9.1+ comes with foreign data wrapper APIs.
    With these APIs, you can add read from or write to any
    data source.
    • Postgres already has 106 wrappers. With these, you
    can run SQL commands on diverse data sources:
    1. S3 (read-only)
    2. MongoDB
    3. Oracle
    4. Cstore_fdw

    View Slide

  21. CStore – Columnar storage
    • CStore is under
    development. For
    example, cstore
    doesn’t yet support
    Update / Delete
    commands.
    • Cstore’s primary
    benefit today is
    compression. People
    use it to reduce in-
    memory and storage
    footprint.

    View Slide

  22. Block 1
    Block 2
    Block 3
    Block 4
    Block 5
    Block 6
    Block 7
    150K rows
    (configurable)
    150K rows
    (configurable) 10K column values
    (configurable) per
    block
    ORC file format

    View Slide

  23. CStore – Data Load and Query

    View Slide

  24. Postgres can do more than row stores
    • Default storage engine for relational databases is row-
    oriented. But, Postgres can do way more than row stores.
    • You can extend Postgres to store data in a columnar
    format or interact with other databases – such as
    DynamoDB or Oracle.
    • Postgres provides extension apis to (1) scan foreign
    tables, (2) scan foreign joins, (3) update foreign tables, (4)
    lock rows, (5) sample data, (6) override planner and
    executor, and more.

    View Slide

  25. Postgres doesn’t scale
    • “SQL doesn’t scale” answers a complex problem by
    making a simple statement.
    • SQL means different things to different people.
    Depending on the context, it could mean multi-tenant
    (B2B) databases, short read/writes, real-time analytics,
    or data warehousing.
    • Scaling each one of these workloads require extending
    the relational database in a different way.

    View Slide

  26. Citus – Distributed database
    1. Citus scales out PostgreSQL
    • Uses sharding and replication
    • Query engine parallelizes SQL queries across machines
    2. Citus extends PostgreSQL
    • Uses Postgres extension APIs to cooperate with or extend all
    database modules
    3. Available in 3 ways
    • Open source, enterprise software, and managed database as a
    service on AWS

    View Slide

  27. Citus – Scaling out PostgreSQL

    View Slide

  28. Citus – Architecture diagram (simplified)
    Coordinator
    SELECT sum(…), count(…) FROM
    teams_1001
    SELECT sum … FROM teams_1003
    Worker node 1
    Table metadata
    Table_1001
    Table_1003
    SELECT sum … FROM teams_1002
    SELECT sum … FROM teams_1004
    Worker node 2
    Table_1002
    Table_1004
    Worker node N
    .
    .
    .
    .
    .
    .
    Each node Postgres with Citus installed
    1 shard = 1 Postgres table
    SELECT avg(..) FROM teams;

    View Slide

  29. Postgres can scale
    • “SQL doesn’t scale” is a
    simple statement to a
    complex problem. It’s easy
    to dismiss a complex
    problem by making a
    statement - that trivializes
    the problem.
    • SQL is hard, not
    impossible, to scale.

    View Slide

  30. Summary
    • Postgres Extension APIs provide a unique way to build
    new databases.
    • Postgres can be extended to many different workloads
    1. jsonb: Semi-structured data
    2. PostGIS: Geospatial database
    3. cstore_fdw: columnar storage (in works)
    4. Citus: Scale out your database

    View Slide

  31. Conclusion
    • Postgres 10 enables you to extend any database
    module’s behavior. This way, you can use functionality
    built into Postgres over decades. You can also grow
    with the rich ecosystem of tools and libraries.
    • Extensions are a game changer for databases.
    • The monolithic relational database could be dying. If
    so, long live Postgres!

    View Slide

  32. © 2017 Citus Data. All right reserved.
    [email protected]
    @citusdata
    Ozgun Erdogan
    www.citusdata.com
    citusdata.com/
    newsletter

    View Slide