Slide 1

Slide 1 text

Sorting Improvements in PostgreSQL 9.5 and 9.6 Peter Geoghegan — Heroku

Slide 2

Slide 2 text

PostgreSQL 9.5 • Sorting is important — CREATE INDEX, ORDER BY, DISTINCT(), CLUSTER, etc. • “Abbreviated keys” optimization added. • For text. • For numeric. • We sort an array of SortTuple structs. • Stores first field, and pointer to “tuple proper”. • e.g. For CREATE INDEX, tuple proper is IndexTuple.

Slide 3

Slide 3 text

PostgreSQL 9.5 • First field is stored inline is SortTuple, and may be pass-by- value (the value itself, for types like integer), or pass-by- reference, for arbitrary sized types like jsonb. • Before, text was always simply treated as pass-by- reference. Pointer chasing required for every comparison. • However, with new abbreviated keys, a special pass-by- value prefix is stored rather than a pointer. Usually sufficient to resolve comparison without pointer chasing. • Basically, abbreviated value is first 8 bytes of each string being sorted.

Slide 4

Slide 4 text

PostgreSQL 9.5 • Text also benefits from cheap memcmp() pre- check. Abbreviated key tie-breaker can use this for low-cardinality sets, and we still win big. • Tie-breaker must consult tuple proper. • For numeric, special encoding scheme usually makes comparisons simple integer comparisons.

Slide 5

Slide 5 text

PostgreSQL 9.6 • Recently, started looking at external sorting. • Revising tuplesort to use simple hybrid sort-merge strategy. • Runs are quicksorted. Limited remaining use of heap data structure. • Various other optimizations that target memory bandwidth/ latency bottleneck. • Further big improvements in performance. 2x - 3.25x, even with a small fraction of the memory required to do everything in memory.