Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How pgvector extends PostgreSQL

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

How pgvector extends PostgreSQL

Avatar for Thiago Colares

Thiago Colares

November 11, 2025

More Decks by Thiago Colares

Other Decks in Research

Transcript

  1. How pgvector extends PostgreSQL Thiago Colares (thicolares.com) – Nov 11,

    2025 AtyimoLAB: Big Data Science Research, PGCOMP UFBA
  2. pgvector overview 3 ▣ Open-source vector similarity search for PG

    ▣ Find by characteristics, not exact matches ▣ Handle vectors alongside regular data ▣ Exact and approximate nearest neighbor search ▣ Various vector distance functions https://github.com/pgvector/pgvector
  3. PostgreSQL with pgvector 4 Get images, text, audio… Transform into

    embedding Save embedding in the database Embedding Model (OpenAI, Gemini, In-house, Open-source ones etc.) App Get query Search embedding in the database App 🏞📄 🎶 [0.5,0.3,...,0.1] [0.6,0.2,...,0.9] "flying whales" [0.5,0.3,...,0.1] [0.6,0.2,...,0.9] [0.4,0.3,...,0.1]
  4. SQL-visible type The vector data type Internal representation 5 CREATE

    TABLE document_embeddings ( id integer not null primary key, embedding vector(1536) not null ); typedef struct Vector { int32 vl_len_; int16 dim; int16 unused; float x[FLEXIBLE_ARRAY_MEMBER]; } Vector;
  5. Search with a distance function 6 https://github.com/pgvector/pgvector?tab=readme-ov-file#querying Search query example

    Supported distance functions <-> L2 distance (euclidian) <#> (negative) inner product <=> cosine distance <+> L1 distance <~> Hamming distance (bin. vectors) <%> Jaccard distance (bin. vectors) SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
  6. Minimal directory structure (C-based) 11 https://www.postgresql.org/docs/current/extend-pgxs.html. Rust: https://crates.io/crates/cargo-pgx myextension/ ├─

    Makefile ├─ myextension.control ├─ sql/ │ └─ myextension--1.0.sql ├─ src/ │ └─ myextension.c ← how to compile the C code ← metadata about the extension ← create functions, types... ← implementation of C functions
  7. ▣ The official build infrastructure for extensions ▣ Makefile definitions

    provided by PostgreSQL ▣ What to build / Where to install extension files ▣ PostgreSQL handles paths/config ▣ Else, hardcode set install paths: Build with PGXS (Extension System) 12 https://www.postgresql.org/docs/current/extend-pgxs.html install: cp my_ext.so /usr/lib/postgresql/16/lib/ cp my_ext.control /usr/share/postgresql/16/extension/
  8. Makefile with PGXS (key lines) 13 https://www.postgresql.org/docs/current/extend-pgxs.html EXTENSION = vector

    MODULE_big = vector PG_CONFIG ?= pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) # name; vector.control file; dest. prefix/share/vector: # build a single shared library named vector (e.g. vector.so): # get PostgreSQL’s inc. dirs, compiler flags, and inst. paths: # get the path to PostgreSQL's PGXS Makefile fragment: # loads the PGXS build system:
  9. Distribution ▣ PostgreSQL has no official registry (like PyPI, npm)

    ▣ PGNX (PostgreSQL Extension Network) ▣ Source with PGXS Makefile (Github etc.) ▣ Binary packages (apt-get, homebrew, pgxman…) 14 https://pgxn.org/ https://pgxman.com/
  10. Defining type 16 https://www.postgresql.org/docs/18/sql-createtype.html -- sql/vector.sql CREATE TYPE vector (

    INPUT = vector_in, -- text → internal OUTPUT = vector_out, -- internal → text TYPMOD_IN = vector_typmod_in, -- check params RECEIVE = vector_recv, -- bin → internal SEND = vector_send, -- internal → bin STORAGE = external );
  11. Defining operator 17 https://www.postgresql.org/docs/18/sql-createoperator.html -- sql/vector.sql CREATE OPERATOR <-> (

    LEFTARG = vector, -- left operand data type RIGHTARG = vector, -- right operand data type PROCEDURE = l2_distance, COMMUTATOR = '<->' );
  12. Defining functions (without index) 18 https://www.postgresql.org/docs/18/sql-createoperator.html -- sql/vector.sql CREATE FUNCTION

    l2_distance(vector, vector) RETURNS float8 AS 'MODULE_PATHNAME' LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE; // C implementation FUNCTION_PREFIX PG_FUNCTION_INFO_V1(l2_distance); Datum l2_distance(PG_FUNCTION_ARGS) { ...
  13. Defining functions (with index) 19 https://www.postgresql.org/docs/18/sql-createopclass.html -- sql/vector.sql CREATE OPERATOR

    CLASS vector_l2_ops FOR TYPE vector USING hnsw AS FUNCTION 1 vector_l2_squared_distance(vector, vector); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ...
  14. At extension installation time ▣ You run CREATE EXTENSION vector

    ▣ PosgtreSQL runs vector.sql ▣ CREATE TYPE vector… triggers INSERT INTO pg_type… ▣ CREATE FUNCTION… triggers INSERT INTO pg_proc… ▣ CREATE OPERATOR CLASS… triggers INSERT INTO pg_opclass… ▣ And so on… ▣ Metadata stored, but no C code executed yet! 21
  15. At index-creation time ▣ Records that this index uses hnsw

    + vector_l2_ops ▣ Stores metadata in pg_index and pg_class 22 CREATE INDEX idx ON table USING hnsw (embedding vector_l2_ops); ^ ^^^^^^^^^^^^^ | operator class column
  16. At query execution time (at runtime) ▣ ▣ Planner: Ah!

    There’s an HNSW index! ▣ Executor: Which operator class? → vector_l2_ops ▣ What is FUNCTION 1 of vector_l2_ops? → vector_l2_squared_distance □ Looks up pg_proc → probin="/path/vector.so", prosrc="..." etc. □ Loads the C function ▣ Perform HNSW and L2 calls the function during search 23 SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;
  17. pgvector 25 ▣ Open-source vector similarity search for Postgres ▣

    Handle vectors alongside regular data ▣ Exact and approximate nearest neighbor search ▣ Various vector distance functions ▣ Single-precision, half-precision, bin., sparse vectors ▣ PG features: ACID, JOINs, point-in-time recover…
  18. Building PostgreSQL extensions 26 ▣ Handful official Makefile boilerplate (PGXS)

    ▣ Define with types, functions, operators with SQL ▣ C, SQL, PL languages (limited), Rust (via pgx) Flow: ▣ Installation: store metadata ▣ Runtime: context → metadata → find/run implementation https://www.postgresql.org/docs/current/extend-pgxs.html https://www.postgresql.org/docs/18/extend.html https://docs.rs/pgx/latest/pgx/
  19. References 27 ▣ GitHub - pgvector/pgvector: Open-source vector similarity search

    for Postgres ▣ Exact vs Approximate Nearest Neighbors in Vector Databases ▣ https://github.com/thicolares/pgvector_experiment ▣ PostgreSQL: Documentation: 18: 36.18. Extension Building Infrastructure ▣ PostgreSQL: Documentation: 18: CREATE TYPE ▣ PostgreSQL: Documentation: 18: CREATE OPERATOR CLASS ▣ cargo-pgx - crates.io: Rust Package Registry ▣ Integrated Vector Database - Azure Cosmos DB | Microsoft Learn ▣ PGXN
  20. Thanks! Any questions? Thiago Colares (thicolares.com) – Nov 11, 2025

    AtyimoLAB: Big Data Science Research, PGCOMP UFBA 28