Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Image Search in Ruby: Postgres, Redis,...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Semantic Image Search in Ruby: Postgres, Redis, or LLM? A CO₂-Conscious Comparison

Postgres, Redis, or LLM? Ask any engineer, and you already know the answer: it depends.
On what, exactly? Usually, there are four things: speed, cost, complexity, and quality. This talk adds a fifth column to that table and shows how to fill it in.
Three real Ruby implementations of semantic image search are run on the same dataset and machine, and compared on grams of CO₂e per query alongside the usual four axes.
The deck includes the measurement method, the formula, the Ruby snippets, and the numbers — but no recommendation.
Just one column you probably didn't have before.

Rubycon.it 2026 — Michele Franzin, SeeSaw.

Avatar for Michele Franzin

Michele Franzin

May 08, 2026

More Decks by Michele Franzin

Other Decks in Technology

Transcript

  1. Michele Franzin Three engines. Same data. Same machine. Latency. Cents.

    Carbon. Postgres, Redis, or LLM? Semantic image search in Ruby.
  2. It depends. It always depends. Postgres, Redis, or LLM? …depends

    on what, exactly? Speed Cost Complexity Qu a lity gCO2e
  3. "sunset be a ch" → [0.12, -0.34, 0.91, 0.07, ...,

    -0.22] ↑ 768 numbers Embeddings Dist a nce is a simil a rity score, not metres.
  4. HNSW (gr a ph) Sub-line a r, the st a

    nd a rd Brute Force O(N) - comp a re with a ll Why approximate? ce with all Tree-based Breaks in high dimensions HNSW (graph) Sub-linear, the standard splits get messy Brute force O(N) — compare with all Tree-based Breaks in high dimensions HN Sub-lin splits get messy Both pgvector a nd Redis use HNSW.
  5. Im a ges 7.000+ Embedding 768-D vector C a ption

    + t a gs text met a d a t a SigLIP 2 Qwen3.5-9B manifest.json Common o ff l ine phase Two AI models. One sh a red f ile. The st a rting point of every engine.
  6. Every rest a rt Once Never Prep An a lyze

    im a ges Lo a d Push d a t a to the engine Query Answer the user manifest.json Every request Once, upfront Three phases: Prep, Load, Query We me a sure a ll three. Query repe a ts — so it weighs more. Prep a nd Lo a d a re re a l costs, not free.
  7. [ SigLIP 2 ] [0.12, -0.34, 0.76, 0.33, … ,

    0.91] 768 numbers s a me sp a ce a s the im a ges "summer be a ch p a rty" First, the query becomes an array… The engine never sees your text — only its embedding.
  8. pgvector F a shion client Production Redis HSNW Tr a

    vel photogr a phy pl a tform Production LLM Qwen3.5-9B Intern a l experiment B a seline Re a l code. Re a l clients. Re a l choice..
  9. pgvector • ACID • Your DBA st a ys •

    HNSW + cosine, in SQL Durable relational baseline
  10. SQL search cosine dist a nce oper a tor HNSW

    index does the he a vy lifting
  11. Redis HNSW • Sub-millisecond • S a me a lgorithm

    f a mily a s pgvector • FLOAT32 bin a ry Low-latency in-memory baseline
  12. LLM - Qwen3.5-9B • S a me model used in

    Prep — now used a t runtime • No vector index · No lo a d • O(N) — every query re a ds everything • ⾠ Not production-re a dy without a retriev a l f irst st a ge Reasoning-heavy baseline
  13. SYSTEM PROMPT USER PROMPT RANK RESULTS the prompt + rank

    pipeline temper a ture: 0.0 — s a me query, s a me a nswer, every run.
  14. What to do in real life production query STAGE 1:

    retriev a l vector se a rch (HNSW) O(log N) · f a st · che a p STAGE 2: LLM rer a nk re a soning over text O(K) · slow · sm a rt ~50 c a ndid a tes top-K NUMBERS We kept the LLM pure to isol a te one cost. In production, you'd hybridize.
  15. build_search_result(title:, rows:) redis/se a rch.rb llm/se a rch.rb pgvector/se a

    rch.rb S a me sign a ture. S a me return sh a pe. Di ff erent worlds inside.
  16. 3 engines. How do we compare them? no sh a

    red method — yet Speed Cost Complexity Qu a lity gCO2e
  17. How to measure carbon footprint for real 🏭 Manufacturing 5+

    inputs Σ (component_embodied_CO₂e × quantity) / lifetime + transport + EOL ⚡ Operational energy 7+ inputs Σ (P_CPU + P_GPU + P_RAM + P_storage + P_network) × h × util 🌬 Datacenter overhead 2 inputs total_energy × PUE (cooling, UPS, lighting, losses) 🔌 Grid carbon intensity 3 inputs Σ (energy_t × gCO₂_per_kWh_t) over time, region-specific ✂ Allocation 4 inputs your_share = your_resources / total_resources (CPU, RAM, ...) Following the ADEME PCR for Datacenter and Cloud services Requires d a t a center sensors, m a nuf a cturing certi f ic a tes, grid d a t a . We h a ve none of th a t.
  18. Our way: SCI-lite A public st a nd a rd

    from the Green Softw a re Found a tion. Reproducible by a nyone with a power meter. Software Carbon Intensity — operational only Rigorous w a y SCI-lite All f ive l a yers Oper a tion a l only D a t a center a ccess Power re a dings on the met a l Absolute cl a ims Rel a tive cl a ims Me a surement-gr a de Decision-gr a de
  19. gCO2e = Watt × seconds × 215.9 3.600.000 • 🟰

    S a me m a chine for a ll three → system a tic errors c a ncel • 📊 powermetrics re a ds SoC power model → consistent, not precise • 🇮🇹 215.9 gCO₂/kWh = It a li a n grid a ver a ge → decl a red a ssumption The recipe
  20. CLOCK_MONOTONIC me a sures dur a tions only, never jumps

    b a ckw a rds on a clock sync Measuring in Ruby, in-house
  21. HARDWARE M4 Mac mini M4 Pro · 32 GB Bare-metal

    · macOS native ENERGY PROBE /sbin/powermetrics 5 samples per second MODELS ON THE METAL SigLIP 2 · MLX adapter Qwen3.5-9B · LM Studio · T = 0.0 PROTOCOL Warm-up · idle baseline 60 s N queries → energy / N Setup
  22. Honest disclosures powermetrics ≠ w a ll-plug. SoC estim a

    tes, v a lid for rel a tive comp a rison. Apple Neur a l Engine sometimes reports 0 mW under re a l lo a d. Logged. Tr a ining cost of SigLIP 2 / Qwen3.5-9B excluded. Not published, not a udit a ble. Runtime overhe a d (Docker, LM Studio, MLX vi a Python): a few percent, consistent a cross engines.
  23. Axis 1: gCO₂e per query 0 20 40 60 80

    p r l 75,39 0,90 0,87 mgCO₂e/query 1 10 100 1000 p r l x866,5 x1,2 x1,0 mgCO₂e/query (r a tio) NUMBERS 200-im a ge d a t a set · 50 queries · sequenti a l · ste a dy st a te.
  24. Axis 1: gCO₂e per phase mgCO₂e p r l 1

    10 100 1000 12 1 1 1 1 2348 2 2 Prep ph a se Lo a d ph a se Add extr a im a ge Prep is sh a red — s a me model, s a me im a ges, s a me cost. Lo a d a nd Query a re where the engines diverge.
  25. Axis 2: Latency 0 45 90 135 180 p r

    l 169,7 0,5 0,5 L a tency (seconds)
  26. Axis 3: Cost, a proxy of latency 0 350 700

    1050 1400 p r l 1.385 4 3 € per 1000 queries € per 1000 queries · 0.50 €/h fi xed rate
  27. L a tency Cost Qu a lity Adoption simplicity gCO₂e

    1 3 4 5 pgvector redis llm Synthesis P R L L a tenct 5 5 1 Cost 5 5 1 Qu a lity 3/4 3/4 5 Adoption 5 3 1 gCO₂e 5 5 1
  28. Thanks Today. Postgres, Redis, or LLM? It depends — but

    on fi ve axes now, not four. Why. Carbon is a real cost. Worth a number. Tomorrow. Just start asking.