Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Searching 23,000 Photos with Modern VLMs: From...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Searching 23,000 Photos with Modern VLMs: From Text to Image

Searching 23,000 Photos with Modern VLMs: From Text to Image
─PyCon JP Image Search─

PyCon US 2026 LT / May 15 2026
CMS Communications Inc.
CEO Manabu TERADA

Avatar for Manabu TERADA

Manabu TERADA

May 16, 2026

More Decks by Manabu TERADA

Other Decks in Technology

Transcript

  1. copyright © 2026 CMS Communications Inc. all rights reserved. Searching

    23,000 Photos with Modern VLMs: From Text to Image ─PyCon JP Image Search─ CMS Communications Inc. CEO Manabu TERADA PyCon US 2026 LT / May 15 2026
  2. copyright © 2026 CMS Communications Inc. all rights reserved. Self

    Introduction 寺田 学 (Manabu TERADA) • Self-proclaimed Vector Researcher • Podcast in Japanese「terapyon channel」(https://podcast.terapyon.net) • Related Publications (Co-author/Supervision) in Japanese ◦ Pythonデータ分析 実践ハンドブック (インプレス: 2023年9月) ◦ Pythonによるあたらしいデータ分析の教科書第 2版(翔泳社:2022年10月) ◦ Python実践レシピ (技術評論社: 2022年1月) ◦ スラスラわかるPython第2版(翔泳社:2021年11月) ◦ 機械学習図鑑(翔泳社 : 2019年4月)
  3. copyright © 2026 CMS Communications Inc. all rights reserved. Titles

    寺田 学 (Manabu TERADA) • CEO of CMS Communications Inc. • Python Asia Organization Founder and Board member • PSF Fellow • Former Board Member of PyCon JP Association
  4. copyright © 2026 CMS Communications Inc. all rights reserved. Python

    technical assistance consulting services Our Services
  5. copyright © 2026 CMS Communications Inc. all rights reserved. PyCon

    JP and Archive Photos • History of PyCon JP • Photo Archive • How We Use These Photos and the Motivation
  6. copyright © 2026 CMS Communications Inc. all rights reserved. History

    of PyCon JP Since 2011 • We have held 15 events from 2011 to 2025. • PyCon JP 2026 is Aug 21 - 23 at Hiroshima ◦ CfP opened
  7. copyright © 2026 CMS Communications Inc. all rights reserved. Photo

    Archive • Photo Archive: All past photos are published under the CC-BY license on Flickr. ◦ https://www.flickr.com/photos/pyconjp/ • Volume: Approximately 23,000 photos.
  8. copyright © 2026 CMS Communications Inc. all rights reserved. How

    We Use These Photos and the Motivation • Using past photos for blogs, websites, and promotional brochures. • Serving as a reference for organizing team members when planning future events. • Can we … ◦ search for photos using natural language? ◦ find visually similar photos? ◦ identify the same person across different years using face recognition?
  9. copyright © 2026 CMS Communications Inc. all rights reserved. Technical

    Solution • Models ◦ VLM (Vision Language Model) ◦ Face Recognition • The Implementation Workflow
  10. copyright © 2026 CMS Communications Inc. all rights reserved. Models

    VLM (Vision Language Model) • Used Google SigLIP 2 for natural language and image-to-image search. Face Recognition • Used InsightFace (ArcFace) to enable high-accuracy person matching.
  11. copyright © 2026 CMS Communications Inc. all rights reserved. The

    Implementation Workflow – Build • Downloaded all photos from Flickr to a local environment. • Generated SigLIP 2 embeddings for all 23,000 images. • Detected faces and generated face embeddings for roughly 70,000 detected faces. • Uploaded these vectors along with metadata to Firestore.
  12. copyright © 2026 CMS Communications Inc. all rights reserved. The

    Implementation Workflow – Search • Frontend: Built a React app that runs the embedding model locally (in-browser) to convert search queries or images into vectors. • Search: Performs an Approximate Nearest Neighbor (ANN) search between the query vector and stored vectors. ◦ Note: I used some specialized techniques from my own research to keep this within the Firestore free tier. Explaining this would take 30 minutes, so I’ll skip the details today!
  13. copyright © 2026 CMS Communications Inc. all rights reserved. Hosting

    & Privacy Hosting • Firebase Free tier infrastructure Privacy • PyCon JP members use only ◦ The "Scary" Reality: Although these photos are CC-BY and public, the face recognition accuracy was so high it felt a bit overwhelming, so we decided not to make the face search public.
  14. copyright © 2026 CMS Communications Inc. all rights reserved. Code

    repository • Publicly available as a reference implementation • https://github.com/pyconjp/pyconjp-image-search