Searching 23,000 Photos with Modern VLMs: From Text to Image

copyright © 2026 CMS Communications Inc. all rights reserved. Searching
23,000 Photos with Modern VLMs: From Text to Image ─PyCon JP Image Search─ CMS Communications Inc. CEO Manabu TERADA PyCon US 2026 LT / May 15 2026

copyright © 2026 CMS Communications Inc. all rights reserved. Self
Introduction

copyright © 2026 CMS Communications Inc. all rights reserved. Self
Introduction 寺田学 (Manabu TERADA) • Self-proclaimed Vector Researcher • Podcast in Japanese「terapyon channel」(https://podcast.terapyon.net) • Related Publications (Co-author/Supervision) in Japanese ◦ Pythonデータ分析実践ハンドブック (インプレス: 2023年9月) ◦ Pythonによるあたらしいデータ分析の教科書第 2版（翔泳社:2022年10月） ◦ Python実践レシピ (技術評論社: 2022年1月) ◦ スラスラわかるPython第2版（翔泳社:2021年11月） ◦ 機械学習図鑑（翔泳社 : 2019年4月）

copyright © 2026 CMS Communications Inc. all rights reserved. Titles
寺田学 (Manabu TERADA) • CEO of CMS Communications Inc. • Python Asia Organization Founder and Board member • PSF Fellow • Former Board Member of PyCon JP Association

copyright © 2026 CMS Communications Inc. all rights reserved. Python
technical assistance consulting services Our Services

copyright © 2026 CMS Communications Inc. all rights reserved. PyCon
JP and Archive Photos • History of PyCon JP • Photo Archive • How We Use These Photos and the Motivation

copyright © 2026 CMS Communications Inc. all rights reserved. History
of PyCon JP Since 2011 • We have held 15 events from 2011 to 2025. • PyCon JP 2026 is Aug 21 - 23 at Hiroshima ◦ CfP opened

copyright © 2026 CMS Communications Inc. all rights reserved. Photo
Archive • Photo Archive: All past photos are published under the CC-BY license on Flickr. ◦ https://www.ﬂickr.com/photos/pyconjp/ • Volume: Approximately 23,000 photos.

copyright © 2026 CMS Communications Inc. all rights reserved. How
We Use These Photos and the Motivation • Using past photos for blogs, websites, and promotional brochures. • Serving as a reference for organizing team members when planning future events. • Can we … ◦ search for photos using natural language? ◦ ﬁnd visually similar photos? ◦ identify the same person across different years using face recognition?

copyright © 2026 CMS Communications Inc. all rights reserved. Technical
Solution • Models ◦ VLM (Vision Language Model) ◦ Face Recognition • The Implementation Workflow

copyright © 2026 CMS Communications Inc. all rights reserved. Models
VLM (Vision Language Model) • Used Google SigLIP 2 for natural language and image-to-image search. Face Recognition • Used InsightFace (ArcFace) to enable high-accuracy person matching.

copyright © 2026 CMS Communications Inc. all rights reserved. The
Implementation Workﬂow – Build • Downloaded all photos from Flickr to a local environment. • Generated SigLIP 2 embeddings for all 23,000 images. • Detected faces and generated face embeddings for roughly 70,000 detected faces. • Uploaded these vectors along with metadata to Firestore.

copyright © 2026 CMS Communications Inc. all rights reserved. The
Implementation Workﬂow – Search • Frontend: Built a React app that runs the embedding model locally (in-browser) to convert search queries or images into vectors. • Search: Performs an Approximate Nearest Neighbor (ANN) search between the query vector and stored vectors. ◦ Note: I used some specialized techniques from my own research to keep this within the Firestore free tier. Explaining this would take 30 minutes, so I’ll skip the details today!

copyright © 2026 CMS Communications Inc. all rights reserved. Hosting
& Privacy Hosting • Firebase Free tier infrastructure Privacy • PyCon JP members use only ◦ The "Scary" Reality: Although these photos are CC-BY and public, the face recognition accuracy was so high it felt a bit overwhelming, so we decided not to make the face search public.

Searching 23,000 Photos with Modern VLMs: From...

Searching 23,000 Photos with Modern VLMs: From Text to Image

Manabu TERADA

More Decks by Manabu TERADA

Other Decks in Technology

Featured

Transcript

copyright © 2026 CMS Communications Inc. all rights reserved. Searching

copyright © 2026 CMS Communications Inc. all rights reserved. Self

copyright © 2026 CMS Communications Inc. all rights reserved. Self

copyright © 2026 CMS Communications Inc. all rights reserved. Titles

copyright © 2026 CMS Communications Inc. all rights reserved. Python

copyright © 2026 CMS Communications Inc. all rights reserved. PyCon

copyright © 2026 CMS Communications Inc. all rights reserved. History

copyright © 2026 CMS Communications Inc. all rights reserved. Photo

copyright © 2026 CMS Communications Inc. all rights reserved. How

copyright © 2026 CMS Communications Inc. all rights reserved. Technical

copyright © 2026 CMS Communications Inc. all rights reserved. Models

copyright © 2026 CMS Communications Inc. all rights reserved. The

copyright © 2026 CMS Communications Inc. all rights reserved. The

copyright © 2026 CMS Communications Inc. all rights reserved. Web

copyright © 2026 CMS Communications Inc. all rights reserved. Hosting

copyright © 2026 CMS Communications Inc. all rights reserved. Code

copyright © 2026 CMS Communications Inc. all rights reserved.

copyright © 2026 CMS Communications Inc. all rights reserved. Demo

copyright © 2026 CMS Communications Inc. all rights reserved. Thank

copyright © 2026 CMS Communications Inc. all rights reserved. PAO