寺田 学 (Manabu TERADA) • CEO of CMS Communications Inc. • Python Asia Organization Founder and Board member • PSF Fellow • Former Board Member of PyCon JP Association
Archive • Photo Archive: All past photos are published under the CC-BY license on Flickr. ◦ https://www.flickr.com/photos/pyconjp/ • Volume: Approximately 23,000 photos.
We Use These Photos and the Motivation • Using past photos for blogs, websites, and promotional brochures. • Serving as a reference for organizing team members when planning future events. • Can we … ◦ search for photos using natural language? ◦ find visually similar photos? ◦ identify the same person across different years using face recognition?
VLM (Vision Language Model) • Used Google SigLIP 2 for natural language and image-to-image search. Face Recognition • Used InsightFace (ArcFace) to enable high-accuracy person matching.
Implementation Workflow – Build • Downloaded all photos from Flickr to a local environment. • Generated SigLIP 2 embeddings for all 23,000 images. • Detected faces and generated face embeddings for roughly 70,000 detected faces. • Uploaded these vectors along with metadata to Firestore.
Implementation Workflow – Search • Frontend: Built a React app that runs the embedding model locally (in-browser) to convert search queries or images into vectors. • Search: Performs an Approximate Nearest Neighbor (ANN) search between the query vector and stored vectors. ◦ Note: I used some specialized techniques from my own research to keep this within the Firestore free tier. Explaining this would take 30 minutes, so I’ll skip the details today!
& Privacy Hosting • Firebase Free tier infrastructure Privacy • PyCon JP members use only ◦ The "Scary" Reality: Although these photos are CC-BY and public, the face recognition accuracy was so high it felt a bit overwhelming, so we decided not to make the face search public.