Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Loupe - a Search Engine for PHP and SQLite

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Loupe - a Search Engine for PHP and SQLite

Yanick Witschi
Loupe - a Search Engine for PHP and SQLite

Why would anyone be crazy enough to implement full-text search in PHP? This talk is the story of that “stupid idea” - and how it turned into Loupe.

I’ll start with the real-world problem that sparked the project: when existing search solutions felt too heavy or too expensive for my use case. From there, we’ll walk through the decision to build a search engine from scratch - what went wrong, what worked, and what I’ve learnt along the way.

Step by step, I’ll show how Loupe was built incrementally: from a tiny index, to fast queries, to surprisingly powerful features. We’ll dive into the heart of typo-tolerant search and unpack the algorithmic tricks that make it very fast, even on large datasets.

Whether you love search, PHP, algorithms - or bad ideas that somehow work - this talk is for you.

Avatar for Yanick Witschi

Yanick Witschi

June 02, 2026

More Decks by Yanick Witschi

Other Decks in Programming

Transcript

  1. Loupe - a search engine with just PHP and SQLite

    Neos Conference 2026
 Yanick Witschi @to fl ar
  2. Today’s story • 2022 • Customer in tourism sector •

    New requirement, typical POI lists • Accommodations • Hiking trails • Restaurants
  3. Requirements • Platform requirements: • PHP and SQLite • DX

    requirements: • Simple API (aka not like ElasticSearch) -> Meilisearch as role model • Functional requirements: • Filter logic (categories, price, geographic distance) • Sorting alphabetically, geographic distance and relevance of course • Typo tolerance for typical letter mixups
  4. Combination - Parser • Lexer / Parser • doctrine/lexer •

    Longer query at the end of the day, but SQLite is fast! ✅
  5. Quick wins • Store the length of the term (11

    for «grindelwald»)
 length >= 9 AND length <= 13 ✅ • First letter must match
 term LIKE 'g%' AND loupe_levenshtein(…) 😕🤷
  6. Big players • Burkhard-Keller tree (BK-tree) • Finite State Transducer

    (FST) • n-gram index • Phonetic algorithms • Keyboard-distance models • Statistical spell correction • Neural rerankers ⚡🧠 RAM
  7. SSI - Cleverness • Do not replace levenshtein()! • but

    reduce candidates to a manageable size fi rst! • tolerate false-positives • Computer science students? • Same idea as a bloom fi lter
  8. SSI - Cleverness We just pretend as if our alphabet

    consisted of only 4 letters 🤓💡
  9. Levenshtein Operations • Match -> possible paths: 1 • Replace

    -> possible paths: max. 3 • Insert -> possible paths: 1 • Delete -> possible paths: 4 • That’s a maximum of only 9 paths to follow for every state
  10. Letter Unicode Codepoint Formula: (Codepoint % 4) + 1 Label

    h 104 (104 % 4) + 1 1 a 97 (97 % 4) + 1 2 u 117 (117 % 4) + 1 2 s 115 (115 % 4) + 1 4 m 109 (109 % 4) + 1 2 n 110 (110 % 4) + 1 3 d 100 (100 % 4) + 1 1 Alphabet - Unicode
  11. Query • for every letter in the query «grindlewald» •

    or just the fi rst e.g 7 letters (false-positives are expected anyway💡) • Calculate the reachable states for all 4 Levenshtein operations • { 1291913, 5167653 } 🚀😎
  12. Performance • Test fi le «movies.json» of Meilisearch (16.2 MB)

    • 32k movies with title and teaser • 75k «terms» for Loupe • «loupe.db» gets about 230 MB • Searching for «Amakin Dkywalker» • including relevance and facets calculation • ~ 30ms and < 10 MB RAM (< 20 ms coming soon™) 🚀😎
  13. Loupe can do a lot more • Damerau-Levenshtein («Grindlewald» vs.

    «Grindelwald» is just one typo) • Highlighting • Phrase search («"Neos Conference"») • Excluding queries using «-» • Stemming and language detection using n-grams for tokenization • Relevance calculation (number of term matches, number of typos, proximity of matches, attribute weighting, exactness) • Facets (statistics regarding the current search result) 💪😎
  14. Future - compound words • Work towards version 1.0.0 •

    Loupe is a pre fi x search (as are also others, e.g. Meilisearch) • Searching for «brush» will thus not match «toothbrush» • Splitting «toothbrush» into «tooth» and «brush» • Decomposition is hard, especially in German • Donaudampfschi ff ahrtsgesellschaftskapitän
 (yes, it’s real: https://de.wikipedia.org/wiki/Donaudampfschi ff ahrtsgesellschaftskapit%C3%A4n)
  15. Contact • Always up for hire! • Fediverse: phpc.social/@to fl

    ar • X: @to fl ar • GitHub: to fl ar • E-Mail: [email protected] • Slack