Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a semantic search experience using PHP...

Building a semantic search experience using PHP and Meilisearch

Loulier Guillaume

November 15, 2024
Tweet

More Decks by Loulier Guillaume

Other Decks in Programming

Transcript

  1. Hellcome Guillaume Loulier / @Guikingone Break things @SensioLabs Work mainly

    with PHP and Rust, a bit of Zig (thanks to ZML) Publish a weekly newsletter about cloud computing, machine learning, PHP and more on Substack
  2. Summary 01 - What do you mean? 02 - Are

    computers capable of understanding? 03 - What about a demo? 04 - Any advices? 01
  3. Searching is hard Until recently, search experiences / engines were

    built on top of keyword search Keyword search is based on sentence occurrences, hits and a bit of luck What if a sentence contains many times the word we’re searching for? Is it relevant?
  4. Everything has a meaning The human brain is wired to

    perform “logical thinking” via memories and patterns, computers are tied to binaries and mathematics Trying to search in a “logical way” via computers is a path to failure Could mathematics helps us? How?
  5. Did you said maths? But, wait, what are words and

    sentences if not sequences of numbers in the alphabet? What if we could expose words into a three dimensional space and see them? What about images? If everything is a number, could we perform calculations around them and find patterns?
  6. There’s no such thing as IA Intelligence is not “general”,

    think of it as a subset of specific intelligences What we call “IA” is just a buzzword for algorithms and workflows At the end, everything comes down to energy transformation and this topic is quite complex (and for another day)
  7. Understanding the world The recent revolutions of machine learning are

    LLMs and transformers, a subset of neural networks The concept behind is understanding and dealing with context LLMs are not intelligent, they excel at probabilities and tokenization, at the end, they’re all biased
  8. Maths, vectors and friends Semantic search (aka vector search) is

    about proximity and similarity If we transform everything to numbers, we can easily perform operations on them The closer numbers are, the more similar the content seems to be, the more relevant it seems
  9. Sadly Similarity doesn’t mean relevancy neither intention 0.1 is close

    to 0.2 but also to 0, wait, what’s behind 0? Depending on the context, similarity can introduce bias Context is key, everything else is fog The more context you gave, the better the results will be Similarity doesn’t mean that the meaning is correct “Hello world” is similar to “Hi there” but “Hello sir” is more relevant
  10. Context is meaningful The dog is running after the tennis

    ball in a kitchen The dog is running after the tennis ball in a kitchen The tennis ball is running after the dog in a kitchen
  11. Let’s vectorize it* The dog is running after the tennis

    ball in a kitchen 0 -1 0 1 -1 0 1 1 1 0 -1 0 2 2 2 0 0 2 -1 0 2 2 3 0 0 2 3 3 0 1 0 -1 2 1 1 2 0 1 -1 3 2 1 0 1 Encoder [-0.88440161, -0.00996133, 0.243678553, …] * Values are built for example
  12. Becoming speed Build for a specific use case at LVMH

    First iteration built in Go, rewritten in Rust, speaking of speed (< 50ms) Open-source (MIT) and built in France PHP SDK, Symfony (Sylius?) bundle and many more
  13. Tease me Built-in vector store Can be used with Ollama,

    OpenAI, Anthropic APIs and more Unified entrypoint for keyword and vector search Fine-tuning available, multi / federated search compatible
  14. Force of habits Built on top of symfony/demo Available via

    https://github.com/Guikingone/SyliusconSemantic
  15. Be contextful The more precise you are, the easier it

    will be to retrieve (seems obvious but…) Choosing the right model is hard, just experiment and/or fine-tune them Want to go further? What about RAG? At the end of the day, everything is tied to mathematics, don’t reinvent the wheel
  16. Oh, by the way… Think about user interactions rather than

    hits / results Semantic search is AN idea, not A promise Sometimes, a plain-old keyword search is enough and that’s fine Experiment, fine-tune, improve, learn and monitor