Building a semantic search experience using PHP and Meilisearch

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Hello there

Slide 3

Slide 3 text

Hellcome Guillaume Loulier / @Guikingone Break things @SensioLabs Work mainly with PHP and Rust, a bit of Zig (thanks to ZML) Publish a weekly newsletter about cloud computing, machine learning, PHP and more on Substack

Slide 4

Slide 4 text

Summary 01 - What do you mean? 02 - Are computers capable of understanding? 03 - What about a demo? 04 - Any advices? 01

Slide 5

Slide 5 text

But really, what do you mean?

Slide 6

Slide 6 text

Ego time

Slide 7

Slide 7 text

Searching is hard Until recently, search experiences / engines were built on top of keyword search Keyword search is based on sentence occurrences, hits and a bit of luck What if a sentence contains many times the word we’re searching for? Is it relevant?

Slide 8

Slide 8 text

Everything has a meaning The human brain is wired to perform “logical thinking” via memories and patterns, computers are tied to binaries and mathematics Trying to search in a “logical way” via computers is a path to failure Could mathematics helps us? How?

Slide 9

Slide 9 text

Did you said maths? But, wait, what are words and sentences if not sequences of numbers in the alphabet? What if we could expose words into a three dimensional space and see them? What about images? If everything is a number, could we perform calculations around them and find patterns?

Slide 10

Slide 10 text

But wait, what about “IA”?

Slide 11

Slide 11 text

There’s no such thing as IA Intelligence is not “general”, think of it as a subset of specific intelligences What we call “IA” is just a buzzword for algorithms and workflows At the end, everything comes down to energy transformation and this topic is quite complex (and for another day)

Slide 12

Slide 12 text

Understanding the world The recent revolutions of machine learning are LLMs and transformers, a subset of neural networks The concept behind is understanding and dealing with context LLMs are not intelligent, they excel at probabilities and tokenization, at the end, they’re all biased

Slide 13

Slide 13 text

Similarity is about diﬀerences

Slide 14

Slide 14 text

Maths, vectors and friends Semantic search (aka vector search) is about proximity and similarity If we transform everything to numbers, we can easily perform operations on them The closer numbers are, the more similar the content seems to be, the more relevant it seems

Slide 15

Slide 15 text

Sadly Similarity doesn’t mean relevancy neither intention 0.1 is close to 0.2 but also to 0, wait, what’s behind 0? Depending on the context, similarity can introduce bias Context is key, everything else is fog The more context you gave, the better the results will be Similarity doesn’t mean that the meaning is correct “Hello world” is similar to “Hi there” but “Hello sir” is more relevant

Slide 16

Slide 16 text

Visualizing the invisible

Slide 17

Slide 17 text

Context is meaningful The dog is running after the tennis ball in a kitchen The dog is running after the tennis ball in a kitchen The tennis ball is running after the dog in a kitchen

Slide 18

Slide 18 text

Let’s vectorize it* The dog is running after the tennis ball in a kitchen 0 -1 0 1 -1 0 1 1 1 0 -1 0 2 2 2 0 0 2 -1 0 2 2 3 0 0 2 3 3 0 1 0 -1 2 1 1 2 0 1 -1 3 2 1 0 1 Encoder [-0.88440161, -0.00996133, 0.243678553, …] * Values are built for example

Slide 19

Slide 19 text

Meilisearch

Slide 20

Slide 20 text

Becoming speed Build for a specific use case at LVMH First iteration built in Go, rewritten in Rust, speaking of speed (< 50ms) Open-source (MIT) and built in France PHP SDK, Symfony (Sylius?) bundle and many more

Slide 21

Slide 21 text

Tease me Built-in vector store Can be used with Ollama, OpenAI, Anthropic APIs and more Unified entrypoint for keyword and vector search Fine-tuning available, multi / federated search compatible

Slide 22

Slide 22 text

Show me

Slide 23

Slide 23 text

Force of habits Built on top of symfony/demo Available via https://github.com/Guikingone/SyliusconSemantic

Slide 24

Slide 24 text

I was there, a long time ago…

Slide 25

Slide 25 text

Be contextful The more precise you are, the easier it will be to retrieve (seems obvious but…) Choosing the right model is hard, just experiment and/or fine-tune them Want to go further? What about RAG? At the end of the day, everything is tied to mathematics, don’t reinvent the wheel

Slide 26

Slide 26 text

Oh, by the way… Think about user interactions rather than hits / results Semantic search is AN idea, not A promise Sometimes, a plain-old keyword search is enough and that’s fine Experiment, fine-tune, improve, learn and monitor

Slide 27

Slide 27 text

Questions?