built on top of keyword search Keyword search is based on sentence occurrences, hits and a bit of luck What if a sentence contains many times the word we’re searching for? Is it relevant?
perform “logical thinking” via memories and patterns, computers are tied to binaries and mathematics Trying to search in a “logical way” via computers is a path to failure Could mathematics helps us? How?
sentences if not sequences of numbers in the alphabet? What if we could expose words into a three dimensional space and see them? What about images? If everything is a number, could we perform calculations around them and find patterns?
think of it as a subset of specific intelligences What we call “IA” is just a buzzword for algorithms and workflows At the end, everything comes down to energy transformation and this topic is quite complex (and for another day)
LLMs and transformers, a subset of neural networks The concept behind is understanding and dealing with context LLMs are not intelligent, they excel at probabilities and tokenization, at the end, they’re all biased
about proximity and similarity If we transform everything to numbers, we can easily perform operations on them The closer numbers are, the more similar the content seems to be, the more relevant it seems
to 0.2 but also to 0, wait, what’s behind 0? Depending on the context, similarity can introduce bias Context is key, everything else is fog The more context you gave, the better the results will be Similarity doesn’t mean that the meaning is correct “Hello world” is similar to “Hi there” but “Hello sir” is more relevant
First iteration built in Go, rewritten in Rust, speaking of speed (< 50ms) Open-source (MIT) and built in France PHP SDK, Symfony (Sylius?) bundle and many more
will be to retrieve (seems obvious but…) Choosing the right model is hard, just experiment and/or fine-tune them Want to go further? What about RAG? At the end of the day, everything is tied to mathematics, don’t reinvent the wheel
hits / results Semantic search is AN idea, not A promise Sometimes, a plain-old keyword search is enough and that’s fine Experiment, fine-tune, improve, learn and monitor