Image search - Speaker Deck

Slide 1

Slide 1 text

Image Search Alex Salgado Developer Advocate @ Elastic @alexsalgadoprof salgado @alexsalgadoprof /in/alex-salgado/

Slide 2

Slide 2 text

Alex Salgado Senior Developer Advocate LATAM ● Mestre em Ciência da Computação pela UFF (Games) ● MBA UFF ● PhD Candidate UFF: Robótica/Visão Computacional - + 25 anos de experiência na área de desenvolvimento de software - Ocupei diversos cargos, trabalhando em startups, pequenas e grandes empresas como Oracle, CSN, BRQ/IBM, Chemtech/Siemens (9 anos). - 8 anos como professor universitário @alexsalgadoprof salgado @alexsalgadoprof /in/alex-salgado/

Slide 3

Slide 3 text

Alex Salgado Senior Developer Advocate LATAM @alexsalgadoprof salgado @alexsalgadoprof /in/alex-salgado/

Slide 4

Slide 4 text

80% Dados mundiais são não-estruturados Preocupações em torno da IA Generativa. KPMG Generative AI Survey The Prompt: Generative AI survey | Google Cloud Blog

Slide 5

Slide 5 text

Enterprise Search Security Observability Kibana Elasticsearch Three solutions powered by one stack Powered by the Elastic Stack 3 solutions Deployed anywhere Elastic Cloud Elastic Cloud on Kubernetes Elastic Cloud Enterprise Saas Orchestration Logstash Beats Agent

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ML / AI IA Generativa O que? Casos de uso Algoritmos programados para aprender o comportamento dos dados e fazer previsões Algoritmos (Deep Learning) treinados com grandes volumes de dados e programados para criar novos dados Large Language Model Conceitos básicos de ML, IA Generativa e LLMs Detecção de anomalias, forecasting, reconhecimento de imagem, PLN Algoritmos programados para criar novos dados Chatbots, geradores de texto, imagem e música Chatbots, geradores de texto, tradutores, geradores de código, aplicativos de pergunta e resposta

Slide 8

Slide 8 text

https://www.elastic.co/search-labs/finding-your-puppy-with-image-search Blog referência

Slide 9

Slide 9 text

Elasticsearch: You Know, for Search

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

O que é similaridade de vetores? Converta dados em representações vetoriais onde as distâncias representam similaridade. Natural Language Processing Model Text Convolutional Neural Network Image Embeddings Feature vectors a 1 a 2 … a n a 1 a 2 … a n 0.0167327… 0.3458967… 0.0547893… 0.0324981… 0.0135497… 0.0216549…

Slide 12

Slide 12 text

Queries are also vectorized

Slide 13

Slide 13 text

What is a Vector?

Slide 14

Slide 14 text

CARTOON REALISTIC Embeddings represent your data Example: 1-dimensional vector Character Vector [ -1 ] [ 1 ]

Slide 15

Slide 15 text

CARTOON REALISTIC HUMAN MACHINE Multiple dimensions represent different data aspects Character Vector [ -1, 1 ] [ 1, 0 ]

Slide 16

Slide 16 text

Similar data is grouped together CARTOON REALISTIC HUMAN MACHINE Character Vector [ -1.0, 1.0 ] [ 1.0, -0.1 ] [ -1.0, 0.8 ]

Slide 17

Slide 17 text

REALISTIC QUERY Relevance Result Query 1 2 3 4 5 Vector search ranks objects by similarity (relevance) to the query CARTOON REALISTIC HUMAN MACHINE

Slide 18

Slide 18 text

https://www.elastic.co/search-labs/finding-your-puppy-with-image-search Demo ELASTICSEARCH ENGINEER COURSE INFORMATION

Slide 19

Slide 19 text

Image Search Architecture Generate embeddings outside Elasticsearch Your Image To Search Application Elastic Platform Elasticsearch Index kNN Search Inference API DogService Util DogRepository Dog Embedding Search results Search results Search Query Insert embeddings

Slide 20

Slide 20 text

Case e-commerce

Slide 21

Slide 21 text

Product Similarity Search Demo

Slide 22

Slide 22 text

{ "_id":"product-1234", "product_name":"Summer Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton", "desc_embedding":[0.452,0.3242,…] } 3 Documents stored in Elasticsearch 2 Source data Search-powered application POST /_doc GET /_search Transformer model 1 with kNN clause 2

Slide 23

Slide 23 text

Step 1: Setting up the machine learning model $ eland_import_hub_model --url https:-/cluster_URL --hub-model-id BERT-MiniLM-L6 --task-type text_embedding --start BERT-MiniLM-L6 Select the appropriate model Load the model to the cluster Manage models

Slide 24

Slide 24 text

Step 2: Data ingestion and embedding generation { "_id":"product-1234", "product_name":"Summer Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton", "desc_embedding":[0.452,0.3242,…] } Standard field indexing for non-vector types POST /_doc POST /_doc Encoding via Inference Processor Source data { "_id":"product-1234", "product_name":"Summer Dress", "description":"Our best-selling…", "Price": 118, "color":"blue", "fabric":"cotton", }

Slide 25

Slide 25 text

Step 3: Issuing a vector query GET product-catalog/_search { "query": { "match": { "description": { "query": "summer clothes", "boost": 0.9 } } }, "knn": { "field": "desc_embbeding", "query_vector": [0.123, 0.244,...], "k": 5, "num_candidates": 50, "boost": 0.1, "filter": { "term": { "department": "women" } } }, "size": 10 } Issue query using the _search endpoint, with a kNN clause, using the previously generated embedding Query is submitted to the search-powered application Transformer model POST /_ml/trained_models/my-model/_infer { "docs": { "description": "summer clothes" } } Query embedding is generated c 3 b a