Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic vs keyword search as context for GPT

Semantic vs keyword search as context for GPT

This compares semantic/vector search with keyword search for the particular use case of providing context for ChatGPT.

Talk for Berlin Buzzwords 2023.

In blog form: https://xata.io/blog/keyword-vs-semantic-search-chatgp

Tudor Golubenco

June 20, 2023
Tweet

More Decks by Tudor Golubenco

Other Decks in Technology

Transcript

  1. 2 Serverless data platform for PostgreSQL Free-text search Vector search

    Rich types Branches Admin UI SDKs Tudor Golubenco CTO @ xata
  2. How does it work? 4 • Run a text search

    against the documentation to find the content that is most relevant to the question asked by the user. • Create a prompt for the ChatGPT api with that content • Steam back the result as is
  3. Creating the prompt 5 With these rules: {rules} And this

    text: {context} Given the above text, answer the question: {question}
  4. Example rules 6 • “Your name is DanGPT” • “Answer

    with the personality of a pirate” • “Answer in german” • “Format the answer in Markdown” • “If the answer to the question is not found in the provided context, do not answer it. It’s very important that you only answer questions from the provided context. Tell the user that you don’t have this information”
  5. Context - a search problem 7 • It can only

    work if the correct context is found • The bigger the context, the better the chances
  6. Context sizes for the OpenAI models 8 Model Max context

    size (tokens) Max Context (pages) Price per input tokens Final cost per question gpt-3.5-turbo 4096 6 pages $0.0015 / 1K tokens ~ $0.01 / question gpt-3.5-turbo-16k 16,384 24 pages $0.003 / 1K tokens ~ $0.05 / question gpt-4 8,192 12 pages $0.03 / 1K tokens ~ $0.25 / question gpt-4-32k 32,768 48 pages $0.06 / 1K tokens ~ $2 / question
  7. Keyword search algorithm 10 • Transform question in keywords. We

    use ChatGPT for it: • Use the provided keywords to run a free-text-search and pick the top results • Take the top 3 results • If the result is bigger than the context, use the search “highlights” to select parts of the pages Extract keywords for a search query from the text provided. Add synonyms for words where a more common one exists.
  8. Vector search algorithm 11 • Preparation: Split up the docs

    into paragraphs, and compute embeddings for each • Compute embeddings for the question • Use cosine similarity search • Add up top results until the context is filled
  9. Accuracy 12 Question Keyword search result Vector search result Winner

    How do I install the Xata CLI? ✅ ✅ Vector (more complete) How do you use Xata with Deno? ✅ ❌ Keyword How can I import a CSV file with custom column types? ✅ ✅ Keyword (more complete) How can I filter a table named Users by the email column? ❌ ✅ Vector What is Xata? ✅ ✅ Draw Result: draw More details: https://xata.io/blog/keyword-vs-semantic-search-chatgpt
  10. Convenience 13 ✋With vector search you need split in paragraphs

    and calculate embeddings for your docs, then maintain them on updates ✋Vector search doesn’t need synonym dictionaries, analyzers, etc. Result: draw
  11. Tunability 14 ✅Keyword search has the usual controls: boosters, column

    weights, relevancy functions 🟧 Vector search has the cosine similarity, filters, negative matches Winner: keyword search