Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Preparing internal knowledge for AI

Lana
August 01, 2024

Preparing internal knowledge for AI

How to make content more suitable for LLM (large language models) and RAG (retrieval-augmented generation) but still understandable by people

Lana

August 01, 2024
Tweet

More Decks by Lana

Other Decks in Education

Transcript

  1. Preparing internal knowledge for AI How to make content more

    suitable for LLM and RAG but still understandable by people
  2. Table of Contents 05 Intricacies of preparing content for AI

    03 Short overview of RAG and LLMs 21 Takeaways
  3. Structure your information with the reader in mind, using headings

    and subheadings as “pointers” for both humans and artificial intelligence. Make sure these “pointers” are marked up semantically Make your “pointers” descriptive 06 Use headings as pointers
  4. Breadcrumbs and other information such us parent and even sibling

    articles can be used as a context when passing to the LLM 08 Logical hierachy
  5. Keep your content concise, clear, and most of all, interesting

    to your audience. Consider how they are hearing and understanding what you are communicating. 10 Last modified
  6. Provide information in self-​ contained chunks and use complete sentences

    Every page is page one 12 Self-​ contained information
  7. The pronoun “it” can be used within a paragraph, but

    not to refer to the subject of the description in the next paragraph 13 Use pronouns wisely
  8. Consistency of terms is important to eliminate ambiguity and confusion

    Use glossary lists to give more context Avoid abbreviations Explain custom terminology 15 Consistency
  9. Minimize images and videos that AI cannot recognize. Media content

    even with the advent of GPT4-​ vision is still a grey area and can be misinterpreted, if you use it, help it with meta-​ information. 17 Make sure media is understandable by AI
  10. Alt text is a bare minimum - make it descriptive

    or even generate using another LLM If you add a video, ideally provide a transcript 18 Add metadata
  11. The table data extraction is a multistage process Explain in

    the leading paragraph what table shows Sometimes AI manages to parse spatial relationships between cells, and sometimes it gets confused Test how tables are perceived, then change format or relationships 20 Spoiler: still painful
  12. Some takeaways 21 Many things that work for humans will

    work for AI (but not all) Check what's going on in the industry - it changes quickly
  13. More materials to study RAG simply explained (https://lucvandonkersgoed.com/2023/12/11/retrieval-​ augmented-​ generation-​

    rag-​ simply-​ explained/ ) Table data extraction (https://iris.ai/blog/tech-​ deep-​ dive-​ extraction-​ of-​ table-​ data-​ and-​ why-​ it-​ s-​ difficult-​ extraction-​ part-​ 1) Every page is page one (https://everypageispageone.com/)