Slide 1

Slide 1 text

Preparing internal knowledge for AI How to make content more suitable for LLM and RAG but still understandable by people

Slide 2

Slide 2 text

Table of Contents 05 Intricacies of preparing content for AI 03 Short overview of RAG and LLMs 21 Takeaways

Slide 3

Slide 3 text

Section Title Overview of RAG process

Slide 4

Slide 4 text

Retrieve - Augment - Generate 04

Slide 5

Slide 5 text

Structure and hierarchy

Slide 6

Slide 6 text

Structure your information with the reader in mind, using headings and subheadings as “pointers” for both humans and artificial intelligence. Make sure these “pointers” are marked up semantically Make your “pointers” descriptive 06 Use headings as pointers

Slide 7

Slide 7 text

Avoid overlapping and duplicate categories - mind MECE principles 07 Avoid overlapping

Slide 8

Slide 8 text

Breadcrumbs and other information such us parent and even sibling articles can be used as a context when passing to the LLM 08 Logical hierachy

Slide 9

Slide 9 text

Content relevance

Slide 10

Slide 10 text

Keep your content concise, clear, and most of all, interesting to your audience. Consider how they are hearing and understanding what you are communicating. 10 Last modified

Slide 11

Slide 11 text

Providing context and organizing information

Slide 12

Slide 12 text

Provide information in self-​ contained chunks and use complete sentences Every page is page one 12 Self-​ contained information

Slide 13

Slide 13 text

The pronoun “it” can be used within a paragraph, but not to refer to the subject of the description in the next paragraph 13 Use pronouns wisely

Slide 14

Slide 14 text

Terms

Slide 15

Slide 15 text

Consistency of terms is important to eliminate ambiguity and confusion Use glossary lists to give more context Avoid abbreviations Explain custom terminology 15 Consistency

Slide 16

Slide 16 text

Mediacontents

Slide 17

Slide 17 text

Minimize images and videos that AI cannot recognize. Media content even with the advent of GPT4-​ vision is still a grey area and can be misinterpreted, if you use it, help it with meta-​ information. 17 Make sure media is understandable by AI

Slide 18

Slide 18 text

Alt text is a bare minimum - make it descriptive or even generate using another LLM If you add a video, ideally provide a transcript 18 Add metadata

Slide 19

Slide 19 text

Tables

Slide 20

Slide 20 text

The table data extraction is a multistage process Explain in the leading paragraph what table shows Sometimes AI manages to parse spatial relationships between cells, and sometimes it gets confused Test how tables are perceived, then change format or relationships 20 Spoiler: still painful

Slide 21

Slide 21 text

Some takeaways 21 Many things that work for humans will work for AI (but not all) Check what's going on in the industry - it changes quickly

Slide 22

Slide 22 text

Thank you

Slide 23

Slide 23 text

More materials to study RAG simply explained (https://lucvandonkersgoed.com/2023/12/11/retrieval-​ augmented-​ generation-​ rag-​ simply-​ explained/ ) Table data extraction (https://iris.ai/blog/tech-​ deep-​ dive-​ extraction-​ of-​ table-​ data-​ and-​ why-​ it-​ s-​ difficult-​ extraction-​ part-​ 1) Every page is page one (https://everypageispageone.com/)

Slide 24

Slide 24 text

No content