Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enhancing Comic Search with Vector Index.

Enhancing Comic Search with Vector Index.

Searching through a vast collection of comics can be challenging. We often rely on matching titles, words, descriptions, publication years, character names, and publishers. But what about categorizing comics by genre or other intriguing criteria? In this session, we’ll explore Vector Index, a powerful index now use with relational databases and graph databases. We’ll cover the basics of indexes, demystify Vector Index, and showcase its potential for more effective searches.

Koji Annoura

August 04, 2024
Tweet

More Decks by Koji Annoura

Other Decks in Programming

Transcript

  1. COSCUP 2024, Aug 4 2024 14:40-15:10 TR 514 Enhancing Comic

    Search with Vector Index. Koji Annoura ҊӜߒೋ [email protected]
  2. Who am I ❖ I live in Fukuoka Japan ❖

    I love ❖ Apache Hop ❖ Data Orchestration App ❖ I am a founder ❖ Apache User Group Japan ❖ Neo4j Graph Database ❖ I am one of the founders ❖ Neo4j Users Group Tokyo ❖ Graph Community MVP
  3. Who am I ❖ I love ❖ Knowledge Graph ❖

    Please join the online session ❖ November 7 2024
  4. Agenda ❖ Talk (10min) ❖ Basic Search ❖ What is

    Vector Index? ❖ … ❖ Demonstration (20min) ❖ Conclusion $PQZSJHIU˜͍Β͢ͱ΍
  5. Search ❖ Let's think about searching ❖ What kind of

    search would you like to do? ❖ Word ❖ Text ❖ Fuzzy (Susiro > Sushiro εγϩʔ) $PQZSJHIU˜͍Β͢ͱ΍
  6. Basic Search ❖ Title: َ໓ (Demon slayer) ❖ Author ❖

    Publisher ❖ Series ❖ Description ❖ ISBN ❖ Genre
  7. Basic Search ❖ 139 books ❖ Title ❖ ʰَ໓ͷਕʱ࿦ɹَ෣ ⁋ແࢂͱेೋَ݄ͷͨ

    Ίͷѩౣൾ ❖ َ໓ͷਕృֆா - ਯ - ❖ َ໓ͷਕ͸υάϥɾϚ άϥɹ̍
  8. Basic Search ❖ Title: SPY Family ❖ Author ❖ Publisher

    ❖ Series ❖ Description ❖ ISBN ❖ Genre
  9. Basic Search ❖ 58 books ❖ Title ❖ ̨̥̮×̢̡̛̖̞̮ɹ ̍̐ɹʲδϟϯϓίϛο

    Ϋεʳ ❖ ̨̥̮×̢̡̛̖̞̮ɹ ηοτɹ̍ - ̍̏ר ❖ Ξʔχϟͱ͸͡Ίͯͷϓ ϩάϥϛϯάɹ೥௕ ~ খ ֶߍதֶ೥
  10. Search ❖ Let's think about searching ❖ What kind of

    search would you like to do? ❖ Word ❖ Text ❖ Fuzzy (Susiro > Sushiro εγϩʔ) ❖ How about ❖ Recommend, Similar $PQZSJHIU˜͍Β͢ͱ΍
  11. How similar are these two sentences? ❖ Pokémon ❖ "Pokémon:

    The First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ One Piece ❖ "One Piece Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..."
  12. Vector Search ❖ I want to fi nd similar items.

    ❖ How do I search for them? ❖ Compare to fi nd similarities. ❖ Numbers are easy to compare. ❖ Documents are hard to compare. $PQZSJHIU˜͍Β͢ͱ΍
  13. Numbers are easy to compare ❖ Pokémon ❖ "Pokémon: The

    First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ One Piece ❖ "One Piece Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..." [0.2, 0.6] [0.3, 0.5]
  14. Indexes Name DB Type Indexes PostgreSQL RDB B-Tree, Hashed, GiST,

    SP-GiST, GIN, BRIN, Vector (pgvector) MySQL RDB BTREE, HASH, Vector (data type, MySQL 9.0) Neo4j Graph Range, Point, Text, Full-text, Token lookup, Vector MongoDB Document store Single Field, Compound, Multikey, Geospatial, Text, Hashed, Clustered, knnVector
  15. Example of 2D Array Comic Book Genre (x) Character Diversity

    (y) One Piece 0.9 0.8 Naruto 0.8 0.7 Attack on Titan 0.7 0.9 My Hero Academia 0.8 0.9 Death Note 0.6 0.6 Spy x Family 0.7 0.8
  16. Example of 2D Array Comic Book Genre (x) One Piece

    0.9 Naruto 0.8 Attack on Titan 0.7 My Hero Academia 0.8 Death Note 0.6 Spy x Family 0.7
  17. Example of 3D Array Comic Book Genre (x) Character Diversity

    (y) Artistic Style (z) One Piece 0.9 0.8 0.7 Naruto 0.8 0.7 0.5 Attack on Titan 0.7 0.9 0.8 My Hero Academia 0.8 0.9 0.7 Death Note 0.6 0.6 0.9 Spy x Family 0.7 0.8 0.8
  18. How many dimensions should be used for natural language? ❖

    Texts ❖ "Pokémon: The First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Dimensions? ❖ 2 ❖ 3 ❖ 18 ❖ 256 ❖ 512 ❖ More
  19. high-dimensional vector ❖ Texts ❖ "Pokémon: The First Movie" ❖

    "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Dimensions !! 1536
  20. Create embeddings w/OpenAI ❖ https://api.openai.com/v1/embeddings ❖ Request ❖ model=text-embedding-ada-002 ❖

    released in December 2022 ❖ 1536 dimensions ❖ input="The food was delicious and … ❖ Response ❖ "embedding": [ 0.0023064255, -0.009327292, .... (1536 fl oats total for ada-002) Source: OpenAI, https://platform.openai.com/docs/api-reference/embeddings/create
  21. Toy Story ❖ Request ❖ "Toy Story of Terror" ❖

    "Woody and the gang are held up at a roadside motel and members of the group start to disappear from everywhere. Woody and set about getting to the bottom of the mystery." ❖ Response ❖ [-0.018501901999115944, -0.0370803102850914, 0.0004992868052795529, -0.023270828649401665, -0.0326429158449173, 0.0058878385461866856, 0.006426574196666479, -0.022837290540337563, 0.010883096605539322, -0.018858933821320534, -0.003943289630115032, 0.0018297884380444884, 0.0038349051028490067, 0.014128262177109718, 0.0130125368013978, -0.00586233614012599, 0.017354300245642662, 0.00009717762441141531, 0.010003268718719482, -0.022263487800955772, -0.0011205063201487064, 0.00917444471269846, -0.034759603440761566, -0.006815483793616295, … (1536 fl oats total for ada-002)]
  22. Pokémon ❖ Request ❖ "Pokémon: The First Movie" ❖ "Scientists

    genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Response ❖ [-0.015344273298978806, -0.011856937780976295, 0.0010308625642210245, -0.0008621466695331037, -0.013923507183790207, 0.005770247429609299, 0.013639354147017002, -0.031101860105991364, 0.0009791983757168055, -0.016984611749649048, 0.013251871801912785, 0.02844115160405636, 0.01570592261850834, -0.023080989718437195, 0.009060611948370934, -0.006067316513508558, 0.02758869342505932, -0.01579633541405201, 0.01556384563446045, -0.010875318199396133, 0.0052310023456811905, 0.021376069635152817, 0.002912570256739855, 0.0010591164464130998, … (1536 fl oats total for ada-002)]
  23. One Piece Film: Strong World ❖ Request ❖ "One Piece

    Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..." ❖ Response ❖ [0.016981402412056923, -0.01806335523724556, 0.0027312743477523327, -0.016400840133428574, -0.013326507993042469, 0.00419916957616806, -0.017218904569745064, 0.008464311249554157, 0.003915486391633749, -0.0273127444088459, 0.01231712382286787, 0.010311550460755825, 0.023499514907598495, -0.01900016888976097, 0.006109082140028477, 0.0014489279128611088, 0.014448045752942562, -0.029898878186941147, -0.002935789991170168, -0.004215662833303213, 0.005871579982340336, -0.0025086160749197006, -0.013056019321084023, 0.003902291879057884, … (1536 fl oats total for ada-002)]
  24. ❖ If you like this movie, how would you search

    for the next one to watch? ❖ 1. Ask someone who knows a lot about movies ❖ 2. Ask ChatGPT ❖ 3. Look for recommendations online ❖ 4. Vector Search Source: TMDB, https://www.themoviedb.org/movie/41498 One Piece Film: Strong World
  25. (2- ⿣ (SBQI2VFSZ-BOHVBHF ⿣ IUUQTXXXJTPPSHTUBOEBSEIUNM ⿣ (2- *40*&$  ⿣

    (FOFSBMJOGPSNBUJPO ⿣ 4UBUVT  1VCMJTIFE ⿣ 1VCMJDBUJPOEBUF   ⿣ 4UBHF*OUFSOBUJPOBM4UBOEBSEQVCMJTIFE ⿣ &EJUJPO   ⿣ /VNCFSPGQBHFT   ⿣ 5FDIOJDBM$PNNJUUFF *40*&$+5$4$ 4PVSDF*40 IUUQTXXXJTPPSHTUBOEBSEIUNM
  26. 42-1(2 ⿣ 1SPQFSUZ(SBQI2VFSZ ⿣ IUUQTXXXJTPPSHTUBOEBSEIUNM ⿣ 42- *40*&$  ⿣

    1BSU ⿣ (FOFSBMJOGPSNBUJPO ⿣ 4UBUVT  1VCMJTIFE ⿣ 1VCMJDBUJPOEBUF   ⿣ 4UBHF*OUFSOBUJPOBM4UBOEBSEQVCMJTIFE ⿣ &EJUJPO   ⿣ /VNCFSPGQBHFT   ⿣ 5FDIOJDBM$PNNJUUFF *40*&$+5$4$ 4PVSDF*40 IUUQTXXXJTPPSHTUBOEBSEIUNM
  27. 42-1(2BOE(2- ⿣ 42-1(2 ⿣ 42-JODMVEFT 1SPQFSUZ(SBQI2VFSZ ⿣ 0SBDMF%BUBCBTFD ⿣ (2-

    ⿣ *OUFSOBUJPOBM4UBOEBSE ⿣ 1VCMJTIFE"QS 4PVSDFPQFO$ZQIFSIUUQTXXXHRMTUBOEBSETPSHIPNF
  28. Conclusion ❖ Vector Search ❖ Useful ❖ Text and document

    search ❖ There are APIs that convert data into vectors $PQZSJHIU˜͍Β͢ͱ΍