Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vector Impact on Similarity Extraction: Movie D...

Vector Impact on Similarity Extraction: Movie Data Analysis

Vector generation methods and their dimensionality play a crucial role in determining the results of data similarity extraction.
In this presentation, I will explore how different vectorization techniques and dimensional configurations influence similarity search outcomes. To do this, I will compare Gemini, ChatGPT, and several other vector generation methods using movie data.

My analysis demonstrates that selecting appropriate vector generation methods significantly enhances the accuracy of data retrieval and information extraction.
This presentation is tailored to be accessible even for beginners who have no prior experience with vector indexes.

Avatar for Koji Annoura

Koji Annoura

March 14, 2025
Tweet

More Decks by Koji Annoura

Other Decks in Technology

Transcript

  1. FOSSASIA Summit 2025, March 14 2025 17:00-17:20 Vector Impact on

    Similarity Extraction: Movie Data Analysis Koji Annoura ҊӜߒೋ [email protected] 4PVSDF/FPK IUUQTOFPKDPNCMPHHFOBJHSBQISBHNBOJGFTUP
  2. Who am I ❖ Koji Annoura ❖ I live in

    Fukuoka Japan ❖ I love ❖ Neo4j Graph Database ❖ Neo4j Users Group Tokyo ❖ I am one of the founders. ❖ Graph Community MVP ❖ Apache Hop ❖ Data Orchestration App ❖ Apache User Group Japan ❖ I am a founder
  3. How similar are these two sentences? ❖ Pokémon ❖ "Pokémon:

    The First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ One Piece ❖ "One Piece Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..."
  4. Numbers are easy to compare ❖ Pokémon ❖ "Pokémon: The

    First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ One Piece ❖ "One Piece Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..." [0.2, 0.6] [0.3, 0.5]
  5. Indexes Name DB Type Indexes PostgreSQL RDB B-Tree, Hashed, GiST,

    SP-GiST, GIN, BRIN, Vector (pgvector) MySQL RDB BTREE, HASH, Vector (data type, MySQL 9.0) Neo4j Graph Range, Point, Text, Full-text, Token lookup, Vector MongoDB Document store Single Field, Compound, Multikey, Geospatial, Text, Hashed, Clustered, knnVector
  6. Example of 2D Array Comic Book Genre (x) Character Diversity

    (y) One Piece 0.9 0.8 Naruto 0.8 0.7 Attack on Titan 0.7 0.9 My Hero Academia 0.8 0.9 Death Note 0.6 0.6 Spy x Family 0.7 0.8
  7. Example of 2D Array Comic Book Genre (x) One Piece

    0.9 Naruto 0.8 Attack on Titan 0.7 My Hero Academia 0.8 Death Note 0.6 Spy x Family 0.7
  8. Example of 3D Array Comic Book Genre (x) Character Diversity

    (y) Artistic Style (z) One Piece 0.9 0.8 0.7 Naruto 0.8 0.7 0.5 Attack on Titan 0.7 0.9 0.8 My Hero Academia 0.8 0.9 0.7 Death Note 0.6 0.6 0.9 Spy x Family 0.7 0.8 0.8
  9. How many dimensions should be used for natural language? ❖

    Texts ❖ "Pokémon: The First Movie" ❖ "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Dimensions? ❖ 2 ❖ 3 ❖ 18 ❖ 256 ❖ 512 ❖ More
  10. high-dimensional vector ❖ Texts ❖ "Pokémon: The First Movie" ❖

    "Scientists genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Dimensions !! 1536
  11. Create embeddings w/OpenAI ❖ https://api.openai.com/v1/embeddings ❖ Request ❖ model=text-embedding-ada-002 ❖

    released in December 2022 ❖ 1536 dimensions ❖ input="The food was delicious and … ❖ Response ❖ "embedding": [ 0.0023064255, -0.009327292, .... (1536 fl oats total for ada-002) Source: OpenAI, https://platform.openai.com/docs/api-reference/embeddings/create
  12. Toy Story ❖ Request ❖ "Toy Story of Terror" ❖

    "Woody and the gang are held up at a roadside motel and members of the group start to disappear from everywhere. Woody and set about getting to the bottom of the mystery." ❖ Response ❖ [-0.018501901999115944, -0.0370803102850914, 0.0004992868052795529, -0.023270828649401665, -0.0326429158449173, 0.0058878385461866856, 0.006426574196666479, -0.022837290540337563, 0.010883096605539322, -0.018858933821320534, -0.003943289630115032, 0.0018297884380444884, 0.0038349051028490067, 0.014128262177109718, 0.0130125368013978, -0.00586233614012599, 0.017354300245642662, 0.00009717762441141531, 0.010003268718719482, -0.022263487800955772, -0.0011205063201487064, 0.00917444471269846, -0.034759603440761566, -0.006815483793616295, … (1536 fl oats total for ada-002)]
  13. Pokémon ❖ Request ❖ "Pokémon: The First Movie" ❖ "Scientists

    genetically create a new Pokémon, Mewtwo, but the results are horri fi c and disastrous." ❖ Response ❖ [-0.015344273298978806, -0.011856937780976295, 0.0010308625642210245, -0.0008621466695331037, -0.013923507183790207, 0.005770247429609299, 0.013639354147017002, -0.031101860105991364, 0.0009791983757168055, -0.016984611749649048, 0.013251871801912785, 0.02844115160405636, 0.01570592261850834, -0.023080989718437195, 0.009060611948370934, -0.006067316513508558, 0.02758869342505932, -0.01579633541405201, 0.01556384563446045, -0.010875318199396133, 0.0052310023456811905, 0.021376069635152817, 0.002912570256739855, 0.0010591164464130998, … (1536 fl oats total for ada-002)]
  14. One Piece Film: Strong World ❖ Request ❖ "One Piece

    Film: Strong World" ❖ "Its a normal day on the thousand sunny instill Nami reads the news paper to the rest of the crew about the east blue being attacked Luff then decides that they will go back to the east blue..." ❖ Response ❖ [0.016981402412056923, -0.01806335523724556, 0.0027312743477523327, -0.016400840133428574, -0.013326507993042469, 0.00419916957616806, -0.017218904569745064, 0.008464311249554157, 0.003915486391633749, -0.0273127444088459, 0.01231712382286787, 0.010311550460755825, 0.023499514907598495, -0.01900016888976097, 0.006109082140028477, 0.0014489279128611088, 0.014448045752942562, -0.029898878186941147, -0.002935789991170168, -0.004215662833303213, 0.005871579982340336, -0.0025086160749197006, -0.013056019321084023, 0.003902291879057884, … (1536 fl oats total for ada-002)]
  15. ❖ If you like this movie, how would you search

    for the next one to watch? ❖ 1. Ask someone who knows a lot about movies ❖ 2. Ask ChatGPT ❖ 3. Look for recommendations online ❖ 4. Vector Search Source: TMDB, https://www.themoviedb.org/movie/41498 One Piece Film: Strong World