Upgrade to Pro — share decks privately, control downloads, hide ads and more …

誰でもできる!OpenAI Embedding API を活用して高度なレコメンド機能を...

誰でもできる!OpenAI Embedding API を活用して高度なレコメンド機能を実現してみよう - A story about implementing an advanced recommendation function using the OpenAI Embedding API

sugoikondo 近藤 豊峰

September 18, 2024
Tweet

More Decks by sugoikondo 近藤 豊峰

Other Decks in Technology

Transcript

  1. ୭Ͱ΋Ͱ͖Δʂ0QFO"* Embedding API Λ׆༻ͯ͠ɺ ߴ౓ͳϨίϝϯυػೳΛ࣮ݱͯ͠ΈΑ͏ By Atsumine Kondo - @sugoikondo

    A story about implementing an advanced recommendation function using the OpenAI Embedding API
  2. ۙ౻ ๛ๆ Atsumine Kondo • Backend / Frontend / Infra

    • Scala, Kotlin, Python, etc… • Vue.js, Nuxt.js, Next,.js etc… @sugoikondo MoneyForward.inc Group Management Solution dept Product development div
  3. • จষϕΫτϧͱ͸Կ͔͕Θ͔Δ You can learn what a text vector is

    • ↑ ͕Θ͔Δ͜ͱͰɺҎԼιϦϡʔγϣϯ͕࣮ݱͰ͖Δ • Ϩίϝϯυ, ҟৗ஋ݕग़, ෼ྨ໰୊, ݕࡧ etc… • By learning the above, you can realize the following solution • Recommendation, outlier detection, classification problems, search etc... ͜ͷൃදͰֶ΂Δ͜ͱɾͰ͖ΔΑ͏ʹͳΔ͜ͱ
  4. • 8 ݄ʹ AI ʹΑΔ࿈݁Պ໨Ϩίϝϯυ ػೳΛϦϦʔε We released an AI-based

    consolidated account recommendation in Aug. 2023 • OpenAI ࣾͷ Embedding API Λ׆༻ Using OpenAI's Embedding API Ϋϥ΢υ࿈݁ձܭʹՊ໨ϨίϝϯυػೳΛ࣮૷ ref: https://corp.moneyforward.com/news/release/service/ 20230804-mf-press-1/ We’ve Implemented a subject recommendation function in our application ※ ಛڐग़ئࡁΈ Pattent applied
  5. • ݸࣾͷצఆՊ໨ʹରͯ͠ɺҙຯతʹ͍ۙ ਌ձࣾͷ࿈݁Պ໨Λ্Ґ 3 ͭΛఏҊ͢Δ • Suggest the top three

    parent company consolidated accounts that are semantically close to the individual company's accounts. ref: https://corp.moneyforward.com/news/release/service/ 20230804-mf-press-1/ Ϋϥ΢υ࿈݁ձܭʹՊ໨ϨίϝϯυػೳΛ࣮૷ We’ve Implemented a subject recommendation function in our application
  6. • ࿩୊ੑͷߴ͔͞Βɺଟ͘ͷϝσΟΞ ͰऔΓ্͛ͯ΋Β͍·ͨ͠ɻ Due to the high profile of the

    topic, we have had a lot of media coverage. • https://cloud.watch.impress.co.jp/docs/ news/1522209.html • https://it.impress.co.jp/articles/-/25192 • https://officenomikata.jp/news/15534/ • In total, about 8 articles... ଟ͘ͷϝσΟΞͰऔΓ্͛ͯ௖͖·ͨ͠ We have had a lot of media coverage.
  7. • צఆՊ໨ͱ͸ɺࢿ࢈ͳͲͷऔҾΛه࿥͢Δࡍʹ࢖͏໊শɾݟग़͠ • Accounts are names or headings used to

    record transactions of assets, etc. आํ Debit ିํ Credit ஍୅Ո௞ Rent expenses 50,000 ී௨༬ۚ Ordinary deposit 50,000 • ͜͜Ͱ͍͏ʮ஍୅Ո௞ʯͱʮී௨༬ۚʯ͕ͦΕͧΕצఆՊ໨ • The “Rent expenses" and “Ordinary deposit" here are the accounts respectively. ྫ: Ո௞ 5 ສԁΛޱ࠲Ҿ͖མͱ͠Ͱࢧ෷ͬͨ৔߹ e.g. You paid 50,000 yen rent via direct debit. צఆՊ໨/࿈݁Պ໨ͱ͸ʁ What is an account/consolidated account?
  8. • ࿈݁ձܭจ຺Ͱ͸ɺάϧʔϓ಺ͷձࣾͷ࿈݁Պ໨ͱֹۚΛٵ্͍͛ɺͦΕΒΛ਌ձࣾͷՊ໨ Ұͭʢ࿈݁Պ໨ʣʹू໿ͤ͞Δ࡞ۀ͕͋Δɻ • In the consolidation accounting context, there

    is a process of taking the consolidated accounts and balance of the companies in the group and consolidating them into one account (consolidated account) of the parent company. צఆՊ໨/࿈݁Պ໨ͱ͸ʁ What is an account/consolidated account? ࢠձࣾA Company A ਌ձࣾ Parent Company ී௨༬ۚ Ordinary Deposit Aۜߦ Bank A ݱۚٴͼ༬ۚ Cash & Deposit ࢠձࣾB Company B
  9. ΑΓྑ͍צఆՊ໨໊ϨίϝϯυΛͲ͏࣮ݱ͢Δ͔ʁ How to achieve better account recommendations? • ୯७ͳ͍͋·͍ݕࡧɾฤूڑ཭ͳͲͰ͸ٵऩ͖͠Εͳ͍ύλʔϯ͕ଟ͍ •

    ྫ: ʮʓʓۜߦʯͱʮී௨༬ۚʯɺʮݱۚٴͼ༬ۚʯͳͲ • ւ֎ࢠձ͕ࣾ͋Δ৔߹͸ʮʓʓ BankʯͳͲ೔ຊޠҎ֎ͷϞϊ͕དྷΔέʔε΋͋Δ • Many patterns cannot be absorbed by simple fuzzy search, edit distance, etc. • Ex: “XX bank” and “Ordinary deposit”, “Cash and deposits”, etc. • If there is an overseas subsidiary, there are cases where things other than Japanese are sent. • ҙຯͷۙ͞΋Ճຯͯ͠ɺݸࣾͷצఆՊ໨ʹҰ൪͍ۙ࿈݁Պ໨ΛϨίϝϯυ ͢Δඞཁ͕͋Δɻ • It is necessary to recommend the consolidated accounts that are closest to the individual company's accounts, taking into account the proximity in meaning.
  10. צఆՊ໨໊Ϩίϝϯυ࣮ݱʹཱͪ͸͔ͩΔน Barriers to achieving account recommendation • ϦϦʔε͔Β೔͕ઙ͘ɺֶशʹ࢖͑Δσʔλ͕े෼ʹू·͍ͬͯͳ͍ • Պ໨ม׵ͷ࣮੷͸ɺ͍͍ͤͥ਺ඦ

    ~ ઍ݅͋Δ͔Ͳ͏͔ • ֶशʹ࢖͏ͳΒ࠷௿Ͱ΋਺ສ ~ ਺ेສఔ౓͸΄͍͠ • At the moment there was still little data available for training. • At most, there are a few hundred ~ a thousand account conversion data • Training the model may require tens or hundreds of thousands of data. • ML Ϟσϧͷϝϯςίετ΍ਓࡐͷ֬อ͕ࠔ೉ • ໰୊ൃੜ࣌ʹରԠͰ͖Δਓͷ༻ҙ͔ΒɺϞσϧ࠶ֶशͳͲͷίετ΋ແࢹͰ͖ͳ͍ • Difficulty in securing maintenance costs and human resources for ML models • Preparing people who can respond to problems when they occur is difficult, and the cost of re-training models cannot be ignored.
  11. ͔͠͠ɺEmbedding API ʹ͍ͭͯ࿩͢લʹɺ ·ͣ͸จষϕΫτϧ / ෼ࢄදݱʹֶ͍ͭͯͼ·͠ΐ͏ɻ But before we talk

    about the Embedding API, Let's first learn about Text vector / Embedding representation.
  12. • จষΛ਺஋/ϕΫτϧʹม׵͢Δٕज़ɾख๏ͷ͜ͱ • A technology or method of converting text

    into vectors. About Embedding / Word2Vec ”ݱ͓ۚΑͼ༬ۚ” [[-0.03455162],[-0.01306203], [ 0.01672893],…, [-0.00129271], [ 0.00694819],[-0.01055199]] • ϕΫτϧɺ෼ࢄදݱ͋Δ͍͸ຒΊࠐΈදݱͱݺ͹ΕΔ͜ͱ΋͋Δɻ • Sometimes called vector, distributed or embedded representation. ‘Cash and deposits’
  13. About Embedding / Word2Vec ex: ʮݱۚʯͱʮෛ࠴ʯ͕ͦΕͧΕ [0.6, 0.8], [-0.3, 0.4]ͱ

    ͳΔ৔߹ When "Cash" and "Liabilities" become [0.4, 0.8] and [-0.3, 0.9], respectively ුಈখ਺఺ͷ഑ྻʹͳΔ͜ͱͰɺ࠲ඪ·ͨ͸ϕΫτϧΛද͢͜ͱ͕Ͱ͖Δɻ It can represent coordinates or vectors by being a floating-point array. -0.5 0.5 1 0.5 1 ݱۚ ෛ࠴ Liabilities Cash 0
  14. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ ɹ ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ϕΫτϧԽ͢ΔͱͰ͖Δ͜ͱ What you can implement when

    vectorize texts 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ ͕Ͱ͖Δ Can calculate similarity between vectors Can perform numerical operations such as addition and subtraction against vectors
  15. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ ɹ ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ ͕Ͱ͖Δ Can calculate similarity

    between vectors Can perform numerical operations such as addition and subtraction between vectors ← ࠓճ͸ ͬͪ͜ This time we talk about this mainly. ϕΫτϧԽ͢ΔͱͰ͖Δ͜ͱ What you can implement when vectorize texts
  16. • 2 ͭͷϕΫτϧͷؒʹͳ֯͢౓ΛٻΊΔ͜ͱ ͰɺϕΫτϧͷ޲͖ͷྨࣅ౓Λࢉग़Ͱ͖Δ • By calculating the angle between

    two vectors, the similarity of vector orientation can be calculated • ίαΠϯྨࣅ౓͕Ұൠత • + Ͱਖ਼ͷ૬ؔɺ- Ͱෛͷ૬ؔ • Cosine similarity is generally used. • Plus means positive correction, negative means negative correction 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ݱۚ A ۜߦ ஍୅Ո௞ cos(‘ݱۚ’, ‘Aۜߦ’) = 0.85 cos(‘ݱۚ’, ‘஍୅Ո௞’) = 0.05 Rent expenses Rent expenses Cash Cash Cash Bank A Bank A Can calculate similarity between vectors
  17. จষؒͷྨࣅ౓ΛٻΊΔ͜ͱ͕Ͱ͖ΔͷͰɺ͜ΕΒιϦϡʔγϣϯ͕࣮ݱͰ͖Δ Since similarity between sentences can be determined, you can

    apply it for the below solution 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ • Ϩίϝϯυʢྨࣅ౓͕ߴ͍΋ͷʣ- Recommendations (highly similarity) • ҟৗ஋ݕग़ (ྨࣅ౓͕௿͍΋ͷ) - Outlier detection (low similarity) • ෼ྨ໰୊ʢྨࣅ౓͕͍ۙ΋ͷಉ࢜Ͱ෼ྨ͢Δʣ- Classification (Classify by its similarity) Can calculate similarity between vectors
  18. จষؒͷྨࣅ౓ΛٻΊΔ͜ͱ͕Ͱ͖ΔͷͰɺ͜ΕΒιϦϡʔγϣϯ͕࣮ݱͰ͖Δ Since similarity between sentences can be determined, you can

    apply it for the below solution 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ • Ϩίϝϯυʢྨࣅ౓͕ߴ͍΋ͷʣ- Recommendations (highly similarity) • ҟৗ஋ݕग़ (ྨࣅ౓͕௿͍΋ͷ) - Outlier detection (low similarity) • ෼ྨ໰୊ʢྨࣅ౓͕͍ۙ΋ͷಉ࢜Ͱ෼ྨ͢Δʣ- Classification (Classify by its similarity) Can calculate similarity between vectors
  19. 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ Google ݕࡧͷϝχϡʔͷྫ΋ྫߟ͑ͯΈΑ͏ Let's think about an example of

    a Google search menu one as well Can calculate similarity between vectors Ref: https://www.google.com/
  20. 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ Can calculate similarity between vectors Ref: https://www.google.com/ Google

    ݕࡧͷϝχϡʔͷྫ΋ྫߟ͑ͯΈΑ͏ Let's think about an example of a Google search menu one as well
  21. 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ࣮͸ݕࡧϫʔυ͝ͱʹɺϝχϡʔͷฒͼॱ͕มΘΔ The order of the menu changes for

    each search term. Can calculate similarity between vectors Ref: https://www.google.com/
  22. • ϕΫτϧ͸୯ͳΔଟ࣍ݩ഑ྻͳͷͰɺ࣍ݩ਺ ͕߹͑͹Ճࢉɾݮࢉʢ߹੒ʣ͕Ͱ͖Δ • Vectors are simply multidimensional arrays, so

    they can be added or subtracted (combined) if the number of dimensions matches. • ϕΫτϧಉ࢜Λ߹੒͢Δ͜ͱͰɺෳ਺ͷϕΫ τϧͷҙຯΛ࣋ͬͨ··ɺҰͭͷϕΫτϧʹ ͢Δ͜ͱ͕Ͱ͖Δ • Vectors can be combined into a single vector with the meaning of multiple vectors 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ IT ΦϨϯδ ۚ༥ܥ MoneyForward Can perform numerical operations such as addition and subtraction against vectors Orange Fintech
  23. ίϨʹ͍ۙ΋ͷ͕࣮ݱͰ͖Δ Something close to this can be implemented. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ

    Ref: https://www.google.com/ ͭ·Γɺ Can perform numerical operations such as addition and subtraction against vectors So, IT Orange Fintech
  24. ίϨʹ͍ۙ΋ͷ΋࣮ݱͰ͖Δ (લͷྫͳΒ IT ͕ώοτ) Something similar to this can also

    be implemented (IT will hit in the previous example). 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ Ref: https://www.google.com/ ΋ͪΖΜݮࢉ΋Ͱ͖ΔͷͰɺ Of course, we can also subtract them, Can perform numerical operations such as addition and subtraction against vectors MoneyForward -Fintech -Orange
  25. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ϕΫτϧͷ߹੒Λ࢖͏͜ͱͰɺ͜Μͳ͜ͱ͕Ͱ͖Δ Using vector composition, we can do this

    Can perform numerical operations such as addition and subtraction against vectors 1. target_words ͷϕΫτϧΛܭࢉ͠ɺՃࢉ 2. ͦͯ͠ candidate_words ͷͦΕͧΕͱൺֱ 1. Compute and add vectors of target_words 2. Then compare the vector with each of the candidate_words ones
  26. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ͪͳΈʹ ‘Macbook’ Λ ‘Mac’ ʹ͢Δͱ͜͏ͳΔ If I change

    'Macbook' to 'Mac', the result becomes like this Can perform numerical operations such as addition and subtraction against vectors
  27. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ಉ͡ख๏ͰɺΩϟϥΫλʔ࿈૝ήʔϜ΋࡞ΕΔ The same technique can be used to

    create character associative games. Can perform numerical operations such as addition and subtraction against vectors
  28. • (લड़ͷ)ςΩετͷϕΫτϧɾ෼ࢄදݱΛऔಘͰ͖Δ API • ref: https://platform.openai.com/docs/guides/embeddings • API to obtain

    vector/distributed representation of text • 23೥6݄ʹՁ֨վఆ͞Εɺada ϞσϧͰͦΕ·Ͱͷ 75% Φϑͷ
 $ 0.0001/ 1K token ʹͳͬͨ • Prices were revised in June 2023 to $ 0.0001/ 1K tokens, 75% off the previous price for the ada model. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API
  29. • 24೥1݄ʹߋʹ҆ՁͳϞσϧ text-embedding-3-small ͕ొ৔͠ɺߋʹ 80% Φϑʹ • લͷϞσϧ (ada) ΑΓ΋ߴਫ਼౓ͳϞσϧ

    text-embedding-3-large ΋ొ৔ • In January 2012, an even less expensive model, text-embedding-3-small, became available at an additional 80% off! • A model text-embedding-3-large, which is more accurate than the previous model (ada), also became available • ߋʹ࣍ݩ਺ͷ࡟ݮΛެࣜͰαϙʔτɻϕΫτϧܭࢉͷߴ଎Խ΍ετϨʔδ༰ྔͷ࡟ݮ͕ݟࠐΊΔ • Furthermore, the reduction of the number of dimensions is supported by the formula, which is expected to speed up vector calculations and reduce storage capacity. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API
  30. • ͔͠΋ΑΓ҆Ձʹར༻Ͱ͖Δ Batch API ΋ར༻Մೳ • Ϩεϙϯε͕஗͘ͳΔ(24 ࣌ؒҎ಺)୅ΘΓʹɺ൒ֹͰར༻Մೳ • Batch

    API is also available for an even lower cost. • Slow response (within 24 hours), but available at half price OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API • ϦΞϧλΠϜੑͷཁٻ͕௿͍Ϟϊ΍ɺॳظσʔλߏஙͳͲʹద͍ͯ͠Δ • Suitable for low real-time objects, initial data construction, etc.
  31. • τʔΫϯͷ໨҆͸ҎԼ͔Βௐ΂Δ͜ͱ͕Ͱ ͖Δɻ • If you are concerned about the

    tokens, you can find out more about them below. • https://platform.openai.com/tokenizer • $1 ࢖͏ͷʹ 100 ສ ~ 5,000 ສจࣈ͘Β͍ ౤͛Δඞཁ͕͋ΔͷͰɺίετ΋ͦ͜·Ͱ ؾʹͳΒͳ͍ • You need to send about 1 ~ 50 million letters to spend $1, so the cost is not much of a concern. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API
  32. 1SFSFWJFXFENPEFM Ґਖ਼౴཰ ҐἬਖ਼౴཰ ̍Ґਖ਼౴཰XJUIPVU&OHMJTI ҐἬਖ਼౴཰XJUIPVU&OHMJTI     Պ໨ϨίϝϯυͰ΋ࣄલݕ౼Ϟσϧͱൺ΂ͯɺਫ਼౓͕େ͖͘޲্

    Significantly improved accuracy in account recommendations compared to pre-reviewed model 0QFO"*UFYUFNCFEEJOHMBSHF Ґਖ਼౴཰ ҐἬਖ਼౴཰ ̍Ґਖ਼౴཰XJUIPVU&OHMJTI ҐἬਖ਼౴཰XJUIPVU&OHMJTI     OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API
  33. • ಉ͡ςΩετΛૹͬͨࡍͷϨεϙϯε͸ৗʹҰఆͳͷͰɺ ϕΫτϧΩϟογϡͷػߏΛ࡞ΔͱϦΫΤετΛݮΒͤΔɻ • Since the response is always constant

    when the same text is sent, a vector cache mechanism can be created to reduce requests. • OpenAI ͷϨεϙϯε͸ (ϦΫΤετʹΑΔ͕) ਺ඵ͔͔Δͱ ͖΋͋ΔͷͰɺϨεϙϯελΠϜվળͷͨΊʹ΋ϕΫτϧ Ωϟογϡ͸͋ͬͨํ͕Α͍ • API’s response can take several seconds (depending on the request), so vector caching is recommended to improve response time. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API
  34. • API ʹ͸ҰൠతͳϨʔτϦϛοτ͕ઃఆ͞Ε͍ͯΔ • ݱࡏ͸ Tier ʹΑͬͯมΘΔ • API has

    a general rate limit • Currently varies depending on Tier • ϦΫΤετ਺ʹ΋ΑΔ͕ɺΩϟογϡػߏ΋ ͋Ε͹ֻ͔Δ͜ͱ͸͋·Γແ͍ • Depends on the number of requests, but with a cache mechanism, there is little to worry about. Embedding API ࢖༻࣌ͷϝϞɾ஫ҙ఺ Notes on using Embedding API
  35. • ͕ɺͦΕͱ͸ผͰγεςϜશମϨϕϧͰ͔ ͔ΔϨʔτϦϛοτͷΑ͏ͳ΋ͷ͕Ͳ͏΍ Βଘࡏ͢Δ • But apart from that, there

    is apparently some kind of rate limit that is applied at the system-wide level. 1. ϨʔτϦϛοτ͕2छྨଘࡏ͢Δ • ଟ͍࣌͸਺ճʹ̍ճ͘Β͍ͷස౓Ͱ͜ͷϦϛοτʹ఍৮͢Δ • I encounter this rate limit error about once every few times at most.
  36. ղܾ๏: ϦτϥΠػߏΛಋೖ͢Δ - Solution: Implement a retry mechanism ϨʔτϦϛοτͷରॲ๏ •

    OpenAI ΋Exponential Backoff Λਪ঑ɺPython ͸ ͍͔ͭ͘ϥΠϒϥϦͷαϯϓϧ΋ࡌ͍ͤͯΔɻ • Ref: https://platform.openai.com/docs/guides/ rate-limits/retrying-with-exponential-backoff • OpenAI also recommends Exponential Backoff and some Python sample code is also provided Rate Limit ΁ͷରॲ๏ͱͯ͠͸ඍົʹ΋ࢥ͑Δ͕ɺಋೖҎ߱Ͱൃੜ݅਺͸΄΅θϩʹɻ Although this may seem like a subtle way to deal with the Rate Limit, the number of occurrences has dropped to almost zero since its introduction.
  37. ࣄྫ1: Պ໨ಉ࢜ͰͷϨίϝϯυ Example 1:Recommendation between accounts • Redis ΛϕΫτϧΩϟογϡอଘ༻ʹར༻ •

    ͜ͷ࢓૊ΈͰඅ༻Λ཈͑ΒΕ͓ͯΓɺྦྷܭͰ ਺ ઍສՊ໨ఔ౓ΛϕΫτϧԽ͕ͨ͠ɺඅ༻͸΄ͱ ΜͲֻ͔͍ͬͯͳ͍ • Use Redis as vector cache storage • Thanks to this mechanism, a total of about tens of millions of accounts have been vectorized so far, but at little or almost no cost. • Embedding API ͕࢖͓͔͑ͨ͛ͰɺGPU ΍େྔͷ CPU/ϝϞϦΛ٧ΜͩߴՁͳϚγϯ͕ෆཁʹ Embedding API eliminates the need for expensive machines packed with GPUs and lots of CPU/memory
  38. ࣄྫ2: ྖऩॻͷϑϦʔϫʔυݕࡧ Example 2: Free word search for receipts Vector

    DB Receipt.pdf, jpg, etc… [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] User • Vectorize the text of the contents of the receipt • OCR, use ChatGPT, etc… • Then store it in Vector DB, etc • ྖऩॻͷத਎ͷςΩετΛ༧ΊϕΫτϧԽ • OCR, ChatGPT ʹ౤͛Δ etc… • ͦΕΛ Vector DB ͳͲʹอଘ͓ͯ͘͠ Vectorization Upload
  39. ࣄྫ2: ྖऩॻͷϑϦʔϫʔυݕࡧ Example 2: Free word search for receipts Vector

    DB • Ϣʔβ͕ೖྗͨ͠ݕࡧϫʔυΛϕΫτϧԽɺ DB ্ͷ஋͔Β͍ۙ͠΋ͷΛϐοΫ User • Vectorize search words entered by the user, and pick the closest ones from the values on the DB. 12/1ͷ1ສԁͷྖऩॻ A receipt of10,000 yen on December 1. [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] Search Receipt_Dec_1.pdf Vectorization
  40. ࣄྫ3: ͱ͋ΔྖऩॻͱྨࣅͷྖऩॻΛ୳͢ Example 3: Find receipts that are similar to

    a certain receipt. Vector DB • Text to File ͕Ͱ͖Ε͹ɺ΋ͪΖΜ File to File ࣮ͩͬͯ૷Ͱ͖ͪΌ͏ • If Text to File can be implemented, of course File to File can also be implemented. ͜Εͱྨࣅͷྖऩॻ͕΄͍͠ I need a receipt similar to this one. [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] Text extraction Vectorization Receipt, Dec. 1, … Search
  41. จষϕΫτϧΛੜ੒Ͱ͖Δ͜ͱͰͰ͖Δ͜ͱ What can be done by being able to generate

    sentence vectors ςΩετʹม׵Ͱ͖Δ΋ͷͳΒɺͳΜͰ΋Ϩίϝϯυ etc Λ࣮૷Ͱ͖Δɻ Anything that can be converted to text can be used to implement recommendations, etc. ͔͠΋ ChatGPT ͷ͓ӄͰɺը૾ etc ΛςΩετʹม׵͢Δෑډ΋௿͘ͳ͍ͬͯΔ Also, thanks to ChatGPT, the difficulty of converting images and other data to text has been reduced.
  42. • OpenAI ࣾͷ Embedding API Λ׆༻͢Δ͜ͱͰɺML ΤϯδχΞ͕ ډͳ͍νʔϜͰ΋ AI ιϦϡʔγϣϯΛ؆୯͔ͭ҆Ձʹ࣮ݱͰ͖ͨ

    • OpenAI's Embedding API made implementing an AI solution for a team without an ML engineer easy and inexpensive. • Embedding API Λ׆༻͢Δ͜ͱͰɺϨίϝϯυ΍ҟৗ஋ݕग़ɺςΩ ετ෼ྨͳͲଟ༷ͳιϦϡʔγϣϯΛ࣮ݱͰ͖Δ • Embedding APIs can be used to implement various solutions such as recommendation, outlier detection, text classification, etc. ·ͱΊ - Summary
  43. ϚωʔϑΥϫʔυ͸ɺҰॹʹ੒ ௕͍͚ͯ͠Δ஥ؒΛืू͓ͯ͠ Γ·͢ɻ We are looking for people who can

    grow with us. ࠾༻αΠτ͸ͪ͜Β → Scan this QR to visit our recruitment site