Slide 1

Slide 1 text

୭Ͱ΋Ͱ͖Δʂ0QFO"* Embedding API Λ׆༻ͯ͠ɺ ߴ౓ͳϨίϝϯυػೳΛ࣮ݱͯ͠ΈΑ͏ By Atsumine Kondo - @sugoikondo A story about implementing an advanced recommendation function using the OpenAI Embedding API

Slide 2

Slide 2 text

ۙ౻ ๛ๆ Atsumine Kondo ● Backend / Frontend / Infra ● Scala, Kotlin, Python, etc… ● Vue.js, Nuxt.js, Next,.js etc… @sugoikondo MoneyForward.inc Group Management Solution dept Product development div

Slide 3

Slide 3 text

͜ͷൃදͰֶ΂Δ͜ͱ What you can learn from this presentation

Slide 4

Slide 4 text

● จষϕΫτϧͱ͸Կ͔͕Θ͔Δ You can learn what a text vector is ● ↑ ͕Θ͔Δ͜ͱͰɺҎԼιϦϡʔγϣϯ͕࣮ݱͰ͖Δ ● Ϩίϝϯυ, ҟৗ஋ݕग़, ෼ྨ໰୊, ݕࡧ etc… ● By learning the above, you can realize the following solution ● Recommendation, outlier detection, classification problems, search etc... ͜ͷൃදͰֶ΂Δ͜ͱɾͰ͖ΔΑ͏ʹͳΔ͜ͱ

Slide 5

Slide 5 text

͸͡Ίʹ Introduction

Slide 6

Slide 6 text

● 8 ݄ʹ AI ʹΑΔ࿈݁Պ໨Ϩίϝϯυ ػೳΛϦϦʔε We released an AI-based consolidated account recommendation in Aug. 2023 ● OpenAI ࣾͷ Embedding API Λ׆༻ Using OpenAI's Embedding API Ϋϥ΢υ࿈݁ձܭʹՊ໨ϨίϝϯυػೳΛ࣮૷ ref: https://corp.moneyforward.com/news/release/service/ 20230804-mf-press-1/ We’ve Implemented a subject recommendation function in our application ※ ಛڐग़ئࡁΈ Pattent applied

Slide 7

Slide 7 text

● ݸࣾͷצఆՊ໨ʹରͯ͠ɺҙຯతʹ͍ۙ ਌ձࣾͷ࿈݁Պ໨Λ্Ґ 3 ͭΛఏҊ͢Δ ● Suggest the top three parent company consolidated accounts that are semantically close to the individual company's accounts. ref: https://corp.moneyforward.com/news/release/service/ 20230804-mf-press-1/ Ϋϥ΢υ࿈݁ձܭʹՊ໨ϨίϝϯυػೳΛ࣮૷ We’ve Implemented a subject recommendation function in our application

Slide 8

Slide 8 text

● ࿩୊ੑͷߴ͔͞Βɺଟ͘ͷϝσΟΞ ͰऔΓ্͛ͯ΋Β͍·ͨ͠ɻ Due to the high profile of the topic, we have had a lot of media coverage. ● https://cloud.watch.impress.co.jp/docs/ news/1522209.html ● https://it.impress.co.jp/articles/-/25192 ● https://officenomikata.jp/news/15534/ ● In total, about 8 articles... ଟ͘ͷϝσΟΞͰऔΓ্͛ͯ௖͖·ͨ͠ We have had a lot of media coverage.

Slide 9

Slide 9 text

● צఆՊ໨ͱ͸ɺࢿ࢈ͳͲͷऔҾΛه࿥͢Δࡍʹ࢖͏໊শɾݟग़͠ ● Accounts are names or headings used to record transactions of assets, etc. आํ Debit ିํ Credit ஍୅Ո௞ Rent expenses 50,000 ී௨༬ۚ Ordinary deposit 50,000 ● ͜͜Ͱ͍͏ʮ஍୅Ո௞ʯͱʮී௨༬ۚʯ͕ͦΕͧΕצఆՊ໨ ● The “Rent expenses" and “Ordinary deposit" here are the accounts respectively. ྫ: Ո௞ 5 ສԁΛޱ࠲Ҿ͖མͱ͠Ͱࢧ෷ͬͨ৔߹ e.g. You paid 50,000 yen rent via direct debit. צఆՊ໨/࿈݁Պ໨ͱ͸ʁ What is an account/consolidated account?

Slide 10

Slide 10 text

● ࿈݁ձܭจ຺Ͱ͸ɺάϧʔϓ಺ͷձࣾͷ࿈݁Պ໨ͱֹۚΛٵ্͍͛ɺͦΕΒΛ਌ձࣾͷՊ໨ Ұͭʢ࿈݁Պ໨ʣʹू໿ͤ͞Δ࡞ۀ͕͋Δɻ ● In the consolidation accounting context, there is a process of taking the consolidated accounts and balance of the companies in the group and consolidating them into one account (consolidated account) of the parent company. צఆՊ໨/࿈݁Պ໨ͱ͸ʁ What is an account/consolidated account? ࢠձࣾA Company A ਌ձࣾ Parent Company ී௨༬ۚ Ordinary Deposit Aۜߦ Bank A ݱۚٴͼ༬ۚ Cash & Deposit ࢠձࣾB Company B

Slide 11

Slide 11 text

צఆՊ໨໊͸ձࣾʹΑ͔ͬͯͳΓදهΏΕ͕͋Δ Account names vary considerably from company to company.

Slide 12

Slide 12 text

ΑΓྑ͍צఆՊ໨໊ϨίϝϯυΛͲ͏࣮ݱ͢Δ͔ʁ How to achieve better account recommendations? ● ୯७ͳ͍͋·͍ݕࡧɾฤूڑ཭ͳͲͰ͸ٵऩ͖͠Εͳ͍ύλʔϯ͕ଟ͍ ● ྫ: ʮʓʓۜߦʯͱʮී௨༬ۚʯɺʮݱۚٴͼ༬ۚʯͳͲ ● ւ֎ࢠձ͕ࣾ͋Δ৔߹͸ʮʓʓ BankʯͳͲ೔ຊޠҎ֎ͷϞϊ͕དྷΔέʔε΋͋Δ ● Many patterns cannot be absorbed by simple fuzzy search, edit distance, etc. ● Ex: “XX bank” and “Ordinary deposit”, “Cash and deposits”, etc. ● If there is an overseas subsidiary, there are cases where things other than Japanese are sent. ● ҙຯͷۙ͞΋Ճຯͯ͠ɺݸࣾͷצఆՊ໨ʹҰ൪͍ۙ࿈݁Պ໨ΛϨίϝϯυ ͢Δඞཁ͕͋Δɻ ● It is necessary to recommend the consolidated accounts that are closest to the individual company's accounts, taking into account the proximity in meaning.

Slide 13

Slide 13 text

͔͠΋ɺ͜ΕΛղܾ͢Δʹ͸·ͩͨ͘͞Μͷน͕… And there are still many barriers to solving this...

Slide 14

Slide 14 text

צఆՊ໨໊Ϩίϝϯυ࣮ݱʹཱͪ͸͔ͩΔน Barriers to achieving account recommendation ● ϦϦʔε͔Β೔͕ઙ͘ɺֶशʹ࢖͑Δσʔλ͕े෼ʹू·͍ͬͯͳ͍ ● Պ໨ม׵ͷ࣮੷͸ɺ͍͍ͤͥ਺ඦ ~ ઍ݅͋Δ͔Ͳ͏͔ ● ֶशʹ࢖͏ͳΒ࠷௿Ͱ΋਺ສ ~ ਺ेສఔ౓͸΄͍͠ ● At the moment there was still little data available for training. ● At most, there are a few hundred ~ a thousand account conversion data ● Training the model may require tens or hundreds of thousands of data. ● ML Ϟσϧͷϝϯςίετ΍ਓࡐͷ֬อ͕ࠔ೉ ● ໰୊ൃੜ࣌ʹରԠͰ͖Δਓͷ༻ҙ͔ΒɺϞσϧ࠶ֶशͳͲͷίετ΋ແࢹͰ͖ͳ͍ ● Difficulty in securing maintenance costs and human resources for ML models ● Preparing people who can respond to problems when they occur is difficult, and the cost of re-training models cannot be ignored.

Slide 15

Slide 15 text

ͦ͜Ͱௐ΂͍ͯͯग़ձͬͨ΋ͷ͕… So I was researching and came across…

Slide 16

Slide 16 text

OpenAI Embedding API

Slide 17

Slide 17 text

͔͠͠ɺEmbedding API ʹ͍ͭͯ࿩͢લʹɺ ·ͣ͸จষϕΫτϧ / ෼ࢄදݱʹֶ͍ͭͯͼ·͠ΐ͏ɻ But before we talk about the Embedding API, Let's first learn about Text vector / Embedding representation.

Slide 18

Slide 18 text

About Embedding / Word2Vec

Slide 19

Slide 19 text

● จষΛ਺஋/ϕΫτϧʹม׵͢Δٕज़ɾख๏ͷ͜ͱ ● A technology or method of converting text into vectors. About Embedding / Word2Vec ”ݱ͓ۚΑͼ༬ۚ” [[-0.03455162],[-0.01306203], [ 0.01672893],…, [-0.00129271], [ 0.00694819],[-0.01055199]] ● ϕΫτϧɺ෼ࢄදݱ͋Δ͍͸ຒΊࠐΈදݱͱݺ͹ΕΔ͜ͱ΋͋Δɻ ● Sometimes called vector, distributed or embedded representation. ‘Cash and deposits’

Slide 20

Slide 20 text

About Embedding / Word2Vec ex: ʮݱۚʯͱʮෛ࠴ʯ͕ͦΕͧΕ [0.6, 0.8], [-0.3, 0.4]ͱ ͳΔ৔߹ When "Cash" and "Liabilities" become [0.4, 0.8] and [-0.3, 0.9], respectively ුಈখ਺఺ͷ഑ྻʹͳΔ͜ͱͰɺ࠲ඪ·ͨ͸ϕΫτϧΛද͢͜ͱ͕Ͱ͖Δɻ It can represent coordinates or vectors by being a floating-point array. -0.5 0.5 1 0.5 1 ݱۚ ෛ࠴ Liabilities Cash 0

Slide 21

Slide 21 text

਺஋Խ/ϕΫτϧԽ͢ΔͱԿ͕خ͍͠ͷ͔ʁ What can be done by quantifying/vectorizing ?

Slide 22

Slide 22 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ ɹ ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ϕΫτϧԽ͢ΔͱͰ͖Δ͜ͱ What you can implement when vectorize texts 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ ͕Ͱ͖Δ Can calculate similarity between vectors Can perform numerical operations such as addition and subtraction against vectors

Slide 23

Slide 23 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ ɹ ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ ͕Ͱ͖Δ Can calculate similarity between vectors Can perform numerical operations such as addition and subtraction between vectors ← ࠓճ͸ ͬͪ͜ This time we talk about this mainly. ϕΫτϧԽ͢ΔͱͰ͖Δ͜ͱ What you can implement when vectorize texts

Slide 24

Slide 24 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ Can calculate similarity between vectors

Slide 25

Slide 25 text

● 2 ͭͷϕΫτϧͷؒʹͳ֯͢౓ΛٻΊΔ͜ͱ ͰɺϕΫτϧͷ޲͖ͷྨࣅ౓Λࢉग़Ͱ͖Δ ● By calculating the angle between two vectors, the similarity of vector orientation can be calculated ● ίαΠϯྨࣅ౓͕Ұൠత ● + Ͱਖ਼ͷ૬ؔɺ- Ͱෛͷ૬ؔ ● Cosine similarity is generally used. ● Plus means positive correction, negative means negative correction 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ݱۚ A ۜߦ ஍୅Ո௞ cos(‘ݱۚ’, ‘Aۜߦ’) = 0.85 cos(‘ݱۚ’, ‘஍୅Ո௞’) = 0.05 Rent expenses Rent expenses Cash Cash Cash Bank A Bank A Can calculate similarity between vectors

Slide 26

Slide 26 text

จষؒͷྨࣅ౓ΛٻΊΔ͜ͱ͕Ͱ͖ΔͷͰɺ͜ΕΒιϦϡʔγϣϯ͕࣮ݱͰ͖Δ Since similarity between sentences can be determined, you can apply it for the below solution 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ • Ϩίϝϯυʢྨࣅ౓͕ߴ͍΋ͷʣ- Recommendations (highly similarity) • ҟৗ஋ݕग़ (ྨࣅ౓͕௿͍΋ͷ) - Outlier detection (low similarity) • ෼ྨ໰୊ʢྨࣅ౓͕͍ۙ΋ͷಉ࢜Ͱ෼ྨ͢Δʣ- Classification (Classify by its similarity) Can calculate similarity between vectors

Slide 27

Slide 27 text

จষؒͷྨࣅ౓ΛٻΊΔ͜ͱ͕Ͱ͖ΔͷͰɺ͜ΕΒιϦϡʔγϣϯ͕࣮ݱͰ͖Δ Since similarity between sentences can be determined, you can apply it for the below solution 1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ • Ϩίϝϯυʢྨࣅ౓͕ߴ͍΋ͷʣ- Recommendations (highly similarity) • ҟৗ஋ݕग़ (ྨࣅ౓͕௿͍΋ͷ) - Outlier detection (low similarity) • ෼ྨ໰୊ʢྨࣅ౓͕͍ۙ΋ͷಉ࢜Ͱ෼ྨ͢Δʣ- Classification (Classify by its similarity) Can calculate similarity between vectors

Slide 28

Slide 28 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ฤूڑ཭΍ LIKE ݕࡧͰ͸΄΅ແཧͳϨίϝϯυ΋… Recommendations that are almost impossible with edit distance or LIKE search… Can calculate similarity between vectors

Slide 29

Slide 29 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ϕΫτϧൺֱͳΒͰ͖Δɻ - But vector comparisons can. Can calculate similarity between vectors

Slide 30

Slide 30 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ Google ݕࡧͷϝχϡʔͷྫ΋ྫߟ͑ͯΈΑ͏ Let's think about an example of a Google search menu one as well Can calculate similarity between vectors Ref: https://www.google.com/

Slide 31

Slide 31 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ Can calculate similarity between vectors Ref: https://www.google.com/ Google ݕࡧͷϝχϡʔͷྫ΋ྫߟ͑ͯΈΑ͏ Let's think about an example of a Google search menu one as well

Slide 32

Slide 32 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ࣮͸ݕࡧϫʔυ͝ͱʹɺϝχϡʔͷฒͼॱ͕มΘΔ The order of the menu changes for each search term. Can calculate similarity between vectors Ref: https://www.google.com/

Slide 33

Slide 33 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ͱ͍͏Θ͚ͰαΫοͱϨίϝϯυͤͯ͞ΈΔ So let's get a quick recommendation result Can calculate similarity between vectors

Slide 34

Slide 34 text

1. ϕΫτϧಉ࢜ͷྨࣅ౓ΛଌΔ͜ͱ͕Ͱ͖Δ ͍ͦͦۙ݁͜͜͠Ռ͕औΕΔ - We can retrieve reasonably close results Can calculate similarity between vectors

Slide 35

Slide 35 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ Can perform numerical operations such as addition and subtraction against vectors

Slide 36

Slide 36 text

● ϕΫτϧ͸୯ͳΔଟ࣍ݩ഑ྻͳͷͰɺ࣍ݩ਺ ͕߹͑͹Ճࢉɾݮࢉʢ߹੒ʣ͕Ͱ͖Δ ● Vectors are simply multidimensional arrays, so they can be added or subtracted (combined) if the number of dimensions matches. ● ϕΫτϧಉ࢜Λ߹੒͢Δ͜ͱͰɺෳ਺ͷϕΫ τϧͷҙຯΛ࣋ͬͨ··ɺҰͭͷϕΫτϧʹ ͢Δ͜ͱ͕Ͱ͖Δ ● Vectors can be combined into a single vector with the meaning of multiple vectors 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ IT ΦϨϯδ ۚ༥ܥ MoneyForward Can perform numerical operations such as addition and subtraction against vectors Orange Fintech

Slide 37

Slide 37 text

ίϨʹ͍ۙ΋ͷ͕࣮ݱͰ͖Δ Something close to this can be implemented. 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ Ref: https://www.google.com/ ͭ·Γɺ Can perform numerical operations such as addition and subtraction against vectors So, IT Orange Fintech

Slide 38

Slide 38 text

ίϨʹ͍ۙ΋ͷ΋࣮ݱͰ͖Δ (લͷྫͳΒ IT ͕ώοτ) Something similar to this can also be implemented (IT will hit in the previous example). 2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ Ref: https://www.google.com/ ΋ͪΖΜݮࢉ΋Ͱ͖ΔͷͰɺ Of course, we can also subtract them, Can perform numerical operations such as addition and subtraction against vectors MoneyForward -Fintech -Orange

Slide 39

Slide 39 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ϕΫτϧͷ߹੒Λ࢖͏͜ͱͰɺ͜Μͳ͜ͱ͕Ͱ͖Δ Using vector composition, we can do this Can perform numerical operations such as addition and subtraction against vectors 1. target_words ͷϕΫτϧΛܭࢉ͠ɺՃࢉ 2. ͦͯ͠ candidate_words ͷͦΕͧΕͱൺֱ 1. Compute and add vectors of target_words 2. Then compare the vector with each of the candidate_words ones

Slide 40

Slide 40 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ݁Ռ͸͜͏ͳΔ - The result is this. Can perform numerical operations such as addition and subtraction against vectors

Slide 41

Slide 41 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ͪͳΈʹ ‘Macbook’ Λ ‘Mac’ ʹ͢Δͱ͜͏ͳΔ If I change 'Macbook' to 'Mac', the result becomes like this Can perform numerical operations such as addition and subtraction against vectors

Slide 42

Slide 42 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ಉ͡ख๏ͰɺΩϟϥΫλʔ࿈૝ήʔϜ΋࡞ΕΔ The same technique can be used to create character associative games. Can perform numerical operations such as addition and subtraction against vectors

Slide 43

Slide 43 text

2. ϕΫτϧಉ࢜ͷՃࢉɾݮࢉͳͲ਺஋ܭࢉ͕Ͱ͖ΔΑ͏ʹͳΔ ΋ͪΖΜ౰ͯΒΕΔ - Of course guessable. Can perform numerical operations such as addition and subtraction against vectors

Slide 44

Slide 44 text

OpenAI ͷEmbedding API ʹ͍ͭͯ About OpenAI's Embedding API

Slide 45

Slide 45 text

● (લड़ͷ)ςΩετͷϕΫτϧɾ෼ࢄදݱΛऔಘͰ͖Δ API ● ref: https://platform.openai.com/docs/guides/embeddings ● API to obtain vector/distributed representation of text ● 23೥6݄ʹՁ֨վఆ͞Εɺada ϞσϧͰͦΕ·Ͱͷ 75% Φϑͷ
 $ 0.0001/ 1K token ʹͳͬͨ ● Prices were revised in June 2023 to $ 0.0001/ 1K tokens, 75% off the previous price for the ada model. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API

Slide 46

Slide 46 text

● 24೥1݄ʹߋʹ҆ՁͳϞσϧ text-embedding-3-small ͕ొ৔͠ɺߋʹ 80% Φϑʹ ● લͷϞσϧ (ada) ΑΓ΋ߴਫ਼౓ͳϞσϧ text-embedding-3-large ΋ొ৔ ● In January 2012, an even less expensive model, text-embedding-3-small, became available at an additional 80% off! ● A model text-embedding-3-large, which is more accurate than the previous model (ada), also became available ● ߋʹ࣍ݩ਺ͷ࡟ݮΛެࣜͰαϙʔτɻϕΫτϧܭࢉͷߴ଎Խ΍ετϨʔδ༰ྔͷ࡟ݮ͕ݟࠐΊΔ ● Furthermore, the reduction of the number of dimensions is supported by the formula, which is expected to speed up vector calculations and reduce storage capacity. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API

Slide 47

Slide 47 text

● ͔͠΋ΑΓ҆Ձʹར༻Ͱ͖Δ Batch API ΋ར༻Մೳ ● Ϩεϙϯε͕஗͘ͳΔ(24 ࣌ؒҎ಺)୅ΘΓʹɺ൒ֹͰར༻Մೳ ● Batch API is also available for an even lower cost. ● Slow response (within 24 hours), but available at half price OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API ● ϦΞϧλΠϜੑͷཁٻ͕௿͍Ϟϊ΍ɺॳظσʔλߏஙͳͲʹద͍ͯ͠Δ ● Suitable for low real-time objects, initial data construction, etc.

Slide 48

Slide 48 text

● τʔΫϯͷ໨҆͸ҎԼ͔Βௐ΂Δ͜ͱ͕Ͱ ͖Δɻ ● If you are concerned about the tokens, you can find out more about them below. ● https://platform.openai.com/tokenizer ● $1 ࢖͏ͷʹ 100 ສ ~ 5,000 ສจࣈ͘Β͍ ౤͛Δඞཁ͕͋ΔͷͰɺίετ΋ͦ͜·Ͱ ؾʹͳΒͳ͍ ● You need to send about 1 ~ 50 million letters to spend $1, so the cost is not much of a concern. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API

Slide 49

Slide 49 text

1SFSFWJFXFENPEFM Ґਖ਼౴཰ ҐἬਖ਼౴཰ ̍Ґਖ਼౴཰XJUIPVU&OHMJTI ҐἬਖ਼౴཰XJUIPVU&OHMJTI Պ໨ϨίϝϯυͰ΋ࣄલݕ౼Ϟσϧͱൺ΂ͯɺਫ਼౓͕େ͖͘޲্ Significantly improved accuracy in account recommendations compared to pre-reviewed model 0QFO"*UFYUFNCFEEJOHMBSHF Ґਖ਼౴཰ ҐἬਖ਼౴཰ ̍Ґਖ਼౴཰XJUIPVU&OHMJTI ҐἬਖ਼౴཰XJUIPVU&OHMJTI OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API

Slide 50

Slide 50 text

● ಉ͡ςΩετΛૹͬͨࡍͷϨεϙϯε͸ৗʹҰఆͳͷͰɺ ϕΫτϧΩϟογϡͷػߏΛ࡞ΔͱϦΫΤετΛݮΒͤΔɻ ● Since the response is always constant when the same text is sent, a vector cache mechanism can be created to reduce requests. ● OpenAI ͷϨεϙϯε͸ (ϦΫΤετʹΑΔ͕) ਺ඵ͔͔Δͱ ͖΋͋ΔͷͰɺϨεϙϯελΠϜվળͷͨΊʹ΋ϕΫτϧ Ωϟογϡ͸͋ͬͨํ͕Α͍ ● API’s response can take several seconds (depending on the request), so vector caching is recommended to improve response time. OpenAI ͷ Embedding API ʹ͍ͭͯ About OpenAI’s Embedding API

Slide 51

Slide 51 text

● API ʹ͸ҰൠతͳϨʔτϦϛοτ͕ઃఆ͞Ε͍ͯΔ ● ݱࡏ͸ Tier ʹΑͬͯมΘΔ ● API has a general rate limit ● Currently varies depending on Tier ● ϦΫΤετ਺ʹ΋ΑΔ͕ɺΩϟογϡػߏ΋ ͋Ε͹ֻ͔Δ͜ͱ͸͋·Γແ͍ ● Depends on the number of requests, but with a cache mechanism, there is little to worry about. Embedding API ࢖༻࣌ͷϝϞɾ஫ҙ఺ Notes on using Embedding API

Slide 52

Slide 52 text

● ͕ɺͦΕͱ͸ผͰγεςϜશମϨϕϧͰ͔ ͔ΔϨʔτϦϛοτͷΑ͏ͳ΋ͷ͕Ͳ͏΍ Βଘࡏ͢Δ ● But apart from that, there is apparently some kind of rate limit that is applied at the system-wide level. 1. ϨʔτϦϛοτ͕2छྨଘࡏ͢Δ ● ଟ͍࣌͸਺ճʹ̍ճ͘Β͍ͷස౓Ͱ͜ͷϦϛοτʹ఍৮͢Δ ● I encounter this rate limit error about once every few times at most.

Slide 53

Slide 53 text

ղܾ๏: ϦτϥΠػߏΛಋೖ͢Δ - Solution: Implement a retry mechanism ϨʔτϦϛοτͷରॲ๏ ● OpenAI ΋Exponential Backoff Λਪ঑ɺPython ͸ ͍͔ͭ͘ϥΠϒϥϦͷαϯϓϧ΋ࡌ͍ͤͯΔɻ ● Ref: https://platform.openai.com/docs/guides/ rate-limits/retrying-with-exponential-backoff ● OpenAI also recommends Exponential Backoff and some Python sample code is also provided Rate Limit ΁ͷରॲ๏ͱͯ͠͸ඍົʹ΋ࢥ͑Δ͕ɺಋೖҎ߱Ͱൃੜ݅਺͸΄΅θϩʹɻ Although this may seem like a subtle way to deal with the Rate Limit, the number of occurrences has dropped to almost zero since its introduction.

Slide 54

Slide 54 text

จষϕΫτϧΛ׆༻ͯ͠Ͱ͖Δ͜ͱ What you can achieve with text vector

Slide 55

Slide 55 text

ࣄྫ1: Պ໨ಉ࢜ͰͷϨίϝϯυ Example 1:Recommendation between accounts • Redis ΛϕΫτϧΩϟογϡอଘ༻ʹར༻ • ͜ͷ࢓૊ΈͰඅ༻Λ཈͑ΒΕ͓ͯΓɺྦྷܭͰ ਺ ઍສՊ໨ఔ౓ΛϕΫτϧԽ͕ͨ͠ɺඅ༻͸΄ͱ ΜͲֻ͔͍ͬͯͳ͍ • Use Redis as vector cache storage • Thanks to this mechanism, a total of about tens of millions of accounts have been vectorized so far, but at little or almost no cost. • Embedding API ͕࢖͓͔͑ͨ͛ͰɺGPU ΍େྔͷ CPU/ϝϞϦΛ٧ΜͩߴՁͳϚγϯ͕ෆཁʹ Embedding API eliminates the need for expensive machines packed with GPUs and lots of CPU/memory

Slide 56

Slide 56 text

ࣄྫ2: ྖऩॻͷϑϦʔϫʔυݕࡧ Example 2: Free word search for receipts Vector DB Receipt.pdf, jpg, etc… [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] User • Vectorize the text of the contents of the receipt • OCR, use ChatGPT, etc… • Then store it in Vector DB, etc • ྖऩॻͷத਎ͷςΩετΛ༧ΊϕΫτϧԽ • OCR, ChatGPT ʹ౤͛Δ etc… • ͦΕΛ Vector DB ͳͲʹอଘ͓ͯ͘͠ Vectorization Upload

Slide 57

Slide 57 text

ࣄྫ2: ྖऩॻͷϑϦʔϫʔυݕࡧ Example 2: Free word search for receipts Vector DB • Ϣʔβ͕ೖྗͨ͠ݕࡧϫʔυΛϕΫτϧԽɺ DB ্ͷ஋͔Β͍ۙ͠΋ͷΛϐοΫ User • Vectorize search words entered by the user, and pick the closest ones from the values on the DB. 12/1ͷ1ສԁͷྖऩॻ A receipt of10,000 yen on December 1. [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] Search Receipt_Dec_1.pdf Vectorization

Slide 58

Slide 58 text

ࣄྫ3: ͱ͋ΔྖऩॻͱྨࣅͷྖऩॻΛ୳͢ Example 3: Find receipts that are similar to a certain receipt. Vector DB • Text to File ͕Ͱ͖Ε͹ɺ΋ͪΖΜ File to File ࣮ͩͬͯ૷Ͱ͖ͪΌ͏ • If Text to File can be implemented, of course File to File can also be implemented. ͜Εͱྨࣅͷྖऩॻ͕΄͍͠ I need a receipt similar to this one. [[-0.03455162],[-0.01306203],…, [ 0.00694819],[-0.01055199]] Text extraction Vectorization Receipt, Dec. 1, … Search

Slide 59

Slide 59 text

ཁ͸… After all…

Slide 60

Slide 60 text

จষϕΫτϧΛੜ੒Ͱ͖Δ͜ͱͰͰ͖Δ͜ͱ What can be done by being able to generate sentence vectors ςΩετʹม׵Ͱ͖Δ΋ͷͳΒɺͳΜͰ΋Ϩίϝϯυ etc Λ࣮૷Ͱ͖Δɻ Anything that can be converted to text can be used to implement recommendations, etc. ͔͠΋ ChatGPT ͷ͓ӄͰɺը૾ etc ΛςΩετʹม׵͢Δෑډ΋௿͘ͳ͍ͬͯΔ Also, thanks to ChatGPT, the difficulty of converting images and other data to text has been reduced.

Slide 61

Slide 61 text

·ͱΊ summary

Slide 62

Slide 62 text

● OpenAI ࣾͷ Embedding API Λ׆༻͢Δ͜ͱͰɺML ΤϯδχΞ͕ ډͳ͍νʔϜͰ΋ AI ιϦϡʔγϣϯΛ؆୯͔ͭ҆Ձʹ࣮ݱͰ͖ͨ ● OpenAI's Embedding API made implementing an AI solution for a team without an ML engineer easy and inexpensive. ● Embedding API Λ׆༻͢Δ͜ͱͰɺϨίϝϯυ΍ҟৗ஋ݕग़ɺςΩ ετ෼ྨͳͲଟ༷ͳιϦϡʔγϣϯΛ࣮ݱͰ͖Δ ● Embedding APIs can be used to implement various solutions such as recommendation, outlier detection, text classification, etc. ·ͱΊ - Summary

Slide 63

Slide 63 text

WE’RE HIRING!!

Slide 64

Slide 64 text

ϚωʔϑΥϫʔυ͸ɺҰॹʹ੒ ௕͍͚ͯ͠Δ஥ؒΛืू͓ͯ͠ Γ·͢ɻ We are looking for people who can grow with us. ࠾༻αΠτ͸ͪ͜Β → Scan this QR to visit our recruitment site

Slide 65

Slide 65 text

No content