Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AIアプリ Dojo #2 Hugging Face Transformers入門

AIアプリ Dojo #2 Hugging Face Transformers入門

生成系AIへの注目が集まる中、エンジニアの皆さんは、AIアプリを検討する機会が増えていくと思います。このセッションでは、自然言語理解や自然言語生成をコンピューターを使って処理していく上で、広く使われているHugging Face Transformersについて、インストールから簡単な実行までを確認しながら学んでいきます。
必須ではありませんが、事前に gitコマンドやPythonのインストールを済ませておくと、参加される皆さん自身も学びながらHugging Face Transformersを試せると思います。

Akira Onishi (IBM)

June 28, 2023
Tweet

More Decks by Akira Onishi (IBM)

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ 1SPQFSUZ 7BMVF ࢯ໊ େ੢ জ 5XJUUFS-JOLFE*O POJBL *5ۀքྺ ೥໨

    ϚΠϒʔϜ μΠΤοτ )BTI5BH ͍͍Ͷ͐੩Ԭੜ׆ ࠲ӈͷ໏ ౿·Εͯ΋ͳ্ཱ͓͕ͪΔಓͷ૲ Α͘࢖͏ٕ ೴಺ม׵Ͱࣗ෼ΛϙδςΟϒʹ IUUQTXXXGBDFCPPLDPNBLJSBPOJTIJ 'BDFCPPLʮ͓ʹ͋͘ʯͰݕࡧ
  2. ࠓ೔ͷ࿩୊ ϩʔΧϧϋʔυ΢ΣΞ 8JOEPXT-JOVY.BD 1ZUIPO 1Z5PSDI  $6%" "*ਪ࿦ΞϓϦ Hugging Face

    Transformers Hugging Faceに公開されているモデル ࡉ͔ͳཧ۶͸ൈ͖ʹͯ͠ɺ 1ZUIPOͱ )VHHJOH'BDF5SBOTGPSNFSTΛ࢖ͬͯ "*ਪ࿦Λ࣮ߦͯ͠ΈΔ ຊ೔ͷηογϣϯͰମݧ͢Δ಺༰ จষཁ໿ ݻ༗දݱநग़ /&3/BNFE&OUJUZ3FDPHOJUJPO จষੜ੒ ෼ྨɺจষ಺ͷײ৘෼ੳ Ի੠ϑΝΠϧ͔Βจࣈى͜͠ ը૾಺ͷ෺ମݕग़ ࣭ٙԠ౴ ຋༁ ϓϩάϥϜͷιʔείʔυੜ੒
  3. ෮श"*ֶशͱ"*ਪ࿦Λ෼͚ͯߟ͑Α͏ "*ֶश Ϟσϧͷ࡞੒ɺվྑ "*ਪ࿦ ϞσϧΛར༻ͨ͠ܭࢉ ֶश σʔλ ਂ૚ֶश ʢܭࢉʣ ཧ࿦తͳԾઆɺݚڀɺ

    ࣮ূ͔Βͷཪ෇͚ େن໛ͳܭࢉࢿݯ )1$)JHI1FSGPSNBODF $PNQVUJOH "*Ϟσϧ "*ਪ࿦ʹదͨ͠ ίϯϐϡʔλʔ ඞཁͳ౤ࢿ "*ਪ࿦ʹదͨ͠ 04ɺϥϯλΠϜ "*ϞσϧΛ࢖ͬͨܭࢉ ϞσϧʹΑΔ࣮ݧɺݕূ "*ίϛϡχςΟͷϥΠϒϥϦΛར༻ͯ͠ ୯ମͰ΋࢝ΊΒΕΔ සൟʹߋ৽͕ൃੜ͢ΔલఏͰͷ ։ൃɾӡ༻͕ཧ૝త ϑΟʔυόοΫ ڊେͳσʔλϨΠΫϋ΢εɺ σʔλαΠΤϯςΟετɺ )1$؀ڥ΁ͷ౤ࢿ͕ඞཁ IUUQTIVHHJOHGBDFDP
  4. ෮श"*ਪ࿦ͷ࣮ߦ؀ڥ IUUQTQZQJPSH IUUQTIVHHJOHGBDFDP ίϯϐϡʔλʔ 8JOEPXT-JOVY04 1ZUIPO 1Z5PSDI ͳͲ $16 (16ϝϞϦ

    ϝϞϦ (16 ϝϞϦ (16ϝϞϦ /7.F 44% (16υϥΠό (1(16ԋࢉϥΠϒϥϦ "*ϞσϧΛར༻ͨ͠ΞϓϦ ϋʔυ΢ΣΞந৅ԽϨΠϠʔ )BSEXBSF"CTUSBDUJPO-BZFS $IJQ ηοτ /FUXPSL *OUFSGBDF طଘͷ"*Ϟσϧ ࢲ͕࣮ͨͪ૷͢Δਪ࿦༻ͷίʔυ "*ϞσϧΛ࢖ͬͨܭࢉ طଘͷϥΠϒϥϦ IUUQTXXXOWJEJBDPNKBKQ ిݯϢχοτ
  5. 🤗 5SBOTGPSNFST "*ϞσϧͷΧελϚΠζ Ϟσϧ΁ͷௐ੔ɺ௥Ճֶश "*Ϟσϧ΍ σʔληοτ "*ਪ࿦ػցֶशʹదͨ͠ ίϯϐϡʔλʔ "*ਪ࿦ػցֶशʹదͨ͠ 04ɺϥϯλΠϜ

    "*ϞσϧΛ࢖ͬͨܭࢉ IUUQTIVHHJOHGBDFDP "*ਪ࿦ ࣄલֶशࡁΈͷϞσϧΛ ར༻ͨ͠ܭࢉ "*ਪ࿦ʹదͨ͠ ίϯϐϡʔλʔ "*ਪ࿦ʹదͨ͠ 04ɺϥϯλΠϜ "*ϞσϧΛ࢖ͬͨܭࢉ ௥Ճͷσʔλ ֶश༻ͱݕূ༻ʣ from transformers import pipeline detector = pipeline(task="object-detection") preds = detector("画像のURL”) trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, compute_metrics=compute_metrics, ) trainer.Train() ˞ࠓޙͷԠ༻ฤͰ ঺հ͢Δ༧ఆ
  6. ࢀߟ5SBOTGPSNFSͷ࢓૊Έ ೥݄ "UUFOUJPOJT"MM:PV/FFE ͷ࿦จൃද ຋༁λεΫʹؔ͢Δ5SBOTGPSNFSͷϞσϧ͕ϕʔε https://huggingface.co/learn/nlp-course/ja/chapter1/4 &ODPEFS ಛ௃ྔͷ ੜ੒ Decoder

    ⽬的の 系列を ⽣成 ೖྗʢӳޠʣ ༧ଌ͞Εͨग़ྗʢ೔ຊޠʣ ͜Ε·Ͱʹ༧ଌ͞Εͨग़ྗʢ೔ຊޠʣ ͬ͘͟Γͨ͠ϙΠϯτɿ ͜Ε·Ͱར༻͞Ε͖ͯͨϑΟʔυόοΫϧʔϓΛ ༗͢Δʮ࠶ؼܕʯχϡʔϥϧωοτϫʔΫ 3// ΑΓ΋ ฒྻԽ͠΍͍͢ ΑΓେن໛ͳֶशϞσϧͷ։ൃ΁ ͜ͷ࿦จ͕ൃද͞Ε͔ͯΒଟ͘ͷݴޠϞσϧ͕ొ৔ɺ େ͖̏ͭ͘ʹ෼ྨʢ(15ܕɺ#&35ܕɺ#"355ܕʣ ฒྻԽ͠΍͍͢൓໘ɺϝϞϦফඅྔ͸େ͖͘ͳͬͨ ֓೦ਤ
  7. import torch import time from transformers import AutoTokenizer, AutoModelForCausalLM prompt_base

    = "ユーザー: {}<NL>システム: " start = time.perf_counter() tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-neox-3.6b-instruction-sft", use_fast=False) end = time.perf_counter() print("Tokenizer loaded:"+str(end-start)) start = time.perf_counter() model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt-neox-3.6b-instruction-sft") #GPUメモリが12-16GBの場合、float16でなんとかメモリ内に収める #model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt-neox-3.6b-instruction-sft", torch_dtype=torch.float16) end = time.perf_counter() print("CausalLM loaded:"+str(end-start)) if torch.cuda.is_available(): model = model.to("cuda") print ("cuda is available") def inferencing(prompt): start = time.perf_counter() token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") with torch.no_grad(): output_ids = model.generate( token_ids.to(model.device), do_sample=True, max_new_tokens=256, temperature=0.9, top_k=50, repetition_penalty=1.0, pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id ) output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1):]) output = output.replace("<NL>", "¥n") end = time.perf_counter() print(”Inferencing completed:"+str(end-start)) return output #続き def do_conversation(): text = input("Neox-3.6b>") if text == "end": return False prompt = prompt_base.format(text) result = inferencing(prompt) print(result) return True while True: res = do_conversation() if res == False: break "*ΞϓϦ%PKPୈճͷৼΓฦΓ ར༻ϞσϧSJOOBKBQBOFTFHQUOFPYCJOTUSVDUJPOTGU IUUQTIVHHJOHGBDFDPSJOOBKBQBOFTFHQUOFPYCJOTUSVDUJPOTGU
  8. &ODPEF .PEFM %FDPEF "*ͱͷ΍ΓऔΓ͸ʮਓ͕ཧղͰ͖Δ৘ใʯ ػցֶशʹ͓͚Δܭࢉͷத਎͸ʮ਺஋ԋࢉʯ ਓ΍γεςϜ͕ ೖྗͨ͠ςΩετ &ODPEFS %FDPEFS ਪ࿦Ͱར༻͢Δ

    .PEFM ೖྗ ग़ྗ 𝑒! ⋮ 𝑒" 𝑟! ⋮ 𝑟" ੜ੒͞Εͨ ςΩετ ਺஋ϕΫτϧ ਺஋ϕΫτϧ IUUQTIVHHJOHGBDFDPEPDTUSBOTGPSNFSTNBJO@DMBTTFTUPLFOJ[FS
  9. ࢀߟจࣈྻͷτʔΫϯԽ UQZ ͱͯ͠อଘʢܾͯ͠ɺUPLFOQZʹ͠ͳ͍Α͏ʹɺ࣍ͷϖʔδͰղઆ GSPNUSBOTGPSNFSTJNQPSU"VUP5PLFOJ[FS UPLFOJ[FS"VUP5PLFOJ[FSGSPN@QSFUSBJOFE CFSUCBTFVODBTFE FODPEJOHUPLFOJ[FS 8FBSFWFSZIBQQZUPTIPXZPVUIF🤗5SBOTGPSNFSTMJCSBSZ QSJOU FODPEJOH

    ࣮ߦ݁Ռ \JOQVU@JET<              > UPLFO@UZQF@JET<             > BUUFOUJPO@NBTL<              >^
  10. ࣦഊͷڞ༗UPLFOQZ ͸࡞Βͳ͍͜ͱ 'JMF6TFSTPOJBLQZUPLFOQZ MJOF JONPEVMF GSPNUSBOTGPSNFSTJNQPSU"VUP5PLFOJ[FS *NQPSU&SSPSDBOOPUJNQPSUOBNF"VUP5PLFOJ[FSGSPNQBSUJBMMZJOJUJBMJ[FENPEVMF USBOTGPSNFST NPTUMJLFMZEVFUPBDJSDVMBSJNQPSU 

    -JCSBSZ'SBNFXPSLT1ZUIPOGSBNFXPSL7FSTJPOTMJCQZUIPOTJUF QBDLBHFTUSBOTGPSNFST@@JOJU@@QZ 'JMF%=MFBSO=USBOTGPSNFST=TBNQMFT=UPLFOQZ MJOF JONPEVMF GSPNUSBOTGPSNFSTJNQPSU"VUP5PLFOJ[FS *NQPSU&SSPSDBOOPUJNQPSUOBNF"VUP5PLFOJ[FSGSPNQBSUJBMMZJOJUJBMJ[FENPEVMF USBOTGPSNFST NPTUMJLFMZEVFUPBDJSDVMBSJNQPSU  $=6TFST=POJBL="QQ%BUB=-PDBM=1SPHSBNT=1ZUIPO=1ZUIPO=MJC=TJUF QBDLBHFT=USBOTGPSNFST=@@JOJU@@QZ ͝஫ҙɿࣗ෼Ͱ࡞੒ͨ͠1ZUIPOίʔυΛ UPLFOQZ ͱ࣮ͯ͠ߦ༻ͷϑΥϧμʹอଘͯ͋͠Δͱɺ ଞͷQZUIPOϓϩάϥϜͷ࣮ߦ࣌ʹඞͣऔΓࠐ·Εͯɺ༧ظͤ͵Τϥʔ͕ੜ͡·͢ɻ ̀ 1ZUIPOఆ਺ͷ্ॻ͖ͱΈͳ͞ΕΔ දࣔ͞Ε͍ͯΔΤϥʔίʔυͰݕࡧͯ͠΋ɺԿ͕ؒҧ͍ͬͯΔͷ͔ʹ͍ͭͯɺؾ෇͖ʹ͘͘ɺ UPLFOQZͱ͍͏ϑΝΠϧ͕ଘࡏ͢ΔͱΤϥʔ͕ଓͨ͘ΊɺUSBOTGPSNFST΍1ZUIPO 1ZUPSDIͷ࠶ΠϯετʔϧͰ΋म෮Ͱ͖·ͤΜ https://docs.python.org/ja/3/library/token.html 1ZUIPO؀ڥ͕յΕΔʂʁ
  11. 5SBOTGPSNFSTɺయܕతͳϢʔεέʔε จষཁ໿ ݻ༗දݱநग़ /&3/BNFE&OUJUZ3FDPHOJUJPO จষੜ੒ ෼ྨɺจষ಺ͷײ৘෼ੳ Ի੠ϑΝΠϧ͔Βจࣈى͜͠ ը૾಺ͷ෺ମݕग़ ࣭ٙԠ౴ ຋༁

    ϓϩάϥϜͷιʔείʔυੜ੒ ͜ͷͭͷϢʔεέʔεʹ߹Θͤͯ ؆୯ͳίʔυΛूΊ·ͨ͠ɻ (16౥ࡌͷ8JOEPXT (16αϙʔτͳ͠NBD04 ͦΕͧΕͰಈ࡞֬ೝ͍ͯ͠·͢
  12. จষཁ໿ #textsum.py Ϟσϧͷϖʔδʹ͋ΔαϯϓϧίʔυΛҾ༻͍ͯ͠·͢ from transformers import pipeline seq2seq = pipeline("summarization",

    model="tsmatz/mt5_summarize_japanese") sample_text = "αοΧʔͷϫʔϧυΧοϓΧλʔϧେձɺੈքϥϯΩϯά24ҐͰάϧʔϓEʹଐ͢Δ೔ ຊ͸ɺ23೔ͷ1࣍Ϧʔάॳઓʹ͓͍ͯɺੈք11ҐͰաڈ4ճͷ༏উΛތΔυΠπͱରઓ͠·ͨ͠ɻࢼ߹͸લ ൒ɺυΠπͷҰํతͳϖʔεͰ͸͡·Γ·͕ͨ͠ɺޙ൒ɺ೔ຊͷ৿อ؂ಜ͸߈ܸతͳબखΛੵۃతʹಈһ͠ ͯྲྀΕΛม͑·ͨ͠ɻ݁ہɺ೔ຊ͸લ൒ʹ1఺ΛୣΘΕ·͕ͨ͠ɺ్தग़৔ͷಊ҆཯બखͱઙ໺୓ຏબख͕ ޙ൒ʹΰʔϧΛܾΊɺ2ର1Ͱٯసউͪ͠·ͨ͠ɻήʔϜͷྲྀΕΛ͔ͭΜͩ৿อࡃ഑͕ޭΛ૗͠·ͨ͠ɻ" result = seq2seq(sample_text) print(result) Downloading (…)lve/main/config.json: 100%|█████| 867/867 [00:00<00:00, 2.00MB/s] Downloading pytorch_model.bin: 100%|███████| 1.20G/1.20G [00:15<00:00, 79.1MB/s] Downloading (…)okenizer_config.json: 100%|█████| 399/399 [00:00<00:00, 3.74MB/s] Downloading spiece.model: 100%|████████████| 4.31M/4.31M [00:00<00:00, 26.9MB/s] Downloading tokenizer.json: 100%|██████████| 16.3M/16.3M [00:00<00:00, 17.0MB/s] Downloading (…)cial_tokens_map.json: 100%|████| 74.0/74.0 [00:00<00:00, 612kB/s] Your max_length is set to 128, but your input_length is only 126. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=63) [{'summary_text': 'αοΧʔͷϫʔϧυΧοϓΧλʔϧେձ͸23೔ɺ1࣍Ϧʔάॳઓ͕͋ΓɺੈքϥϯΩϯά24ҐͰάϧʔϓEʹଐ͢Δ೔ຊ͸ɺ υΠπͱରઓͨ͠ɻ'}] IUUQTIVHHJOHGBDFDPUTNBU[NU@TVNNBSJ[F@KBQBOFTF Λར༻ ϞσϧαΠζ (#
  13. ݻ༗දݱநग़ #ner.py Ϟσϧͷϖʔδʹ͋ΔαϯϓϧίʔυΛҰ෦վม͍ͯ͠·͢ from transformers import pipeline import pandas as

    pd model_name = "tsmatz/xlm-roberta-ner-japanese" classifier = pipeline("token-classification", model=model_name) result = classifier("ాத͸4݄ͷཅؾͷྑ͍೔ʹɺླΛ͚ͭͯ۽ຊݝͷѨોࢁʹొͬͨɻ۽ຊͷถম யʮനַ ͠ΖʯΛҿΜͩɻ") df = pd.DataFrame(result) print(df) entity score index word start end 0 PER 0.999310 1 ▁ 0 1 1 PER 0.999407 2 ా 0 1 2 PER 0.999074 3 த 1 2 3 LOC 0.998935 14 ۽ຊ 19 21 4 LOC 0.997582 15 ݝ 21 22 5 LOC 0.998968 17 Ѩ 23 24 6 LOC 0.998960 18 ો 24 25 7 LOC 0.998147 19 ࢁ 25 26 8 LOC 0.990043 24 ۽ຊ 31 33 9 PRD 0.997916 30 ന 38 39 10 PRD 0.998629 31 ַ 39 40 11 PRD 0.998314 32 ▁ 41 42 12 PRD 0.997710 33 ͠ 41 42 13 PRD 0.998055 34 Ζ 42 43 IUUQTIVHHJOHGBDFDPUTNBU[YMNSPCFSUBOFSKBQBOFTF (#͋ΔͷͰॳճ͸͕͔͔࣌ؒΓ·͢ʣ pip install pandas
  14. จষੜ੒ #tg.py from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-medium",

    use_fast=False, padding_side='left') tokenizer.do_lower_case = True # due to some bug of tokenizer config loading model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium") data ="͜Μʹͪ͸ɺ" input = tokenizer.encode(data, return_tensors="pt") output = model.generate(input, do_sample=True, max_length=300, num_return_sequences=1,pad_token_id=tokenizer.eos_token_id) print(tokenizer.batch_decode(output)) pip install protobuf https://huggingface.co/rinna/japanese-gpt2-medium 1JQFMJOFͰ͸ͳ͘ɺ"VUP$MBTTΛར༻
  15. ײ৘෼ੳ # sentiment.py from transformers import pipeline model_name = "Mizuiro-sakura/luke-japanese-large-sentiment-analysis-wrime"

    classifier = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name) data =[ "ࡢ೔஗͘·Ͱ࢓ࣄͯ͠େมͩͬͨɻ͔͠΋๙ΊΒΕͣ൵͔ͬͨ͠", "ࡢ೔஗͘·Ͱ࢓ࣄͯ͠େม͚ͩͬͨͲɺ๙ΊΒΕͨͷͰݩؾ͍ͬͺ͍", "ΧϥΦέͰେ੠Ͱ੠͕ރΕΔ·Ͱ೤এͨ͠", "ࠓே͔ΒେӍ͕߱ͬͯͲ͜ʹ΋֎ग़Ͱ͖ͳͯ͘࠷ѱͩͬͨ"] ret = classifier(data) emotions=['͏Ε͍͠', '൵͍͠', 'ظ଴', 'ౖΓ', 'ڪΕ', 'ݏѱ', '৴པ'] for i, r in enumerate(ret): print(f"'{data[i]}' ͸ {float(r['score']) * 100:.2f}%ͷείΞͰ `{emotions[int(r['label'][-1])]}` ͱ ൑ఆ͞Ε·ͨ͠") 'ࡢ೔஗͘·Ͱ࢓ࣄͯ͠େมͩͬͨɻ͔͠΋๙ΊΒΕͣ൵͔ͬͨ͠' ͸ 97.88%ͷείΞͰ `൵͍͠` ͱ൑ఆ͞Ε·ͨ͠ 'ࡢ೔஗͘·Ͱ࢓ࣄͯ͠େม͚ͩͬͨͲɺ๙ΊΒΕͨͷͰݩؾ͍ͬͺ͍' ͸ 98.29%ͷείΞͰ `͏Ε͍͠` ͱ൑ఆ͞Ε·ͨ͠ 'ΧϥΦέͰେ੠Ͱ੠͕ރΕΔ·Ͱ೤এͨ͠' ͸ 98.48%ͷείΞͰ `͏Ε͍͠` ͱ൑ఆ͞Ε·ͨ͠ 'ࠓே͔ΒେӍ͕߱ͬͯͲ͜ʹ΋֎ग़Ͱ͖ͳͯ͘࠷ѱͩͬͨ' ͸ 97.69%ͷείΞͰ `൵͍͠` ͱ൑ఆ͞Ε·ͨ͠ IUUQTIVHHJOHGBDFDP.J[VJSPTBLVSBMVLFKBQBOFTFMBSHFTFOUJNFOUBOBMZTJTXSJNF Λར༻
  16. Ի੠͔Βจࣈى͜͠ #trans.py : TransformersͷυΩϡϝϯτ಺ͷαϯϓϧίʔυΛҾ༻͍ͯ͠·͢ from transformers import pipeline generator =

    pipeline(model="openai/whisper-large") text = generator( [ "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac", "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac", ] ) print(text) [{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}, {'text': ' He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered flour-fattened sauce.'}] IUUQTIVHHJOHGBDFDPPQFOBJXIJTQFSMBSHFW (#͋ΔͷͰɺॳճ͸μ΢ϯϩʔυʹ͕͔͔࣌ؒΓ·͢ʣ oniak3@AkiranoiMac py % python3 stt.py Downloading (…)lve/main/config.json: 100%|█| 1.96k/1.96k [00:00<00:00, 5.31MB/s] Downloading pytorch_model.bin: 100%|███████| 6.17G/6.17G [05:06<00:00, 20.2MB/s] Downloading (…)neration_config.json: 100%|█| 3.51k/3.51k [00:00<00:00, 25.6MB/s] Downloading (…)okenizer_config.json: 100%|█████| 842/842 [00:00<00:00, 8.18MB/s] Downloading (…)olve/main/vocab.json: 100%|█| 1.04M/1.04M [00:00<00:00, 1.64MB/s] Downloading (…)/main/tokenizer.json: 100%|█| 2.20M/2.20M [00:00<00:00, 11.7MB/s] Downloading (…)olve/main/merges.txt: 100%|███| 494k/494k [00:00<00:00, 34.7MB/s] Downloading (…)main/normalizer.json: 100%|██| 52.7k/52.7k [00:00<00:00, 353kB/s] Downloading (…)in/added_tokens.json: 100%|█| 2.08k/2.08k [00:00<00:00, 18.7MB/s] Downloading (…)cial_tokens_map.json: 100%|█| 2.08k/2.08k [00:00<00:00, 10.2MB/s] Downloading (…)rocessor_config.json: 100%|███| 185k/185k [00:00<00:00, 59.9MB/s] GGNQFH͕ඞཁ IUUQTGGNQFHPSHEPXOMPBEIUNM Ԡ༻໰୊ ೔ຊޠͷձ࿩ؚ͕·Ε͍ͯΔԻ੠ϑΝΠϧΛ ࢦఆ͢Δͱɺ݁Ռ͸Ͳ͏ͳΔͰ͠ΐ͏͔ʁ ͥͻɺ͓ࢼ͍ͩ͘͠͞ɻ
  17. ը૾಺ͷ෺ମݕग़ #obd.py : Transformersͷจॻ಺ͷίʔυΛҾ༻͍ͯ͠·͢ import requests from PIL import Image

    from transformers import pipeline # Download an image with cute cats url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" image_data = requests.get(url, stream=True).raw image = Image.open(image_data) # Allocate a pipeline for object detection object_detector = pipeline('object-detection') result = object_detector(image) print (result) [{'score': 0.9982201457023621, 'label': 'remote', 'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}}, {'score': 0.9960021376609802, 'label': 'remote', 'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}}, {'score': 0.9954745173454285, 'label': 'couch', 'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}}, {'score': 0.9988006353378296, 'label': 'cat', 'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}}, {'score': 0.9986783862113953, 'label': 'cat', 'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}] https://huggingface.co/docs/transformers/tasks/object_detection pip install timm
  18. ࣭ٙԠ౴ #qanda.py : Ϟσϧͷϖʔδ಺ͷαϯϓϧίʔυΛҾ༻͍ͯ͠·͢ from transformers import pipeline model_name =

    "tsmatz/roberta_qa_japanese" qa_pipeline = pipeline("question-answering", model=model_name,tokenizer=model_name) result = qa_pipeline( question = "ܾউτʔφϝϯτͰ೔ຊʹউͬͨͷ͸Ͳ͜Ͱ͔ͨ͠ɻ", context = "೔ຊ͸༧બϦʔάͰڧ߽ͷυΠπͱεϖΠϯʹউܾͬͯউτʔφϝϯτʹਐΜ͕ͩɺΫϩΞνΞͱରઓͯ͠ഊΕͨɻ", align_to_words = False, ) print(result) oniak3@AkiranoiMac py % code qanda.py oniak3@AkiranoiMac py % python3 qanda.py Downloading (…)lve/main/config.json: 100%|█████| 731/731 [00:00<00:00, 1.71MB/s] Downloading pytorch_model.bin: 100%|█████████| 440M/440M [00:21<00:00, 20.7MB/s] Downloading (…)okenizer_config.json: 100%|█████| 540/540 [00:00<00:00, 2.86MB/s] Downloading spiece.model: 100%|██████████████| 806k/806k [00:00<00:00, 1.14MB/s] Downloading (…)/main/tokenizer.json: 100%|█| 2.41M/2.41M [00:00<00:00, 2.67MB/s] Downloading (…)cial_tokens_map.json: 100%|█████| 170/170 [00:00<00:00, 1.03MB/s] {'score': 0.4740956723690033, 'start': 38, 'end': 43, 'answer': 'ΫϩΞνΞ'} IUUQTIVHHJOHGBDFDPUTNBU[SPCFSUB@RB@KBQBOFTF நग़ܕͷ࣭໰Ԡ౴Ϟσϧʣ ࢦఆͨ͠DPOUFYU಺ʹؚ·ΕΔճ౴Λநग़͢ΔϞσϧ
  19. ຋༁ ӳޠ͔Β೔ຊޠ #e2j.py #pip install sacremoses ͕ඞཁ from transformers import

    pipeline model_name = "staka/fugumt-en-ja" translator = pipeline("translation", model=model_name) text = ["I have a pen. I have an Apple. How's your translation in Japanese?", "Watsonx is our upcoming enterprise-ready AI and data platform designed to multiply the impact of AI across your business. The platform comprises three powerful components: the watsonx.ai studio for new foundation models, generative AI and machine learning; the watsonx.data fit-for-purpose store for the flexibility of a data lake and the performance of a data warehouse; plus the watsonx.governance toolkit, to enable AI workflows that are built with responsibility, transparency and explainability."] ret = translator(text) print(ret) oniak3@AkiranoiMac py % python3 trans.py [{'translation_text': 'ࢲ͸ϖϯΛ͍࣋ͬͯ·͢ɻࢲ͸ΞοϓϧΛ͍࣋ͬͯ·͢ɻ͋ͳͨͷ຋༁͸೔ຊޠͰͲ͏Ͱ͔͢ɻ'}, {'translation_text': 'Watsonx͸ɺΤϯλʔϓϥΠζରԠͷAIͱσʔλϓϥοτϑΥʔϜͰɺϏδωεશମͷAIͷӨڹΛֻ͚߹Θͤ ΔΑ͏ʹઃܭ͞Ε͍ͯ·͢ɻϓϥοτϑΥʔϜ͸ɺ3ͭͷڧྗͳίϯϙʔωϯτͰߏ੒͞Ε͍ͯ·͢ɻ৽͍͠ج൫ϞσϧͷͨΊͷ watsonx.aiελδΦɺδΣωϨʔςΟϒAIͱػցֶशɺσʔλϨΠΫͷॊೈੑͱσʔλ΢ΣΞϋ΢εͷύϑΥʔϚϯεͷͨΊͷ watsonx.data fit-for-purposeετΞɺ੹೚ɺಁ໌ੑɺઆ໌ՄೳੑΛඋ͑ͨAIϫʔΫϑϩʔΛ࣮ݱ͢ΔͨΊͷwatsonx.governance πʔϧΩοτͰ͢ɻ'}] IUUQTIVHHJOHGBDFDPTUBLBGVHVNUFOKB IUUQTTUBLBKQXPSEQSFTT Q Λࢀরʣ QJQJOTUBMMTBDSFNPTFT
  20. ιʔείʔυੜ੒ #codeg.py: https://blog.salesforceairesearch.com/codegen/ Λࢀߟʹ͍ͯ͠·͢ from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer

    = AutoTokenizer.from_pretrained("Salesforce/codegen2-1B") model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen2-1B", trust_remote_code=True, revision="main") text = "Solve the two sum problem with python." input_ids = tokenizer(text, return_tensors="pt").input_ids generated_ids = model.generate(input_ids, max_length=192) print(tokenizer.decode(generated_ids[0], skip_special_tokens=False)[len(text):]) import math def two_sum(nums, target): """ :type nums: List[int] :type target: int :rtype: List[int] """ nums.sort() for i in range(len(nums)): if nums[i] == target: return [i, len(nums)] return [] if __name__ == '__main__': print(two_sum([2, 7, 11, 15], 9)) <|endoftext|><|python|># IUUQTIVHHJOHGBDFDP4BMFTGPSDFDPEFHFO# (#͋ΔͷͰؾ௕ʹμ΢ϯϩʔυ͠·͠ΐ͏࿦จ͸ͪ͜Βʣ ࢼ༻ͷͨΊͰ͋Ε͹$16Ͱ΋ਪ࿦Ͱ͖·͕͢ɺ࣮༻తͰ͸͋Γ·ͤΜ ਪ࿦࣌ؒ γεςϜߏ੒ ඵ 8JOEPXT1SP)  ".%3Z[FO9 /7*%*"(F'PSDF359 ඵ NBD047FOUVSBʢ'ʣ J.BD*OUFM$PSFJ()[ίΞ$16
  21. ϫʔΫγϣοϓɺηογϣϯɺ͓Αͼࢿྉ͸ɺ*#.·ͨ͸ηογϣϯൃදऀʹΑͬͯ४උ͞ΕɺͦΕͧΕಠࣗͷݟղΛ൓өͨ͠΋ͷͰ͢ɻͦΕΒ͸৘ใఏڙͷ໨తͷΈ Ͱఏڙ͞Ε͓ͯΓɺ͍͔ͳΔࢀՃऀʹରͯ͠΋๏཯త·ͨ͸ͦͷଞͷࢦಋ΍ॿݴΛҙਤͨ͠΋ͷͰ͸ͳ͘ɺ·ͨͦͷΑ͏ͳ݁ՌΛੜΉ΋ͷͰ΋͋Γ·ͤΜɻຊߨԋࢿྉ ʹؚ·Ε͍ͯΔ৘ใʹ͍ͭͯ͸ɺ׬શੑͱਖ਼֬ੑΛظ͢ΔΑ͏౒ྗ͠·͕ͨ͠ɺʮݱঢ়ͷ··ʯఏڙ͞Εɺ໌ࣔ·ͨ͸҉ࣔʹ͔͔ΘΒ͍͔ͣͳΔอূ΋൐Θͳ͍΋ͷͱ ͠·͢ɻຊߨԋࢿྉ·ͨ͸ͦͷଞͷࢿྉͷ࢖༻ʹΑͬͯɺ͋Δ͍͸ͦͷଞͷؔ࿈ʹΑͬͯɺ͍͔ͳΔଛ֐͕ੜͨ͡৔߹΋ɺ*#.͸੹೚ΛෛΘͳ͍΋ͷͱ͠·͢ɻຊߨԋ ࢿྉʹؚ·Ε͍ͯΔ಺༰͸ɺ*#.·ͨ͸ͦͷαϓϥΠϠʔ΍ϥΠηϯεަ෇ऀ͔Β͍͔ͳΔอূ·ͨ͸ද໌ΛҾ͖ͩ͢͜ͱΛҙਤͨ͠΋ͷͰ΋ɺ*#.ιϑτ΢ΣΞͷ࢖༻ Λنఆ͢Δద༻ϥΠηϯεܖ໿ͷ৚߲Λมߋ͢Δ͜ͱΛҙਤͨ͠΋ͷͰ΋ͳ͘ɺ·ͨͦͷΑ͏ͳ݁ՌΛੜΉ΋ͷͰ΋͋Γ·ͤΜɻ ຊߨԋࢿྉͰ*#.੡඼ɺϓϩάϥϜɺ·ͨ͸αʔϏεʹݴٴ͍ͯͯ͠΋ɺ*#.͕Ӧۀ׆ಈΛߦ͍ͬͯΔ͢΂ͯͷࠃͰͦΕΒ͕࢖༻ՄೳͰ͋Δ͜ͱΛ҉ࣔ͢Δ΋ͷͰ͸͋Γ ·ͤΜɻຊߨԋࢿྉͰݴٴ͍ͯ͠Δ੡඼ϦϦʔε೔෇΍੡඼ػೳ͸ɺࢢ৔ػձ·ͨ͸ͦͷଞͷཁҼʹج͍ͮͯ*#.ಠࣗͷܾఆݖΛ΋͍ͬͯͭͰ΋มߋͰ͖Δ΋ͷͱ͠ɺ ͍͔ͳΔํ๏ʹ͓͍ͯ΋কདྷͷ੡඼·ͨ͸ػೳ͕࢖༻ՄೳʹͳΔͱ֬໿͢Δ͜ͱΛҙਤͨ͠΋ͷͰ͸͋Γ·ͤΜɻຊߨԋࢿྉʹؚ·Ε͍ͯΔ಺༰͸ɺࢀՃऀ͕։࢝͢Δ ׆ಈʹΑͬͯಛఆͷൢചɺച্ߴͷ޲্ɺ·ͨ͸ͦͷଞͷ݁Ռ͕ੜ͡Δͱड़΂Δɺ·ͨ͸҉ࣔ͢Δ͜ͱΛҙਤͨ͠΋ͷͰ΋ɺ·ͨͦͷΑ͏ͳ݁ՌΛੜΉ΋ͷͰ΋͋Γ·

    ͤΜɻύϑΥʔϚϯε͸ɺ؅ཧ͞Εͨ؀ڥʹ͓͍ͯඪ४తͳ*#.ϕϯνϚʔΫΛ࢖༻ͨ͠ଌఆͱ༧ଌʹج͍͍ͮͯ·͢ɻϢʔβʔ͕ܦݧ͢Δ࣮ࡍͷεϧʔϓοτ΍ύ ϑΥʔϚϯε͸ɺϢʔβʔͷδϣϒɾετϦʔϜʹ͓͚ΔϚϧνϓϩάϥϛϯάͷྔɺೖग़ྗߏ੒ɺετϨʔδߏ੒ɺ͓Αͼॲཧ͞ΕΔϫʔΫϩʔυͳͲͷߟྀࣄ߲Λ ؚΉɺ਺ଟ͘ͷཁҼʹԠͯ͡มԽ͠·͢ɻ͕ͨͬͯ͠ɺݸʑͷϢʔβʔ͕͜͜Ͱड़΂ΒΕ͍ͯΔ΋ͷͱಉ༷ͷ݁ՌΛಘΒΕΔͱ֬໿͢Δ΋ͷͰ͸͋Γ·ͤΜɻ هड़͞Ε͍ͯΔ͢΂ͯͷ͓٬༷ࣄྫ͸ɺͦΕΒͷ͓٬༷͕ͲͷΑ͏ʹ*#.੡඼Λ࢖༻͔ͨ͠ɺ·ͨͦΕΒͷ͓٬༷͕ୡ੒ͨ݁͠Ռͷ࣮ྫͱͯࣔ͠͞Εͨ΋ͷͰ͢ɻ࣮ࡍ ͷ؀ڥίετ͓ΑͼύϑΥʔϚϯεಛੑ͸ɺ͓٬༷͝ͱʹҟͳΔ৔߹͕͋Γ·͢ɻ *#.ɺ*#.ϩΰɺJCNDPNɺ*#.$MPVEɺ*#.$MPVE1BLT͸ɺੈքͷଟ͘ͷࠃͰొ࿥͞Εͨ*OUFSOBUJPOBM#VTJOFTT.BDIJOFT$PSQPSBUJPOͷ঎ඪͰ͢ɻଞͷ੡඼໊͓ ΑͼαʔϏε໊౳͸ɺͦΕͧΕ*#.·ͨ͸֤ࣾͷ঎ඪͰ͋Δ৔߹͕͋Γ·͢ɻݱ࣌఺Ͱͷ*#.ͷ঎ඪϦετʹ͍ͭͯ͸ɺXXXJCNDPNMFHBMDPQZUSBEFTIUNMΛ͝ཡ ͍ͩ͘͞ɻ .JDSPTPGU 8JOEPXT 8JOEPXT4FSWFS /&5'SBNFXPSL /&5 /&5$PSF͸ɺ.JDSPTPGU$PSQPSBUJPOͷ঎ඪ·ͨ͸ొ࿥঎ඪͰ͢ɻ /7*%*" /7*%*"ϩΰ /7*%*"$6%"͸ /7*%*"$PSQPSBUJPOͷ঎ඪ·ͨ͸ొ࿥঎ඪͰ͢ɻ )VHHJOH'BDF͸ɺ )VHHJOH'BDF *OD ͷ঎ඪͰ͢ɻʢొ࿥঎ඪͱͯ͠ग़ئதʣ ࢿྉ಺Ͱར༻͍ͯ͠Δ)VHHJOH'BDFʹొ࿥͞Ε͍ͯΔϞσϧ͸ɺ֤Ϟσϧ͕ࢦఆͨ͠ϥΠηϯεͰӡ༻Ͱ͖·͢ɻ ࢿྉ಺ʹ͍ࣔͯ͠Δ"*ਪ࿦Λ࣮ߦ͢ΔͨΊͷίʔυ͸ɺαϯϓϧͰ͋Γ׬શͳίʔυͰ͸͋Γ·ͤΜɻ*5ΤϯδχΞͷମݧػձΛ૿΍ֶ͢श໨తͰ४උͨ͠΋ͷͰ͢ɻ "*ϞσϧΛ࣮ࡍͷγεςϜʹ૊ΈࠐΉ৔߹͸ɺϞσϧͷϥΠηϯεࣄ߲Λ֬ೝ͠ɺγεςϜཁ݅ʹԠͨ͡"*ਪ࿦࣮ߦ؀ڥΛ४උ͠ɺඞཁͳྫ֎ॲཧΛ௥Ճ͢ΔͳͲ࣮ӡ ༻ʹ࢖͑ΔίʔυΛ࡞੒͠ɺे෼ͳσόοάɺςετΛߦ͍ͬͯͩ͘͞ɻ )VHHJOH'BDF5SBOTGPSNFSͷٕज़తͳ໰୊ղܾɺϑΟʔυόοΫ͸ɺIUUQTHJUIVCDPNIVHHJOHGBDFUSBOTGPSNFST ΑΓɺ (JU)VC*TTVF 1VMM3FRVFTUΛ௨ͯ͡ɺΦʔϓϯιʔείϛϡχςΟͱڞʹਐΊ͍ͯͩ͘͞ɻ