Upgrade to Pro — share decks privately, control downloads, hide ads and more …

機械翻訳をローカルマシンで( ArgosTranslate/LibreTranslate )

機械翻訳をローカルマシンで( ArgosTranslate/LibreTranslate )

* 鹿児島Linux勉強会 2022.12(オンライン開催) 2022/12/18(日) 14:00 〜 17:00
https://kagolug.connpass.com/event/268089/
* source
https://gitlab.com/matoken/kagolug-2022.12/-/blob/master/slide/slide.adoc

Kenichiro MATOHARA

December 18, 2022
Tweet

More Decks by Kenichiro MATOHARA

Other Decks in Technology

Transcript

  1. 今回の環境 CPU Intel® Core™ i5-7300U CPU @ 2.60GHz RAM 8GB

    OS Debian bookworm amd64(testing) Python Python 3.10.9 4
  2. Argos Translate Python製MITライセンス Pythonライブラリ/CLI/GUI/API/Webがある サポート言語 Arabic, Azerbaijani, Catalan, Chinese, Czech,

    Danish, Dutch, English, Esperanto, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Persian, Polish, Portuguese, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian () Argos Open Tech 5
  3. 環境構築 $ mkdir ./argostranslate $ cd ./argostranslate/ $ python3 -m

    venv venv $ source venv/bin/activate $ pip install argostranslate $ du -Hs . 3449892 . 6
  4. modelデータ入手 $ argospm --help usage: argospm [-h] {update,search,install,list,remove} ... positional

    arguments: {update,search,install,list,remove} Available commands. update Downloads remote package index. search Search package from remote index. install Install package. list List installed packages. remove Remove installed package. options: -h, --help show this help message and exit $ argospm list | wc -l 53 $ argospm list | grep ja translate-en_ja translate-ja_en $ argospm list | grep ja | xargs -n1 argospm install 7
  5. usage $ argos-translate --help usage: argos-translate [-h] [--from-lang FROM_LANG] [--to-lang

    TO_LANG] [TEXT] Open-source offline translation. positional arguments: TEXT The text to translate. Read from standard input if missing. options: -h, --help show this help message and exit --from-lang FROM_LANG, -f FROM_LANG The code for the language to translate from (ISO 639-1) --to-lang TO_LANG, -t TO_LANG The code for the language to translate to (ISO 639-1) $ time argos-translate --from-lang en --to-lang ja 'hello world.' こんにちは世界。 real 0m2.124s user 0m1.731s sys 0m0.484s $ echo 'cat is awesome.' | argos-translate --from-lang en --to-lang ja 猫は素晴らしいです。 8
  6. install $ git clone https://github.com/LibreTranslate/LibreTranslate $ cd LibreTranslate $ python3

    -m venv venv $ source venv/bin/activate $ pip install -e . $ du -Hs . 3518928 . $ libretranslate Updating language models Found 58 models Downloading Arabic → English (1.0) ... Downloading Azerbaijani → English (1.5) ... Downloading Catalan → English (1.7) ... : ^c $ du -Hs ~/.local/share/argos-translate/ 6137956 /home/matoken/.local/share/argos-translate/ $ libretranslate --frontend-language-source en --frontend-language-target ja Running on http://127.0.0.1:5000 11
  7. 12

  8. LibreTranslateをShellで ArgosTranslateのCLIでもいいような? GitHub - argosopentech/LibreTranslate-sh: Unix bindings for LibreTranslate $

    export LIBRETRANSLATE_URL="http://127.0.0.1:5000/" $ ./libretranslate translate en ja "hello cat." {"translatedText":"\u3053\u3093\u306b\u3061\u306f\u732b\u3002"} $ ./libretranslate translate en ja "hello cat." | jq . { "translatedText": "こんにちは猫。" } 14
  9. 翻訳比較 例 1. When Elon Musk bought up Twitter Inc.

    as his personal toy to play with, a lot of people started to worry. By acquiring Twitter, the richest man on Earth took unchecked control of arguably one of the most powerful online platforms for political speech and public debate. Why everyone is on Mastodon? - European Digital Rights (EDRi) 15
  10. Google Translate Elon Musk が Twitter Inc. を自分のおもちゃとして買収したとき、 多くの人が心配し始めました。地球上で最も裕福な人物は、Twitter を

    買収することで、政治的発言と公開討論のための最も強力なオンライン プラットフォームの 1 つを無防備に支配しました。 17
  11. Joke RFC 例 2. https://www.ietf.org/rfc/rfc1149.txt This memo describes an experimental

    method for the encapsulation of IP datagrams in avian carriers. This specification is primarily useful in Metropolitan Area Networks. This is an experimental, not recommended standard. Distribution of this memo is unlimited. 20