Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OpenAIのWhisper でオフライン文字 起こし(STT)

OpenAIのWhisper でオフライン文字 起こし(STT)

鹿児島Linux勉強会 2022.11(オンライン開催)
https://kagolug.connpass.com/event/267761/
source
https://gitlab.com/matoken/kagolug-2022.11

Kenichiro MATOHARA

November 27, 2022
Tweet

More Decks by Kenichiro MATOHARA

Other Decks in Technology

Transcript

  1. Speach To Text(STT) 音声認識してテキストデータに SaaSではいくつか 要回線 (鹿児島Linux勉強会 2022.04) がローカルかつ無料で使えるSTT の

    Whisper とモデルデータを無料で公開! Speach To Text To Translation(Azure版) OpenAI(人工知能研究所) 3
  2. OracleCloud Free Tier Ampere A1 Compute VM で試す CPU Ampere

    A1 Compute(aarch64) x 4 RAM 24GB OS Ubuntu 20.04.5 LTS 5
  3. 環境構築 1 Python 3.9.9以上が必要なのでppaから3.10を導入 2 pipでWhisper導入 3 ffmpegも必要 $ sudo

    add-apt-repository ppa:deadsnakes/ppa $ sudo apt install python3.10-minimal python3.10-venv $ python3.10 -m venv venv $ source venv/bin/activate $ pip install git+https://github.com/openai/whisper.git $ sudo apt install ffmpeg 1 2 3 6
  4. usage $ whisper usage: whisper [-h] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large}] [--language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,f

    [--temperature TEMPERATURE] [--best_of BEST_OF] [--beam_size BEAM_SIZE] [--patie [--condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT] [--fp16 FP16] [--tempe [--logprob_threshold LOGPROB_THRESHOLD] [--no_speech_threshold NO_SPEECH_THRESHO audio [audio ...] whisper: error: the following arguments are required: audio 7
  5. 日本語音声の文字起こしを試す YouTubeからサンプルデータの入手 $ yt-dlp -F https://www.youtube.com/watch?v=GiglWCcVi5o | grep -i audio

    599 m4a audio only 2 | 224.57KiB 31k https | audio only mp4a.40.5 31k 22k ultr 600 webm audio only 2 | 238.16KiB 33k https | audio only opus 33k 48k ultr 139 m4a audio only 2 | 354.97KiB 49k https | audio only mp4a.40.5 49k 22k low, 249 webm audio only 2 | 342.58KiB 47k https | audio only opus 47k 48k low, 250 webm audio only 2 | 524.67KiB 72k https | audio only opus 72k 48k low, 140 m4a audio only 2 | 939.76KiB 130k https | audio only mp4a.40.2 130k 44k medi 251 webm audio only 2 | 1.02MiB 144k https | audio only opus 144k 48k medi $ yt-dlp -f 140 https://www.youtube.com/watch?v=GiglWCcVi5o $ ffprobe -i ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].m4a 2>&1 | grep In : Duration: 00:00:59.42, start: 0.000000, bitrate: 129 kb/s Stream #0:0[0x1](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s : $ yt-dlp --no-download --write-auto-subs --sub-langs ja https://www.youtube.com/watch?v=GiglWCc 10
  6. tiny $ time whisper ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm --language /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarnin warnings.warn("FP16

    is not supported on CPU; using FP32 instead") [00:00.000 --> 00:04.000] 予報センターから自信の情報を伝えしています。 [00:04.000 --> 00:10.000] レージ25%九州地方で最大シーンと500の自信が発生しました。 [00:10.000 --> 00:14.000] 新源中はオースミハントと方向を起き、 [00:14.000 --> 00:18.000] 自信の希望示すまぐに中度は5.8 [00:18.000 --> 00:22.000] 新源の重はおよそ30kmとなっています。 [00:22.000 --> 00:26.000] この自信によるつなびの心配はありません。 [00:26.000 --> 00:29.000] この自信によるつなびの心配はありません。 [00:29.000 --> 00:33.000] 自信の御着を観測したのは、 [00:33.000 --> 00:36.000] 西断し、新度御着を観測したのは、 [00:36.000 --> 00:39.000] 西断し、新度4を観測したのは、 [00:39.000 --> 00:42.000] 高なベッチョー、シントミッチョー、 [00:42.000 --> 00:44.000] 宮崎し、 [00:44.000 --> 00:45.000] 崎しまし、 [00:45.000 --> 00:49.000] 宮崎の上司、こばやししとなっています。 [00:49.000 --> 00:52.000] そのほか九州エリア広い入れ、 [00:52.000 --> 00:55.000] 新度3を観測しています。 [00:55.000 --> 00:58.000] この自信によるつなびの心配はありません。 [00:58.000 --> 01:00.000] ではありません。 real 6m0.515s user 11m10.055s sys 1m36.165s 11
  7. base $ time whisper ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm --language /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarnin warnings.warn("FP16

    is not supported on CPU; using FP32 instead") [00:00.000 --> 00:10.180] 余補センターから自身の情報をお伝えしています 冷時2分後ろ九州地方で最大進度5弱の自身 [00:10.180 --> 00:18.260] 新原地は大済み半島東方大き 自身の規模を示すマグに中度は5.8 [00:18.260 --> 00:25.800] 新原の重はおよそ30kmとなっています この自身による津波の心配はありません [00:25.800 --> 00:49.460] 新度5弱を観測したのは日南市 新度5弱を観測したのは日南市 新度4を観測したのは高なベ [00:49.460 --> 00:59.960] その他九州エリア広い配で新度3 新度にを観測しています この自身による津波の心配はあり real 8m36.310s user 16m39.758s sys 2m29.937s 12
  8. small $ time whisper ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm --language /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarnin warnings.warn("FP16

    is not supported on CPU; using FP32 instead") [00:00.000 --> 00:03.840] 予報センターから地震の情報をお伝えしています。 [00:03.840 --> 00:10.140] 0時2分ごろ九州地方で最大震度5弱の地震が発生しました。 [00:10.140 --> 00:18.300] 震源地は大隅半島東方置き、地震の希望を示すマグニチュードは5.8。 [00:18.300 --> 00:22.400] 震源の深さはおよそ30kmとなっています。 [00:22.400 --> 00:25.800] この地震による津波の心配はありません。 [00:25.800 --> 00:29.700] この地震による津波の心配はありません。 [00:29.700 --> 00:36.640] 震度5弱を観測したのは日南市、震度5弱を観測したのは日南市、 [00:36.640 --> 00:44.040] 震度4を観測したのは高鍋町、震都道町、宮崎市、 [00:44.040 --> 00:49.400] 駆島市、宮子の城市、小林市となっています。 [00:49.400 --> 00:55.800] その他、九州エリア広い灰で震度3、震度2を観測しています。 [00:55.800 --> 00:59.800] この地震による津波の心配はありません。 real 35m3.326s user 74m50.110s sys 8m49.869s 13
  9. medium $ time whisper ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm --language /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarnin warnings.warn("FP16

    is not supported on CPU; using FP32 instead") [00:00.000 --> 00:10.240] 予報センターから地震の情報をお伝えしています 0時2分ごろ九州地方で最大震度5弱の地震 [00:10.240 --> 00:18.320] 震源地は大隅半島東方起 地震の規模を示すマグニチュードは5.8 [00:18.320 --> 00:25.920] 震源の深さはおよそ30kmとなっています この地震による津波の心配はありません [00:25.920 --> 00:33.360] 震度5弱を観測したのは日南市 震度5弱を観測したのは日南市 [00:33.360 --> 00:45.840] 震度4を観測したのは高鍋町 新富町 宮崎市 福島市 都の城市 小林市となっています [00:45.840 --> 00:52.320] その他九州エリア広井配で震度3 震度2を観測しています [00:52.320 --> 01:00.880] この地震による津波の心配はありません real 84m22.932s user 191m7.199s sys 23m17.116s 14
  10. large $ time whisper ./大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm --language /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarnin warnings.warn("FP16

    is not supported on CPU; using FP32 instead") [00:00.000 --> 00:03.760] 予報センターから地震の情報をお伝えしています。 [00:03.760 --> 00:10.080] 0時2分ごろ九州地方で最大震度5弱の地震が発生しました。 [00:10.080 --> 00:13.680] 震源地は大隅半島東方沖。 [00:13.680 --> 00:18.240] 地震の規模を示すマグニ中度は5.8。 [00:18.240 --> 00:22.320] 震源の深さはおよそ30キロメートルとなっています。 [00:22.320 --> 00:25.760] この地震による津波の心配はありません。 [00:25.760 --> 00:29.640] この地震による津波の心配はありません。 [00:29.640 --> 00:33.240] 震度5弱を観測したのは日南市。 [00:33.240 --> 00:36.520] 震度5弱を観測したのは日南市。 [00:36.520 --> 00:43.920] 震度4を観測したのは高鍋町、新富町、宮崎市、 [00:43.920 --> 00:49.280] 久島市、宮古の城市、小林市となっています。 [00:49.280 --> 00:55.640] その他九州エリア広い範囲で震度3、震度2を観測しています。 [00:55.640 --> 00:59.760] この地震による津波の心配はありません。 real 210m5.865s user 435m17.317s sys 80m58.978s 15
  11. 英語音声の文字起こし $ wget https://www3.nhk.or.jp/nhkworld/upld/medias/en/radio/news/20221010183000_english_1.mp3 $ ffmpeg -i ./20221010183000_english_1.mp3 -map 0

    -c copy -f segment -segment_time 60 -reset_ti $ ls -s1 20221010183000_english_1* 476 20221010183000_english_1-00.mp3 472 20221010183000_english_1-01.mp3 476 20221010183000_english_1-02.mp3 476 20221010183000_english_1-03.mp3 456 20221010183000_english_1-04.mp3 2332 20221010183000_english_1.mp3 $ ffprobe ./20221010183000_english_1-00.mp3 2>&1 | grep Input -A10 Input #0, mp3, from './20221010183000_english_1-00.mp3': Metadata: encoder : Lavf59.27.100 Duration: 00:01:00.00, start: 0.025057, bitrate: 64 kb/s Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 64 kb/s Metadata: encoder : Lavc57.64 19
  12. tiny $ time whisper ./20221010183000_english_1-00.mp3 --language English --model tiny /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78:

    UserWarnin warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:00.000 --> 00:06.500] This is Asian View from N.H.K. Roll in Japan. [00:06.500 --> 00:08.000] I'm here with you. [00:08.000 --> 00:12.000] Japanese Foreign Minister Hayashi Yoshimasa met his Malaysian counte [00:12.000 --> 00:15.000] Sifuiting Abdullah and Kuala Lumpur on Sunday. [00:15.000 --> 00:19.000] Hayashi conveyed his strong opposition to any attempt to unilaterall [00:19.000 --> 00:24.000] change the status quo in the east and south China seas by force. [00:24.000 --> 00:30.000] Malaysia is a strategic partner that shares our basic values and str [00:30.000 --> 00:35.000] I think we were able to have a very meaningful discussion on future [00:35.000 --> 00:41.000] Hayashi also explained the importance of maintaining and strengtheni [00:41.000 --> 00:44.000] and responding to economic coercion. [00:44.000 --> 00:49.000] On Russia's invasion of Ukraine, Hayashi sent the act goes against i [00:49.000 --> 00:51.000] and should not be condoned. [00:51.000 --> 00:57.000] Hayashi and Sifuiting confirmed that they will continue to coordinat [00:57.000 --> 01:13.000] Thai Prime Minister Pryu Chan Ocha has it. real 2m58.637s user 6m42.815s sys 0m46.548s 20
  13. base $ time whisper ./20221010183000_english_1-00.mp3 --language English --model base /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78:

    UserWarnin warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:00.000 --> 00:06.800] This is Asian View from NHK Roll Japan. [00:06.800 --> 00:08.800] I'm Hiragokitadai. [00:08.800 --> 00:13.720] Japanese Foreign Minister Hayashi Yoshimasa met his Malaysian counte [00:13.720 --> 00:15.880] in Kuala Lumpur on Sunday. [00:15.880 --> 00:20.460] Hayashi conveyed his strong opposition to any attempt to unilaterall [00:20.460 --> 00:25.120] quo in the East and South China Seas by force. [00:25.120 --> 00:30.320] Malaysia is a strategic partner that shares our basic values and str [00:30.320 --> 00:35.400] I think we were able to have a very meaningful discussion on future [00:35.400 --> 00:40.400] Hayashi also explained the importance of maintaining and strengtheni [00:40.400 --> 00:43.760] order and responding to economic coercion. [00:43.760 --> 00:49.000] On Russia's invasion of Ukraine, Hayashi said the act goes against i [00:49.000 --> 00:51.000] should not be condoned. [00:51.000 --> 00:55.960] Hayashi and Saifudin confirmed that they will continue to coordinate [00:55.960 --> 00:57.840] the conflict. [00:57.840 --> 01:27.400] The Thai Prime Minister Priyut Chan Ocha has it. real 6m41.061s user 14m49.295s sys 1m46.802s 21
  14. small $ time whisper ./20221010183000_english_1-00.mp3 --language English --model small /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78:

    UserWarnin warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:00.000 --> 00:13.720] Japanese Foreign Minister Hayashi Yoshimasa met his Malaysian counte [00:13.720 --> 00:15.880] in Kuala Lumpur on Sunday. [00:15.880 --> 00:20.440] Hayashi conveyed his strong opposition to any attempt to unilaterall [00:20.440 --> 00:24.320] quo in the East and South China Seas by force. [00:24.320 --> 00:30.280] Malaysia is a strategic partner that shares our basic values and str [00:30.280 --> 00:35.400] I think we were able to have a very meaningful discussion on future [00:35.400 --> 00:40.400] Hayashi also explained the importance of maintaining and strengtheni [00:40.400 --> 00:43.800] order and responding to economic coercion. [00:43.800 --> 00:48.840] On Russia's invasion of Ukraine, Hayashi said the act goes against i [00:48.840 --> 00:51.040] and should not be condoned. [00:51.040 --> 00:55.840] Hayashi and Saifudin confirmed that they will continue to coordinate [00:55.840 --> 01:00.000] to the conflict. real 27m32.630s user 56m35.610s sys 8m19.331s 22
  15. medium $ time whisper ./20221010183000_english_1-00.mp3 --language English --model medium /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78:

    UserWarnin warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:00.000 --> 00:06.800] This is Asian View from NHK World Japan. [00:06.800 --> 00:08.600] I'm Hiroko Kitadai. [00:08.600 --> 00:13.720] Japanese Foreign Minister Hayashi Yoshimasa met his Malaysian counte [00:13.720 --> 00:15.800] in Kuala Lumpur on Sunday. [00:15.800 --> 00:20.440] Hayashi conveyed his strong opposition to any attempt to unilaterall [00:20.440 --> 00:25.560] quo in the East and South China Seas by force. [00:25.560 --> 00:30.360] Russia is a strategic partner that shares our basic values and strat [00:30.360 --> 00:35.400] I think we were able to have a very meaningful discussion on future [00:35.400 --> 00:40.440] Hayashi also explained the importance of maintaining and strengtheni [00:40.440 --> 00:43.840] order and responding to economic coercion. [00:43.840 --> 00:49.000] On Russia's invasion of Ukraine, Hayashi said the act goes against i [00:49.000 --> 00:51.000] should not be condoned. [00:51.000 --> 00:55.800] Hayashi and Saifuddin confirmed that they will continue to coordinat [00:55.800 --> 00:57.800] to the conflict. [00:57.800 --> 01:24.760] Thai Prime Minister Prayut Chan-o-cha has a real 79m3.566s user 178m34.357s sys 22m48.464s 23
  16. large $ time whisper ./20221010183000_english_1-00.mp3 --language English --model large /home/ubuntu/src/whisper/venv/lib/python3.10/site-packages/whisper/transcribe.py:78:

    UserWarnin warnings.warn("FP16 is not supported on CPU; using FP32 instead") [00:00.000 --> 00:08.000] This is Asian View from NHK World Japan. I'm Hiroko Kitadai. [00:08.000 --> 00:15.000] Japanese Foreign Minister Hayashi Yoshimasa met his Malaysian counte [00:15.000 --> 00:24.000] Hayashi conveyed his strong opposition to any attempt to unilaterall [00:24.000 --> 00:30.000] Malaysia is a strategic partner that shares our basic values and str [00:30.000 --> 00:35.000] I think we were able to have a very meaningful discussion on future [00:35.000 --> 00:43.000] Hayashi also explained the importance of maintaining and strengtheni [00:43.000 --> 00:50.000] On Russia's invasion of Ukraine, Hayashi said the act goes against i [00:50.000 --> 00:57.000] Hayashi and Saifuddin confirmed that they will continue to coordinat real 133m44.308s user 289m38.870s sys 49m56.902s 24
  17. ja en 時間比較 ja en tiny 6m 2m58 base 8m36

    6m41 small 35m3 27m32 medium 84m22 79m3 large 210m5 133m44 25
  18. 翻訳結果とsubtitle file $ ls -1 大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm.* '大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし [GiglWCcVi5o].webm.srt' '大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし

    [GiglWCcVi5o].webm.txt' '大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし [GiglWCcVi5o].webm.vtt' $ cat 大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし\ \[GiglWCcVi5o\].webm.txt 予報センターから地震の情報をお伝えしています。 0時2分ごろ九州地方で最大震度5弱の地震が発生しました。 震源地は大隅半島東方沖。 地震の規模を示すマグニ中度は5.8。 震源の深さはおよそ30キロメートルとなっています。 この地震による津波の心配はありません。 この地震による津波の心配はありません。 震度5弱を観測したのは日南市。 震度5弱を観測したのは日南市。 震度4を観測したのは高鍋町、新富町、宮崎市、 久島市、宮古の城市、小林市となっています。 その他九州エリア広い範囲で震度3、震度2を観測しています。 この地震による津波の心配はありません。 26
  19. Speach to Text to Translate  English only  英語以外に翻訳するときはtranslate-shell

    やみんなの自動翻訳 @TexTraの利用できるtransも便利※要回線 $ whisper './大隅半島東方沖で地震 宮崎県で震度5弱 津波の心配なし [GiglWCcVi5o].m4a' \ --language Japanese --model small --task translate 2>/dev/null [00:00.000 --> 00:03.640] We have information about the earthquake from both centers. [00:03.640 --> 00:09.960] The largest earthquake in the world occurred at around 2 a.m. [00:09.960 --> 00:13.640] The earthquake was in Osumi-Hanto, Toho-Oki. [00:13.640 --> 00:18.040] The magnitude of the earthquake was 5.8. [00:18.040 --> 00:22.280] The height of the earthquake is approximately 30 km. [00:22.280 --> 00:25.800] There is no worry about the tsunami caused by this earthquake. [00:25.800 --> 00:29.600] There is no worry about the tsunami caused by this earthquake. [00:29.600 --> 00:33.280] The earthquake that affected the 5-degrees in Shinto is Nichi-Nan-sh [00:33.280 --> 00:36.360] The earthquake that affected the 5-degrees in Shinto is Nichi-Nan-sh [00:36.360 --> 00:49.240] The earthquake that affected the 4-degrees in Shinto is Takanabe-cho [00:49.240 --> 00:54.560] In addition, the earthquake in the Kyushu area and the Shinto-san in [00:54.560 --> 00:59.160] There is no worry about the tsunami caused by this earthquake. コマンドラインで 翻訳 28
  20. build $ git clone https://github.com/ggerganov/whisper.cpp $ make cc -I. -O3

    -std=c11 -pthread -mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o g++ -I. -I./examples -O3 -std=c++11 -pthread -c whisper.cpp -o whisper.o g++ -I. -I./examples -O3 -std=c++11 -pthread examples/main/main.cpp ggml.o whisper.o -o main ./main -h usage: ./main [options] file0.wav file1.wav ... options: -h, --help show this help message and exit -s SEED, --seed SEED RNG seed (default: -1) -t N, --threads N number of threads to use during computation (default: 4) -p N, --processors N number of processors to use during computation (default: 1) -ot N, --offset-t N time offset in milliseconds (default: 0) -on N, --offset-n N segment index offset (default: 0) -d N, --duration N duration of audio to process in milliseconds (default: 0) -mc N, --max-context N maximum number of text context tokens to store (default: max) -ml N, --max-len N maximum segment length in characters (default: 0) -wt N, --word-thold N word timestamp probability threshold (default: 0.010000) -su, --speed-up speed up audio by factor of 2 (faster processing, reduced accuracy -v, --verbose verbose output --translate translate from source language to english -otxt, --output-txt output result in a text file -ovtt, --output-vtt output result in a vtt file -osrt, --output-srt output result in a srt file -owts, --output-words output script for generating karaoke video -ps, --print_special print special tokens -pc, --print_colors print colors -nt, --no_timestamps do not print timestamps -l LANG, --language LANG spoken language (default: en) -m FNAME, --model FNAME model path (default: models/ggml-base.en.bin) -f FNAME, --file FNAME input WAV file path 31
  21. sample $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav whisper_model_load: loading model

    from models/ggml-base.en.bin whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 1 whisper_model_load: type = 2 whisper_model_load: mem_required = 506.00 MB whisper_model_load: adding 1607 extra tokens whisper_model_load: ggml ctx size = 140.60 MB whisper_model_load: memory size = 22.83 MB whisper_model_load: model size = 140.54 MB system_info: n_threads = 4 / 4 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 main: processing samples/jfk.wav (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en [00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do whisper_print_timings: load time = 542.31 ms whisper_print_timings: mel time = 174.94 ms whisper_print_timings: sample time = 14.82 ms whisper_print_timings: encode time = 6282.06 ms / 1047.01 ms per layer 32
  22. モデルのダウンロード $ bash ./models/download-ggml-model.sh Usage: ./models/download-ggml-model.sh <model> Available models: tiny.en

    tiny base.en base small.en small medium.en medium large $ bash ./models/download-ggml-model.sh large $ bash ./models/download-ggml-model.sh medium : 33
  23. Whisper.cpp の入力フォーマット Whisper は動画だろうが処理してくれたが,Whisper.cpp は wav であ る必要があるよう 16kHz wav

    である必要があるよう 変換例 error: failed to open 'input.webm' as WAV file ./main: WAV file 'input.wav' must be 16 kHz $ ffmpeg -o input.webm -ar 16000 out.wav 34
  24. Whisper と cpp の速度比較 表 1. 1分の音声をIntel Core i5-7300U で

    Whisper cpp large 29m38 8m9s medium 12m46s 4m4s small 2m28s 1m6s base 47s 20s tiny 27s 19s Benchmark results → https://github.com/ggerganov/whisper.cpp/issues/89 35
  25. Whisper.cpp でstreaming文字起こ し Intel Core i5-7300U でbaseだと処理が間に合わず以下のメッセージが 大量に出る.tinyでも処理が間に合わないことがあるよう $ sudo

    apt-get install libsdl2-dev $ make stream $ ./stream --language ja -m models/ggml-tiny.bin main: WARNING: cannot process audio fast enough, dropping audio ... 37
  26. 英語音声をリアルタイム文字起 こししつつ,transrate-shell で日本 語に翻訳 loopbackデバイスを用意して自分の介さない言語のビデオミーティン グなどの音声を流し込むと便利 $ ./stream -t 4

    --language en -m models/ggml-tiny.bin | pee cat "trans :ja -b" : Thanks for the main menu and the me use this for points for sure. メインメニューに感謝します。私はこれをポイントとして使用しています。 [2K the fights are hot and we hope that we will be able to provide all the amenities that we 戦いは熱いので、私たちが提供できるすべてのアメニティを提供できることを願っています [2K I don't know if I can take the phone to the phone I don't know if I can take the phone to the 電話を電話に持っていけるかどうかわからない 電話を電話に持っていけるかどうかわからない [2K I want to be a Nielsen, you all are even happy to be here. Yeah, be a Nielsen. Yeah, be a Niel 私はニールセンになりたいです。皆さんもここにいられて幸せです。ええ、ニールセンになりましょう。ええ、ニールセンに ^C 38