Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
音声情報処理に便利な (Python) パッケージやソフトウェア
Search
Akira Tamamori
December 30, 2020
Research
3
780
音声情報処理に便利な (Python) パッケージやソフトウェア
Tokyo BISH Bashの資料から独立させたもの
Akira Tamamori
December 30, 2020
Tweet
Share
More Decks by Akira Tamamori
See All by Akira Tamamori
音声認識と音声合成の超入門
tam17aki
0
390
Tokyo BISH Bash #02 音声情報処理と音声変換技術入門
tam17aki
2
2k
[ICASSP2020音響音声読み会] State-Space Gaussian Process for Drift Estimation in Stochastic Differential Equations
tam17aki
0
480
Other Decks in Research
See All in Research
10-ot-generic-bio.pdf
gpeyre
0
140
サウナでのプロジェクションマッピングの可能性の検討 / EC71koizumi
yumulab
0
180
生成AIを用いたText to SQLの最前線
masatoto
1
2.4k
インタビューだけじゃない!ユーザーに共感しユーザーの目👀を手に入れるためのインプット
moco1013
0
260
ニフティのインナーソース導入事例 - InnerSource Commons #11
niftycorp
PRO
0
260
Accurate Method and Variable Tracking in Commit History
tsantalis
0
260
Alexander Mielke Hellinger--Kantorovich (a.k.a. Wasserstein-Fisher-Rao) Spaces and Gradient Flows
jjzhu
3
190
ランサーズエージェント_フリーランスエンジニアの年収・キャリアの実態調査2024
lancers_pr
0
110
LiDARセキュリティ最前線
kentaroy47
0
280
Generative Spoken Dialogue Language Modeling [対話論文読み会@電通大]
yuta0306
1
130
眠眠ガチャ:ガチャを活用した睡眠意欲向上アプリの開発 / EC71inui
yumulab
0
160
説明可能AI:代表的手法と最近の動向
yuyay
1
610
Featured
See All Featured
Robots, Beer and Maslow
schacon
PRO
155
7.9k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
352
28k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
14
1.5k
Being A Developer After 40
akosma
65
580k
Practical Orchestrator
shlominoach
183
9.7k
The Mythical Team-Month
searls
216
42k
Making the Leap to Tech Lead
cromwellryan
125
8.5k
Building a Modern Day E-commerce SEO Strategy
aleyda
20
6.4k
Building Flexible Design Systems
yeseniaperezcruz
320
37k
Scaling GitHub
holman
457
140k
The Art of Programming - Codeland 2020
erikaheidi
43
12k
Product Roadmaps are Hard
iamctodd
45
9.7k
Transcript
ԻใॲཧԻมʹ ศརͳ (Python) ύοέʔδ ιϑτΣΞͨͪ 1 ༻ײͳͲࢲݟΛؚΈ·͢
TPYQZTPY • ίϚϯυϥΠϯ͔ΒϑΥʔϚοτมͳͲΛ͓खܰʹ • ϑΥʔϚοτมʢwav to mp3 ͳͲʣ • ݁߹ϛοΫεɺτϦϛϯά
Մೳ • όονॲཧָʢγΣϧεΫϦϓτͳͲʣ • Pythonϥούʔ pysox͋Δ • Πϯετʔϧ • brew install sox ͳͲ • pip install sox ← pysox ͷΠϯετʔϧ͜Ε 2
MJCSPTBʢ͜ΕຊʹΦεεϝʣ • Ի/ԻָͷੳʹศརͳϞδϡʔϧ͕ଗͬͨύοέʔδ • ެࣜϚχϡΞϧɾνϡʔτϦΞϧͷॆ࣮ॿ͔Δ • ݸਓతʹΑ͘͏ػೳ • ܗදࣔɺεϖΫτϩάϥϜදࣔ •
Իಛྔநग़ʢରϝϧεϖΫτϩάϥϜʣ • Πϯετʔϧ pip install librosa • ެࣜϖʔδ https://librosa.org/librosa/index.html 3
1Z8PSME • Իͷੳ࠶߹Λߦ͏Ϙίʔμʔͷύοέʔδ • ԻΛʮ৭ɾͷߴ͞ɾͷ͔͢Εʯͷ֤ʹղ͠࠶߹ • C++൛ͷPythonϥούʔ • Իͷಛྔநग़ʹ͑ͯศར ⇒
PySPTKʢޙड़ʣΑΓ࣭ͷΑ͍εϖΫτϧแབྷ • Πϯετʔϧ pip install pyworld • ެࣜϖʔδ https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder 4
1Z"VEJP • ετϦʔϜԻ / ࠶ੜʹศརͳύοέʔδ • ϦΞϧλΠϜͷԻೖྗɾԻग़ྗʹ͑Δ • ϦΞϧλΠϜԻม with
PythonͳͲՄೳ • Πϯετʔϧ • pip install pyaudio ※ཁportaudio (e.g., brew install portaudio) 5
1Z"VEJPͱ1Z8PSMEͷΈ߹Θͤ • ؆қ൛ͷϘΠενΣϯδϟʔ • ؆қϘΠενΣϯδϟʔͷεΫϦϓτΛվྑɿPyQt5ͷεϥΠμʔʹΑΓ ϐονͱϑΥϧϚϯτΛϦΞϧλΠϜௐ͢ΔػೳΛՃʢฐϒϩάʣ • banibiku • Zoom৴͚ʹ̎࣍ݩΩϟϥʹͳΓ͖Δ͜ͱΛࢦͨ͠ϓϩδΣΫτ
• scripts/voice_converter.py ͕ྑ͍ײ͡ͷϘΠενΣϯδϟʔ → ฐϒϩάͷαϯϓϧεΫϦϓτͷόάϑΟοΫεؚ͕·ΕΔ 6 https://tam5917.hatenablog.com/entry/2019/04/30/213321 https://github.com/peisuke/babiniku
1Z415, • ԻใॲཧπʔϧΩοτSPTKͷPythonϥούʔ • SPTKࣗମLinuxίϚϯυ܈ • Իڹಛྔநग़ʹ͏ͷ͕ศར • Իੳ߹Ͱ͖Δ͕ɺ࣭ࣗମWORLDͷ΄͏্͕ •
Πϯετʔϧ pip install pysptk • ެࣜϖʔδ https://pysptk.readthedocs.io/en/latest/ 7
OONOLXJJ <OBOBNJO LBXBJJ> • DNNԻ߹ʹཱͭϞδϡʔϧΛूΊͨύοέʔδ • ͲͪΒ͔ͱ͍͏ͱݚڀ༻్ • લॲཧԻڹಛྔநग़ͷΫϥε͕Ұ௨Γଗ͍ͬͯΔ •
จͷ࠶ݱ࣮Λ͢Δͱ͖ͳͲʹେ͍ʹཱͭ • Πϯετʔϧ pip install nnmnkwii • ެࣜϖʔδ https://r9y9.github.io/nnmnkwii/stable/index.html 8
1ZEVC • Pydub • ܗฤूʹศརͳϞδϡʔϧΛूΊͨύοέʔδ • αϙʔτ͢ΔϑΝΠϧܗࣜ๛ʢwav, mp3, mp4, wma,
aac, ...ʣ • ػೳ Γग़͠ɺׂɺϛοΫεɺϑΣʔυΠϯΞτɺແԻૠೖɺͳͲͳͲ • Ұ෦ͷػೳ pysoxͷ΄͏͕ߴͱ͍͏ӟ?ʢະ֬ೝʣ • Πϯετʔϧ pip install pydub • ެࣜϖʔδ http://pydub.com/ 9
TQSPDLFU • ౷ܭత࣭มͷͨΊͷπʔϧΩοτ (not ύοέʔδ) • ͲͪΒ͔ͱ͍͏ͱݚڀ༻ ʢMITϥΠηϯεʣ • ݚڀͷʮϕʔεϥΠϯʯߏஙʹ࠷ద
• ެࣜϖʔδ https://github.com/k2kobayashi/sprocket • ղઆจ ʰ౷ܭత࣭มιϑτΣΞೖʱ https://www.jstage.jst.go.jp/article/isciesci/62/2/62_69/_article/-char/ja/ • νϡʔτϦΞϧ (εϥΠυ & notebook) https://github.com/kan-bayashi/INTERSPEECH19_TUTORIAL 10
"VEBDJUZ ೖΕ͓ͯ͘ͱ҆৺ • ϑϦʔͷܗฤूιϑτɺϚϧνϓϥοτϑΥʔϜ • ๛ͳαϯυΤϑΣΫτՃػೳ • ެࣜϖʔδ https://www.audacityteam.org/ 11
(16্ͰԻॲཧ͍ͨ͠Ϛϯʹ ͓͢͢Ίͷύοέʔδ 12 ͓·͚
UPSDIBVEJP • Pytorchެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥϥΠϒϥϦ • PytorchܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • ެࣜϖʔδ https://pytorch.org/audio/stable/index.html 13
UGTJHOBM • TensorFlowެ͕ࣜαϙʔτ͍ͯ͠ΔԻॲཧܥͷؔ܈ • TFܥͷਂֶशϞσϧͱͷ૬ੑ͕ྑ͍ʢͦΕͦ͏ʣ • FFT/iFFT, DCT, MDCT, STFTͳͲ
• ެࣜϖʔδ https://www.tensorflow.org/api_docs/python/tf/signal 14
UPSDIMJCSPTB • PytorchΛόοΫΤϯυʹͯ͠librosaΛGPU্Ͱಈ͔͢ • Πϯετʔϧ pip install torchlibrosa • ެࣜϖʔδ
https://github.com/qiuqiangkong/torchlibrosa 15
LBQSF • Kerasʢͱ͍͏͔TFʣΛόοΫΤϯυʹͯ͠Իॲཧ͢Δ • STFTiSTFTɺϝϧεϖΫτϩάϥϜͳͲ • CQTͳͲͳ͍ • ެࣜϖʔδ https://github.com/keunwoochoi/kapre
• Πϯετʔϧ pip install kapre 16
OO"VEJP • PytorchΛόοΫΤϯυʹͯ͠STFTͳͲΛGPU্Ͱಈ͔͢ • STFTɺٯSTFTɺCQTͳͲΑ͘͏ಛநग़ܥ͕ଗ͏ • ެࣜϖʔδ https://github.com/KinWaiCheuk/nnAudio 17
OO"VEJPʢ͖ͭͮʣ • ൺֱද 18