自作お家AIエージェントｽﾀｯｸﾁｬﾝFWで困っている所紹介

自作お家AIエージェントｽﾀｯｸﾁｬﾝFWで困っている所紹介ｽﾀｯｸﾁｬﾝお誕生日会2026 @74th(Atsushi Morimoto)

世界のｽﾀｯｸﾁｬﾝ誕生おめでとう Happy Birthday to Stack-chans Around the World ※ ChatGPTで生成

About me Atsushi Morimoto X, GitHub: @74th https://74th.booth.pm/items/8119356

私にとってのｽﾀｯｸﾁｬﾝ What Stack-chan Means to Me 音声AIエージェントをすぐに試せる端末 A ready-to-use device
for trying out voice AI agents PC Gemini WebSocket Python Server Claude Agent SDK LocalLLM https://github.com/74th/websocket-control-stackchan/

Issue 1: 妻のウェイクアップワードが聞こえない My wife cannot hear the wake word.
ウェイクアップワード検出ライブラリESP-SRは女性の声は聞き取りにくそう The ESP-SR wake word detection library seems to have diﬃculty recognizing female voices. ローカル音声認識のwhisper.cppをPC上で動かし、常に0.5秒毎に、発言に「ハイスタックチャン」が含まれているかを検出し、ウェイクアップワードとする To solve this, I run whisper.cpp locally on a PC and continuously check the recognized speech every 0.5 seconds. If the speech contains “Stack-chan,” it is treated as the wake word. Prompt “ハイスーチャン。ネエハイトチャン。ハイスタックチャン。ハイスズキクン。ハイフロントチャン。” から、検出結果 “ハイスタックチャン” を抽出 Extract the detection result “Hi Stack-chan” from the prompt: “Hi Su-chan. Hey Hai-to-chan. Hi Stack-chan. Hi Suzuki-kun. Hi Front-chan.”

Issue 2: TV視聴中に急に喋り出す It suddenly starts speaking while watching TV
TVドラマを視聴中に、誤検出して急にしゃべり出す。未解決。 While we are watching TV dramas, it falsely detects the wake word and suddenly starts speaking. This issue is still unresolved.

Issue 3: 人間の話しかけの終了が分からない It cannot tell when a human has
ﬁnished speaking 自宅は、TVが付いていなければ静かなため、一定音量以下が続いたら終了と見なしている。わりとこれでうまくいく。 At home, it is quiet when the TV is oﬀ, so I treat it as the end of speech when the volume stays below a certain threshold for a while. This works fairly well. イベントでは、騒音に溢れているので、30秒で終了としている At demo events, the environment is full of noise, so I force the interaction to end after 30 seconds. whisper.cppのVAD（音声活動検出）で発話セグメント分解して、セグメントが切れていることを検出してやるべき。だけれど、ストリーミングでできるのか分かっていない。 Ideally, I should use whisper.cpp’s VAD, or voice activity detection, to split speech into segments and detect when a segment has ended. However, I am not yet sure whether this can be done in streaming mode.

Issue 4: 音声アシスタントにしては長くしゃべる The responses are too long for a
voice assistant. エアコンの操作の度に、「かしこまりました。エアコンを暖房にしました。間もなくお部屋が暖かくなり、快適に過ごせます。」といろいろ喋る Every time I control the air conditioner, it talks too much, saying things like: “Certainly. I have set the air conditioner to heating mode. Your room will soon become warm and comfortable.” まず、「音声アシスタントであるため、マークダウン等の表現は使わず、3文程度の文章で答えてください。」と短く答えさせる。 To make responses shorter, I ﬁrst instruct it: “Since you are a voice assistant, do not use Markdown and answer in about three sentences.” 「家電操作ツールを実行した場合には、短く1文程度で答えてください。」と伝える。 I also add: “When you execute a home appliance control tool, respond brieﬂy in about one sentence.”

自作お家AIエージェントｽﾀｯｸﾁｬﾝFWで困っている所紹介

自作お家AIエージェントｽﾀｯｸﾁｬﾝFWで困っている所紹介

74th(Atsushi Morimoto)

More Decks by 74th(Atsushi Morimoto)

Other Decks in Technology

Featured

Transcript