Build your own voice assistant

Build your own voice assistant Dale Humby [email protected]

techtarget.com

dowsingandreynolds.com

Photo by Moritz Kindler on Unsplash

denofgeek.com

Black Mirror, White Christmas

home-assistant.io Open source home automation that puts local control and
privacy first. Powered by a worldwide community of tinkerers and DIY enthusiasts. Perfect to run on a Raspberry Pi or a local server.

• Local • Offline • Highly customisable • Experiment with
new ideas • Doesn't have to be cost effective • Learn Why build your own voice assistant?

Rhasspy is an open source, fully offline set of voice
assistant services for many human languages rhasspy.readthedocs.io github.com/rhasspy/rhasspy

Each Rhasspy services can be swapped-out, to customise and improve
the voice assistant

tenor.com/en-GB/view/this-is-fine-gif-24177057

Hardware • Raspberry Pi 4B (Pi 3 should work) •
A good micro-USB power supply • Micro-SD card, for Raspberry Pi OS • HDMI cable and USB keyboard • Jabra 410 or 510 USB speaker (but any popular mic+speaker or headset should be fine) … or as a Virtual Machine on your laptop

Install Rhasspy services: rhasspy: image: rhasspy/rhasspy container_name: rhasspy restart: unless-stopped
volumes: - "$HOME/rhasspy/:/profiles" - "/etc/localtime:/etc/localtime:ro" ports: - "12101:12101" devices: - "/dev/snd:/dev/snd" command: --user-profiles /profiles --profile en hub.docker.com/r/rhasspy/rhasspy docker-compose.yml

"Hey Google… what's the weather today?"

Wake word

Hey Google… what's the weather today?

Speech to text

Hey Google… what's the weather today? Mozilla Deep Speech •
Open source • offline, on-device • speech-to-text • real time • Raspberry Pi 4 Voice Activity Detector (VAD) • Classifies audio • voiced or unvoiced • Silence detection

Intent recognition

Hey Mycroft… Turn on the light Raw utterance

[LightState] turn (on | off){state} [the] light Hey Mycroft… Turn
on the light Raw utterance Intents grammar

[LightState] turn (on | off){state} [the] light { "text": "turn
on the light", "intent": { "name": "LightState" }, "slots": { "state": "on" } } Hey Mycroft… Turn on the light Raw utterance Intents grammar Structured data - Intent - Slots

{ "wakeword_id": "hey-mycroft-2", "raw_text": "turn on the light", "intent": {
"name": "LightState", "confidence": 1 }, "slots": { "state": "on" } }

[GetTime] what time is it tell me the time whats
the time [GetWeather] whats the weather (today | tomorrow){day} whats the weather on ($rhasspy/days){day} [LightState] states = (on | off) turn <states>{state} [the] light Intents grammar sentences.ini

Text to speech

Install Mycroft AI's mimic3 hub.docker.com/r/mycroftai/mimic3 mimic3: image: mycroftai/mimic3 container_name: mimic3
restart: unless-stopped ports: - "59125:59125" volumes: - "$HOME/mimic3:/home/mimic3/.local/share/mycroft/mimic3" docker-compose.yml

Intent handling

[GetWeather] whats the weather (today | tomorrow){day} whats the weather
on ($rhasspy/days){day} { "text": "whats the weather on tuesday", "intent": { "name": "GetWeather" }, "slots": { "day": "tuesday" } } Hey Mycroft… What's the weather on Tuesday? Raw utterance Intents grammar Structured data - Intent - Slots

Install Node-RED node-red: image: nodered/node-red:latest restart: unless-stopped ports: - "1880:1880"
environment: - TZ=Europe/Sweden volumes: - "$HOME/node-red:/data" - "/etc/localtime:/etc/localtime:ro" hub.docker.com/r/nodered/node-red docker-compose.yml

msg.payload = "The time is 11 42" return msg return
msg.payload Hey Mycroft… What's the time? POST http://rhasspy:12101/api/text-to-speech

Hey Mycroft… What's the time? const dateTime = new Date()
const options = { hour: 'numeric', minute: 'numeric', hour12: 'true', timeZone: 'Europe/Stockholm' } const formatted = Intl.DateTimeFormat('en-GB', options).format(dateTime) msg.payload = `The time is ${formatted}` return msg

Hey Mycroft… What's the weather tomorrow?

Photo by Max van den Oetelaar on Unsplash

The Future…?

Hardware

Satellites Base

Low power Satellite High power Base

Supposed to be 300kr Now… 1,000+ kr

ESP32-wroom ESP32-S3 https://esp32s3.com/tinys3.html

Wake word

Open Wake Word github.com/dscripka/openWakeWord

Whisper … approaches human level robustness and accuracy on English
speech recognition. openai.com/research/whisper github.com/ggerganov/whisper.cpp github.com/guillaumekln/faster-whisper

Large Language Model

Fulﬁlment with GPT-3 ChatGPT … GPT-4 ChatGPT in an iOS
Shortcut — Worlds Smartest HomeKit Voice Assistant

[riffusion] riffusion.com Stable Diffusion / Dall.e stablediffusionweb.com

Deep Neural Networks for automation? How far can we push

GPT-4 The quick brown fox jumps over the lazy dogs

Deep Neural Networks for automation? Stable Diffusion Astronaut riding a horse How far can we push

Deep Neural Networks for automation? Riffusion Lo-Fi jazz How far can we push

Events > Actions Dale walks into kitchen GPT-4 The quick
brown fox jumps over the lazy dogs Deep Neural Networks for automation? How far can we push Riffusion Lo-Fi jazz

Events > Actions Dale walks into kitchen - Kitchen light
on - Under counter light on - Kettle on GPT-4 The quick brown fox jumps over the lazy dogs Deep Neural Networks for automation? How far can we push Riffusion Lo-Fi jazz

Thank you! [email protected] dhum.by/github home-assistant.io rhasspy.readthedocs.io

tts sonantic, … • multi-turn conversation • more general knowledge

Build your own voice assistant

Build your own voice assistant

More Decks by Dale Humby

Other Decks in Technology

Featured

Transcript