Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build your own voice assistant

Build your own voice assistant

When you say, “Hey Siri, play Spotify”, or “Alexa, turn on my kitchen lights” have you ever wondered how they work?

In this talk I’ll show you how to build your own local, offline voice assistant using open source software – all runnable on a Raspberry Pi. In the process you’ll learn about wake words, speech-to-text and text-to-speech, intent matching and handling, and we’ll write a few simple commands to get the weather, and start playing music on Spotify.

No prior knowledge of Docker, Pi’s or voice assistants is needed, just a curiosity into how they work.

Dale Humby

March 22, 2023
Tweet

More Decks by Dale Humby

Other Decks in Technology

Transcript

  1. home-assistant.io Open source home automation that puts local control and

    privacy first. Powered by a worldwide community of tinkerers and DIY enthusiasts. Perfect to run on a Raspberry Pi or a local server.
  2. • Local • Offline • Highly customisable • Experiment with

    new ideas • Doesn't have to be cost effective • Learn Why build your own voice assistant?
  3. Rhasspy is an open source, fully offline set of voice

    assistant services for many human languages rhasspy.readthedocs.io github.com/rhasspy/rhasspy
  4. Hardware • Raspberry Pi 4B (Pi 3 should work) •

    A good micro-USB power supply • Micro-SD card, for Raspberry Pi OS • HDMI cable and USB keyboard • Jabra 410 or 510 USB speaker (but any popular mic+speaker or headset should be fine) … or as a Virtual Machine on your laptop
  5. Install Rhasspy services: rhasspy: image: rhasspy/rhasspy container_name: rhasspy restart: unless-stopped

    volumes: - "$HOME/rhasspy/:/profiles" - "/etc/localtime:/etc/localtime:ro" ports: - "12101:12101" devices: - "/dev/snd:/dev/snd" command: --user-profiles /profiles --profile en hub.docker.com/r/rhasspy/rhasspy docker-compose.yml
  6. Hey Google… what's the weather today? Mozilla Deep Speech •

    Open source • offline, on-device • speech-to-text • real time • Raspberry Pi 4 Voice Activity Detector (VAD) • Classifies audio • voiced or unvoiced • Silence detection
  7. [LightState] turn (on | off){state} [the] light Hey Mycroft… Turn

    on the light Raw utterance Intents grammar
  8. [LightState] turn (on | off){state} [the] light { "text": "turn

    on the light", "intent": { "name": "LightState" }, "slots": { "state": "on" } } Hey Mycroft… Turn on the light Raw utterance Intents grammar Structured data - Intent - Slots
  9. { "wakeword_id": "hey-mycroft-2", "raw_text": "turn on the light", "intent": {

    "name": "LightState", "confidence": 1 }, "slots": { "state": "on" } }
  10. [GetTime] what time is it tell me the time whats

    the time [GetWeather] whats the weather (today | tomorrow){day} whats the weather on ($rhasspy/days){day} [LightState] states = (on | off) turn <states>{state} [the] light Intents grammar sentences.ini
  11. Install Mycroft AI's mimic3 hub.docker.com/r/mycroftai/mimic3 mimic3: image: mycroftai/mimic3 container_name: mimic3

    restart: unless-stopped ports: - "59125:59125" volumes: - "$HOME/mimic3:/home/mimic3/.local/share/mycroft/mimic3" docker-compose.yml
  12. [GetWeather] whats the weather (today | tomorrow){day} whats the weather

    on ($rhasspy/days){day} { "text": "whats the weather on tuesday", "intent": { "name": "GetWeather" }, "slots": { "day": "tuesday" } } Hey Mycroft… What's the weather on Tuesday? Raw utterance Intents grammar Structured data - Intent - Slots
  13. Install Node-RED node-red: image: nodered/node-red:latest restart: unless-stopped ports: - "1880:1880"

    environment: - TZ=Europe/Sweden volumes: - "$HOME/node-red:/data" - "/etc/localtime:/etc/localtime:ro" hub.docker.com/r/nodered/node-red docker-compose.yml
  14. msg.payload = "The time is 11 42" return msg return

    msg.payload Hey Mycroft… What's the time? POST http://rhasspy:12101/api/text-to-speech
  15. Hey Mycroft… What's the time? const dateTime = new Date()

    const options = { hour: 'numeric', minute: 'numeric', hour12: 'true', timeZone: 'Europe/Stockholm' } const formatted = Intl.DateTimeFormat('en-GB', options).format(dateTime) msg.payload = `The time is ${formatted}` return msg
  16. ASR

  17. Whisper … approaches human level robustness and accuracy on English

    speech recognition. openai.com/research/whisper github.com/ggerganov/whisper.cpp github.com/guillaumekln/faster-whisper
  18. Fulfilment with GPT-3 ChatGPT … GPT-4 ChatGPT in an iOS

    Shortcut — Worlds Smartest HomeKit Voice Assistant
  19. GPT-4 The quick brown fox jumps over the lazy dogs

    Deep Neural Networks for automation? How far can we push
  20. GPT-4 The quick brown fox jumps over the lazy dogs

    Deep Neural Networks for automation? Stable Diffusion Astronaut riding a horse How far can we push
  21. GPT-4 The quick brown fox jumps over the lazy dogs

    Deep Neural Networks for automation? Riffusion Lo-Fi jazz How far can we push
  22. Events > Actions Dale walks into kitchen GPT-4 The quick

    brown fox jumps over the lazy dogs Deep Neural Networks for automation? How far can we push Riffusion Lo-Fi jazz
  23. Events > Actions Dale walks into kitchen - Kitchen light

    on - Under counter light on - Kettle on GPT-4 The quick brown fox jumps over the lazy dogs Deep Neural Networks for automation? How far can we push Riffusion Lo-Fi jazz