Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build your own voice assistant

Build your own voice assistant

When you say, “Hey Siri, play Spotify”, or “Alexa, turn on my kitchen lights” have you ever wondered how they work?

In this talk I’ll show you how to build your own local, offline voice assistant using open source software – all runnable on a Raspberry Pi. In the process you’ll learn about wake words, speech-to-text and text-to-speech, intent matching and handling, and we’ll write a few simple commands to get the weather, and start playing music on Spotify.

No prior knowledge of Docker, Pi’s or voice assistants is needed, just a curiosity into how they work.

Dale Humby

March 22, 2023
Tweet

More Decks by Dale Humby

Other Decks in Technology

Transcript

  1. Build your own voice
    assistant
    Dale Humby
    [email protected]

    View full-size slide

  2. techtarget.com

    View full-size slide

  3. dowsingandreynolds.com

    View full-size slide

  4. Photo by Moritz Kindler on Unsplash

    View full-size slide

  5. denofgeek.com

    View full-size slide

  6. Black Mirror, White Christmas

    View full-size slide

  7. home-assistant.io
    Open source home automation that
    puts local control and privacy first.
    Powered by a worldwide community of
    tinkerers and DIY enthusiasts. Perfect to
    run on a Raspberry Pi or a local server.

    View full-size slide

  8. ● Local
    ● Offline
    ● Highly customisable
    ● Experiment with new ideas
    ● Doesn't have to be cost effective
    ● Learn
    Why build your own
    voice assistant?

    View full-size slide

  9. Rhasspy is an open source, fully offline set of voice
    assistant services for many human languages
    rhasspy.readthedocs.io
    github.com/rhasspy/rhasspy

    View full-size slide

  10. Each Rhasspy services can be swapped-out,
    to customise and improve the voice assistant

    View full-size slide

  11. tenor.com/en-GB/view/this-is-fine-gif-24177057

    View full-size slide

  12. Hardware
    ● Raspberry Pi 4B (Pi 3 should work)
    ● A good micro-USB power supply
    ● Micro-SD card, for Raspberry Pi OS
    ● HDMI cable and USB keyboard
    ● Jabra 410 or 510 USB speaker (but any
    popular mic+speaker or headset should
    be fine)
    … or as a Virtual Machine on your laptop

    View full-size slide

  13. Install Rhasspy
    services:
    rhasspy:
    image: rhasspy/rhasspy
    container_name: rhasspy
    restart: unless-stopped
    volumes:
    - "$HOME/rhasspy/:/profiles"
    - "/etc/localtime:/etc/localtime:ro"
    ports:
    - "12101:12101"
    devices:
    - "/dev/snd:/dev/snd"
    command: --user-profiles /profiles --profile en
    hub.docker.com/r/rhasspy/rhasspy
    docker-compose.yml

    View full-size slide

  14. "Hey Google… what's the
    weather today?"

    View full-size slide

  15. Hey Google… what's the weather today?

    View full-size slide

  16. Hey Google… what's the weather today?

    View full-size slide

  17. Hey Google… what's the weather today?

    View full-size slide

  18. Hey Google… what's the weather today?

    View full-size slide

  19. Hey Google… what's the weather today?

    View full-size slide

  20. Speech to text

    View full-size slide

  21. Hey Google… what's the weather today?
    Mozilla Deep Speech
    ● Open source
    ● offline, on-device
    ● speech-to-text
    ● real time
    ● Raspberry Pi 4
    Voice Activity Detector
    (VAD)
    ● Classifies audio
    ● voiced or unvoiced
    ● Silence detection

    View full-size slide

  22. Intent
    recognition

    View full-size slide

  23. Hey Mycroft… Turn on the light Raw utterance

    View full-size slide

  24. [LightState]
    turn (on | off){state} [the] light
    Hey Mycroft… Turn on the light Raw utterance
    Intents grammar

    View full-size slide

  25. [LightState]
    turn (on | off){state} [the] light
    {
    "text": "turn on the light",
    "intent": {
    "name": "LightState"
    },
    "slots": {
    "state": "on"
    }
    }
    Hey Mycroft… Turn on the light Raw utterance
    Intents grammar
    Structured data
    - Intent
    - Slots

    View full-size slide

  26. {
    "wakeword_id": "hey-mycroft-2",
    "raw_text": "turn on the light",
    "intent": {
    "name": "LightState",
    "confidence": 1
    },
    "slots": {
    "state": "on"
    }
    }

    View full-size slide

  27. [GetTime]
    what time is it
    tell me the time
    whats the time
    [GetWeather]
    whats the weather (today | tomorrow){day}
    whats the weather on ($rhasspy/days){day}
    [LightState]
    states = (on | off)
    turn {state} [the] light
    Intents grammar
    sentences.ini

    View full-size slide

  28. Text to speech

    View full-size slide

  29. Install Mycroft AI's mimic3
    hub.docker.com/r/mycroftai/mimic3
    mimic3:
    image: mycroftai/mimic3
    container_name: mimic3
    restart: unless-stopped
    ports:
    - "59125:59125"
    volumes:
    - "$HOME/mimic3:/home/mimic3/.local/share/mycroft/mimic3"
    docker-compose.yml

    View full-size slide

  30. Intent
    handling

    View full-size slide

  31. [GetWeather]
    whats the weather (today | tomorrow){day}
    whats the weather on ($rhasspy/days){day}
    {
    "text": "whats the weather on tuesday",
    "intent": {
    "name": "GetWeather"
    },
    "slots": {
    "day": "tuesday"
    }
    }
    Hey Mycroft… What's the weather on Tuesday? Raw utterance
    Intents grammar
    Structured data
    - Intent
    - Slots

    View full-size slide

  32. Install Node-RED
    node-red:
    image: nodered/node-red:latest
    restart: unless-stopped
    ports:
    - "1880:1880"
    environment:
    - TZ=Europe/Sweden
    volumes:
    - "$HOME/node-red:/data"
    - "/etc/localtime:/etc/localtime:ro"
    hub.docker.com/r/nodered/node-red
    docker-compose.yml

    View full-size slide

  33. msg.payload = "The time is 11 42"
    return msg
    return msg.payload
    Hey Mycroft… What's the time?
    POST http://rhasspy:12101/api/text-to-speech

    View full-size slide

  34. Hey Mycroft… What's the time?
    const dateTime = new Date()
    const options = {
    hour: 'numeric',
    minute: 'numeric',
    hour12: 'true',
    timeZone: 'Europe/Stockholm'
    }
    const formatted = Intl.DateTimeFormat('en-GB', options).format(dateTime)
    msg.payload = `The time is ${formatted}`
    return msg

    View full-size slide

  35. Hey Mycroft… What's the weather tomorrow?

    View full-size slide

  36. Photo by Max van den Oetelaar on Unsplash

    View full-size slide

  37. The Future…?

    View full-size slide

  38. Satellites
    Base

    View full-size slide

  39. Low power
    Satellite
    High power
    Base

    View full-size slide

  40. Supposed to be 300kr
    Now… 1,000+ kr

    View full-size slide

  41. ESP32-wroom ESP32-S3
    https://esp32s3.com/tinys3.html

    View full-size slide

  42. Open Wake Word
    github.com/dscripka/openWakeWord

    View full-size slide

  43. Whisper … approaches
    human level robustness
    and accuracy on English
    speech recognition.
    openai.com/research/whisper
    github.com/ggerganov/whisper.cpp
    github.com/guillaumekln/faster-whisper

    View full-size slide

  44. Large Language Model

    View full-size slide

  45. Fulfilment with
    GPT-3
    ChatGPT
    … GPT-4
    ChatGPT in an iOS Shortcut —
    Worlds Smartest HomeKit Voice
    Assistant

    View full-size slide

  46. [riffusion]
    riffusion.com
    Stable Diffusion / Dall.e
    stablediffusionweb.com

    View full-size slide

  47. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  48. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  49. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    Stable Diffusion
    Astronaut riding a horse
    How far can we push

    View full-size slide

  50. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    Riffusion
    Lo-Fi jazz
    How far can we push

    View full-size slide

  51. Events > Actions
    Dale walks into kitchen
    GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push
    Riffusion
    Lo-Fi jazz

    View full-size slide

  52. Events > Actions
    Dale walks into kitchen
    - Kitchen light on
    - Under counter light on
    - Kettle on
    GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push
    Riffusion
    Lo-Fi jazz

    View full-size slide

  53. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  54. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  55. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  56. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  57. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  58. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  59. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  60. Deep Neural Networks for automation?
    How far can we push

    View full-size slide

  61. Thank you!
    [email protected]
    dhum.by/github
    home-assistant.io
    rhasspy.readthedocs.io

    View full-size slide

  62. tts
    sonantic, …
    ● multi-turn conversation
    ● more general knowledge

    View full-size slide