Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build your own voice assistant

Build your own voice assistant

When you say, “Hey Siri, play Spotify”, or “Alexa, turn on my kitchen lights” have you ever wondered how they work?

In this talk I’ll show you how to build your own local, offline voice assistant using open source software – all runnable on a Raspberry Pi. In the process you’ll learn about wake words, speech-to-text and text-to-speech, intent matching and handling, and we’ll write a few simple commands to get the weather, and start playing music on Spotify.

No prior knowledge of Docker, Pi’s or voice assistants is needed, just a curiosity into how they work.

Dale Humby

March 22, 2023
Tweet

More Decks by Dale Humby

Other Decks in Technology

Transcript

  1. Build your own voice
    assistant
    Dale Humby
    [email protected]

    View Slide

  2. View Slide

  3. techtarget.com

    View Slide

  4. dowsingandreynolds.com

    View Slide

  5. Photo by Moritz Kindler on Unsplash

    View Slide

  6. denofgeek.com

    View Slide

  7. Black Mirror, White Christmas

    View Slide

  8. home-assistant.io
    Open source home automation that
    puts local control and privacy first.
    Powered by a worldwide community of
    tinkerers and DIY enthusiasts. Perfect to
    run on a Raspberry Pi or a local server.

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. ● Local
    ● Offline
    ● Highly customisable
    ● Experiment with new ideas
    ● Doesn't have to be cost effective
    ● Learn
    Why build your own
    voice assistant?

    View Slide

  14. Rhasspy is an open source, fully offline set of voice
    assistant services for many human languages
    rhasspy.readthedocs.io
    github.com/rhasspy/rhasspy

    View Slide

  15. Each Rhasspy services can be swapped-out,
    to customise and improve the voice assistant

    View Slide

  16. tenor.com/en-GB/view/this-is-fine-gif-24177057

    View Slide

  17. Hardware
    ● Raspberry Pi 4B (Pi 3 should work)
    ● A good micro-USB power supply
    ● Micro-SD card, for Raspberry Pi OS
    ● HDMI cable and USB keyboard
    ● Jabra 410 or 510 USB speaker (but any
    popular mic+speaker or headset should
    be fine)
    … or as a Virtual Machine on your laptop

    View Slide

  18. Install Rhasspy
    services:
    rhasspy:
    image: rhasspy/rhasspy
    container_name: rhasspy
    restart: unless-stopped
    volumes:
    - "$HOME/rhasspy/:/profiles"
    - "/etc/localtime:/etc/localtime:ro"
    ports:
    - "12101:12101"
    devices:
    - "/dev/snd:/dev/snd"
    command: --user-profiles /profiles --profile en
    hub.docker.com/r/rhasspy/rhasspy
    docker-compose.yml

    View Slide

  19. View Slide

  20. View Slide

  21. "Hey Google… what's the
    weather today?"

    View Slide

  22. Wake word

    View Slide

  23. Hey Google… what's the weather today?

    View Slide

  24. Hey Google… what's the weather today?

    View Slide

  25. Hey Google… what's the weather today?

    View Slide

  26. Hey Google… what's the weather today?

    View Slide

  27. Hey Google… what's the weather today?

    View Slide

  28. View Slide

  29. Speech to text

    View Slide

  30. Hey Google… what's the weather today?
    Mozilla Deep Speech
    ● Open source
    ● offline, on-device
    ● speech-to-text
    ● real time
    ● Raspberry Pi 4
    Voice Activity Detector
    (VAD)
    ● Classifies audio
    ● voiced or unvoiced
    ● Silence detection

    View Slide

  31. View Slide

  32. Intent
    recognition

    View Slide

  33. View Slide

  34. Hey Mycroft… Turn on the light Raw utterance

    View Slide

  35. [LightState]
    turn (on | off){state} [the] light
    Hey Mycroft… Turn on the light Raw utterance
    Intents grammar

    View Slide

  36. [LightState]
    turn (on | off){state} [the] light
    {
    "text": "turn on the light",
    "intent": {
    "name": "LightState"
    },
    "slots": {
    "state": "on"
    }
    }
    Hey Mycroft… Turn on the light Raw utterance
    Intents grammar
    Structured data
    - Intent
    - Slots

    View Slide

  37. {
    "wakeword_id": "hey-mycroft-2",
    "raw_text": "turn on the light",
    "intent": {
    "name": "LightState",
    "confidence": 1
    },
    "slots": {
    "state": "on"
    }
    }

    View Slide

  38. [GetTime]
    what time is it
    tell me the time
    whats the time
    [GetWeather]
    whats the weather (today | tomorrow){day}
    whats the weather on ($rhasspy/days){day}
    [LightState]
    states = (on | off)
    turn {state} [the] light
    Intents grammar
    sentences.ini

    View Slide

  39. Text to speech

    View Slide

  40. View Slide

  41. Install Mycroft AI's mimic3
    hub.docker.com/r/mycroftai/mimic3
    mimic3:
    image: mycroftai/mimic3
    container_name: mimic3
    restart: unless-stopped
    ports:
    - "59125:59125"
    volumes:
    - "$HOME/mimic3:/home/mimic3/.local/share/mycroft/mimic3"
    docker-compose.yml

    View Slide

  42. View Slide

  43. View Slide

  44. Intent
    handling

    View Slide

  45. [GetWeather]
    whats the weather (today | tomorrow){day}
    whats the weather on ($rhasspy/days){day}
    {
    "text": "whats the weather on tuesday",
    "intent": {
    "name": "GetWeather"
    },
    "slots": {
    "day": "tuesday"
    }
    }
    Hey Mycroft… What's the weather on Tuesday? Raw utterance
    Intents grammar
    Structured data
    - Intent
    - Slots

    View Slide

  46. View Slide

  47. Install Node-RED
    node-red:
    image: nodered/node-red:latest
    restart: unless-stopped
    ports:
    - "1880:1880"
    environment:
    - TZ=Europe/Sweden
    volumes:
    - "$HOME/node-red:/data"
    - "/etc/localtime:/etc/localtime:ro"
    hub.docker.com/r/nodered/node-red
    docker-compose.yml

    View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. msg.payload = "The time is 11 42"
    return msg
    return msg.payload
    Hey Mycroft… What's the time?
    POST http://rhasspy:12101/api/text-to-speech

    View Slide

  52. Hey Mycroft… What's the time?
    const dateTime = new Date()
    const options = {
    hour: 'numeric',
    minute: 'numeric',
    hour12: 'true',
    timeZone: 'Europe/Stockholm'
    }
    const formatted = Intl.DateTimeFormat('en-GB', options).format(dateTime)
    msg.payload = `The time is ${formatted}`
    return msg

    View Slide

  53. Hey Mycroft… What's the weather tomorrow?

    View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. Photo by Max van den Oetelaar on Unsplash

    View Slide

  69. The Future…?

    View Slide

  70. Hardware

    View Slide

  71. Satellites
    Base

    View Slide

  72. Low power
    Satellite
    High power
    Base

    View Slide

  73. Supposed to be 300kr
    Now… 1,000+ kr

    View Slide

  74. ESP32-wroom ESP32-S3
    https://esp32s3.com/tinys3.html

    View Slide

  75. Wake word

    View Slide

  76. View Slide

  77. Open Wake Word
    github.com/dscripka/openWakeWord

    View Slide

  78. ASR

    View Slide

  79. Whisper … approaches
    human level robustness
    and accuracy on English
    speech recognition.
    openai.com/research/whisper
    github.com/ggerganov/whisper.cpp
    github.com/guillaumekln/faster-whisper

    View Slide

  80. Large Language Model

    View Slide

  81. Fulfilment with
    GPT-3
    ChatGPT
    … GPT-4
    ChatGPT in an iOS Shortcut —
    Worlds Smartest HomeKit Voice
    Assistant

    View Slide

  82. [riffusion]
    riffusion.com
    Stable Diffusion / Dall.e
    stablediffusionweb.com

    View Slide

  83. Deep Neural Networks for automation?
    How far can we push

    View Slide

  84. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push

    View Slide

  85. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    Stable Diffusion
    Astronaut riding a horse
    How far can we push

    View Slide

  86. GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    Riffusion
    Lo-Fi jazz
    How far can we push

    View Slide

  87. Events > Actions
    Dale walks into kitchen
    GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push
    Riffusion
    Lo-Fi jazz

    View Slide

  88. Events > Actions
    Dale walks into kitchen
    - Kitchen light on
    - Under counter light on
    - Kettle on
    GPT-4
    The quick brown fox
    jumps over the lazy dogs
    Deep Neural Networks for automation?
    How far can we push
    Riffusion
    Lo-Fi jazz

    View Slide

  89. Deep Neural Networks for automation?
    How far can we push

    View Slide

  90. Deep Neural Networks for automation?
    How far can we push

    View Slide

  91. Deep Neural Networks for automation?
    How far can we push

    View Slide

  92. Deep Neural Networks for automation?
    How far can we push

    View Slide

  93. Deep Neural Networks for automation?
    How far can we push

    View Slide

  94. Deep Neural Networks for automation?
    How far can we push

    View Slide

  95. Deep Neural Networks for automation?
    How far can we push

    View Slide

  96. Deep Neural Networks for automation?
    How far can we push

    View Slide

  97. View Slide

  98. Thank you!
    [email protected]
    dhum.by/github
    home-assistant.io
    rhasspy.readthedocs.io

    View Slide

  99. tts
    sonantic, …
    ● multi-turn conversation
    ● more general knowledge

    View Slide