Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect is an open source IDE plugin for voice coding and a novel approach to building bots that allows for users to define custom commands on-the-fly. Unlike traditional chatbots, it does not pretend to be an omniscient virtual assistant but rather a reconfigurable voice programming system that empowers users to create their own commands and actions dynamically, without rebuilding or restarting the application. We offer an experience report describing the tool itself, illustrate some example use cases, and reflect on several lessons learned during the tool’s development.

Breandan Considine

May 24, 2023
Tweet

More Decks by Breandan Considine

Other Decks in Programming

Transcript

  1. Idiolect: A Reconfigurable
    Voice Coding Assistant
    Breandan Considine
    Nicholas Albion
    Xujie Si
    BotSE ’23 Workshop

    View full-size slide

  2. Voice coding?
    • Voice is a natural interface for expressing user intent
    • Rubber duck debugging aids human program repair
    • Vocal commands reduces keyboard shortcut clutter
    • Complex actions can be easier to express using voice
    • Natural language and vernacular programming tools
    • Assistance for coders with visuomotor impairments

    View full-size slide

  3. Speech recognition over the last decade
    • Seven years ago, two colleagues and I entered a hackathon
    • IDEA: let’s build a voice user interface for the IntelliJ Platform
    • We used a library called CMUSphinx for speech recognition
    • Language model consisting of phonemes
    • Required defining context-free grammar
    • Word error rate was still very high

    View full-size slide

  4. Speech recognition enters a new era
    • In 2017, voice recognition improving, still no open models
    • A developer from Australia called Nicholas joins the project
    • Nicholas says, “Let’s do another hackathon using AWS Lex!”
    • Speech recognition is now MUCH better
    • Lower word error rate, no configuration
    • Today: FOSS ASR is here 100% usable

    View full-size slide

  5. Anthropomorphic versus pragmatic chatbots
    •Anthropomorphic chatbots are a UX design trope
    •May be a convenient metaphor, but an imprecise one
    •Chatbots are a new kind of programming interface
    •Designers should adopt more pragmatic stance
    •Dialogue vs. autocomplete? Neither.
    •More akin to programming tools

    View full-size slide

  6. Chatbot Affordance Mismatch
    • Impedance mismatch: system capabilities and user intent.
    • System features have poor discoverability: what can I do?
    • Even if capabilities are known, frustration arises when I
    cannot express intent in a way the system understands.
    • Smarter intent recognition is one approach to resolve this
    • But maybe a simpler way is to give the user full access to
    modify the system. Treat users more like programmers.
    • This familiarizes users with the systems capabilities and
    many users even enjoy configuring the system.

    View full-size slide

  7. Reconfigurability alleviates mismatch
    • Reduces discoverability & intent recognition mismatch
    • Needs evolve and users may introduce new phrases
    • Presumably, users already proficient programmers
    • DIY instead of asking chatbot devs for new features
    • Provide a scripting sandbox for the user to manage
    • Similar to PL with some guardrails to make it “nice”

    View full-size slide

  8. Too much configurability can be detrimental
    • Models should be able to read and modify their behavior
    • Fully self-modifying code a footgun: too much flexibility
    • User interface should be relatively consistent to cultivate
    procedural recall by using familiar cues, i.e., priming
    • Ordinary lexicon okay, but inexpressive and inflexible
    • Users should have as much freedom as possible to
    express their intent, but the set of “affordances” or
    available actions should be relatively stable
    • [Context-free] idiolects are a sweet spot $ rm –rf /

    View full-size slide

  9. Idiolects, dialects, sociolects, technolects
    • Programming languages are “technolects”
    • Embedded domain specific languages (eDSLs)
    • What is the equivalent for spoken programs?
    • Nascent research on vernacular programming
    • How could we specify the instructions in a
    more natural, conversational manner?
    • e.g., “Whenever I say, X, you do Y…”

    View full-size slide

  10. Voice coding in the early years: 2016-2022
    • 15,000+ total downloads
    • Initial launch successful
    • Incompatible June 2017
    • Popular in China and US
    • Many GitHub tickets filed
    • Authors depart JetBrains

    View full-size slide

  11. Voice coding in the present: 2023+
    • New UI and ASR engine
    • Rereleased as “Idiolect”
    • Early analytics promising
    • Only supports English
    • No SEO, organic traffic
    • Contributors welcome!

    View full-size slide

  12. Speech
    Silence
    Waiting Listening
    Voice recognition loop (current)
    Noise
    Silence

    View full-size slide

  13. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    Voice recognition loop (current)

    View full-size slide

  14. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “Open settings page” ❌
    Voice recognition loop (current)

    View full-size slide

  15. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    Voice recognition loop (current)
    “Open settings page” ❌

    View full-size slide

  16. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    Voice recognition loop (current)
    “Open settings page” ❌
    “Show IDE settings” ❌

    View full-size slide

  17. Voice recognition loop (current) Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “Open settings menu” ✅
    “Open settings page” ❌

    View full-size slide

  18. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    Voice recognition loop (future)

    View full-size slide

  19. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    Voice recognition loop (future)

    View full-size slide

  20. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    Open file menu
    Open settings menu

    Open run configuration
    Voice recognition loop (future)

    View full-size slide

  21. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    “the”
    Open file menu
    Open settings menu

    Open run configuration

    Voice recognition loop (future)

    View full-size slide

  22. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    Open file menu
    Open settings menu

    Open run configuration
    Voice recognition loop (future)

    View full-size slide

  23. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    “settings”
    Open settings menu
    Open settings file
    Voice recognition loop (future)

    View full-size slide

  24. Speech
    Silence
    Waiting Listening
    Noise
    Silence
    “open”
    “settings”
    “menu”
    Open settings menu ✅
    Voice recognition loop (future)

    View full-size slide

  25. What we learned building Idiolect
    • Machines not to be feared or idolized, but above all understood
    • Applied machine learning requires “mechanical sympathy”
    ¨:PVEPO§UIBWFUPCFBOFOHJOFFSUPCFCFBSBDJOHESJWFS CVU
    ZPVEPIBWFUPIBWFNFDIBOJDBMTZNQBUIZ© y+BDLJF4UFXBSU
    • Don’t be afraid to hold hands with the AI (metaphorically)
    • Treat your users as intelligent collaborators
    • Align the application with AI capabilities
    • Be curious, resourceful and pragmatic!

    View full-size slide

  26. What coding
    feels like today
    What coding
    could be like

    View full-size slide

  27. •Jin Guo
    •Alexey Kudinkin
    •Yaroslav Lepenkin
    •Hadi Hariri
    •Alejandro Salinas Medina
    Acknowledgements

    View full-size slide