Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect is an open source IDE plugin for voice coding and a novel approach to building bots that allows for users to define custom commands on-the-fly. Unlike traditional chatbots, it does not pretend to be an omniscient virtual assistant but rather a reconfigurable voice programming system that empowers users to create their own commands and actions dynamically, without rebuilding or restarting the application. We offer an experience report describing the tool itself, illustrate some example use cases, and reflect on several lessons learned during the tool’s development.

Breandan Considine

May 24, 2023
Tweet

More Decks by Breandan Considine

Other Decks in Programming

Transcript

  1. Voice coding? • Voice is a natural interface for expressing

    user intent • Rubber duck debugging aids human program repair • Vocal commands reduces keyboard shortcut clutter • Complex actions can be easier to express using voice • Natural language and vernacular programming tools • Assistance for coders with visuomotor impairments
  2. Speech recognition over the last decade • Seven years ago,

    two colleagues and I entered a hackathon • IDEA: let’s build a voice user interface for the IntelliJ Platform • We used a library called CMUSphinx for speech recognition • Language model consisting of phonemes • Required defining context-free grammar • Word error rate was still very high
  3. Speech recognition enters a new era • In 2017, voice

    recognition improving, still no open models • A developer from Australia called Nicholas joins the project • Nicholas says, “Let’s do another hackathon using AWS Lex!” • Speech recognition is now MUCH better • Lower word error rate, no configuration • Today: FOSS ASR is here 100% usable
  4. Anthropomorphic versus pragmatic chatbots •Anthropomorphic chatbots are a UX design

    trope •May be a convenient metaphor, but an imprecise one •Chatbots are a new kind of programming interface •Designers should adopt more pragmatic stance •Dialogue vs. autocomplete? Neither. •More akin to programming tools
  5. Chatbot Affordance Mismatch • Impedance mismatch: system capabilities and user

    intent. • System features have poor discoverability: what can I do? • Even if capabilities are known, frustration arises when I cannot express intent in a way the system understands. • Smarter intent recognition is one approach to resolve this • But maybe a simpler way is to give the user full access to modify the system. Treat users more like programmers. • This familiarizes users with the systems capabilities and many users even enjoy configuring the system.
  6. Reconfigurability alleviates mismatch • Reduces discoverability & intent recognition mismatch

    • Needs evolve and users may introduce new phrases • Presumably, users already proficient programmers • DIY instead of asking chatbot devs for new features • Provide a scripting sandbox for the user to manage • Similar to PL with some guardrails to make it “nice”
  7. Too much configurability can be detrimental • Models should be

    able to read and modify their behavior • Fully self-modifying code a footgun: too much flexibility • User interface should be relatively consistent to cultivate procedural recall by using familiar cues, i.e., priming • Ordinary lexicon okay, but inexpressive and inflexible • Users should have as much freedom as possible to express their intent, but the set of “affordances” or available actions should be relatively stable • [Context-free] idiolects are a sweet spot $ rm –rf /
  8. Idiolects, dialects, sociolects, technolects • Programming languages are “technolects” •

    Embedded domain specific languages (eDSLs) • What is the equivalent for spoken programs? • Nascent research on vernacular programming • How could we specify the instructions in a more natural, conversational manner? • e.g., “Whenever I say, X, you do Y…”
  9. Voice coding in the early years: 2016-2022 • 15,000+ total

    downloads • Initial launch successful • Incompatible June 2017 • Popular in China and US • Many GitHub tickets filed • Authors depart JetBrains
  10. Voice coding in the present: 2023+ • New UI and

    ASR engine • Rereleased as “Idiolect” • Early analytics promising • Only supports English • No SEO, organic traffic • Contributors welcome!
  11. Speech Silence Waiting Listening Noise Silence Voice recognition loop (current)

    “Open settings page” ❌ “Show IDE settings” ❌
  12. Voice recognition loop (current) Speech Silence Waiting Listening Noise Silence

    “Open settings menu” ✅ “Open settings page” ❌ …
  13. Speech Silence Waiting Listening Noise Silence “open” Open file menu

    Open settings menu … Open run configuration Voice recognition loop (future)
  14. Speech Silence Waiting Listening Noise Silence “open” “the” Open file

    menu Open settings menu … Open run configuration ❌ Voice recognition loop (future)
  15. Speech Silence Waiting Listening Noise Silence “open” Open file menu

    Open settings menu … Open run configuration Voice recognition loop (future)
  16. Speech Silence Waiting Listening Noise Silence “open” “settings” Open settings

    menu Open settings file Voice recognition loop (future)
  17. What we learned building Idiolect • Machines not to be

    feared or idolized, but above all understood • Applied machine learning requires “mechanical sympathy” ¨:PVEPO§UIBWFUPCFBOFOHJOFFSUPCFCFBSBDJOHESJWFS CVU ZPVEPIBWFUPIBWFNFDIBOJDBMTZNQBUIZ© y+BDLJF4UFXBSU • Don’t be afraid to hold hands with the AI (metaphorically) • Treat your users as intelligent collaborators • Align the application with AI capabilities • Be curious, resourceful and pragmatic!