Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect: A Reconfigurable Voice Coding Assistant Breandan Considine Nicholas Albion
Xujie Si BotSE ’23 Workshop

Voice coding? • Voice is a natural interface for expressing
user intent • Rubber duck debugging aids human program repair • Vocal commands reduces keyboard shortcut clutter • Complex actions can be easier to express using voice • Natural language and vernacular programming tools • Assistance for coders with visuomotor impairments

Speech recognition over the last decade • Seven years ago,
two colleagues and I entered a hackathon • IDEA: let’s build a voice user interface for the IntelliJ Platform • We used a library called CMUSphinx for speech recognition • Language model consisting of phonemes • Required defining context-free grammar • Word error rate was still very high

Speech recognition enters a new era • In 2017, voice
recognition improving, still no open models • A developer from Australia called Nicholas joins the project • Nicholas says, “Let’s do another hackathon using AWS Lex!” • Speech recognition is now MUCH better • Lower word error rate, no configuration • Today: FOSS ASR is here 100% usable

Anthropomorphic versus pragmatic chatbots •Anthropomorphic chatbots are a UX design
trope •May be a convenient metaphor, but an imprecise one •Chatbots are a new kind of programming interface •Designers should adopt more pragmatic stance •Dialogue vs. autocomplete? Neither. •More akin to programming tools

Chatbot Affordance Mismatch • Impedance mismatch: system capabilities and user
intent. • System features have poor discoverability: what can I do? • Even if capabilities are known, frustration arises when I cannot express intent in a way the system understands. • Smarter intent recognition is one approach to resolve this • But maybe a simpler way is to give the user full access to modify the system. Treat users more like programmers. • This familiarizes users with the systems capabilities and many users even enjoy configuring the system.

Reconfigurability alleviates mismatch • Reduces discoverability & intent recognition mismatch
• Needs evolve and users may introduce new phrases • Presumably, users already proficient programmers • DIY instead of asking chatbot devs for new features • Provide a scripting sandbox for the user to manage • Similar to PL with some guardrails to make it “nice”

Too much configurability can be detrimental • Models should be
able to read and modify their behavior • Fully self-modifying code a footgun: too much flexibility • User interface should be relatively consistent to cultivate procedural recall by using familiar cues, i.e., priming • Ordinary lexicon okay, but inexpressive and inflexible • Users should have as much freedom as possible to express their intent, but the set of “affordances” or available actions should be relatively stable • [Context-free] idiolects are a sweet spot $ rm –rf /

Idiolects, dialects, sociolects, technolects • Programming languages are “technolects” •
Embedded domain specific languages (eDSLs) • What is the equivalent for spoken programs? • Nascent research on vernacular programming • How could we specify the instructions in a more natural, conversational manner? • e.g., “Whenever I say, X, you do Y…”

Voice coding in the early years: 2016-2022 • 15,000+ total
downloads • Initial launch successful • Incompatible June 2017 • Popular in China and US • Many GitHub tickets filed • Authors depart JetBrains

Voice coding in the present: 2023+ • New UI and
ASR engine • Rereleased as “Idiolect” • Early analytics promising • Only supports English • No SEO, organic traffic • Contributors welcome!

Speech Silence Waiting Listening Voice recognition loop (current) Noise Silence

Speech Silence Waiting Listening Noise Silence Voice recognition loop (current)

Speech Silence Waiting Listening Noise Silence “Open settings page” ❌
Voice recognition loop (current)

“Open settings page” ❌ …

“Open settings page” ❌ “Show IDE settings” ❌

Voice recognition loop (current) Speech Silence Waiting Listening Noise Silence
“Open settings menu” ✅ “Open settings page” ❌ …

Speech Silence Waiting Listening Noise Silence Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” Voice recognition loop
(future)

Speech Silence Waiting Listening Noise Silence “open” Open file menu
Open settings menu … Open run configuration Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” “the” Open file
menu Open settings menu … Open run configuration ❌ Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” Open file menu
Open settings menu … Open run configuration Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” “settings” Open settings
menu Open settings file Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” “settings” “menu” Open
settings menu ✅ Voice recognition loop (future)

What we learned building Idiolect • Machines not to be
feared or idolized, but above all understood • Applied machine learning requires “mechanical sympathy” ¨:PVEPO§UIBWFUPCFBOFOHJOFFSUPCFCFBSBDJOHESJWFS CVU ZPVEPIBWFUPIBWFNFDIBOJDBMTZNQBUIZ© y+BDLJF4UFXBSU • Don’t be afraid to hold hands with the AI (metaphorically) • Treat your users as intelligent collaborators • Align the application with AI capabilities • Be curious, resourceful and pragmatic!

What coding feels like today What coding could be like

•Jin Guo •Alexey Kudinkin •Yaroslav Lepenkin •Hadi Hariri •Alejandro Salinas
Medina Acknowledgements

Idiolect: A Reconfigurable Voice Coding Assisant

Idiolect: A Reconfigurable Voice Coding Assisant

Breandan Considine

More Decks by Breandan Considine

Other Decks in Programming

Featured

Transcript

Idiolect: A Reconfigurable Voice Coding Assistant Breandan Considine Nicholas Albion

Voice coding? • Voice is a natural interface for expressing

Speech recognition over the last decade • Seven years ago,

Speech recognition enters a new era • In 2017, voice

Anthropomorphic versus pragmatic chatbots •Anthropomorphic chatbots are a UX design

Chatbot Affordance Mismatch • Impedance mismatch: system capabilities and user

Reconfigurability alleviates mismatch • Reduces discoverability & intent recognition mismatch

Too much configurability can be detrimental • Models should be

Idiolects, dialects, sociolects, technolects • Programming languages are “technolects” •

Voice coding in the early years: 2016-2022 • 15,000+ total

Voice coding in the present: 2023+ • New UI and

Speech Silence Waiting Listening Voice recognition loop (current) Noise Silence

Speech Silence Waiting Listening Noise Silence Voice recognition loop (current)

Speech Silence Waiting Listening Noise Silence “Open settings page” ❌

Speech Silence Waiting Listening Noise Silence Voice recognition loop (current)

Speech Silence Waiting Listening Noise Silence Voice recognition loop (current)

Voice recognition loop (current) Speech Silence Waiting Listening Noise Silence

Speech Silence Waiting Listening Noise Silence Voice recognition loop (future)

Speech Silence Waiting Listening Noise Silence “open” Voice recognition loop

Speech Silence Waiting Listening Noise Silence “open” Open file menu

Speech Silence Waiting Listening Noise Silence “open” “the” Open file

Speech Silence Waiting Listening Noise Silence “open” Open file menu

Speech Silence Waiting Listening Noise Silence “open” “settings” Open settings

Speech Silence Waiting Listening Noise Silence “open” “settings” “menu” Open

What we learned building Idiolect • Machines not to be

What coding feels like today What coding could be like

•Jin Guo •Alexey Kudinkin •Yaroslav Lepenkin •Hadi Hariri •Alejandro Salinas