The Speech Accessibility Project: Diversifying Speech Recognition

Brief Intro to the Speech Accessibility Project Mark Hasegawa-Johnson April
5, 2024

Access is a human right “Everyone has the right of
equal access to public service in his country.” - Universal Declaration of Human Rights Article 21 “Everyone has the right to work.” - Universal Declaration of Human Rights Article 23 “Higher education shall be equally accessible to all on the basis of merit.” - Universal Declaration of Human Rights Article 26

A brief biased history of speech technology for people with
motor disorders • 1990: Dragon Dictate allows control of a PC using only speech, “and found acceptance among the disabled” (Maher, 2023) • 1993: “Speech input for dysarthric users,” (Hwa-Ping Chang, JASA 94:1782) • 1985-2018: Stephen Hawking uses the ”Perfect Paul” speech synthesizer • 2018: ”A phenomenological look at the life hacking-enabled practices of individuals with mobility and dexterity impairments,” Jerry Robinson

Published speech recognition error rates on public large-vocabulary corpora, 2008-2023
1 2 4 8 16 32 64 128 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Word Error Rates (%), 2008-2023 Non-Dysarthric Word Error Rate Dysarthric Word Error Rate

Why does ASR struggle so much with dysarthria? • In
2023, word error rates for dysarthric speech have dropped to 18%. • Error rates for non-dysarthric speech have dropped to 1.4%. • Dysarthric speech is harder, even for human listeners. • …but there is another reason that dysarthric ASR lags non-dysarthric ASR…

Sizes of public large-vocabulary speech recognition corpora, 2008-2023 4 8
16 32 64 128 256 512 1024 2048 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Data Size (# Hours), 2008-2023 Non-Dysarthric Hours Dysarthric Hours 4 16 64 256 1024 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Data Size (# Speakers), 2008- 2023 Non-Dysarthric Speakers Dysarthric Speakers

The Speech Accessibility Project • Speech technology is now good
enough to be useful for people speaking General American English without disabilities. • To make it useful for people with disabilities, we need about 1000 hours of transcribed speech (~1.2 million sentences). • The Speech Accessibility Project seeks to collect, curate, and distribute such a corpus.

Recruitment and status • Recruitment strategy: 5 etiologies, 400 people
each • Parkinson’s, ALS, Down Syndrome, Stroke, Cerebral Palsy • Other etiologies will be screened if they volunteer • Consent process and mentoring • Potential participants (746 as of 2023/10/10) meet a speech-language therapist online, who decides whether their speech is sufficiently affected to meet the needs of the project (283 as of 2023/10/10) • Preliminary results • Word error rates per talker vary considerably • Fine-tune to talkers with disability ⟹ Error rate drops for talkers with disability (by about a factor of two)

The Speech Accessibility Project: Diversifying ...

The Speech Accessibility Project: Diversifying Speech Recognition

3Play Media

More Decks by 3Play Media

Other Decks in Technology

Featured

Transcript

Brief Intro to the Speech Accessibility Project Mark Hasegawa-Johnson April

Access is a human right “Everyone has the right of

A brief biased history of speech technology for people with

Published speech recognition error rates on public large-vocabulary corpora, 2008-2023

Why does ASR struggle so much with dysarthria? • In

Sizes of public large-vocabulary speech recognition corpora, 2008-2023 4 8

The Speech Accessibility Project • Speech technology is now good

Recruitment and status • Recruitment strategy: 5 etiologies, 400 people