Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Speech Accessibility Project: Diversifying ...

The Speech Accessibility Project: Diversifying Speech Recognition

Join us for a transformative session presented by the leaders driving UIUC’s (University of Illinois Urbana- Champaign) Speech Accessibility Project. Explore how diverse data is revolutionizing AI’s ability to understand varied speech patterns. Gain insights into the crucial role of inclusive datasets in training machine learning models for more accurate and inclusive speech recognition.

3Play Media

May 20, 2024
Tweet

More Decks by 3Play Media

Other Decks in Technology

Transcript

  1. Access is a human right “Everyone has the right of

    equal access to public service in his country.” - Universal Declaration of Human Rights Article 21 “Everyone has the right to work.” - Universal Declaration of Human Rights Article 23 “Higher education shall be equally accessible to all on the basis of merit.” - Universal Declaration of Human Rights Article 26
  2. A brief biased history of speech technology for people with

    motor disorders • 1990: Dragon Dictate allows control of a PC using only speech, “and found acceptance among the disabled” (Maher, 2023) • 1993: “Speech input for dysarthric users,” (Hwa-Ping Chang, JASA 94:1782) • 1985-2018: Stephen Hawking uses the ”Perfect Paul” speech synthesizer • 2018: ”A phenomenological look at the life hacking-enabled practices of individuals with mobility and dexterity impairments,” Jerry Robinson
  3. Published speech recognition error rates on public large-vocabulary corpora, 2008-2023

    1 2 4 8 16 32 64 128 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Word Error Rates (%), 2008-2023 Non-Dysarthric Word Error Rate Dysarthric Word Error Rate
  4. Why does ASR struggle so much with dysarthria? • In

    2023, word error rates for dysarthric speech have dropped to 18%. • Error rates for non-dysarthric speech have dropped to 1.4%. • Dysarthric speech is harder, even for human listeners. • …but there is another reason that dysarthric ASR lags non-dysarthric ASR…
  5. Sizes of public large-vocabulary speech recognition corpora, 2008-2023 4 8

    16 32 64 128 256 512 1024 2048 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Data Size (# Hours), 2008-2023 Non-Dysarthric Hours Dysarthric Hours 4 16 64 256 1024 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 ASR Data Size (# Speakers), 2008- 2023 Non-Dysarthric Speakers Dysarthric Speakers
  6. The Speech Accessibility Project • Speech technology is now good

    enough to be useful for people speaking General American English without disabilities. • To make it useful for people with disabilities, we need about 1000 hours of transcribed speech (~1.2 million sentences). • The Speech Accessibility Project seeks to collect, curate, and distribute such a corpus.
  7. Recruitment and status • Recruitment strategy: 5 etiologies, 400 people

    each • Parkinson’s, ALS, Down Syndrome, Stroke, Cerebral Palsy • Other etiologies will be screened if they volunteer • Consent process and mentoring • Potential participants (746 as of 2023/10/10) meet a speech-language therapist online, who decides whether their speech is sufficiently affected to meet the needs of the project (283 as of 2023/10/10) • Preliminary results • Word error rates per talker vary considerably • Fine-tune to talkers with disability ⟹ Error rate drops for talkers with disability (by about a factor of two)