Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A data flywheel for biblical machine translation

Chris Lim
September 14, 2021

A data flywheel for biblical machine translation

Why do Christian communities need to curate their own data sets for translation? How can we empower them to do it at scale? In this talk Chris Lim shares his experience creating http://spf.io, a platform that makes churches accessible in any language using AI. Hear how http://spf.io is creating a data flywheel to accelerate Bible translation and power applications of biblically literate AI.

Talk: https://youtu.be/AztY79i_iy8

Chris Lim

September 14, 2021
Tweet

More Decks by Chris Lim

Other Decks in Technology

Transcript

  1. български Türkçe ᥀Ꮳ᜔ हिन् दी አማርኛ য ࡥุᇏ໓ የ Português

    h Français ский าไทย ລາວ یسراف Kiswahili Tiếng Việt తె లుగు বাংলা Bahasa Indon Español תירבע audio live streaming, real time captioning, automatic translation and more learn more at www.spf.io a platform to make your church accessible in any language
  2. Data from: https://www.wycli ff e.net/wp-content/uploads/2020/10/Tall-Infographic_2020_EN.pdf Americas
 120 Europe
 60 Asia


    836 Paci fi c
 401 Africa
 597 Potential and expressed need in languages with no translation programs and no Scripture
  3. Tenets • Keep humans in the loop • Give people

    power over their data • Make people’s data as useful as possible
  4. Automatic translation drift due to changing training data ibadah sholat

    Source: https://aiandfaith.org/empowering-faith-communities-with-ai-translation-tools/
  5. Projected corpus size over time Aligned sentence pairs 0 40000

    80000 120000 160000 January March May July September November This graph illustrates the estimated number of aligned sentence pairs generated by ten churches translating weekly sermons throughout 2021 into a low resource language. It assumes a sermon is three hundred sentences and that the churches are translating into the same language. After a year, a 156,000 sentence dataset would be created, which is su ff i cient to begin training machine translation models. Read more about “community-sourced translation” https://www.spf.io/wp-content/uploads/2019/01/Multilingual-Accessibility-for-the-21st-Century-Church.pdf
  6. Data Utility Usage Utility: Increasing what people can do with

    their data and the use cases they can apply it to
  7. Data Utility Usage Data opportunities: • Data sharing / Open

    data • Public datasets, baseline models, apis • Existing scripts, slides, documents, recordings, video, calls • New data types • Content creation • Tapping into existing streams of content and data
  8. Data Utility Usage Utility opportunities: • New algorithms • More

    integrations • App localization / low code / no code • More interactions (chat bots, hybrid modes, etc.) • More modalities • More humans in the loop • More data creation/curation tools • Custom ASR / MT for more domains
  9. Data Utility Usage Usage opportunities: • New distribution • Use

    it for your content • Use it for your meetings • Use it for your events • Accelerate your projects • What APIs would you want? • Deeper partnerships?
  10. Data Utility Usage Connect with me: [email protected] spf.io mailing list:

    eepurl.com/c8ktI5 TheoTech podcast: anchor.fm/theotech TheoTech forum: cdx.theotech.org/ Project Pentecost: projectpentecost.com Twitter: @meritandgrace