$30 off During Our Annual Pro Sale. View Details »

A data flywheel for biblical machine translation

Chris Lim
September 14, 2021

A data flywheel for biblical machine translation

Why do Christian communities need to curate their own data sets for translation? How can we empower them to do it at scale? In this talk Chris Lim shares his experience creating http://spf.io, a platform that makes churches accessible in any language using AI. Hear how http://spf.io is creating a data flywheel to accelerate Bible translation and power applications of biblically literate AI.

Talk: https://youtu.be/AztY79i_iy8

Chris Lim

September 14, 2021
Tweet

More Decks by Chris Lim

Other Decks in Technology

Transcript

  1. View Slide

  2. български
    Türkçe
    ᥀Ꮳ᜔
    हिन् दी
    አማርኛ
    য ࡥุᇏ໓

    Português
    h Français
    ский
    าไทย
    ລາວ
    یسراف
    Kiswahili
    Tiếng Việt
    తె
    లుగు
    বাংলা Bahasa Indon
    Español
    תירבע
    audio live streaming, real time captioning,


    automatic translation and more
    learn more at www.spf.io
    a platform to make your


    church accessible in any language

    View Slide

  3. Data from: https://www.wycli
    ff
    e.net/wp-content/uploads/2020/10/Tall-Infographic_2020_EN.pdf
    Americas

    120
    Europe

    60
    Asia

    836
    Paci
    fi
    c

    401
    Africa

    597
    Potential and expressed need in languages with
    no translation programs and no Scripture

    View Slide

  4. View Slide

  5. Tenets
    • Keep humans in the loop

    • Give people power over their data

    • Make people’s data as useful as possible

    View Slide

  6. Chris Lim | 9/14/21
    A data flywheel
    for biblical machine translation
    Data
    Utility
    Usage

    View Slide

  7. Why do Christian communities
    need to curate their own data sets
    for translation?

    View Slide

  8. Automatic translation drift due to changing training data
    ibadah sholat
    Source: https://aiandfaith.org/empowering-faith-communities-with-ai-translation-tools/

    View Slide

  9. How can we empower Christian
    communities to curate and apply
    their own data at scale?

    View Slide

  10. A data flywheel
    for biblical machine translation
    Data
    Utility
    Usage

    View Slide

  11. View Slide

  12. View Slide

  13. spf.io integrates with platforms like:

    View Slide

  14. View Slide

  15. Data
    Utility
    Usage
    Live Speech
    Scripts / Documents
    Slides / Images
    Video
    Audio

    View Slide

  16. Projected corpus size over time
    Aligned sentence pairs
    0
    40000
    80000
    120000
    160000
    January March May July September November
    This graph illustrates the estimated number of aligned sentence pairs generated by ten churches translating weekly sermons throughout 2021 into a
    low resource language. It assumes a sermon is three hundred sentences and that the churches are translating into the same language. After a year, a
    156,000 sentence dataset would be created, which is su
    ff
    i
    cient to begin training machine translation models.
    Read more about “community-sourced translation”

    https://www.spf.io/wp-content/uploads/2019/01/Multilingual-Accessibility-for-the-21st-Century-Church.pdf

    View Slide

  17. Data
    Utility
    Usage
    Utility:


    Increasing what people can
    do with their data and the use
    cases they can apply it to

    View Slide

  18. Translation Portal

    View Slide

  19. Translation Portal
    Multilingual Zoom Calls

    View Slide

  20. Translation Portal
    Multilingual Zoom Calls
    Hybrid Events

    View Slide

  21. Data
    Utility
    Usage
    Data opportunities:


    • Data sharing / Open data

    • Public datasets, baseline models, apis

    • Existing scripts, slides, documents,
    recordings, video, calls

    • New data types

    • Content creation

    • Tapping into existing streams of
    content and data

    View Slide

  22. Data
    Utility
    Usage
    Utility opportunities:


    • New algorithms

    • More integrations

    • App localization / low code / no code

    • More interactions (chat bots, hybrid
    modes, etc.)

    • More modalities

    • More humans in the loop

    • More data creation/curation tools

    • Custom ASR / MT for more domains

    View Slide

  23. Data
    Utility
    Usage
    Usage opportunities:


    • New distribution

    • Use it for your content

    • Use it for your meetings

    • Use it for your events

    • Accelerate your projects

    • What APIs would you want?

    • Deeper partnerships?

    View Slide

  24. Data
    Utility
    Usage
    Connect with me:
    [email protected]

    spf.io mailing list: eepurl.com/c8ktI5

    TheoTech podcast: anchor.fm/theotech

    TheoTech forum: cdx.theotech.org/

    Project Pentecost: projectpentecost.com

    Twitter: @meritandgrace

    View Slide

  25. View Slide