Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sound Change Mechanization

Tiago Tresoldi
February 18, 2020

Sound Change Mechanization

Tiago Tresoldi

February 18, 2020
Tweet

More Decks by Tiago Tresoldi

Other Decks in Science

Transcript

  1. Sound Change
    Sound Change
    Mechanization
    Mechanization
    Some work-in-progress notes
    Some work-in-progress notes
    Tiago Tresoldi
    DLCE department meeting, MPI-SHH, Jena,
    18/02/2020
    2

    View Slide

  2. What I will talk about
    What I will talk about
    History of sound change mechanization
    Formalized notation
    Two different tasks
    The AlteruPhono package
    Current status and plans
    What it would (hopefully) allow
    3 . 1

    View Slide

  3. Caveats
    Caveats
    Segments as units composed of bundles of
    (distinctive) features
    Not (necessarily) phonemes, but a “useful
    descriptive fiction”
    Sound changes alone don’t explain all of history
    3 . 2

    View Slide

  4. Background
    Background
    Sound change mechanization proposed in the 50s
    (Gleason 1959), started in the 60s and 70s (Kay
    1964, Hewson 1977)
    4 . 1

    View Slide

  5. Diachrony replaced by synchrony
    Informed edit distance (Damerau, 1975)
    Phonological distance (Wieling et Nerbonne,
    2015)
    Likelihood of correspondence (Bouchard-Côté et
    al., 2013)
    Sequence alignment (List, 2014)
    Ancestral State Reconstruction (Jäger et List,
    2018)
    4 . 2

    View Slide

  6. (Milky Way from Pakistan’s Karakoram Range, Anne
    4 . 3

    View Slide

  7. ( )
    bitacoradegalileo.com
    4 . 4

    View Slide

  8. Good and preferable solution for tasks such as
    cognate detection
    Observation before inference
    But algorithms connecting French là and Hawaiian
    laila (“there”), while missing German da and
    English there, unsettle linguists (List et al., 2017)
    4 . 5

    View Slide

  9. “All /e/ become /i/ when preceded by a
    consonant”
    /tade/ turns into /tadi/
    /pepe/ turns into /pipi/
    /emu/ stays as /emu/
    4 . 6

    View Slide

  10. XKCD 1831, “Here to help”
    4 . 7

    View Slide

  11. Problem #1: Formal notation
    Problem #1: Formal notation
    The pattern of A > B / C (“A turns into B in
    context C”) is but a blueprint:
    Footnotes and comparative prose
    Conventions (∅, #, etc.)
    Ad hoc solutions (shifts, alternatives, etc.)
    Implicitness (e.g. coronal plosives)
    IPA issues (and “e” is not necessarily /e/!)
    5 . 1

    View Slide

  12. Proto-Omotic to North Omotic
    e → i / #l_{P,C[+voiced]}
    e(:) → i(:) / #C[+sibilant]_{d,n,r}
    {u,a,i} → ∅ / _% (when stressed)
    Classical Arabic to Hadhrami Arabic
    dˤ q → ðˤ ɡ
    5 . 2

    View Slide

  13. Problem #2: Typological research
    Problem #2: Typological research
    Insufficient empirical probabilities
    Database of sound changes
    Growing body of supporting research (Blevins
    2004, Kümmel 2007, Bybee 2015, Hruschka et
    al. 2015)
    Case of Index Diachronica
    6

    View Slide

  14. Task #1: Forwards Reconstruction
    Task #1: Forwards Reconstruction
    Smith (1969), PIE → Russian
    /aŋgʷʰi/(“worm, snake”) -> уж /uʂ/ (“adder”)
    Mignani (1971), P.-Romance → Franco-Provençal
    Burton-Hunter (1976), Latin → Old French
    Eastlack (1977), Latin → Spanish
    Bátori (1982), Proto-Uralic → Finnish/Hungarian
    7 . 1

    View Slide

  15. Hartman (2003), a de facto programming language
    Generative phonology
    Remarkably powerful
    Notation very different from the usual
    7 . 2

    View Slide

  16. View Slide

  17. $1: A_Coloring «/aw/, /aj/ > /ow/, /ej/»
    A: +syll (*) +low (*) -cons (*+1) -syll (*+1) +high (*+1)
    1: -low (*)
    2: back (*) = back (*+1)
    3: round (*) = round (*+1)
    END: A_Coloring
    1
    2
    3
    4
    5
    6
    7 . 3

    View Slide

  18. Task #2: Backwards
    Task #2: Backwards
    Reconstruction
    Reconstruction
    Hewson (1977) on Algonquian
    each lexeme handled on average 21 potential
    proto-forms
    Lowe & Mazaudon (1994), Oakes (2000), Kondrak
    et al. (2007)
    7 . 4

    View Slide

  19. Even a simple rule as b > p applied to /papa/
    yields four alternatives
    /baba/, /bapa/, /paba/, /papa/
    Rosenfelder’s SCA² on Portuguese “distrito” (cf.
    Sims-Williams, 2018)
    distrito
    districtus distriptus (dozen others) diiistericto divivistriviviptus
    7 . 5

    View Slide

  20. AlteruPhono
    AlteruPhono
    Both a Python library and a web tool
    Intended for usage also without installing
    Formalization of notation (database, CLTS)
    PEG grammar
    Forwards and backwards direction
    https://github.com/tresoldi/alteruphono
    8 . 1

    View Slide

  21. Currently
    Currently
    Python library usable by programmers
    800 test rules (“stress tests”), (B)IPA features
    Real data (on-going)
    Proto-Algonquian to Shawnee (48 rules)
    Conversion of Hartman’s LS (1,800 forms)
    Toy dataset of PIE to RP English (10 words)
    */kʷetwṓr/ ➞ /fɔː/ (“four”)
    */h₂ḱowsyónom/ ➞ /hiə/ (“hear”)
    8 . 2

    View Slide

  22. 8 . 3

    View Slide

  23. Example #1, Simple rule
    Example #1, Simple rule
    a ➞ e / _ #
    papa
    pape pape
    *papa *pape
    9 . 1

    View Slide

  24. Example #2, Sound classes
    Example #2, Sound classes
    b ➞ β / V _ V
    ibaba
    iβaβa iβaβa
    *ibaba *iβaba *ibaβa *iβaβa
    9 . 2

    View Slide

  25. Example #3, Back-references
    Example #3, Back-references
    k V ➞ @2 / # _
    kira
    ira ira
    *kira *ira koke
    oke oke
    *koke *oke
    9 . 3

    View Slide

  26. Example #4, Back-references with
    Example #4, Back-references with
    changes and alternatives
    changes and alternatives
    p|k ➞ @1[+voiced] / V _ V
    apakak
    abagak abagak
    *abagak *apagak *abakak *apakak
    9 . 4

    View Slide

  27. Example #5, sets and mappings,
    Example #5, sets and mappings,
    alternatives
    alternatives
    {a,e,u} ➞ {e,i,o} / r _ | _ r
    areru
    eriro eriro
    *eriro *ariro *erero *eriru
    *arero *ariru *ereru *areru
    9 . 5

    View Slide

  28. Multitiers
    Multitiers
    Approach to tiers as extensions to alignments and
    features (List et Chacon, 2015; Tresoldi et al.,
    2018)
    “initial /t/ becomes /n/ if there is a nasal
    consonant anywhere in the word”
    tata ➞ tata
    taɲa ➞ naɲa
    tatatataɲatata ➞ natatataɲatata
    10 . 1

    View Slide

  29. Tier Seg-1 Seg-2 Seg-3 Seg-4
    sound t a t a
    sound t a ɲ a
    10 . 2

    View Slide

  30. Tier Seg-1 Seg-2 Seg-3 Seg-4
    sound t a t a
    nasal_in_word False False False False
    sound t a ɲ a
    nasal_in_word True True True True
    t[nasal_in_word] > n / # _
    10 . 3

    View Slide

  31. Roadmap
    Roadmap
    Get usable library and tool
    Consolidate notation
    Double implementation, JSON
    Feature system agnostic (BIPA default)
    Write paper for review and feedback
    Bootstrapping for other projects on hold
    inference of sound changes from cognates
    catalog
    attenuate homoplasy from sounds as states
    11

    View Slide

  32. References
    References
    BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung (Maschinelle Verifizierung der
    BÁTORI, I., 1982. “Computersimulation in der linguistischen Forschung (Maschinelle Verifizierung der
    rekonstruierten Lautformen anhand uralischen Materials)”,
    rekonstruierten Lautformen anhand uralischen Materials)”, Ural-altaische Jahrbücher
    Ural-altaische Jahrbücher, Neue Folge, 2. 1–18.
    , Neue Folge, 2. 1–18.
    BLEVINS, J., 2004.
    BLEVINS, J., 2004. Evolutionary phonology: The emergence of sound patterns
    Evolutionary phonology: The emergence of sound patterns. Cambridge University Press.
    . Cambridge University Press.
    BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient
    BOUCHARD-CÔTÉ, A.; HALL, D.; GRIFFITHS, T. L. & KLEIN, D., 2013. “Automated reconstruction of ancient
    languages using probabilistic models of sound change”,
    languages using probabilistic models of sound change”, Proceedings of the National Academy of Sciences
    Proceedings of the National Academy of Sciences
    110(11). 4224-4229.
    110(11). 4224-4229.
    BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”,
    BURTON–HUNTER, S., 1976. “Romance etymology: A computerized model”, Computers and the Humanities
    Computers and the Humanities
    10. 217–220.
    10. 217–220.
    BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of
    BYBEE, J., 2015. “Articulatory processing and frequency of sound change”, *The Oxford Handbook of
    Historical Phonology", 467-484.
    Historical Phonology", 467-484.
    DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”,
    DAMERAU, F. J., 1975. “Mechanization of cognate recognition in comparative linguistics”, Linguistics: An
    Linguistics: An
    International Journal
    International Journal 13(148). 5–29.
    13(148). 5–29.
    EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”,
    EASTLACK , C. L., 1977. “Iberochange: A program to simulate systematic sound change in Ibero-Romance”,
    Computers and the Humanities
    Computers and the Humanities 11. 81–88.
    11. 81–88.
    GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”,
    GLEASON , H. A. Jr., 1959. “Counting and calculating for historical reconstruction”, Anthropological
    Anthropological
    Linguistics
    Linguistics 1(2). 22–32.
    1(2). 22–32.
    12 . 1

    View Slide

  33. HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular historical sound change”, in Leonel
    HARTMAN, L., 2003. “Phono (Version 4.0): Software for modeling regular historical sound change”, in Leonel
    Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.),
    Ruiz Miyares, Celia E. Alvarez Moreno & María Rosa Alvarez Silva (eds.), Actas: VIII Simposio Internacional de
    Actas: VIII Simposio Internacional de
    Comunicación Social
    Comunicación Social, Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de
    , Santiago de Cuba, 20–24 de enero del 2003, Volume I. Santiago de Cuba: Centro de
    Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609.
    Lingüística Aplicada, Ministerio Ciencia, Santiago de Cuba. 606–609.
    HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic
    HEWSON, J., 1977. “Reconstructing prehistoric languages on the computer: The triumph of the electronic
    neogrammarian”, in A. Zampolli & N. Calzolari (eds.),
    neogrammarian”, in A. Zampolli & N. Calzolari (eds.), Proceedings of the 5th Conference on Computational
    Proceedings of the 5th Conference on Computational
    Linguistics
    Linguistics, Pisa 1973, Volume I. Florence: Olschki. 263–273.
    , Pisa 1973, Volume I. Florence: Olschki. 263–273.
    HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T.,
    HRUSCHKA, D. J.; BRANFORD, S.; SMITH, E. D.; WILKINS, J.; MEADE, A.; PAGEL, M. & BHATTACHARYA, T.,
    2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”,
    2015. “Detecting Regular Sound Changes in Linguistics as Events of Concerted Evolution”, Current Biology
    Current Biology
    25: 1-9.
    25: 1-9.
    JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological
    JÄGER, G. & LIST, J.-M., 2018. “Using ancestral state reconstruction methods for onomasiological
    reconstruction in multilingual word lists”.
    reconstruction in multilingual word lists”. Language Dynamics and Change
    Language Dynamics and Change 8.1. 22-54.
    8.1. 22-54.
    KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre
    KÜMMEL, M. J., 2007, “Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre
    Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden.
    Konsequenzen für die vergleichende Rekonstruktion”. Reichert, Wiesbaden.
    LIST, J.-M., 2014.
    LIST, J.-M., 2014. Sequence comparison in historical linguistics
    Sequence comparison in historical linguistics. Düsseldorf University Press: Düsseldorf.
    . Düsseldorf University Press: Düsseldorf.
    LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal
    LIST, J.-M. & CHACON, T., 2015. “Towards a Cross-Linguistic Database for Historical Phonology? A proposal
    for a machine readable modeling of phonetic context”, Leiden.
    for a machine readable modeling of phonetic context”, Leiden.
    LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical
    LIST, J.-M.; GREENHILL, S. J. & GRAY, R., 2017). “The potential of automatic word comparison for historical
    linguistics”.
    linguistics”. PLOS ONE
    PLOS ONE 12.1. 1-18.
    12.1. 1-18.
    12 . 2

    View Slide

  34. LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine: A computer implementation of the
    LOWE, J. B. & MAZAUDON, M., 1994. “The reconstruction engine: A computer implementation of the
    comparative method”,
    comparative method”, Computational Linguistics
    Computational Linguistics 20(3). 381–417.
    20(3). 381–417.
    MIGNANI, R., 1971. “Review of Durham & Rogers 1969”,
    MIGNANI, R., 1971. “Review of Durham & Rogers 1969”, Computers and the Humanities
    Computers and the Humanities 5(3). 191.
    5(3). 191.
    OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter
    OAKES, M., 2000. “Computer estimation of vocabulary in a protolanguage from word lists in four daughter
    languages”,
    languages”, Journal of Quantitative Linguistics
    Journal of Quantitative Linguistics 7(3). 233–243.
    7(3). 233–243.
    KAY, M., 1964.
    KAY, M., 1964. The logic of cognate recognition in historical linguistics
    The logic of cognate recognition in historical linguistics. Memorandum RM–4224–PR,
    . Memorandum RM–4224–PR,
    prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation.
    prepared for United States Air Force Project Rand. Santa Monica, CA: The Rand Corporation.
    KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John
    KONDRAK, G.; BECK, D. & DILTS, P., 2007. “Creating a comparative dictionary of Totonac–Tepehua”, in John
    Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.),
    Nerbonne, T. Mark Ellison & Grzegorz Kondrak (eds.), Computing and historical phonology: Proceedings of the
    Computing and historical phonology: Proceedings of the
    Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
    Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology. Prague:
    . Prague:
    Association for Computational Linguistics. 134–141.
    Association for Computational Linguistics. 134–141.
    SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”.
    SIMS‐WILLIAMS, P., 2018. “Mechanising historical phonology”. Transactions of the Philological Society
    Transactions of the Philological Society,,
    116(3), pp. 555-573.
    116(3), pp. 555-573.
    SMITH, R. N., 1969. “A computer simulation of phonological change”,
    SMITH, R. N., 1969. “A computer simulation of phonological change”, ITL: Tijdschrift voor Toegepaste
    ITL: Tijdschrift voor Toegepaste
    Linguistiek
    Linguistiek 5. 82–91.
    5. 82–91.
    TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence
    TRESOLDI, T.; ANDERSON, C. & LIST, J.-M. “Modelling sound change with the help of multi-tiered sequence
    representations”,
    representations”, Poznań Linguistic Meeting
    Poznań Linguistic Meeting, 2018-10-15.
    , 2018-10-15.
    WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”,
    WIELING, M. & NERBONNE, J., 2015. “Advances in dialectometry”, Annual Review of Linguistics
    Annual Review of Linguistics 1. 243–264.
    1. 243–264.
    12 . 3

    View Slide

  35. Thank you!
    Thank you!
    [email protected]
    13

    View Slide