Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Novas abordagens computacionais em linguística histórica

Novas abordagens computacionais em linguística histórica

Presentation in Portuguese used in the first day of the Workshop "New Computational Approaches in Historical Linguistics" as part of the ABRALIN50 conference in Maceió (Brazil).

The basic techniques of historical linguistics, the “comparative method”, have not changed significantly in the last centuries: intensive comparison of linguistic material, identification of regular correspondences, and reconstruction of the history of languages and their families. Among its limits are an essentially manual execution (even when aided by computers), limited collaborative networks (especially in terms of data sharing), and the fact that some crucial tasks (such as the identification of cognates) are sometimes based on non-formalized knowledge. Computational proposals to overcome these limits have been advanced for more than 50 years, with limited acceptance by the community (see the justified reception of the M. Swadesh glottochronology and the proposals on Nostratic); however, during the last decade we have witnessed a widespread and progressive acceptance of a new category of methods, influenced by Bayesian statistics and practices from biology, in response to challenges imposed by certain language families (such as Sino-Tibetan or sign languages) and by the demands of transdisciplinarity and open access. Starting from a proposal to complement the traditional approach, in which results must always be interpretable and whose purpose is to assist and not replace researchers, this workshop will begin by reviewing the comparative method and exposing the principles of the new approaches, illustrating their limits and presenting the criticisms already advanced. Participants will learn to perform phonetic alignments, identify patterns of sound correspondence, detect cognates, constructo phylogenies, and investigate total and partial colexifications. Emphasis will be placed on the appropriate preparation of linguistic databases for collaboration and processing. Although it is not necessary for the participants to reproduce the demonstrations, all computer interaction can be reproduced individually in an open web interface (http://edictor.digling.org/). The workshop is offered with the aim of disseminating the new methods, arousing a new interest in historical. This workshop wil be taught in Portuguese.

Tiago Tresoldi

May 02, 2019
Tweet

More Decks by Tiago Tresoldi

Other Decks in Science

Transcript

  1. Novas abordagens computacionais
    em linguística histórica
    (dia 1)
    Tiago Tresoldi
    Computer-Assisted Language Comparison ERC Group
    (CALC)
    Max-Planck-Institut für Menschheitsgeschichte
    (MPI-SHH / Jena, Alemanha)
    Maceió, 2019-05-02/03/04

    View Slide

  2. 2
    Sobre essa oficina

    Objetivos

    Divulgar e iniciar capacitação sobre os novos métodos e
    gestão de dados linguísticos

    Iniciar colaborações científicas, especialmente quanto a
    línguas nativas da América do Sul

    Formato

    Primeiro dia: Expositivo dialogado

    Segundo e terceiro dia: Prático

    View Slide

  3. 3
    Linguística histórica
    Por Minna Sundberg

    View Slide

  4. 4
    NeighbourNet para as línguas Dene–Yeniseian (Sicoli and Holton, 2014)

    View Slide

  5. 5
    Árvore radial para as línguas do mundo, a partir do ASJP (Jäger and Wichmann, 2014)

    View Slide

  6. 6
    Bouckaert et al. (2012)

    View Slide

  7. 7
    Linguística histórica

    Estudo científico da mudança e da evolução
    linguística ao longo do tempo

    Inclui e relaciona-se, entre outros, com

    linguística comparativa

    dialetologia e a sociologia

    fonologia e a psicolinguística

    filologia e a filosofia da linguagem

    Muitas vezes coincide com o método comparativo
    ou mesmo com os estudos indo-europeus

    “fetiche da proto-forma”

    View Slide

  8. 8
    “Virada quantitativa”

    Evidências quantitativas são usadas desde os
    princípios da disciplina

    Os primeiros trabalhos propriamente estatísticos
    foram publicados Sapir (1916), Kroeber e Chretien
    (1937) e Ross (1950)

    Métodos computacionais iniciam com as
    contestadas abordagens de lexicoestatística e
    glotocronologia na década de ‘50

    Morris Swadesh e suas listas

    Joseph Greenberg

    Sergei Starostin e a Escola de Moscou

    View Slide

  9. 9
    Cladística e Filogenética - I

    Projecto CPHL (Computational Phylogenetics in
    Historical Linguistics) da Rice University no início
    da década de ‘90, liderado por Donald Ringe

    Em meados da década de ‘90, Ringe forma um outro
    grupo na Pennsylvania University

    Sucesso de mídia com o trabalho de Gray &
    Atkinson (2003), publicado na Nature

    Pesada reação da linguística histórica tradicional

    Contudo, análises filogenéticas têm sido publicadas com
    cada vez mais frequência

    View Slide

  10. 10

    View Slide

  11. 11

    View Slide

  12. 12
    Dados – I

    Métodos vêm e vão, (bons) dados são pra sempre

    “FAIR” data (Wilkinson 2016)

    Findable

    Accessible

    Interoperable

    Reusable

    Basicamente, um modelo de dados relacionais,
    armazenado em formato textual, com catálogos
    externos de referência

    A atenção aos preceitos precisa ser garantida
    automaticamente

    View Slide

  13. 13
    Dados - II

    View Slide

  14. 14
    WALS
    http://wals.info

    View Slide

  15. 15
    Glottolog
    http://glottolog.org

    View Slide

  16. 16
    Concepticon - I
    https://concepticon.clld.org

    View Slide

  17. 17
    Concepticon - II

    View Slide

  18. 18
    CLICS (I)

    View Slide

  19. 19
    CLICS (II)

    View Slide

  20. 20
    Bancos de dados multilinguísticos

    Lexibank (em preparação)

    Sound Comparisons (em preparação)

    IE-CoR (em preparação)

    Diachronic Atlas of Comparative Linguistics (DiACL)

    View Slide

  21. 21
    Lexibank

    View Slide

  22. 22
    Edictor
    http://edictor.digling.org

    View Slide

  23. 23
    Para amanhã

    Explorar, se divertir, e criticar os bancos de dados
    apresentados

    Veremos alinhamentos, detecção de cognatos e aspectos básicos
    de filogenética

    Estudar os dois bancos de dados para essa oficina (Indo-
    Europeu e Tucanoano)

    Pensar na organização dos próprios dados

    Abrir uma conta no CodeOcean e executar a nossa
    “cápsula” (opcional)

    View Slide

  24. 24
    Referências e Bibliografia Essencial

    Anttila, Raimo. Historical and Comparative Linguistics. 2nd edition. Philadelphia: John
    Benjamins, 1989.

    Bassetto, Bruno Fregni. Elementos de filologia românica. São Paulo: EDUSP, 2001.

    Beekes, Robert S. P. Comparative Indo-European Linguistics. Amsterdam: John Benjamins,
    1995.

    Bowern, Claire; Evans, Bethwyn. The Routledge Handbook of Historical Linguistics. London:
    Routledge, 2014.

    Campbell, Lyle. Historical Linguistics – An Introduction. 3rd edition. Cambridge, Massachusetts:
    the MIT Press, 2013.

    Hoenigswald, Henry M. Language change and linguistic reconstruction. Chicago: University of
    Chicago Press, 1960.

    List, Johan-Mattis; Walworth, Mary; Greenhill, Simon; Tresoldi, Tiago; Forkel, Robert. “Sequence
    comparison in computational historical linguistics”. Journal of Language Evolution. 3.2. 130-
    144.

    Trask, Robert L. (Ed.) Dictionary of Historical and Comparative Linguistics. Chicago: Fitzroy
    Dearborn, 2001.

    View Slide

  25. View Slide

  26. 26

    View Slide