$30 off During Our Annual Pro Sale. View Details »

Increasing the comparability of linguistic data

Increasing the comparability of linguistic data

Talk held at the Division of Linguistic and Multilingual Studies (Nanyang Technological University, Singapore)

Johann-Mattis List

February 24, 2017
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Increasing the Comparability of Linguistic Data
    Johann-Mattis List
    Department of Linguistic and Cultural Evolution
    Max Planck Institute for the Science of Human History
    Jena
    2017/02/24
    1 / 37

    View Slide

  2. Prolog
    Prolog
    2 / 37

    View Slide

  3. Prolog
    Juggling
    3 / 37

    View Slide

  4. Prolog
    Juggling
    3 / 37

    View Slide

  5. Prolog
    Juggling
    3 / 37

    View Slide

  6. Prolog
    Moral of the Story
    Restricting our perspective by modeling and for-
    malizing the phenomena we are dealing with
    may actually open our eyes for details we had
    disregarded before.
    4 / 37

    View Slide

  7. Problems
    Problems
    5 / 37

    View Slide

  8. Problems
    General Data Problem in Linguistics
    6 / 37

    View Slide

  9. Problems
    General Data Problem in Linguistics
    Linguists face very complex problems in their
    research. But they tend to overemphasize the
    complexity of their problems. As a result, they
    refuse to handle even the things which could be
    easily handled. Instead of “Yes, we can!”, lin-
    guists tend to say “Can we really?”
    6 / 37

    View Slide

  10. Problems
    General Data Problem in Linguistics
    → application of methods
    → representation of results
    → replication of analyses
    6 / 37

    View Slide

  11. Problems Application
    Application
    7 / 37

    View Slide

  12. Problems Application
    Application
    7 / 37

    View Slide

  13. Problems Representation
    Representation
    8 / 37

    View Slide

  14. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    German "Frucht" in Pfei�er (1993, also at http://dwds.de)
    8 / 37

    View Slide

  15. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    German "Frucht" in Pfei�er (1993,
    also at http://dwds.de
    8 / 37

    View Slide

  16. Problems Representation
    Representation
    Frucht, ferner fruchten, befruchten, Befruchtung,
    fruchtbar, fruchtig
    Frucht f. ‘der Fortpflanzung der eigenen Art dienendes
    Produkt einer Pflanze’, auch ‘ungeborenes Lebewesen’,
    übertragen ‘Ertrag’, ahd. fruht (9. Jh.), mhd. vruht,
    asächs. fruht, mnd. mnl. nl. vrucht beruhen auf einer
    frühen Entlehnung von gleichbed. lat. frūctus,
    abgeleitet vom Verb lat. fruī (frūctus sum) ‘genießen,
    Nutzen ziehen’ (verwandt mit brauchen, s. d.). Das
    Deminutiv Früchtchen hat die spezielle Bedeutung
    [...]
    inherited from
    borrowed from
    derived from
    PIE *bhreu◌◌̯
    Hg◌

    ̑
    -
    “to use”
    PIE *bhruHg◌

    ̑
    -ié-
    “to use” (present tense)
    PGM *ƀrūkan-
    “to use”
    OHG brūhhan
    “to use”
    G brauchen
    “to use”
    G Brauch
    “custom”
    OHG fruht
    “profit, fruit”
    G frugal
    “modest (food)”
    Fr fruit
    “profit,fruit”
    Fr frugal
    “modest (food)”
    Lt fruor, fruī
    “I enjoy”
    Lt frūctus
    “profit”
    Lt frux
    “fruit, grain”
    Lt frugalis
    “bring profit”
    Adapted from an Illustration by Hans Geisler (University Düsseldorf)
    German "Frucht" in Pfei�er (1993,
    also at http://dwds.de
    8 / 37

    View Slide

  17. Problems Representation
    Representation
    Entry for PIE *kʷetware in Tower of Babel (http://starling.rinet.ru) 8 / 37

    View Slide

  18. Problems Representation
    Representation
    Insufficiencies of Data Representation
    data in “textual form” (impossible to search it efficiently)
    no standardized phonetic representations
    no standardized glosses for meanings
    no standardized names or abbreviations for language
    and dialect names
    no standardized representation of sound
    correspondences
    no standardized assignment of cognate sets and
    borrowings
    ...
    9 / 37

    View Slide

  19. Problems Replication
    Replication
    10 / 37

    View Slide

  20. Problems Replication
    Replication
    Gloss Blust Pawley Distance
    “day” *qaco *qaco 0
    “to spit” *qanusi *qanusi 0
    “person” *taumataq *tamwata 3
    “to vomit” *mumutaq *mumuta 1
    “name” *ŋajan *qajan 1
    “snake” *mwata *mwata 0
    “man” *mwa ruqane *taumwaqane 5
    “four” *pani *pat 2
    “one” *sakai *tasa 3
    ... ... ... ...
    Disagreement between experts on PO reconstructions (Bouchard-Côté et al. 2014) 10 / 37

    View Slide

  21. Problems Replication
    Replication
    Reproducability Problems in Historical Linguistics
    Scholars disagree on many points in historical linguistics, be
    it the number of laryngeals, the position of Baltic and Slavic,
    or whether a given word was borrowed or not.
    We know well that no two etymological dictionaries for the
    same language or language families are completely identi-
    cal. Unfortunately, we lack a rigorous check to which de-
    gree experts actually agree or disagree in their judgments.
    We also lack methods for evaluation which would help us to
    show to which degree a given hypothesis (a reconstruction,
    a family tree, or an etymology) corresponds with our linguis-
    tic data.
    10 / 37

    View Slide

  22. Increasing Comparability
    Increasing Comparability
    11 / 37

    View Slide

  23. Increasing Comparability Formats
    Cross-Linguistic Data Formats
    12 / 37

    View Slide

  24. Increasing Comparability Formats
    Cross-Linguistic Data Formats
    Key Aspects of CLDF
    use CSV (comma-separated values) as a basic format
    for tabular data
    use JSON (key-value data-format) for meta-data
    define how standard columns of the data
    (languages/doculects, concepts, transcriptions,
    grammatical features, etc.) should be treated
    provide an API that checks the consistency of datasets
    provide sample datasets that illustrate the data format
    provide applications which handle CLDF (for example,
    in automatic analyses)
    13 / 37

    View Slide

  25. Increasing Comparability Standards
    Standards for Word Lists and Lexical Data
    14 / 37

    View Slide

  26. Increasing Comparability Standards
    Standards for Word Lists and Lexical Data
    14 / 37

    View Slide

  27. Increasing Comparability Standards
    Word Lists and Lexical Data
    Word Lists in CLDF
    define each row as a word
    indicate the language in which this word is spoken in
    one column
    indicate the meaning in another column
    provide information on the form in additional columns
    15 / 37

    View Slide

  28. Increasing Comparability Standards
    Standards for Word Lists and Lexical Data
    ID DOCULECT CONCEPT ...
    1 German Woldemort valdəmar ...
    2 English Woldemort wɔldəmɔrt ...
    3 Chinese Woldemort fu⁵¹ti⁵¹mɔ³⁵ ...
    4 Russian Woldemort vladimir ...
    ... ... ... ... ...
    10 German Harry haralt ...
    11 English Harry hæri ...
    12 Russian Harry gali ...
    ... ... ... ... ...
    TRANSCRIPTION
    16 / 37

    View Slide

  29. Increasing Comparability Applications
    Applications
    The CLDF Python API (Forkel et al. 2016) simplifies the
    handling and the testing of data sets in CLDF format. With
    help of the API and its extensions, scholars can test whether
    the data conforms to the format. With additional software
    which is mostly already available, one can further easily draw
    statistics from the data. Last not least, tools which handle
    CLDF data can be used for automatic analysis (LingPy, List
    and Forkel 2016, http://lingpy.org) or for manual cu-
    ration (EDICTOR, List 2017, http://edictor.digling.org).
    17 / 37

    View Slide

  30. Increasing Comparability Applications
    Applications
    18 / 37

    View Slide

  31. Examples
    Examples
    19 / 37

    View Slide

  32. Examples Concepticon
    Concepticon
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232)
    20 / 37

    View Slide

  33. Examples Concepticon
    Concepticon
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232)
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    20 / 37

    View Slide

  34. Examples Concepticon
    Concepticon
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323)
    20 / 37

    View Slide

  35. Examples Concepticon
    Concepticon
    Concept labels for “GREASE” in 22 different concept lists (see List et al. 2015,
    online at http://concepticon.clld.org)
    Concept List # Items Concept Label Concept ID
    Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323)
    Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323)
    Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323)
    Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323)
    Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323)
    Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323)
    Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323)
    Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323)
    Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323)
    Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323)
    Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323)
    Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323)
    Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323)
    Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323)
    TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323)
    Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323)
    Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323)
    Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323)
    Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323)
    Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323)
    20 / 37

    View Slide

  36. Examples Concepticon
    Concepticon
    Concepticon (List et al. 2016)
    link concept labels in published concept lists
    (questionnaires) to concept sets
    link concept sets to meta-data
    define relations between concept sets
    never link one concept in a given list to more than one
    concept set (guarantees consistency)
    provide an API to check the consistency of the data
    and to query the data
    provide a web-interface to browse through the data
    21 / 37

    View Slide

  37. Examples Concepticon
    Concepticon
    STONE
    EGG
    FOOT
    THE STONE
    THE EGG
    THE LEG
    STONE
    (FRUIT)
    EGG
    (CHICKEN)
    FOOT/LEG
    STONE
    EGG
    LEG
    FOOT
    http://concepticon.clld.org
    22 / 37

    View Slide

  38. Examples Concepticon
    Concepticon
    CONCEPT
    SET
    CONCEPT
    CONCEPT
    LIST
    CONCEPT
    LABEL
    COMPILER
    SOURCE
    NOTE
    CONCEPT
    LABEL
    CONCEPT
    LABEL
    CONCEPT
    LABEL
    CONCEPT
    SET
    CONCEPT
    SET
    22 / 37

    View Slide

  39. Examples Concepticon
    Concepticon
    http://concepticon.clld.org
    23 / 37

    View Slide

  40. Examples CLICS
    CLICS
    Semantic change plays a crucial role in language change.
    Although most linguists assume that it proceeds according
    to certain general patterns, we currently lack the empirical
    basis to pursue the question in depth. Normally, semantic
    change proceeds by cumulation and reduction.
    24 / 37

    View Slide

  41. Examples CLICS
    CLICS
    German “head”
    Kopf .
    k ɔ p͡f
    Pre-German “head”
    *kop –
    k ɔ p “vessel”
    Proto-
    Germanic
    *kuppa-
    k u pː a “vessel”
    POLYSEMY
    PHASE
    FORM MEANING
    MONOSEMY
    PHASE
    MONOSEMY
    PHASE
    CUMULATION
    REDUCTION
    24 / 37

    View Slide

  42. Examples CLICS
    CLICS
    “cup”
    CONTEST
    TROPHY
    [kʌp] CUP
    English polysemy structure for cup
    24 / 37

    View Slide

  43. Examples CLICS
    CLICS
    “head, cup”
    CUP
    HEAD
    [kɔp] TOP
    Dutch polysemy structure for kop
    24 / 37

    View Slide

  44. Examples CLICS
    CLICS
    “head”
    HEAD
    TOP
    [kɔp͡f] CHIEF
    German polysemy structure for Kopf
    24 / 37

    View Slide

  45. Examples CLICS
    CLICS
    Key Concept Russian German ...
    1.1 world mir, svet Welt ...
    1.21 earth, land zemlja Erde, Land ...
    1.212 ground, soil počva Erde, Boden ...
    1.420 tree derevo Baum ...
    1.430 wood derevo Wald ...
    ... ... ... ... ...
    25 / 37

    View Slide

  46. Examples CLICS
    CLICS
    CLICS: Crosslinguistic Colexifications
    - 221 Languages
    - 64 language families
    - 1280 concepts
    - 301,498 words
    - 45,667 polysemies (colexifications)
    - 16,239 different links between concepts
    - http://clics.lingpy.org
    25 / 37

    View Slide

  47. Examples CLICS
    CLICS
    684
    678
    871
    1043
    6
    30
    129
    196
    1243
    128
    869
    853
    650 344
    1103
    150
    185
    627
    232
    709
    1035
    1206
    177
    97
    311
    496
    606
    137
    207
    444
    840
    1077
    325
    222
    1063
    1138
    1204
    1258
    559
    723
    495
    766
    914
    38
    1101
    652
    865
    891
    872
    633
    291
    980
    700 144
    410
    430
    1025
    406
    464
    787
    622
    131
    242
    918
    275
    1159
    99
    1174
    671 1038
    786
    705
    641
    760
    1259
    356
    391
    197
    10
    214
    299
    63
    191
    619
    644
    792
    1205
    897 67
    1231
    213
    226
    747
    681
    399
    841
    439
    773
    123
    800
    16
    1067
    1227
    696
    417
    550
    68
    76
    108
    360
    1244
    339
    500
    81
    867
    79
    1097
    98
    96
    833
    771
    715
    455
    380
    1268
    1186
    1046
    39
    252
    1228
    66
    23
    1112
    133
    676
    336
    739 1150
    1071
    986
    485
    112
    372
    1109
    830
    721
    1053
    1057
    601
    573
    556
    527
    1248
    614
    488
    908
    499
    1002
    309
    442
    814
    1193
    569
    458 258
    563
    653
    682 774
    70
    1151
    948
    801
    1082
    243
    47
    71
    83
    153
    1265
    934
    85
    1215
    1199
    523
    581
    422
    21
    358
    1261
    111
    354
    219
    759
    15
    890
    261
    1222
    141
    158
    74
    806
    1031
    845
    770
    850
    903
    1224
    419
    754
    433
    798
    188
    1256
    613
    528
    208
    539
    323
    981
    132
    1055
    1001
    790
    804
    844
    1118
    907
    640 446
    815
    923
    498
    201
    1184
    578
    566
    427
    532
    452
    151
    750
    598
    1094
    345
    735
    777
    978
    599
    492
    390
    286
    1107
    742
    1015
    1202
    1210
    1257 1275
    859
    988
    69
    752
    596
    290
    126
    110
    950
    922
    1047
    741
    253
    347
    385
    620
    966
    221
    431 3
    224
    1194
    999
    953
    1029
    852
    301
    389
    318
    530
    1048
    1032 175
    701 544
    1119
    241
    94
    745
    835
    1270
    62
    107
    159
    20
    767
    512
    331
    248
    549
    1013
    946
    974
    1022 1100
    477
    302
    233
    1168
    1003
    1211
    570
    307 40
    945
    1269
    784
    546
    437
    901
    350
    238
    305
    1191
    482
    1012
    977
    906
    783
    524
    117
    457
    603
    836
    1181
    880
    229 124
    216
    1113
    1074
    72
    586
    647
    447
    2
    113
    1179
    7 1006
    665
    397
    502
    610 1274
    707
    327
    659
    667
    824
    917
    985
    1089
    346
    1229
    101
    542
    1042
    727
    782
    733
    967
    462
    592
    468
    1106
    440
    478 308
    577
    698
    776
    75
    1155
    51
    145
    517
    359
    938
    1157
    1160
    1183
    947
    1102
    1135
    1252
    343
    608
    537
    103
    634
    251
    383 506
    25
    829
    396
    686
    679
    574
    516
    42
    250
    379
    809
    602
    660
    780
    765
    697
    856
    899
    594
    1008
    393
    179
    114
    1140 11
    100
    1209
    618
    600
    192
    1277
    896
    1142
    1278
    762 421
    713
    182
    521
    861
    672
    297
    1116
    1190
    1192
    140
    1212
    46
    493
    1187
    157
    1225
    212
    403
    519
    616
    173
    413
    912
    1110
    84
    756
    793
    636
    118
    889
    692
    998
    366
    711
    1045
    61
    240
    1263
    199
    648
    832
    289
    522
    368
    1091
    931
    982
    949
    400
    119
    388 811
    53 59
    1069
    708
    952
    545
    763
    1238
    184
    825
    377
    1242
    1233
    262
    635
    269
    1062
    1061
    1073
    933
    17
    1247
    352
    64
    384
    50
    632 736
    1246
    822
    781 758 1
    939
    595
    778
    105
    860
    1049
    1066
    1072
    995
    503 370
    919
    1149
    1127
    1128
    972
    1126
    245
    921
    973
    675
    587
    1235
    960
    928 926
    1143
    548
    1250
    86
    1021
    32
    1068
    719
    965
    259
    1070
    863
    638
    303
    324
    873
    249
    892
    976 1007
    722
    36
    459
    293
    165
    209
    557
    1245
    788 862
    651
    900
    31
    483
    236
    935 1052
    115
    294 680
    831
    44
    453
    206
    971
    1273
    170
    753
    256
    1148 200
    450
    382
    1240
    561
    615
    317
    572
    725 870
    438
    139
    1011
    646
    1117
    392
    45
    276 264 704
    1080
    174
    1050
    808
    1197
    508
    576
    225
    562
    471
    1217
    333
    1014
    593
    92
    1034
    611
    1171 312
    802
    1253
    29
    902
    244
    582
    466
    668
    878
    341
    432
    1163
    625
    904
    164
    467 1195
    1232
    796
    828
    281
    629
    349
    1166
    411
    369
    387
    1208
    394
    415
    1000 58
    1098
    148
    287
    1223
    818
    263
    220
    838
    876
    313
    260
    65
    1165
    5 355
    106
    1172
    490
    718
    171
    1139
    163
    785
    881
    887
    1169
    319
    585
    553
    894
    306
    314
    1041
    1009
    799
    674
    848
    1201
    1004
    689
    1085
    1218 1145 1170
    228
    911
    279
    73 104
    690
    1254
    402
    340
    169
    693
    868
    893
    1018
    78
    1092
    194
    555
    198
    834
    1249
    997
    932
    237
    1176 666
    956
    624
    1262
    541
    520
    795
    866
    702
    4
    734
    1095
    1180
    728
    964
    1079 271
    842
    1241
    1056
    154
    751 353
    905
    1136
    504
    909
    910
    1133
    362
    583
    670
    1124 381
    1216
    215
    178
    571
    470
    142
    376
    1154
    172
    296
    533
    364
    963
    152
    797 1213
    803
    1051
    738
    426
    1036
    1153
    637
    823
    915
    428
    1075
    560
    547
    1137
    35
    882
    89
    511
    1122
    805
    494
    1130
    1188
    1086
    1236
    669
    588
    930
    703
    942
    18
    655
    335
    155
    710
    1156
    1028
    465
    147
    183
    414
    1221
    273
    166
    1054
    278
    55
    460
    812 1090
    810
    180
    768
    143
    156
    404
    367
    1182
    231
    288
    136
    456
    82
    529
    970
    1016
    729
    395 187
    604
    408
    330
    1064
    34
    1267
    847
    726
    543
    677
    642
    940
    645
    958
    683 695
    864
    1058 605
    1084
    451
    443
    699
    1167
    959
    925
    1198
    227
    886
    628
    1178
    337
    991
    813
    657
    1185
    1039
    769
    1081
    484
    712
    1189
    944
    1207
    322
    33
    685
    424 80
    270
    937
    1177
    283
    1237
    816
    130
    161
    189
    77
    300
    1026
    463 1104
    326
    589 60
    983
    474
    1093
    744
    748
    554 292
    41
    267
    984
    373
    1214
    957
    1024 969
    507 37
    874
    1030
    630
    579
    962
    535
    706
    688
    122
    497
    1060
    1083
    1027 102
    510 405
    1134
    658
    617
    936
    929
    363
    1175 361
    536
    534
    1219
    181
    386
    884
    418
    558 8
    479
    979
    551
    505
    316
    298
    26
    315
    761
    202
    1144
    176
    473 348 134
    639
    663
    717
    885
    924
    149
    49
    1078
    1040
    57
    167
    764
    1173
    673
    280
    1152
    277
    1272
    1065
    272
    827
    531
    607
    1123
    257
    996
    436 9
    826
    234
    1096
    875
    525
    304
    1108
    475
    1132
    714
    846
    540
    716
    1005
    1105
    357
    1162
    694
    920 743
    28
    994
    1200
    168
    1266
    420
    515
    568
    755
    895
    218
    916
    730
    807 210
    375
    854
    1010
    879
    1125
    268
    1129
    1114
    1255
    1158
    1279
    487
    486
    398
    597
    661
    135 565
    621 193
    321
    1230
    513
    654
    265
    612
    737
    855
    211
    1196
    246
    1264
    584
    338
    749
    1271
    434
    121
    423
    509
    839
    1147
    656
    230
    239
    489
    14
    469
    22
    1044
    351
    448
    282
    329
    961
    254
    989
    371
    284
    223
    843
    821
    24
    1023
    643
    819
    285
    514
    746
    757
    791
    138
    186
    849
    93 951 127
    877
    1088
    518
    1164
    1260
    501
    54
    190
    95
    43 205
    1276
    116
    146 662
    217
    461
    883
    204
    1033
    310
    472
    12
    412
    332
    817
    649
    794
    1037
    943 927
    481
    968
    425
    109 195
    857
    1121
    564
    687
    664
    724
    87
    1120
    88
    449
    429
    255
    987
    992
    1111
    591
    575
    491
    720
    851
    328
    941
    990 1019
    993
    1087
    955
    580
    1226
    975
    1099
    732
    235 779
    365 1234
    441
    609 247
    334 91
    1251
    1131
    913
    691
    52
    274
    1017
    435
    90
    407
    480
    1239
    13
    623
    0
    266
    626
    295
    954
    1059
    552
    898
    858
    772 526
    1115
    48
    1161
    125
    590
    454
    1020
    1141
    203
    740
    1146
    342
    820
    1220
    56
    320
    416
    27
    401
    476
    19
    120
    1203
    445 789
    775
    888
    567
    378
    1076
    160
    162
    409
    731
    631
    374
    538
    837
    25 / 37

    View Slide

  48. Examples CLICS
    CLICS
    Concept "money" is part of a cluster with the central concept "fishscale" with a total of 10 nodes. Hover over
    forms for each link. Click on the forms to check their sources. Click HERE to export the current network.
    ty: Line weights: Coloring: Family
    silver
    leather
    fishscale
    bark
    coin
    fur
    snail
    skin, hide
    money
    shell
    49 links for "silver" and "money":
    Language Family Form
    1. Ignaciano Arawakan ne
    2. Aymara, Central Aymaran ḳulʸḳi
    3. Tsafiki Barbacoan kaˈla
    4. Seselwa Creole French Creole larzan
    5. Miao, White Hmong-Mien nyiaj
    6. Breton Indo-European arhant
    7. French Indo-European argent
    8. Gaelic, Irish Indo-European airgead
    9. Welsh Indo-European arian
    10. Cofán Isolate koriΦĩʔdi
    25 / 37

    View Slide

  49. Examples CLICS
    CLICS
    Concept "wheel" is part of a cluster with the central concept "leg" with a total of 11 nodes. Hover over the e
    each link. Click on the forms to check their sources. Click HERE to export the current network.
    ity: Line weights: Coloring: Geolocation
    sphere, ball
    round
    footprint
    foot
    calf of leg
    circle
    thigh
    wheel
    leg
    hip
    buttocks
    6 links for "foot" and "wheel":
    Language Family Form
    1. Cofán Isolate c̷ɨʔtʰe
    2. Puinave Isolate sim
    3. Yaminahua Panoan taɨ
    4. Wayampi Tupi pɨ
    5. Pumé Unclassified taɔ
    6. Ninam Yanomam mãhuk
    25 / 37

    View Slide

  50. Examples CLICS
    CLICS
    http://clics.lingpy.org
    26 / 37

    View Slide

  51. Examples LingPy and EDICTOR
    LingPy and EDICTOR
    LingPy
    http://lingpy.org
    EDICTOR
    http://tsv.lingpy.org
    27 / 37

    View Slide

  52. Examples LingPy and EDICTOR
    LingPy
    LingPy (List and Forkel 2016)
    Python library for quantitative tasks in historical
    linguistics
    automatic phonetic alignment
    automatic cognate detection
    automatic handling of segmentation
    automatic search for colexifications
    28 / 37

    View Slide

  53. Examples LingPy and EDICTOR
    LingPy
    http://lingpy.org
    29 / 37

    View Slide

  54. Examples LingPy and EDICTOR
    EDICTOR
    EDICTOR (List 2017)
    web-based tool for data curation and analysis
    alignment editor
    cognate set editor
    morpheme structure annotation
    30 / 37

    View Slide

  55. Examples LingPy and EDICTOR
    EDICTOR
    http://edictor.digling.org
    31 / 37

    View Slide

  56. Examples Cross-Linguistic Phonetic Alphabet
    Cross-Linguistic Phonetic Alphabet
    The use of the standards recommended by the IPA are
    widely varying in linguistics. Experts on language families
    often have their own traditions, humans necessarily com-
    mit errors when transcribing data, technical confusions arise
    from the usage of lookalike symbols which do not share the
    same code point, and scholars interpret the IPA differently.
    Furthermore, the IPA does not offer recommendations for all
    aspects of transcription: morphological annotation, for ex-
    ample is not included and varies greatly among scholars.
    32 / 37

    View Slide

  57. Examples Cross-Linguistic Phonetic Alphabet
    Cross-Linguisic Phonetic Alphabet
    CLPA (List and Forkel, in prep.)
    define standards for phonetic representation
    provide meta-data for standardized sounds (feature
    matrices, etc.)
    provide an API that allows to query the data and check
    the consistency of transcriptions with regard to CLPA
    provide solutions for scholars to convert their data to
    CLPA
    develop standards for phonotactic and morphological
    annotation
    33 / 37

    View Slide

  58. Examples Cross-Linguistic Phonetic Alphabet
    Cross-Linguistic Phonetic Alphabet
    http://glottobank.org/clpa/clpa.html
    34 / 37

    View Slide

  59. Outlook
    Outlook
    Outlook
    35 / 37

    View Slide

  60. Outlook
    36 / 37

    View Slide

  61. Outlook
    36 / 37

    View Slide

  62. Outlook
    36 / 37

    View Slide

  63. Outlook
    Thank you for listening!
    37 / 37

    View Slide