Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Better models with Saussure: Simulating lexical evolution with semantic shifts

Better models with Saussure: Simulating lexical evolution with semantic shifts

Talk by Gereon Kaiping and Johann-Mattis List, presented at the conference "Phylogenetic methods in historical linguistics" (2017/03/27-30, Eberhard-Karls-Universität, Tübingen)

Johann-Mattis List

March 30, 2017
Tweet

More Decks by Johann-Mattis List

Other Decks in Science

Transcript

  1. Better Models With Saussure
    Simulating Lexical Evolution with Semantic Shifts
    Gereon Kaiping1 Mattis List2
    1Leiden University Centre for Linguistics
    2MPI for the Science of Human History, Jena
    2017-03-30

    View Slide

  2. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    1 Context and Motivation
    2 Our Model
    3 What we can and can’t do
    4 Closing Remarks
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  3. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Context
    Phylogenetic reconstruction has been enjoying a great popularity of
    late.
    Language trees are not only used for genetic subgrouping of
    language families, but also to address
    general linguistic questions (typological universals, ...)
    general anthropological/historical questions (Urheimat, ....)
    Phylogenetic reconstruction was the driving force for the recent
    quantitative turn in historical linguistics, and has has been
    accepted by most scholars in the field
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  4. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Context
    Phylogenetic reconstruction has been enjoying a great popularity of
    late.
    Language trees are not only used for genetic subgrouping of
    language families, but also to address
    general linguistic questions (typological universals, ...)
    general anthropological/historical questions (Urheimat, ....)
    Phylogenetic reconstruction was the driving force for the recent
    quantitative turn in historical linguistics, and has has been
    accepted by most scholars in the field
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  5. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Context
    The Reconstruction Dilemma
    historical linguistics deals with past states and events,
    investigating research objects and processes which are not
    directly observable → even falsification is tricky
    scholars tend to argue in terms of the likelihood of scenarios,
    but we cannot compare our inferences against inferences in
    controlled experiments
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  6. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Context
    despite their popularity, methods for phylogenetic
    reconstruction are rarely tested, neither against gold standards
    (which do not really exist, the closest we have are phylogenies
    in databases like Glottolog [3]), nor against the results of
    simulation studies
    in the rare cases where phylogenetic methods have been
    tested with help of simulation studies [5, 2, 7, 6, 1], they were
    based on very simplistic models of lexical change that assume
    independent gain/loss of words or replacement of items
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  7. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    The usual assumptions
    t
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  8. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    The usual assumptions
    loss
    gain
    t
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  9. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    The usual assumptions
    replacement
    meaning
    t
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  10. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Lexicostatistical Word List Data
    Concept ID German ID English ID Italian ID French
    HAND 1 Hand 1 hand 2 mano 2 main
    BLOOD 3 Blut 3 blood 4 sangue 4 sang
    HEAD 5 Kopf 6 head 7 testa 7 tête
    TOOTH 8 Zahn 8 tooth 8 dente 8 dent
    TO SLEEP 9 schlafen 9 sleep 10 dormir 10 dormir
    TO SAY 11 sagen 11 say 12 dire 12 dire
    … … … … …
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  11. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Lexicostatistical Word List Data
    Concept ID German ID English ID Italian ID French
    HAND 1 Hand 1 hand 2 mano 2 main
    BLOOD 3 Blut 3 blood 4 sangue 4 sang
    HEAD 5 Kopf 6 head 7 testa 7 tête
    TOOTH 8 Zahn 8 tooth 8 dente 8 dent
    TO SLEEP 9 schlafen 9 sleep 10 dormir 10 dormir
    TO SAY 11 sagen 11 say 12 dire 12 dire
    … … … … …
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  12. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Gain-Loss Coding
    Concept Proto-Form German English Italian French
    HAND PGM *xanda-
    HAND LAT mānus
    BLOOD PGM *blođa-
    BLOOD LAT sanguis
    HEAD PGM *kuppa-
    HEAD PGM *xawbda-
    HEAD LAT tēsta
    TOOTH PIE *h3dont-
    TO SLEEP PGM slēpan-
    TO SLEEP LAT dormīre
    TO SAY PGM *sagjan-
    TO SAY LAT dīcere
    … … … … … …
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  13. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Gain-Loss Coding
    Proto-Form German English Italian French
    PGM *xanda-
    LAT mānus
    PGM *blođa-
    LAT sanguis
    PGM *kuppa-
    PGM *xawbda-
    LAT tēsta
    PIE *h3dont-
    PGM slēpan-
    LAT dormīre
    PGM *sagjan-
    LAT dīcere
    … … … … …
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  14. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Motivation
    gain/loss and replacement approaches are not satisfying
    linguistically
    phylogenetic reconstruction is important but insufficiently
    tested
    gold standard (controlled) datasets are not available
    we barely understand the processes underlying lexical change
    → by working on more realistic simulations, we can learn a lot
    about the processes of lexical change and also help to
    evaluate the accuracy of phylogenetic approaches
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  15. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign
    arbre
    form
    "meaning"
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  16. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign
    arbre
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  17. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign
    arbre arbre
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  18. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign
    arbre arbre
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  19. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign: Dynamics
    arbre bois forêt
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  20. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign: Dynamics
    arbre bois forêt
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  21. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Saussure’s model of the linguistic sign: Dynamics
    arbre bois forêt
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  22. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Preliminaries
    A A'
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  23. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Preliminaries
    A A'
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  24. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Preliminaries
    B B'
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  25. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Preliminaries
    B B'
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  26. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Semantic Network
    1
    post, pole
    staff, walking stick
    doorpost, jamb
    tree stump
    mast
    club
    firewood
    root
    tree trunk
    woods, forest
    banana tree
    tree
    wood
    1http://clics.lingpy.org/browse.php?gloss=wood
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  27. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Semantic Network
    CLICS [4]
    database of synchronic lexical associations (“colexifications”),
    currently 221 language varieties
    1280 concepts
    uses network approaches to partition the data into semantic
    fields
    web-application at http://clics.lingpy.org allows for
    quick browsing of the semantic networks
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  28. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Semantic Network
    684
    678
    871
    1043
    6
    30
    129
    196
    1243
    128
    869
    853
    650 344
    1103
    150
    185
    627
    232
    709
    1035
    1206
    177
    97
    311
    496
    606
    137
    207
    444
    840
    1077
    325
    222
    1063
    1138
    1204
    1258
    559
    723
    495
    766
    914
    38
    1101
    652
    865
    891
    872
    633
    291
    980
    700 144
    410
    430
    1025
    406
    464
    787
    622
    131
    242
    918
    275
    1159
    99
    1174
    671 1038
    786
    705
    641
    760
    1259
    356
    391
    197
    10
    214
    299
    63
    191
    619
    644
    792
    1205
    897 67
    1231
    213
    226
    747
    681
    399
    841
    439
    773
    123
    800
    16
    1067
    1227
    696
    417
    550
    68
    76
    108
    360
    1244
    339
    500
    81
    867
    79
    1097
    98
    96
    833
    771
    715
    455
    380
    1268
    1186
    1046
    39
    252
    1228
    66
    23
    1112
    133
    676
    336
    739 1150
    1071
    986
    485
    112
    372
    1109
    830
    721
    1053
    1057
    601
    573
    556
    527
    1248
    614
    488
    908
    499
    1002
    309
    442
    814
    1193
    569
    458 258
    563
    653
    682 774
    70
    1151
    948
    801
    1082
    243
    47
    71
    83
    153
    1265
    934
    85
    1215
    1199
    523
    581
    422
    21
    358
    1261
    111
    354
    219
    759
    15
    890
    261
    1222
    141
    158
    74
    806
    1031
    845
    770
    850
    903
    1224
    419
    754
    433
    798
    188
    1256
    613
    528
    208
    539
    323
    981
    132
    1055
    1001
    790
    804
    844
    1118
    907
    640 446
    815
    923
    498
    201
    1184
    578
    566
    427
    532
    452
    151
    750
    598
    1094
    345
    735
    777
    978
    599
    492
    390
    286
    1107
    742
    1015
    1202
    1210
    1257 1275
    859
    988
    69
    752
    596
    290
    126
    110
    950
    922
    1047
    741
    253
    347
    385
    620
    966
    221
    431 3
    224
    1194
    999
    953
    1029
    852
    301
    389
    318
    530
    1048
    1032 175
    701 544
    1119
    241
    94
    745
    835
    1270
    62
    107
    159
    20
    767
    512
    331
    248
    549
    1013
    946
    974
    1022 1100
    477
    302
    233
    1168
    1003
    1211
    570
    307 40
    945
    1269
    784
    546
    437
    901
    350
    238
    305
    1191
    482
    1012
    977
    906
    783
    524
    117
    457
    603
    836
    1181
    880
    229 124
    216
    1113
    1074
    72
    586
    647
    447
    2
    113
    1179
    7 1006
    665
    397
    502
    610 1274
    707
    327
    659
    667
    824
    917
    985
    1089
    346
    1229
    101
    542
    1042
    727
    782
    733
    967
    462
    592
    468
    1106
    440
    478 308
    577
    698
    776
    75
    1155
    51
    145
    517
    359
    938
    1157
    1160
    1183
    947
    1102
    1135
    1252
    343
    608
    537
    103
    634
    251
    383 506
    25
    829
    396
    686
    679
    574
    516
    42
    250
    379
    809
    602
    660
    780
    765
    697
    856
    899
    594
    1008
    393
    179
    114
    1140 11
    100
    1209
    618
    600
    192
    1277
    896
    1142
    1278
    762 421
    713
    182
    521
    861
    672
    297
    1116
    1190
    1192
    140
    1212
    46
    493
    1187
    157
    1225
    212
    403
    519
    616
    173
    413
    912
    1110
    84
    756
    793
    636
    118
    889
    692
    998
    366
    711
    1045
    61
    240
    1263
    199
    648
    832
    289
    522
    368
    1091
    931
    982
    949
    400
    119
    388 811
    53 59
    1069
    708
    952
    545
    763
    1238
    184
    825
    377
    1242
    1233
    262
    635
    269
    1062
    1061
    1073
    933
    17
    1247
    352
    64
    384
    50
    632 736
    1246
    822
    781 758 1
    939
    595
    778
    105
    860
    1049
    1066
    1072
    995
    503 370
    919
    1149
    1127
    1128
    972
    1126
    245
    921
    973
    675
    587
    1235
    960
    928 926
    1143
    548
    1250
    86
    1021
    32
    1068
    719
    965
    259
    1070
    863
    638
    303
    324
    873
    249
    892
    976 1007
    722
    36
    459
    293
    165
    209
    557
    1245
    788 862
    651
    900
    31
    483
    236
    935 1052
    115
    294 680
    831
    44
    453
    206
    971
    1273
    170
    753
    256
    1148 200
    450
    382
    1240
    561
    615
    317
    572
    725 870
    438
    139
    1011
    646
    1117
    392
    45
    276 264 704
    1080
    174
    1050
    808
    1197
    508
    576
    225
    562
    471
    1217
    333
    1014
    593
    92
    1034
    611
    1171 312
    802
    1253
    29
    902
    244
    582
    466
    668
    878
    341
    432
    1163
    625
    904
    164
    467 1195
    1232
    796
    828
    281
    629
    349
    1166
    411
    369
    387
    1208
    394
    415
    1000 58
    1098
    148
    287
    1223
    818
    263
    220
    838
    876
    313
    260
    65
    1165
    5 355
    106
    1172
    490
    718
    171
    1139
    163
    785
    881
    887
    1169
    319
    585
    553
    894
    306
    314
    1041
    1009
    799
    674
    848
    1201
    1004
    689
    1085
    1218 1145 1170
    228
    911
    279
    73 104
    690
    1254
    402
    340
    169
    693
    868
    893
    1018
    78
    1092
    194
    555
    198
    834
    1249
    997
    932
    237
    1176 666
    956
    624
    1262
    541
    520
    795
    866
    702
    4
    734
    1095
    1180
    728
    964
    1079 271
    842
    1241
    1056
    154
    751 353
    905
    1136
    504
    909
    910
    1133
    362
    583
    670
    1124 381
    1216
    215
    178
    571
    470
    142
    376
    1154
    172
    296
    533
    364
    963
    152
    797 1213
    803
    1051
    738
    426
    1036
    1153
    637
    823
    915
    428
    1075
    560
    547
    1137
    35
    882
    89
    511
    1122
    805
    494
    1130
    1188
    1086
    1236
    669
    588
    930
    703
    942
    18
    655
    335
    155
    710
    1156
    1028
    465
    147
    183
    414
    1221
    273
    166
    1054
    278
    55
    460
    812 1090
    810
    180
    768
    143
    156
    404
    367
    1182
    231
    288
    136
    456
    82
    529
    970
    1016
    729
    395 187
    604
    408
    330
    1064
    34
    1267
    847
    726
    543
    677
    642
    940
    645
    958
    683 695
    864
    1058 605
    1084
    451
    443
    699
    1167
    959
    925
    1198
    227
    886
    628
    1178
    337
    991
    813
    657
    1185
    1039
    769
    1081
    484
    712
    1189
    944
    1207
    322
    33
    685
    424 80
    270
    937
    1177
    283
    1237
    816
    130
    161
    189
    77
    300
    1026
    463 1104
    326
    589 60
    983
    474
    1093
    744
    748
    554 292
    41
    267
    984
    373
    1214
    957
    1024 969
    507 37
    874
    1030
    630
    579
    962
    535
    706
    688
    122
    497
    1060
    1083
    1027 102
    510 405
    1134
    658
    617
    936
    929
    363
    1175 361
    536
    534
    1219
    181
    386
    884
    418
    558 8
    479
    979
    551
    505
    316
    298
    26
    315
    761
    202
    1144
    176
    473 348 134
    639
    663
    717
    885
    924
    149
    49
    1078
    1040
    57
    167
    764
    1173
    673
    280
    1152
    277
    1272
    1065
    272
    827
    531
    607
    1123
    257
    996
    436 9
    826
    234
    1096
    875
    525
    304
    1108
    475
    1132
    714
    846
    540
    716
    1005
    1105
    357
    1162
    694
    920 743
    28
    994
    1200
    168
    1266
    420
    515
    568
    755
    895
    218
    916
    730
    807 210
    375
    854
    1010
    879
    1125
    268
    1129
    1114
    1255
    1158
    1279
    487
    486
    398
    597
    661
    135 565
    621 193
    321
    1230
    513
    654
    265
    612
    737
    855
    211
    1196
    246
    1264
    584
    338
    749
    1271
    434
    121
    423
    509
    839
    1147
    656
    230
    239
    489
    14
    469
    22
    1044
    351
    448
    282
    329
    961
    254
    989
    371
    284
    223
    843
    821
    24
    1023
    643
    819
    285
    514
    746
    757
    791
    138
    186
    849
    93 951 127
    877
    1088
    518
    1164
    1260
    501
    54
    190
    95
    43 205
    1276
    116
    146 662
    217
    461
    883
    204
    1033
    310
    472
    12
    412
    332
    817
    649
    794
    1037
    943 927
    481
    968
    425
    109 195
    857
    1121
    564
    687
    664
    724
    87
    1120
    88
    449
    429
    255
    987
    992
    1111
    591
    575
    491
    720
    851
    328
    941
    990 1019
    993
    1087
    955
    580
    1226
    975
    1099
    732
    235 779
    365 1234
    441
    609 247
    334 91
    1251
    1131
    913
    691
    52
    274
    1017
    435
    90
    407
    480
    1239
    13
    623
    0
    266
    626
    295
    954
    1059
    552
    898
    858
    772 526
    1115
    48
    1161
    125
    590
    454
    1020
    1141
    203
    740
    1146
    342
    820
    1220
    56
    320
    416
    27
    401
    476
    19
    120
    1203
    445 789
    775
    888
    567
    378
    1076
    160
    162
    409
    731
    631
    374
    538
    837
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  29. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Design Goals
    A more realistic model of lexical evolution
    is based on a bipartite graph structure of word forms and
    word meanings
    builds on a dynamic representation of reference potentials
    instead of Saussure’s inseparable dichotomy of the linguistic
    sign
    feeds on (ideally, weighted and directed) networks of semantic
    associations to account for the fact that semantic shift and
    lexical replacement follow certain preference laws
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  30. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Design Goals
    A more realistic model of lexical evolution
    is based on a bipartite graph structure of word forms and
    word meanings
    builds on a dynamic representation of reference potentials
    instead of Saussure’s inseparable dichotomy of the linguistic
    sign
    feeds on (ideally, weighted and directed) networks of semantic
    associations to account for the fact that semantic shift and
    lexical replacement follow certain preference laws
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  31. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Design Goals
    A more realistic model of lexical evolution
    is based on a bipartite graph structure of word forms and
    word meanings
    builds on a dynamic representation of reference potentials
    instead of Saussure’s inseparable dichotomy of the linguistic
    sign
    feeds on (ideally, weighted and directed) networks of semantic
    associations to account for the fact that semantic shift and
    lexical replacement follow certain preference laws
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  32. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Design Goals
    A more realistic model of lexical evolution
    is based on a bipartite graph structure of word forms and
    word meanings
    builds on a dynamic representation of reference potentials
    instead of Saussure’s inseparable dichotomy of the linguistic
    sign
    feeds on (ideally, weighted and directed) networks of semantic
    associations to account for the fact that semantic shift and
    lexical replacement follow certain preference laws
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  33. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Questions of Implementation
    How should the model drive the change of edge weights in the
    bipartite graph?
    How to choose the underlying semantic network?
    How do we see whether the model has any chance of realism?
    How can we select realistic parameters for the model?
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  34. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Questions of Implementation
    How should the model drive the change of edge weights in the
    bipartite graph?
    How to choose the underlying semantic network?
    How do we see whether the model has any chance of realism?
    How can we select realistic parameters for the model?
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  35. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Questions of Implementation
    How should the model drive the change of edge weights in the
    bipartite graph?
    How to choose the underlying semantic network?
    How do we see whether the model has any chance of realism?
    How can we select realistic parameters for the model?
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  36. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Questions of Implementation
    How should the model drive the change of edge weights in the
    bipartite graph?
    How to choose the underlying semantic network?
    How do we see whether the model has any chance of realism?
    How can we select realistic parameters for the model?
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  37. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Framework
    A
    B
    C D
    E
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  38. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Framework
    A
    B
    C D
    E
    intention,
    purpose
    woods,
    forest
    tree
    wood
    post,
    pole
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  39. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Framework
    A
    B
    C D
    E
    intention,
    purpose
    woods,
    forest
    tree
    wood
    post,
    pole
    [1]
    [2]
    [3]
    6
    1
    1
    5
    2
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  40. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Framework
    A
    B
    C D
    E
    intention,
    purpose
    woods,
    forest
    tree
    wood
    post,
    pole
    [1]
    [2]
    [3]
    6
    1
    1
    5
    2
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  41. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Evolution
    Inspiration: Discrimination Game and Guessing Game
    Choose two random concepts ci (P ∝ deg2)
    Score each word w for each ci:
    wi = wt(w; ci) + 0.1 c neighbor of ci
    wt(w; c)
    Increase wt(w, ci) where w¬i = 0 and wi max;
    Or create a new word meaning ci with wt 1.
    Decrease wt(w, ci) where 0 < w¬i < wi max;
    Or a random connection (∝ wt)
    Decrease wt of a random connection (∝ wt)
    intention,
    purpose
    woods,
    forest
    tree
    wood
    post,
    pole
    [1]
    [2]
    [3]
    6
    1
    1
    5
    2

    intention,
    purpose
    woods,
    forest
    tree
    wood
    post,
    pole
    [1]
    [2]
    [3]
    7
    0
    1
    5
    2
    1
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  42. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Long-Term Behaviour
    Proposition
    Behaviour (vocabulary size, polysemy, synonymy) should stabilize
    over long time scales at reasonable values.
    Test: Run the simulation along a branch with 2 000 000 time
    steps, using CLICS (see above) as semantic network.
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  43. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Long-Term Behaviour
    Proposition
    Behaviour vocabulary size, polysemy, synonymy should stabilize
    over long time scales at reasonable values
    100 101 102 103 104 105 106
    time steps t
    800
    900
    1000
    1100
    1200
    1300
    Vocabulary size
    100 101 102 103 104 105 106
    time steps t
    0
    1
    2
    3
    4
    5
    Average Polysemy/Synonymity
    Polysemy
    Synonymity
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  44. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Calibration
    Proposition
    Years:Replacement-Steps scaling parameter has an optimum
    Test: Run the simulation along a known dated tree (Chinese
    dialects from the Cíhuì) and compare with cross-semantically
    cognate coded data. Compare pairwise shared cognate proportion
    between real and simulated data.
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  45. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Collection of Chinese dialects created in the late 1950s and
    published in 1964 (Běijīng University 1964)
    Contains lexical data, as the short title suggests (cíhuì means
    “lexical inventory” or Wortschatz in German)
    Based on a questionnaire consisting of 905 concepts (daily life
    and basic vocabulary)
    Offers data for 18 dialect varieties, including varieties from
    each of the seven largest dialect groups of Chines (Mǐn,
    Cantonese, Mandarin, Hakka, Wú, Xiāng, and Gàn)
    Data was prepared during List’s research project (2015-2016),
    digitized, and partial cognate coding was extracted
    automatically, based on annotations in the original source
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  46. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Collection of Chinese dialects created in the late 1950s and
    published in 1964 (Běijīng University 1964)
    Contains lexical data, as the short title suggests (cíhuì means
    “lexical inventory” or Wortschatz in German)
    Based on a questionnaire consisting of 905 concepts (daily life
    and basic vocabulary)
    Offers data for 18 dialect varieties, including varieties from
    each of the seven largest dialect groups of Chines (Mǐn,
    Cantonese, Mandarin, Hakka, Wú, Xiāng, and Gàn)
    Data was prepared during List’s research project (2015-2016),
    digitized, and partial cognate coding was extracted
    automatically, based on annotations in the original source
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  47. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Collection of Chinese dialects created in the late 1950s and
    published in 1964 (Běijīng University 1964)
    Contains lexical data, as the short title suggests (cíhuì means
    “lexical inventory” or Wortschatz in German)
    Based on a questionnaire consisting of 905 concepts (daily life
    and basic vocabulary)
    Offers data for 18 dialect varieties, including varieties from
    each of the seven largest dialect groups of Chines (Mǐn,
    Cantonese, Mandarin, Hakka, Wú, Xiāng, and Gàn)
    Data was prepared during List’s research project (2015-2016),
    digitized, and partial cognate coding was extracted
    automatically, based on annotations in the original source
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  48. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Collection of Chinese dialects created in the late 1950s and
    published in 1964 (Běijīng University 1964)
    Contains lexical data, as the short title suggests (cíhuì means
    “lexical inventory” or Wortschatz in German)
    Based on a questionnaire consisting of 905 concepts (daily life
    and basic vocabulary)
    Offers data for 18 dialect varieties, including varieties from
    each of the seven largest dialect groups of Chines (Mǐn,
    Cantonese, Mandarin, Hakka, Wú, Xiāng, and Gàn)
    Data was prepared during List’s research project (2015-2016),
    digitized, and partial cognate coding was extracted
    automatically, based on annotations in the original source
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  49. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Collection of Chinese dialects created in the late 1950s and
    published in 1964 (Běijīng University 1964)
    Contains lexical data, as the short title suggests (cíhuì means
    “lexical inventory” or Wortschatz in German)
    Based on a questionnaire consisting of 905 concepts (daily life
    and basic vocabulary)
    Offers data for 18 dialect varieties, including varieties from
    each of the seven largest dialect groups of Chines (Mǐn,
    Cantonese, Mandarin, Hakka, Wú, Xiāng, and Gàn)
    Data was prepared during List’s research project (2015-2016),
    digitized, and partial cognate coding was extracted
    automatically, based on annotations in the original source
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  50. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Excursus: Cíhuì
    Xian
    Jinan
    Beijing
    Shenyang
    Chengdu
    Kunming
    Hefei
    Yangzhou
    Changsha
    Nanchang
    Wenzhou
    Suzhou
    Meixian
    Guangzhou
    Yangjiang
    Fuzhou
    Chaozhou
    Xiamen
    100.0
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  51. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Calibration
    Proposition
    Years:Replacement-Steps scaling parameter has an optimum
    Values around 1.5
    look best, and even
    reasonable, given
    the ad-hoc nature
    of the tree
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  52. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Calibration
    Proposition
    Years:Replacement-Steps scaling parameter has an optimum
    Values around 1.5
    look best, and even
    reasonable, given
    the ad-hoc nature
    of the tree
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  53. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Semantic Shift
    Proposition
    The model shows reasonable amounts of semantic shift
    Test: Visually compare distribution of Meaning Classes/Cognate
    Class for simulated and real Cíhuì data
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  54. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Semantic Shift
    Proposition
    The model shows reasonable amounts of semantic shift
    Note: Simulation results
    are not filtered to exclude
    synonyms
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  55. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Missing Bits
    More comparable data for better calibration
    A better model of the semantics – and calibration of that
    (frequent pathways, etc.)
    Support for language contact (borrowings)
    Partial cognate support/Compositionality/Derivations
    Needs severe change in vocabulary representation, and some
    serious quantitative data
    Might help with language contact
    Population level modeling, for rate variation/punctuated
    evolution to emerge
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  56. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Conclusions
    We have a realistic model of semantic shift. With more dated,
    cross-semantic-cognate-coded trees we can calibrate it more
    confidently.
    We (and you, it’s Open Source1!) can already use it to run and
    compare different tree building methods.
    1http://github.com/Anaphory/simuling
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  57. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Thanks
    Thank you for listening
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  58. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Parameter Robustness: Concept Selection Weight
    degree_squared one preferential degree
    0
    200
    400
    600
    800
    1000
    1200
    1400
    Vocabulary size
    degree_squared one preferential degree
    1
    2
    3
    4
    5
    6
    7
    8
    Average Polysemy/Synonymity
    Polysemy
    Synonymity
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  59. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Parameter Robustness: Neighbor factor
    0.0 0.2 0.4 0.6 0.8 1.0
    neighbor factor
    0
    200
    400
    600
    800
    1000
    1200
    1400
    Vocabulary size
    0.0 0.2 0.4 0.6 0.8 1.0
    neighbor factor
    1
    2
    3
    4
    5
    6
    Average Polysemy/Synonymity
    Polysemy
    Synonymity
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  60. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Sources and Further Reading I
    Quentin Atkinson et al. “From Words to Dates: Water into
    Wine, Mathemagic or Phylogenetic Inference?” In:
    Transactions of the Philological Society 103.2 (Aug. 1, 2005),
    pp. 193–219. issn: 1467-968X. doi:
    10.1111/j.1467-968X.2005.00151.x. url: http:
    //onlinelibrary.wiley.com/doi/10.1111/j.1467-
    968X.2005.00151.x/abstract (visited on 03/23/2017).
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  61. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Sources and Further Reading II
    François Barbançon et al. “An Experimental Study
    Comparing Linguistic Phylogenetic Reconstruction Methods”.
    In: Diachronica 30.2 (2013), pp. 143–170. doi:
    10.1075/dia.30.2.01bar. url:
    http://www.ingentaconnect.com/content/jbp/dia/
    2013/00000030/00000002/art00001 (visited on
    10/15/2016).
    Harald Hammarström et al. Glottolog. Version 2.5. URL:
    http://glottolog.org. Leipzig, 2015.
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  62. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Sources and Further Reading III
    J.-M. List et al., eds. CLICS: Database of Cross-Linguistic
    Colexifications. Marburg: Forschungszentrum Deutscher
    Sprachatlas, 2014. Archived at:
    http://www.webcitation.org/6ccEMrZYM. url:
    http://clics.lingpy.org.
    Andrew D. M. Smith. “Models of Language Evolution and
    Change”. In: Wiley Interdisciplinary Reviews-Cognitive
    Science 5.3 (May 1, 2014). WOS:000334511800004,
    pp. 281–293. issn: 1939-5078. doi: 10.1002/wcs.1285.
    url: http://onlinelibrary.wiley.com/doi/10.1002/
    wcs.1285/abstract.
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  63. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Sources and Further Reading IV
    S.A. Starostin. “Computer-Based Simulation of the
    Glottochronological Process”. In: [Works on Linguistics].
    Moscow: , 2007, pp. 854–862.
    Tandy Warnow et al. “A Stochastic Model of Language
    Evolution That Incorporates Homoplasy and Borrowing”. In:
    (). url: http://statistics.berkeley.edu/sites/
    default/files/tech-reports/673.pdf (visited on
    03/22/2017).
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide

  64. Context and Motivation
    Our Model
    What we can and can’t do
    Closing Remarks
    References
    Gereon Kaiping, Mattis List Better Models With Saussure

    View Slide