Slide 1

Slide 1 text

CLICS 2.0 Towards an Improved Handling of Cross-Linguistic Colexification Patterns Johann-Mattis List Research Group “Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2017/07/03 very long title P(A|B)=P(B|A)... 1 / 33

Slide 2

Slide 2 text

A long, long time ago... A long, long time ago... 2 / 33

Slide 3

Slide 3 text

A long, long time ago... Predecessors Predecessors: People and Ideas 3 / 33

Slide 4

Slide 4 text

A long, long time ago... Predecessors Predecessors: People and Ideas Haspelmath (2003): The geometry of grammatical meaning. 3 / 33

Slide 5

Slide 5 text

A long, long time ago... Predecessors Predecessors: People and Ideas Haspelmath (2003): The geometry of grammatical meaning. François (2008): Semantic maps and the typology of colexification. 3 / 33

Slide 6

Slide 6 text

A long, long time ago... Predecessors Predecessors: People and Ideas Haspelmath (2003): The geometry of grammatical meaning. François (2008): Semantic maps and the typology of colexification. Cysouw (2010): Drawing networks from recurrent polysemies. 3 / 33

Slide 7

Slide 7 text

A long, long time ago... Predecessors Predecessors: People and Ideas Haspelmath (2003): The geometry of grammatical meaning. François (2008): Semantic maps and the typology of colexification. Cysouw (2010): Drawing networks from recurrent polysemies. Steiner, Stadler, and Cysouw (2011): A pipeline for computational historical linguistics. 3 / 33

Slide 8

Slide 8 text

A long, long time ago... Predecessors Predecessors: People and Ideas Haspelmath (2003): The geometry of grammatical meaning. François (2008): Semantic maps and the typology of colexification. Cysouw (2010): Drawing networks from recurrent polysemies. Steiner, Stadler, and Cysouw (2011): A pipeline for computational historical linguistics. Urban (2011): Assymetries in overt marking and directionality in semantic change. 3 / 33

Slide 9

Slide 9 text

A long, long time ago... Predecessors Predecessors: Data 4 / 33

Slide 10

Slide 10 text

A long, long time ago... Predecessors Predecessors: Data Intercontinental Dictionary Series (IDS, Key and Comrie 2016) offers 1310 concepts translated into about 360 languages, an earlier version offered ca. 200 languages. 4 / 33

Slide 11

Slide 11 text

A long, long time ago... Predecessors Predecessors: Data Intercontinental Dictionary Series (IDS, Key and Comrie 2016) offers 1310 concepts translated into about 360 languages, an earlier version offered ca. 200 languages. World Loanword Typology (WOLD, Haspelmath and Tadmor 2009) offers 1430 concepts translated into 41 languages (some overlap with IDS). 4 / 33

Slide 12

Slide 12 text

A long, long time ago... Predecessors Predecessors: Techniques Steiner, Stadler, and Cysouw (2011) present the idea to model similarities between concepts by constructing a matrix from parts of the IDS data that shows how often individual languages colexify certain concepts. 5 / 33

Slide 13

Slide 13 text

A long, long time ago... Predecessors Predecessors: Techniques Steiner, Stadler, and Cysouw (2011) present the idea to model similarities between concepts by constructing a matrix from parts of the IDS data that shows how often individual languages colexify certain concepts. Cysouw (2010) shows how to use polysemy data to draw networks. 5 / 33

Slide 14

Slide 14 text

A long, long time ago... Initial Ideas Initial Ideas 6 / 33

Slide 15

Slide 15 text

A long, long time ago... Initial Ideas Initial Ideas List, Terhalle, and Urban (2013) build on ideas of Cysouw (2010) and Steiner, Stadler and Cysouw (2011) in using IDS data for polysemy studies and in using network techniques to study the data. 6 / 33

Slide 16

Slide 16 text

A long, long time ago... Initial Ideas Initial Ideas List, Terhalle, and Urban (2013) build on ideas of Cysouw (2010) and Steiner, Stadler and Cysouw (2011) in using IDS data for polysemy studies and in using network techniques to study the data. In contrast to earlier approaches, they use techniques for community detection (Girvan and Newman 2002) to further analyse the network, and to partition the concepts into communities which seem to make intuitively sense, reminding of naturally derived semantic fields. 6 / 33

Slide 17

Slide 17 text

A long, long time ago... Further Ideas Further Ideas 7 / 33

Slide 18

Slide 18 text

A long, long time ago... Further Ideas Further Ideas Mayer, List, Terhalle, and Urban (2014) present an interactive way to visualize cross-linguistic colexification data. 7 / 33

Slide 19

Slide 19 text

A long, long time ago... Further Ideas Further Ideas Mayer, List, Terhalle, and Urban (2014) present an interactive way to visualize cross-linguistic colexification data. List, Mayer, Terhalle, and Urban (2014) publish the database and the web-application online, under the name CLICS (Database of Cross-Linguistic Colexifications). 7 / 33

Slide 20

Slide 20 text

A long, long time ago... Further Ideas Further Ideas Mayer, List, Terhalle, and Urban (2014) present an interactive way to visualize cross-linguistic colexification data. List, Mayer, Terhalle, and Urban (2014) publish the database and the web-application online, under the name CLICS (Database of Cross-Linguistic Colexifications). In contrast to earlier attempts, they increased the data by merging IDS, WOLD, and additional datasets which they collected themselves, thus containing 220 languages in total. 7 / 33

Slide 21

Slide 21 text

A long, long time ago... Further Ideas Further Ideas Mayer, List, Terhalle, and Urban (2014) present an interactive way to visualize cross-linguistic colexification data. List, Mayer, Terhalle, and Urban (2014) publish the database and the web-application online, under the name CLICS (Database of Cross-Linguistic Colexifications). In contrast to earlier attempts, they increased the data by merging IDS, WOLD, and additional datasets which they collected themselves, thus containing 220 languages in total. They also improved the community detection procedure by using Infomap (Rosvall and Bergstrom 2008), an advanced algorithm based on random walks in complex networks. 7 / 33

Slide 22

Slide 22 text

CLICS 1.0 CLICS 1.0 8 / 33

Slide 23

Slide 23 text

CLICS 1.0 Data Data 9 / 33

Slide 24

Slide 24 text

CLICS 1.0 Data Data IDS (Key and Comrie 2007 version), of 233 language varieties, 178 included in CLICS. 9 / 33

Slide 25

Slide 25 text

CLICS 1.0 Data Data IDS (Key and Comrie 2007 version), of 233 language varieties, 178 included in CLICS. WOLD (Haspelmath and Tadmor 2009), of 41 languages in WOLD, 33 are included in CLICS. 9 / 33

Slide 26

Slide 26 text

CLICS 1.0 Data Data IDS (Key and Comrie 2007 version), of 233 language varieties, 178 included in CLICS. WOLD (Haspelmath and Tadmor 2009), of 41 languages in WOLD, 33 are included in CLICS. Logos Dictionary (Logos Group), of dictionaries for more than 60 different languages, 4 languages were manually extracted and included in CLICS. 9 / 33

Slide 27

Slide 27 text

CLICS 1.0 Data Data IDS (Key and Comrie 2007 version), of 233 language varieties, 178 included in CLICS. WOLD (Haspelmath and Tadmor 2009), of 41 languages in WOLD, 33 are included in CLICS. Logos Dictionary (Logos Group), of dictionaries for more than 60 different languages, 4 languages were manually extracted and included in CLICS. Språkbanken project (University of Gothenburg) offers 8 word lists for SEA languages, 6 were included in CLICS. 9 / 33

Slide 28

Slide 28 text

CLICS 1.0 Methods Methods Problems 10 / 33

Slide 29

Slide 29 text

CLICS 1.0 Methods Methods Problems (A) Data cannot be displayed fully, complexity needs to be reduced. (B) Data is noisy and needs to be corrected. 10 / 33

Slide 30

Slide 30 text

CLICS 1.0 Methods Methods Problems (A) Data cannot be displayed fully, complexity needs to be reduced. (B) Data is noisy and needs to be corrected. Solutions 10 / 33

Slide 31

Slide 31 text

CLICS 1.0 Methods Methods Problems (A) Data cannot be displayed fully, complexity needs to be reduced. (B) Data is noisy and needs to be corrected. Solutions (A) Show communities instead of showing all the data, offer a subgraph-view that cuts out the nearest neighbors of one concept to compensate for data loss in the community view. (B) Filter by language families and weight the concept links by frequency of occurrence, following Dellert’s (2014) suggestion. This will cut most of the links resulting from homophony and leaves the links which are due to polysemy. 10 / 33

Slide 32

Slide 32 text

CLICS 1.0 Interface Interface 11 / 33

Slide 33

Slide 33 text

CLICS 1.0 Interface Interface Interface is written in JavaScript for the visualizations and PhP for querying the data. 11 / 33

Slide 34

Slide 34 text

CLICS 1.0 Interface Interface Interface is written in JavaScript for the visualizations and PhP for querying the data. The interactive component of the network browser was specifically designed for CLICS and builds on the D3 framework by Bostock et al. (2011). 11 / 33

Slide 35

Slide 35 text

CLICS 1.0 Interface Interface Interface is written in JavaScript for the visualizations and PhP for querying the data. The interactive component of the network browser was specifically designed for CLICS and builds on the D3 framework by Bostock et al. (2011). The underlying network with the inferred communities is offered for download from the website, and the whole code which was used to create the website is available for download at http://github.com/clics/clics. 11 / 33

Slide 36

Slide 36 text

CLICS 1.0 Interface DEMO 12 / 33

Slide 37

Slide 37 text

CLICS 2.0 CLICS 2.0 13 / 33

Slide 38

Slide 38 text

CLICS 2.0 Motivation Motivation 14 / 33

Slide 39

Slide 39 text

CLICS 2.0 Motivation Motivation Problems in CLICS 1.0 difficult to curate (error-correction, linking data, adding data) 14 / 33

Slide 40

Slide 40 text

CLICS 2.0 Motivation Motivation Problems in CLICS 1.0 difficult to curate (error-correction, linking data, adding data) difficult to collaborate (the CLICS team is separated and everybody is extremely busy with stuff other than CLICS 14 / 33

Slide 41

Slide 41 text

CLICS 2.0 Motivation Motivation Problems in CLICS 1.0 difficult to curate (error-correction, linking data, adding data) difficult to collaborate (the CLICS team is separated and everybody is extremely busy with stuff other than CLICS difficult to communicate (not all users understand how we arrived at the data, and often think that it is us who messed datasets up, etc., although we only take the data to produce something new out of it) 14 / 33

Slide 42

Slide 42 text

CLICS 2.0 Motivation Motivation Problems in CLICS 1.0 difficult to curate (error-correction, linking data, adding data) difficult to collaborate (the CLICS team is separated and everybody is extremely busy with stuff other than CLICS difficult to communicate (not all users understand how we arrived at the data, and often think that it is us who messed datasets up, etc., although we only take the data to produce something new out of it) difficult to expand (new datasets cannot be added without having a true guiding principle) 14 / 33

Slide 43

Slide 43 text

CLICS 2.0 Motivation Motivation Problems in CLICS 1.0 difficult to curate (error-correction, linking data, adding data) difficult to collaborate (the CLICS team is separated and everybody is extremely busy with stuff other than CLICS difficult to communicate (not all users understand how we arrived at the data, and often think that it is us who messed datasets up, etc., although we only take the data to produce something new out of it) difficult to expand (new datasets cannot be added without having a true guiding principle) difficult to catch up (we know much, much better now, how to curate datasets, but we did not know this when preparing CLICS initially) 14 / 33

Slide 44

Slide 44 text

CLICS 2.0 Ideas Ideas 15 / 33

Slide 45

Slide 45 text

CLICS 2.0 Ideas Ideas use the state of the art of available data 15 / 33

Slide 46

Slide 46 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) 15 / 33

Slide 47

Slide 47 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) assemble data with help of the Concepticon (List, Forkel, and Cysouw 2016) 15 / 33

Slide 48

Slide 48 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) assemble data with help of the Concepticon (List, Forkel, and Cysouw 2016) assemble information on languages exclusively from Glottolog (Hammarström et al. 2017) 15 / 33

Slide 49

Slide 49 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) assemble data with help of the Concepticon (List, Forkel, and Cysouw 2016) assemble information on languages exclusively from Glottolog (Hammarström et al. 2017) curate the code and the polysemy data with help of a transparent API 15 / 33

Slide 50

Slide 50 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) assemble data with help of the Concepticon (List, Forkel, and Cysouw 2016) assemble information on languages exclusively from Glottolog (Hammarström et al. 2017) curate the code and the polysemy data with help of a transparent API regularly release the data in release circles of about 1 per year (following the practice of Glottolog and other CLLD projects) 15 / 33

Slide 51

Slide 51 text

CLICS 2.0 Ideas Ideas use the state of the art of available data separate data from display (CLICS 2.0 does not host data, but simply uses it) assemble data with help of the Concepticon (List, Forkel, and Cysouw 2016) assemble information on languages exclusively from Glottolog (Hammarström et al. 2017) curate the code and the polysemy data with help of a transparent API regularly release the data in release circles of about 1 per year (following the practice of Glottolog and other CLLD projects) normalize the data which is analysed by CLICS 15 / 33

Slide 52

Slide 52 text

CLICS 2.0 Excursus Excursus: Concepticon Concept List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232) Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232) 16 / 33

Slide 53

Slide 53 text

CLICS 2.0 Excursus Excursus: Concepticon Concept List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID: 3232) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID: 3232) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID: 3232) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID: 3232) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID: 3232) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID: 3232) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID: 3232) Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID: 3232) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID: 3232) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID: 3232) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID: 3232) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID: 3232) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID: 3232) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID: 3232) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID: 3232) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID: 3232) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID: 3232) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID: 3232) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID: 3232) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID: 3232) 16 / 33

Slide 54

Slide 54 text

CLICS 2.0 Excursus Excursus: Concepticon Concept List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323) Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323) 16 / 33

Slide 55

Slide 55 text

CLICS 2.0 Excursus Excursus: Concepticon Concept List # Items Concept Label Concept ID Allen (2007) 500 animal oil; 动物油(脂肪) GREASE (CONCEPTICON-ID:323) Gregersen (1976) 217 fat-grease*fat-grease GREASE (CONCEPTICON-ID:323) Heggarty (2005) 150 fat (grease); grasa GREASE (CONCEPTICON-ID:323) Swadesh (1955) 100 fat (grease) GREASE (CONCEPTICON-ID:323) Alpher and Nash (1999) 151 fat, grease GREASE (CONCEPTICON-ID:323) Hale (1961) 100 fat, grease GREASE (CONCEPTICON-ID:323) OGrady and Klokeid (1969) 100 fat, grease GREASE (CONCEPTICON-ID:323) Blust (2008) 210 fat/grease GREASE (CONCEPTICON-ID:323) Matisoff (1978) 200 fat/grease GREASE (CONCEPTICON-ID:323) Samarin (1969) 218 fat/grease GREASE (CONCEPTICON-ID:323) Dunn et al. (2012) 207 fat GREASE (CONCEPTICON-ID:323) Swadesh (1950) 215 fat GREASE (CONCEPTICON-ID:323) Zgraggen (1980) 380 fat GREASE (CONCEPTICON-ID:323) Jachontov (1991) 100 fat n. GREASE (CONCEPTICON-ID:323) Wiktionary (2003) 207 fat (noun) GREASE (CONCEPTICON-ID:323) Starostin (1991) 110 fat n.; жир GREASE (CONCEPTICON-ID:323) TeilDautrey et al. (2008) 430 fat, oil GREASE (CONCEPTICON-ID:323) Swadesh (1952) 200 fat (organic substance) GREASE (CONCEPTICON-ID:323) Shiro (1973) 200 grease (fat) GREASE (CONCEPTICON-ID:323) Samarin (1969) 100 grease; graisse; Fett; grasa GREASE (CONCEPTICON-ID:323) Wang (2006) 200 pig oil; 猪油 GREASE (CONCEPTICON-ID:323) Haspelmath and Tadmor (2009) 1460 the grease or fat GREASE (CONCEPTICON-ID:323) 16 / 33

Slide 56

Slide 56 text

CLICS 2.0 Excursus Excursus: Concepticon Concepticon (List et al. 2016) link concept labels in published concept lists (questionnaires) to concept sets link concept sets to meta-data define relations between concept sets never link one concept in a given list to more than one concept set (guarantees consistency) provide an API to check the consistency of the data and to query the data provide a web-interface to browse through the data 17 / 33

Slide 57

Slide 57 text

CLICS 2.0 Excursus Concepticon STONE EGG FOOT THE STONE THE EGG THE LEG STONE (FRUIT) EGG (CHICKEN) FOOT/LEG STONE EGG LEG FOOT http://concepticon.clld.org 18 / 33

Slide 58

Slide 58 text

CLICS 2.0 Excursus Concepticon CONCEPT SET CONCEPT CONCEPT LIST CONCEPT LABEL COMPILER SOURCE NOTE CONCEPT LABEL CONCEPT LABEL CONCEPT LABEL CONCEPT SET CONCEPT SET 18 / 33

Slide 59

Slide 59 text

CLICS 2.0 Excursus http://concepticon.clld.org 19 / 33

Slide 60

Slide 60 text

CLICS 2.0 Excursus Excursus: Data DATASET EDITORS LANGUAGES CONCEPTS IDS Key and Comrie (2016) 367 1310 WOLD Haspelmath and Tadmor (2008) 41 1430 BaiDial* Allen (2007) 8 500 HuberReed Huber and Reed (1992) 71 374 Kraft1981 Kraft (1981) 68 434 BantuBVD* Teil-Dautrey (2008) 10 430 Tryon1983* Tryon (1983) 111 324 Madang* Zgraggen (1980) 100 380 Cihui* Beijing Daxue (1964) 17 905 TBL* Huang (1992) 50 1800 NorthEuraLex Dellert and Jäger (2017) 106 1000 Datasets with an asterisk are currently in preparation and will be most likely released already within this year. 20 / 33

Slide 61

Slide 61 text

CLICS 2.0 Excursus Excursus: Data 21 / 33

Slide 62

Slide 62 text

CLICS 2.0 Excursus Excursus: Data By linking these datasets to the Concepticon (which we have already done with most of them), we can easily combine the data into a bigger dataset that we use as our basic data for CLICS 2.0. 21 / 33

Slide 63

Slide 63 text

CLICS 2.0 Excursus Excursus: Data By linking these datasets to the Concepticon (which we have already done with most of them), we can easily combine the data into a bigger dataset that we use as our basic data for CLICS 2.0. Given problems with concept overlap in the datasets, we can make different selections for the users, including datasets with many concepts but not so many languages and datasets with many languages but less concepts. 21 / 33

Slide 64

Slide 64 text

CLICS 2.0 Excursus Excursus: Data Subset Datasets Concepts Languages High-Low >= 2 >= 1000 >= 300 Mid-Mid >= 5 >= 500 >= 600 Low-High >= 10 >= 250 >= 1000 22 / 33

Slide 65

Slide 65 text

CLICS 2.0 Excursus Excursus: Data Subset Datasets Concepts Languages High-Low >= 2 >= 1000 >= 300 Mid-Mid >= 5 >= 500 >= 600 Low-High >= 10 >= 250 >= 1000 . . Effectively this means, that with CLICS 2.0, we can immediately offer different views on the data, which allow scholars to investigate very broad patterns of semantic associations, as well as fine-grained patterns with a lower attestation. 22 / 33

Slide 66

Slide 66 text

CLICS 2.0 Excursus Excursus: Software API 23 / 33

Slide 67

Slide 67 text

CLICS 2.0 Excursus Excursus: Software API With the Python API that we are currently preparing for CLICS 2.0, users will be able to use their own data to run their own network analyses, since all data is shipped with CLICS, users can also use the data we selected for CLICS 2.0. 23 / 33

Slide 68

Slide 68 text

CLICS 2.0 Excursus Excursus: Software API With the Python API that we are currently preparing for CLICS 2.0, users will be able to use their own data to run their own network analyses, since all data is shipped with CLICS, users can also use the data we selected for CLICS 2.0. We will try to offer cookbooks accompanying the software API, to help users to use it efficiently. 23 / 33

Slide 69

Slide 69 text

CLICS 2.0 Excursus Excursus: Software API With the Python API that we are currently preparing for CLICS 2.0, users will be able to use their own data to run their own network analyses, since all data is shipped with CLICS, users can also use the data we selected for CLICS 2.0. We will try to offer cookbooks accompanying the software API, to help users to use it efficiently. By shifting to the CLLD framework, scholars can also create their own CLICS websites, since the source code for the creation of interactive networks will be transparently shipped with the data. 23 / 33

Slide 70

Slide 70 text

CLICS 2.0 Excursus Excursus: Software API With the Python API that we are currently preparing for CLICS 2.0, users will be able to use their own data to run their own network analyses, since all data is shipped with CLICS, users can also use the data we selected for CLICS 2.0. We will try to offer cookbooks accompanying the software API, to help users to use it efficiently. By shifting to the CLLD framework, scholars can also create their own CLICS websites, since the source code for the creation of interactive networks will be transparently shipped with the data. Spring schools and further events carried out at the MPI-SHH as part of my ERC project on Computer-Assisted Language Comparison will cover – among others – introductory tutorials to all the software APIs that are shipped with the different tools and datasets developed at our department. 23 / 33

Slide 71

Slide 71 text

CLICS 2.0 Features Features 24 / 33

Slide 72

Slide 72 text

CLICS 2.0 Features Features drastic increase in data 24 / 33

Slide 73

Slide 73 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency 24 / 33

Slide 74

Slide 74 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency drastic increase in replicability 24 / 33

Slide 75

Slide 75 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency drastic increase in replicability regular floating releases which feature new data 24 / 33

Slide 76

Slide 76 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency drastic increase in replicability regular floating releases which feature new data strict and clear-cut collaboration guidelines 24 / 33

Slide 77

Slide 77 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency drastic increase in replicability regular floating releases which feature new data strict and clear-cut collaboration guidelines new methods (see demo on next slide) 24 / 33

Slide 78

Slide 78 text

CLICS 2.0 Features Features drastic increase in data drastic increase in transparency drastic increase in replicability regular floating releases which feature new data strict and clear-cut collaboration guidelines new methods (see demo on next slide) rigid policy towards open data (since we heavily profit from all of our colleagues who publish their data!) 24 / 33

Slide 79

Slide 79 text

CLICS 2.0 Features Features: Coverage 25 / 33

Slide 80

Slide 80 text

CLICS 2.0 Features New Methods 26 / 33

Slide 81

Slide 81 text

CLICS 2.0 Features New Methods Following Urban (2011) we are currently testing an automatized variant of partial colexifications which can help us to direct our networks and shed light on compositional aspect of semantic associations. 26 / 33

Slide 82

Slide 82 text

CLICS 2.0 Features New Methods Following Urban (2011) we are currently testing an automatized variant of partial colexifications which can help us to direct our networks and shed light on compositional aspect of semantic associations. By improving our insights into graph theory and available algorithms, we can now enhance the analysis of the networks. Articulation points, for example, show key players in a network which connect between different communities. 26 / 33

Slide 83

Slide 83 text

CLICS 2.0 Features New Methods WASP BEEHIVE WINE ALCOHOL (FERMENTED DRINK) BEER DRINK MEAD BEVERAGE HONEY BEESWAX SUGAR FRAGRANT STINKING BEE SWEET SMELL (STINK) FEEL SUGAR CANE SNIFF SMELL (PERCEIVE) 27 / 33

Slide 84

Slide 84 text

CLICS 2.0 Features New Methods CORNER SHORE COAST FRINGE LAST (FINAL) END (OF TIME) FOR A LONG TIME FAR LENGTH DEEP LONG BOUNDARY SIDE BESIDE END (OF SPACE) HIGH UP TOP HEAVEN TALL ABOVE SKY NEAR EDGE BORDER 28 / 33

Slide 85

Slide 85 text

CLICS 2.0 Features CLICS 2.0 DEMO 29 / 33

Slide 86

Slide 86 text

CLICS 2.0 Schedule Schedule 30 / 33

Slide 87

Slide 87 text

CLICS 2.0 Schedule Schedule We are working hard on assembling more data and building up the new API as well as the web-interface, but we are currently not many who work on CLICS or in its periphery. 30 / 33

Slide 88

Slide 88 text

CLICS 2.0 Schedule Schedule We are working hard on assembling more data and building up the new API as well as the web-interface, but we are currently not many who work on CLICS or in its periphery. We hope that we can publish CLICS 2.0 very late this year, and in a worst case, in early 2018. 30 / 33

Slide 89

Slide 89 text

CLICS 2.0 Schedule Schedule We are working hard on assembling more data and building up the new API as well as the web-interface, but we are currently not many who work on CLICS or in its periphery. We hope that we can publish CLICS 2.0 very late this year, and in a worst case, in early 2018. But we would argue that it is better to publish the next version a bit later rather than publishing a version that we will need to update immediately after we first published it. 30 / 33

Slide 90

Slide 90 text

CLICS 2.0 Schedule Schedule We are working hard on assembling more data and building up the new API as well as the web-interface, but we are currently not many who work on CLICS or in its periphery. We hope that we can publish CLICS 2.0 very late this year, and in a worst case, in early 2018. But we would argue that it is better to publish the next version a bit later rather than publishing a version that we will need to update immediately after we first published it. If we can learn one thing from CLICS 1.0, it is that we need to keep the code and the data in a state that we can easily curate them. We hope we will achieve this with CLICS 2.0. 30 / 33

Slide 91

Slide 91 text

Outlook Outlook 31 / 33

Slide 92

Slide 92 text

It is still a rather long way from CLICS 1.0 to CLICS 2.0. 32 / 33

Slide 93

Slide 93 text

It is still a rather long way from CLICS 1.0 to CLICS 2.0. But we hope that we are on the right track by now, and that won’t disappoint those who came to like the Cross-Linguistic Colexification Database. 32 / 33

Slide 94

Slide 94 text

It is still a rather long way from CLICS 1.0 to CLICS 2.0. But we hope that we are on the right track by now, and that won’t disappoint those who came to like the Cross-Linguistic Colexification Database. CLICS 2.0 won’t be perfect, but it will be entertaining and hopefully very interesting for our colleagues working on historical linguistics and lexical typology. 32 / 33

Slide 95

Slide 95 text

Thanks for your attention! 33 / 33