“Computer-Assisted Language Comparison” Department of Linguistic and Cultural Evolution Max-Planck Institute for the Science of Human History Jena, Germany 2019-05-17 very long title P(A|B)=P(B|A)... 1 / 62
grammar was so simple that the sporadic ref- erences in previous paragraphs have essentially described it. The prime importance of sound symbolism for the people of nature should be noted again before we further detail that the vowel “E” was felt as indicating the “yin” element, passivity, femininity etc. [...] (Papakitsos and Kenanidis 2018: 8) 4 / 62
the literature and a large scale crowd- sourcing experiment, we estimate that an average 20-year-old na- tive speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families. (Brysbaert et al. 2016: 1) 5 / 62
mathematician David Hilbert in 1900 (Hilbert 1902) at least 10 problems have been solved by now some 7 problems have solutions accepted by some scientists 6 / 62
of problems for linguistics in a talk in 2014 Russell D. Gray further promoted the idea in a series of talks, where he emphasized we should ask more Hilb/pert questions in the field of diversity linguistics 7 / 62
- ? “small” problems in comparison to big picture questions asked by Hilpert and Gray problems identified by myself can be solved by some workflow or algorithm they help us to advance our research by forcing us to formalize our work and to make clear what data we actually want to use 8 / 62
total initial basic division into problems of inference, simulation, statistics, and typology problems will be discussed on a monthly basis throughout 2019 first three problems were already discussed in February, March, and April updated division of problems: modeling, inference, analysis (MIA) 11 / 62
segmentation (blog in February 2019) 2 automatic sound law induction (blog in March 2019) 3 automatic borrowing detection (blog in April 2019) 4 automatic phonological reconstruction (blog planned for May 2019) 13 / 62
inference problems deal with something we want to find in linguistic data. Their common objective is to identify past and present processes and states of which we – due to our models – think that they have occurred or existed once, or still occur and exist. 14 / 62
with our knowledge about pro- cesses and how we account for the processes in a formal or mathematical way. Proof of language relatedness is a specific case, maybe not completely fitting into this category, but its key objective is to model chance resemblances, which is why it is basically also a modeling task and not a task of inference. 16 / 62
with the bigger picture of the pro- cesses, and with the question if we can derive tendencies, rates, or frequencies from linguistic data. In order to achieve this, we need to infer the processes first, and this is the reason why these problems are listed last in this overview. 18 / 62
Linguistik gibt es noch keinen richtigen Terminus für Wörter, die selbst Grundlage von vielen anderen Wörtern sind, also im Wortschatz einer Sprache häufig in Ableitungen wiederverwendet werden. In Anlehnung an die Biologie, wo wir in den Protein- domänen ähnliche Phänomene vorfinden (Basu et al. 2008, List et al. 2016), könnten wir jedoch von promiskuitiven Konzepten sprechen, zu denen «schlagen» im Deutschen dann auf jeden Fall auch gehören sollte. (List 2018: Von Wortfamilien und promiskuitiven Wörtern) 21 / 62
steadily increasing our qualitative methods reach their practical limits we need to take computational methods into account but computational methods are not very accurate and may yield wrong results 23 / 62
ERC Starting Grant (2017-2022) Host: MPI-SHH (Jena) Current team: 2 post-docs, 2 docs, and myself Objectives go beyond historical linguistics and Sino-Tibetan (but they are our starting point) http://calc.digling.org 25 / 62
Black Boxes many problems of inference and analysis are nowadays addressed by employing machine learning methods in the broad sense among the most popular techniques are Bayesian inference and neural networks using different techniques for problem solving is less accepted, and often receives surprised reactions, specifically among scholars with little training in historical linguistics and linguistic typology 26 / 62
Machine Learning For over 20 years, the IEEE has organised a biennial workshop covering the latest developments in automatic speech recognition and attended by the leading researchers in the field. Many of these meetings have been of high significance; for example, it was at the 1985 workshop entitled ‘Frontiers of Speech Recognition’ that Fred Jelinek uttered the now immortal phrase “Every time we fire a phonetician/linguist, the performance of our system goes up”. By 1995 the series had become known as ‘ASRU’ - the IEEE workshop on Automatic Speech Recognition and Understanding. (Moore 2005: 1) 27 / 62
Naive Machine Learning Techniques Problems may have a “normal” solution, so there is no need to bother for an approximate one. Standard techniques may not be apt the task at hand, i.e., Bayesian inference and neural networks are not good at induction, but induction is needed in many tasks. If the criteria upon which the results are based are hidden in a black box, they lack epistemological interest: as scientists we want to know what’s going on, not that a machine can do the same as we can. 28 / 62
Design One of the promises of deep learning is that it vastly simplifies the feature-engineering process by allowing the model designer to specify a small set of core, basic, or “natural” features, and letting the trainable neural network architecture combine them into more meaningful higher-level features, or representations. However, one still needs to specify a suitable set of core features, and tie them to a suitable architecture. (Goldberg 2017: 18) 29 / 62
core class of your problem (modeling, inference, analysis) look at existing qualitative solutions formalize the problem in a way that allows one to test it (specify data and techniques for evaluation) do not hesitate to define sub-problems, given that qualitative solutions are often holistic search for inspiration in neighboring disciplines (graph theory, computer science, evolutionary biology) by looking for similar processes that could be addressed in an analogous or similar way accept a qualitative or semi-automatic solution for inference processes, but make sure that the results are annotated in a machine-readable way insist on transparent output (no black boxes) to allow for an immediate review of results by experts 30 / 62
list of less than 1000 words in phonetic transcription, readily seg- mented into sounds, with concepts mapped to common concept lists (e.g., Concepticon), identify the mor- pheme boundaries in the data. 32 / 62
algorithms build on n-grams (recurring symbol sequences of arbitrary length) assuming that n-grams representing meaning-building units should be distributed more frequently across the lexicon of a language, they assemble n-gram statistics from the data with Morfessor, there is a popular family of algorithms avilable in form of a stable library (Creutz and Lagus 2005, virpioja et al. 2013) 33 / 62
ambiguous, they are not only based on the form, but also on semantics even speakers may at times no longer understand the original morphology of their language (folk etymology, etc.) morphological judgments are thus based on different viewpoints (historical perspective involving more than one language, speaker intuition, descriptive grammar) 35 / 62
take semantics into account (e.g., Spanish hermano “brother” vs. hermana “sister”) humans know that morphological structure varies across languages (compare SEA languages vs. Indo-European languages) humans try to infer phonotactic rules humans make use of cross-linguistic evidence 36 / 62
information (make use of new resources such as CLICS, Concepticon) employ phonotactic information (make use of the prosody models in LingPy) employ cross-linguistic information (use LingPy’s sequence comparison techniques) give up the idea of a universal morpheme segmentation algorithm (rather proceed from linguistic areas) invest time to create datasets for testing and training 37 / 62
in the phylogeny, explain them by invoking borrowings (MLN approach, Nelson-Sathi et al. 2011, List et al. 2014) similar words among unrelated languages (Mennecier et al. 2016) tree reconciliation methods (Willems et al. 2016) borrowability statistics (Sergey Yakhontov, as reported by Starostin 1990, Chén 1996, McMahon et al. 2005) 39 / 62
the phylogeny tend to overestimate the amount of borrowing, since there are multiple reasons for conflicts in phylogenies, not only borrowing (Morrison 2011) sequence comparison on unrelated languages seem solid, but one needs to be careful with chance resemblances based on onomatopoetic words etc. (mama, papa, etc., Jakobson 1960, Blasi et al. 2016) tree reconciliation methods are unrealistic if word trees are derived from simple edit distances sublist-approaches may be useful, but they require large accounts on known borrowings, which we usually lack 40 / 62
presupposes to exclude alternative reasons (inheritance, natural patterns, chance) no unified procedure for the identification of borrowings in the classical discipline borrowing detection is much more based on multiple types of evidence (“consilience”, “cumulative evidence”) than other tasks in historical linguistics 41 / 62
idea: search for conflicts (List, under review) search for phylogenetic conflicts (English mountain, French montagne) search for trait-related conflicts (German Damm, English dam) check for areal proximity (as a pre-condition) use borrowability arguments in cases of doubt, or as heuristics 42 / 62
data in phonetic transcription and consistent definition of meanings to allow for search of similar words among unrelated languages test methods for automatic correspondence pattern recognition and search for trait-related conflicts (List 2019) work on cross-linguistic datasets of known borrowed words to increase our knowledge of borrowability 43 / 62
Given a list of words in an ancestral language and their reflexes in a descendant lan- guage, identify the sound laws by which the ancestor can be converted into the descendant. *p > *pf / #_ 44 / 62
Solutions simulation studies (black boxes, see e.g., Ciobanu and Dinu 2018) for word prediction manual tools to model sound change when providing sound laws (PHONO, Hartmann 2003) correspondence-pattern based word prediction (List 2019, Bodt and List under review) 45 / 62
induction of rules as a problem usually not addressed in machine learning solutions problem of handling context of arbitrary distance to target sound problem of handling “abstract” context (suprasegmentals) problem of handling systemic aspects of sound change (where sound change is modeled in features) 47 / 62
Multi-tiered sequence modeling (List 2014, List and Chacon 2015, WIP): by modeling all different possible conditioning contexts, we make sure that we can find the context that conditions a sound change by selecting those which actually do condition a sound change, using computational tools, we can identify and propose potential environments of varying degrees of abstractness we still need, however, to reflect, how to handle systematic aspects of sound change 48 / 62
set of alignments of strict cognate morphemes across a set of related lan- guages, as well as the typ- ical correspondence patterns by which the sounds in the languages correspond to each other, try to infer the hypothet- ical pronunciation of each mor- pheme in the proto-language. * ₂ 49 / 62
et al. (2013) use a framework that makes use of probabilistic string transducers. If the family tree of the languages is known, and cognate sets are defined as such, the method produces proto-form suggestions. 50 / 62
by Bouchard-Côté was only tested on Austronesian, and is not available, so it cannot be tested further without re-implementing from scratch the scores reported are good (error rates between 0.25 and 0.12), but Austronesian is not a challenging candidate for reconstruction the method cannot reconstruct sounds that are not found in the data, while this is quite possible to happen in language change the evaluation uses edit distance, but differences between reconstructions are better compared for structural differences, than for substantial ones (List 2018) 51 / 62
disagree with respect to the question of how reconstruction should be best carried out, i.e., if it should be abstract or realistic (so-called abstractionalist-realist debate, Lass 2017, Jakobson 1958) no measures to account for the predictive quality of a given reconstruction system exist reconstructing what was not, as in the case of laryngeals in Indo-European (Saussure 1879), does not have a counterpart in biology (or is simply ignored in biology) 52 / 62
of sound correspondence patterns (Anttila 1972, Meillet 1903) use of external evidence where possible use of internal reconstruction where possible 53 / 62
start from semi-automatic reconstructions (especially, since we can now compute sound correspondence patterns from alignment data, see List 2019), by having experts inspect correspondence patterns, and assigning one proto-form to each of them if we manage to implement our system for sound law induction with multi-tiered sequence representations, we could evaluate the overall plausibility of existing and proposed reconstruction systems automatically we need more data for testing and training we need to work on measures to compare different reconstruction systems (along the lines proposed in List 2018, by measuring structural differences) 54 / 62
gold standards, training data, and baselines lack of good evaluation measures simulation methods may help to produce more evaluation data interfaces for data annotation are crucial to produce more high quality data 55 / 62
data comparable allow for a better integration of software and data can also guarantee that data is available in both human- and machine-readable form 01 | | | 05 | | | | 10 | | | | 15 First attempts: Cross-Linguistic Data Formats initiative Forkel et al. (2018), Scientific Data. https://cldf.clld.org 56 / 62
of data guarantee that data is human- and machine-readable allow for qualitative and quantitative research at the same time very long title P(A|B)=P(B|A)... First attempts: Etymological Dictionary Editor (EDICTOR) List (2017): Proc. of the EACL. System Demonstrations. https://edictor.digling.org 57 / 62
U O ? wOld yuE_5_1liaN_1 moon moon moon moon Běijīng Guǎngzhōu Měixiàn Fúzhōu 1 2 3 4 Conversion and Segmentation Highlighting of Unrecognized Phonetic Symbols yuE_5_1liaN_1 yɛ⁵¹liɑŋ¹ y ɛ ⁵¹ l i ɑ ŋ ¹ annotate data analyze data edit alignments Etymological DICTionary ediTor http://edictor.digling.org List (2017) E D T 57 / 62
Cross-Linguistic Colexifications http://clics.clld.org List et al. (2018) CARRY IN HAND CARRY UNDER ARM RULE ORDER SALT TAKE CHOOSE LEND SHARE BRING FORGET ACQUIT HAVE SEX HAND LIBERATE DIRTY GUEST ARM BETWEEN UPPER ARM MOLD TORCH OR LAMP OWN GAP (DISTANCE) DRIP (EMIT LIQUID) FINGERNAIL OR TOENAIL RIVER KISS RAIN (PRECIPITATION) WHEN SPOON SUCK ROUND LICK FINGERNAIL CLAW SOUP DRINK FORK PITCHFORK WATER SEA OPEN SMOKE (INHALE) LET GO OR SET FREE CAUSE DIRT FORKED BRANCH SEND LIP FORGIVE UNTIE ANCHOR EAT BITE BEVERAGE SWALLOW SAP URINE ANKLE FISHHOOK WHEEL WHERE LIFT CHIEFTAIN LOWER ARM CAUSE TO (LET) QUEEN GIVE ELBOW DONATE ELECTRICITY SKY STORM CLOUDS MUD SWAMP SMOKE (EXHAUST) FRESH SMOKE (EMIT SMOKE) STRANGER CEASE MOORLAND HOST GO UP (ASCEND) WEDDING CLIMB CLOUD PALM OF HAND FIVE MARRY RISE (MOVE UPWARDS) WRIST KING PRESIDENT FATHOM COLLARBONE RIDE SPACE (AVAILABLE) MASTER SHOULDER BROOM RAKE FLESH HOOK DRIBBLE SPIT TOE PAW OCEAN FINGER LAKE EDGE OBSCURE TOP NIGHT INCREASE WORLD UP DARKNESS BE GOD CALF OF LEG LEG SHIN FISH LOWER LEG WOMAN FEMALE (OF PERSON) FEMALE FEMALE (OF ANIMAL) LAGOON CORNER BORDER BESIDE FRINGE BOUNDARY WIFE COAST POINTED SHARP SHORE PLACE (POSITION) END (OF SPACE) EARTH (SOIL) BLACK STAND UP CHEW MEAL BREAKFAST HEEL FOOD DINNER (SUPPER) FOOT STAR SAND CLAY STAND SHOULDERBLADE CRAWL WAKE UP FOG FINISH DARK MALE ICE WAIST MARRIED MAN HIP DEEP LUNG FOAM REMAINS BLUE WAIT (FOR) LIFE LATE BE ALIVE AFTER TOWN BEHIND ASH FLOUR STATE (POLITICS) NEW UPPER BACK BOTTOM PASTURE THATCH BUTTOCKS MAN MALE (OF ANIMAL) MALE (OF PERSON) SIT DOWN TALL CROUCH EVENING AFTERNOON HIGH WEST GROW MAINLAND SIT LAND FLOOR AREA HALT (STOP) DUST REMAIN GROUND NATIVE COUNTRY DWELL (LIVE, RESIDE) COUNTRY HUSBAND BACK END (OF TIME) SPINE GRASS DEW MARRIED WOMAN ROOSTER INSECT FOWL BIRD ANIMAL HEN SHORT BABY CORN FIELD THIN SAGO PALM GARDEN SMALL THIN (OF SHAPE OF OBJECT) CLAN NARROW FAMILY YOUNG CITIZEN FINE OR THIN SHALLOW THIN (SLIM) GIRL RELATIVES YOUNG MAN FRIEND PARENTS CHILD (DESCENDANT) YOUNG WOMAN BOY NEIGHBOUR CHILD (YOUNG HUMAN) SON SIBLING BROTHER DESCENDANTS OLDER SIBLING DAUGHTER ALONE FENCE ONLY FEW TOWER SOME ONE YARD OUTSIDE FORTRESS NEVER PLAIN PEOPLE VALLEY DOWN FIELD LOW PERSON YOUNGER SIBLING YOUNGER SISTER OLDER BROTHER YOUNGER BROTHER COUSIN SISTER OLDER SISTER NEPHEW DAMP FLOWER MANY SMOOTH WIDE FLAT BLOOD WET BELOW OR UNDER DOWN OR BELOW GREY BREAD DOUGH RAW VILLAGE GREEN CROWD SOFT AT ALL SLIP UNRIPE VEIN BLOOD VESSEL ALWAYS TENDON ROOF ROOT INSIDE OR GENTLE OLD WITH ENOUGH OLD (AGED) FORMER AND ROOM HOME TENT HUT GARDEN-HOUSE WEAK DENSE MEN'S HOUSE OLD MAN LAZY STILL (CONTINUING) TIRED AGAIN MORE READY OLD WOMAN SOMETIMES IN HOUSE OFTEN YELLOW RED AFTERWARDS BIG GOLD YOLK HOUR SALTY PINCH KNEEL AGE RIPE THICK FULL STRAIGHT BE LATE LIGHT (RADIATION) ABOVE WORK (ACTIVITY) PRODUCE MAKE DAY (NOT NIGHT) HEAVEN WORK (LABOUR) BUILD FAR AT THAT TIME LONG WHITE LENGTH THEN MOUNTAIN OR HILL SEASON HAVE PRESS GET PICK UP HEAD HOLD EARN DO OR MAKE WEATHER FATHER STEPFATHER UNCLE FATHER-IN-LAW (OF MAN) FATHER'S BROTHER MOTHER'S BROTHER STEPMOTHER AUNT BEGINNING BEGIN FIRST FATHER'S SISTER MOTHER-IN-LAW (OF WOMAN) MOTHER'S SISTER MOTHER MOTHER-IN-LAW (OF MAN) PARENTS-IN-LAW GRANDDAUGHTER SON-IN-LAW (OF WOMAN) FATHER-IN-LAW (OF WOMAN) SON-IN-LAW (OF MAN) DAUGHTER-IN-LAW (OF WOMAN) CHILD-IN-LAW SIBLING'S CHILD NIECE GRANDFATHER DAUGHTER-IN-LAW (OF MAN) IN FRONT OF FORWARD GRANDSON GRANDCHILD GRANDMOTHER ANCESTORS GRANDPARENTS THING STREET MANNER ROAD PIECE PORT PATH OR ROAD PATH RIB BONE BAIT THIGH BAY FLESH OR MEAT MEAT FOOTPRINT SIDE PART SLICE WALL (OF HOUSE) MIDDLE NAVEL SNOW LAST (FINAL) HAY HALF NEAR CHICKEN BULL SNAKE WORM CATTLE LIVESTOCK CALF OX COW WHICH WHITHER (WHERE TO) WINE HOW CIRCLE RING BALL BRACELET HOW MUCH HOW MANY BEEHIVE GRAVE CAVE BEARD RAIN (RAINING) SPRING OR WELL MOUSTACHE STREAM GLUE ALCOHOL (FERMENTED DRINK) BEE BEER HONEY WHO WASP MEAD WHAT WHY CANDY LUNCH ITEM WARE CUSTOM LAW MIDDAY PIT (POTHOLE) HOLE FURROW DITCH LAIR JUDGMENT COURT ADJUDICATE CONDEMN CONVICT ACCUSE BLAME ANNOUNCE PREACH EXPLAIN SAY ASK (REQUEST) THROW BUDGE (ONESELF) SHOOT EMBERS UGLY CHOP CUT DOWN COLD (OF WEATHER) FIREWOOD GRASP LEAD (GUIDE) DISTANCE LIE DOWN CARRY ON HEAD PERMIT PUSH MOLAR TOOTH FRONT TOOTH (INCISOR) RIDGEPOLE BEAK COAT TOWEL HELMET SHIRT HEADBAND HEADGEAR RAG VEIL SOON TOGETHER IMMEDIATELY NEST NOW BED TODAY INSTANTLY SUDDENLY RUG WITHOUT PONCHO BLANKET CLOAK MAT BEFORE BOLT (MOVE IN HASTE) ROAR (OF SEA) FAST DASH (OF VEHICLE) EARLY YESTERDAY HURRY AT FIRST EMPTY NO DRY ZERO NOTHING NOT RESULT IN BE BORN HAPPEN PASS SUCCEED BECOME BRAVE CLOTH POWERFUL DARE LOUD GRASS-SKIRT DRESS CLOTHES SKIRT RIPEN SOLID PIERCE HARD BEGET ROUGH REFUSE FRY DRESS UP DENY CALM MORNING PEACE BE SILENT QUIET SWELL TOMORROW HEALTHY EXPENSIVE HAPPY ROAST OR FRY STRONG BAKE PRICE BOIL (SOMETHING) PUT ON COOKED SLOW FAITHFUL RIGHT LAST (ENDURE) FOR A LONG TIME DAWN BEAUTIFUL GOOD COOK (SOMETHING) YES CORRECT (RIGHT) BOIL (OF LIQUID) DO PUT BRIGHT CLEAN LIGHT (COLOR) LAY (VERB) SHINE SEAT (SOMEBODY) INNOCENT FORBID PREPARE CERTAIN TRUTH TRUE DEAR PRECIOUS WARM HEAT CONCEIVE SEW LOOM PLAIT LIGHT (IGNITE) BURN (SOMETHING) PREVENT HOLY GOOD-LOOKING ARSON BEND CHANGE (BECOME DIFFERENT) BURNING TWIST DEBT CROOKED ROLL SPIN HEAVY HOT WEAVE DIFFICULT FEVER PLAIT OR BRAID OR WEAVE PREGNANT OWE TWINKLE CLEAR BEND (SOMETHING) MORTAR CRUSHER PESTLE BITTER MILL MONTH SKULL MEASURE TRY COME BACK TIME MOON COUNT JOIN SQUEEZE PILE UP CLOCK BUY DRAW MILK DAY (24 HOURS) BETRAY GUARD PROTECT PAY KNEE KEEP SELL SUN BILL HELP LIE (MISLEAD) TRADE OR BARTER DECEIT PERJURY RESCUE CURE FOLD SIEVE PRESERVE TRANSLATE TURN (SOMETHING) TURN WRAP HERD (SOMETHING) WAGES DEFEND CHANGE RETURN HOME TIE UP (TETHER) TURN AROUND HANG KNIT WEIGH HANG UP GIVE BACK CONNECT COVER BUTTON BUNCH KNOT SHUT BUNDLE TIE NOOSE GILL EAR EARLOBE THINK FOLLOW JEWEL BE ABLE OBEY SUMMER FEEL (TACTUALLY) REMEMBER SUSPECT BELIEVE GUESS RECOGNIZE (SOMEBODY) SOUR SWEET SUGAR CANE BRACKISH SUGAR TASTY CALCULATE IMITATE CITRUS FRUIT TASTE (SOMETHING) READ COME PRECIPICE SEE STONE OR ROCK APPROACH TOUCH ARRIVE YEAR MEET GRIND FRAGRANT ROTTEN SMELL (STINK) SMELL (PERCEIVE) STINKING SNIFF PUS FEEL UNDERSTAND HEAR THINK (BELIEVE) LISTEN MOVE (AFFECT EMOTIONALLY) KNOW (SOMETHING) NOTICE (SOMETHING) WATCH LEARN REEF STUDY LOOK FOR LOOK NASAL MUCUS (SNOT) SPLASH PITY HIDE (CONCEAL) SHELF FLY (MOVE THROUGH AIR) REGRET NOSTRIL THIEF BOARD SINK (DESCEND) DECREASE CHEEK NOSE BROKEN LOSE EMERGE (APPEAR) ANXIETY BAD LUCK GOOD LUCK OMEN WRONG SLAB FOREHEAD EYE BAD EVIL TABLE INJURE DANGER SURPRISED HARVEST BERRY FEAR (FRIGHT) NUT FAULT MISTAKE BECOME SICK SEED MISS (A TARGET) GUILTY SWELLING BRUISE BLISTER BOIL (OF SKIN) SCAR CHOKE ENTER ACHE SICK DISEASE PAIN DAMAGE (INJURY) SEVERE GRIEF SAUSAGE BEAD STOMACH INTESTINES CHAIN SPLEEN NECKLACE WOMB LIVER BELLY MEANING GHOST POSTCARD HEART LEGENDARY CREATURE SHADE DEMON BRAIN MEMORY FIGHT LETTER THOUGHT MIND BOOK COLLAR INTENTION SPIRIT PURSUE LONG HAIR SPRINGTIME HAIR (HEAD) THINK (REFLECT) DOUBT AUTUMN ORNAMENT HOPE ARMY QUARREL BEAT SOLDIER KNOCK BATTLE NOISE REST NAPE (OF NECK) THROAT NECK IDEA IF BECAUSE SLEEP FOREST DRIP (FALL IN GLOBULES) STICK TREE WALKING STICK PLANT (VEGETATION) LIE (REST) DRAG ASK (INQUIRE) DIVIDE URGE (SOMEONE) STING BRANCH CAMPFIRE BORROW SEPARATE TOOTH MOUTH CANDLE FALL ASLEEP DRIVE (CATTLE) MATCH DRIVE RAFTER BEAM DOORPOST DREAM (SOMETHING) POST MAST TUMBLE (FALL DOWN) WALK TREE TRUNK LAND (DESCEND) TEAR (SHRED) SAW GO OUT FALL TEAR (OF EYE) GO DOWN (DESCEND) BODY TREE STUMP SHOW CARVE SPOIL (SOMEBODY OR SOMETHING) BREAK (CLEAVE) PLANT (SOMETHING) DESTROY WALK (TAKE A WALK) CHIN BREAK (DESTROY OR GET DESTROYED) CUT PICK SPLIT LEAVE PULL CLUB WOOD MOVE (ONESELF) HIRE PRAISE MIX KNEAD WIPE SNEEZE BOAST SCRATCH CLEAN (SOMETHING) HOARFROST WORSHIP COUGH SWEEP RUB SCRAPE CARCASS DIE (FROM ACCIDENT) DIE BATHE SWIM DEAD FLOAT LOVE STAB SAIL PEEL SPREAD OUT CRY COMMON COLD (DISEASE) FROST CORPSE SHRIEK JUMP SHOUT DIG WINTER NAME STREAM (FLOW CONTINUOUSLY) PLOUGH CULTIVATE PLAY VISIBLE SEEM STRETCH SOW SEEDS RETREAT INVITE MUSIC RUN COLD HOLLOW OUT CHARCOAL TONGUE STOVE CONVERSATION SKIN DIVORCE OVEN EARWAX COOKHOUSE TIP (OF TONGUE) AIR HUNT BORE CALL BY NAME BREATH STEP (VERB) SONG ATTACK WASH PROUD SIN DEFENDANT CRIME CHIME (ACTION) EGG TESTICLES BARLEY FRUIT VEGETABLES GRAIN MAIZE RICE WHEAT RUDDER RYE PADDLE SWAY SWING (MOVEMENT) SWING (SOMETHING) SHAKE ROW FREEZE JOG (SOMETHING) OAT SHIVER RINSE RING (MAKE SOUND) MAKE NOISE SOUND (OF INSTRUMENT OR VOICE) TINKLE HOE SHOVEL SPADE FLOW DANCE FLEE CALL DAMAGE SAME FACE SIMILAR DISAPPEAR ESCAPE PRAY GAME BURY CAPE CHAIR MOVE STEAL GROAN HOWL COLD (CHILL) JAW DROWN SINK (DISAPPEAR IN WATER) SET (HEAVENLY BODIES) DIVE WOUND POUND TALK BREATHE PROMISE SPEAK WIND VOICE FUR PUBIC HAIR SOUND OR NOISE STRIKE OR BEAT BARK SCALE KILL HAMMER TONE (MUSIC) WOOL EXTINGUISH MURDER HIT SPEECH CHAT (WITH SOMEBODY) WORD STORM THRESH LEATHER LIKE NEED (NOUN) FELT SKIN (OF FRUIT) PAPER OATH WANT SWEAR KICK SNAIL DEATH PULL OFF (SKIN) SHELL FIREPLACE PEN HAIR (BODY) LANGUAGE CONVEY (A MESSAGE) TELL LEAF (LEAFLIKE OBJECT) FEATHER POUR FLAME GO SING BEESWAX HELL GATHER CARRY SEIZE CATCH TRAP (CATCH) WING FIRE CARRY ON SHOULDER CAST MOW BOSS FIND FIN ADMIT TEACH LEAF SAILCLOTH HAIR ANSWER SAY FOOT CIRCLE GRAIN Largest connected component in CLICS² Clusters inferred with the Infomap Community Detection algorithm List et al. (u. rev.) 60 / 62
Cross-Linguistic Colexifications http://clics.clld.org List et al. (2018) CARRY IN HAND CARRY UNDER ARM RULE ORDER SALT TAKE CHOOSE LEND SHARE BRING FORGET ACQUIT HAVE SEX HAND LIBERATE DIRTY GUEST ARM BETWEEN UPPER ARM MOLD TORCH OR LAMP OWN GAP (DISTANCE) DRIP (EMIT LIQUID) FINGERNAIL OR TOENAIL RIVER KISS RAIN (PRECIPITATION) WHEN SPOON SUCK ROUND LICK FINGERNAIL CLAW SOUP DRINK FORK PITCHFORK WATER SEA OPEN SMOKE (INHALE) LET GO OR SET FREE CAUSE DIRT FORKED BRANCH SEND LIP FORGIVE UNTIE ANCHOR EAT BITE BEVERAGE SWALLOW SAP URINE ANKLE FISHHOOK WHEEL WHERE LIFT CHIEFTAIN LOWER ARM CAUSE TO (LET) QUEEN GIVE ELBOW DONATE ELECTRICITY SKY STORM CLOUDS MUD SWAMP SMOKE (EXHAUST) FRESH SMOKE (EMIT SMOKE) STRANGER CEASE MOORLAND HOST GO UP (ASCEND) WEDDING CLIMB CLOUD PALM OF HAND FIVE MARRY RISE (MOVE UPWARDS) WRIST KING PRESIDENT FATHOM COLLARBONE RIDE SPACE (AVAILABLE) MASTER SHOULDER BROOM RAKE FLESH HOOK DRIBBLE SPIT TOE PAW OCEAN FINGER LAKE EDGE OBSCURE TOP NIGHT INCREASE WORLD UP DARKNESS BE GOD CALF OF LEG LEG SHIN FISH LOWER LEG WOMAN FEMALE (OF PERSON) FEMALE FEMALE (OF ANIMAL) LAGOON CORNER BORDER BESIDE FRINGE BOUNDARY WIFE COAST POINTED SHARP SHORE PLACE (POSITION) END (OF SPACE) EARTH (SOIL) BLACK STAND UP CHEW MEAL BREAKFAST HEEL FOOD DINNER (SUPPER) FOOT STAR SAND CLAY STAND SHOULDERBLADE CRAWL WAKE UP FOG FINISH DARK MALE ICE WAIST MARRIED MAN HIP DEEP LUNG FOAM REMAINS BLUE WAIT (FOR) LIFE LATE BE ALIVE AFTER TOWN BEHIND ASH FLOUR STATE (POLITICS) NEW UPPER BACK BOTTOM PASTURE THATCH BUTTOCKS MAN MALE (OF ANIMAL) MALE (OF PERSON) SIT DOWN TALL CROUCH EVENING AFTERNOON HIGH WEST GROW MAINLAND SIT LAND FLOOR AREA HALT (STOP) DUST REMAIN GROUND NATIVE COUNTRY DWELL (LIVE, RESIDE) COUNTRY HUSBAND BACK END (OF TIME) SPINE GRASS DEW MARRIED WOMAN ROOSTER INSECT FOWL BIRD ANIMAL HEN SHORT BABY CORN FIELD THIN SAGO PALM GARDEN SMALL THIN (OF SHAPE OF OBJECT) CLAN NARROW FAMILY YOUNG CITIZEN FINE OR THIN SHALLOW THIN (SLIM) GIRL RELATIVES YOUNG MAN FRIEND PARENTS CHILD (DESCENDANT) YOUNG WOMAN BOY NEIGHBOUR CHILD (YOUNG HUMAN) SON SIBLING BROTHER DESCENDANTS OLDER SIBLING DAUGHTER ALONE FENCE ONLY FEW TOWER SOME ONE YARD OUTSIDE FORTRESS NEVER PLAIN PEOPLE VALLEY DOWN FIELD LOW PERSON YOUNGER SIBLING YOUNGER SISTER OLDER BROTHER YOUNGER BROTHER COUSIN SISTER OLDER SISTER NEPHEW DAMP FLOWER MANY SMOOTH WIDE FLAT BLOOD WET BELOW OR UNDER DOWN OR BELOW GREY BREAD DOUGH RAW VILLAGE GREEN CROWD SOFT AT ALL SLIP UNRIPE VEIN BLOOD VESSEL ALWAYS TENDON ROOF ROOT INSIDE OR GENTLE OLD WITH ENOUGH OLD (AGED) FORMER AND ROOM HOME TENT HUT GARDEN-HOUSE WEAK DENSE MEN'S HOUSE OLD MAN LAZY STILL (CONTINUING) TIRED AGAIN MORE READY OLD WOMAN SOMETIMES IN HOUSE OFTEN YELLOW RED AFTERWARDS BIG GOLD YOLK HOUR SALTY PINCH KNEEL AGE RIPE THICK FULL STRAIGHT BE LATE LIGHT (RADIATION) ABOVE WORK (ACTIVITY) PRODUCE MAKE DAY (NOT NIGHT) HEAVEN WORK (LABOUR) BUILD FAR AT THAT TIME LONG WHITE LENGTH THEN MOUNTAIN OR HILL SEASON HAVE PRESS GET PICK UP HEAD HOLD EARN DO OR MAKE WEATHER FATHER STEPFATHER UNCLE FATHER-IN-LAW (OF MAN) FATHER'S BROTHER MOTHER'S BROTHER STEPMOTHER AUNT BEGINNING BEGIN FIRST FATHER'S SISTER MOTHER-IN-LAW (OF WOMAN) MOTHER'S SISTER MOTHER MOTHER-IN-LAW (OF MAN) PARENTS-IN-LAW GRANDDAUGHTER SON-IN-LAW (OF WOMAN) FATHER-IN-LAW (OF WOMAN) SON-IN-LAW (OF MAN) DAUGHTER-IN-LAW (OF WOMAN) CHILD-IN-LAW SIBLING'S CHILD NIECE GRANDFATHER DAUGHTER-IN-LAW (OF MAN) IN FRONT OF FORWARD GRANDSON GRANDCHILD GRANDMOTHER ANCESTORS GRANDPARENTS THING STREET MANNER ROAD PIECE PORT PATH OR ROAD PATH RIB BONE BAIT THIGH BAY FLESH OR MEAT MEAT FOOTPRINT SIDE PART SLICE WALL (OF HOUSE) MIDDLE NAVEL SNOW LAST (FINAL) HAY HALF NEAR CHICKEN BULL SNAKE WORM CATTLE LIVESTOCK CALF OX COW WHICH WHITHER (WHERE TO) WINE HOW CIRCLE RING BALL BRACELET HOW MUCH HOW MANY BEEHIVE GRAVE CAVE BEARD RAIN (RAINING) SPRING OR WELL MOUSTACHE STREAM GLUE ALCOHOL (FERMENTED DRINK) BEE BEER HONEY WHO WASP MEAD WHAT WHY CANDY LUNCH ITEM WARE CUSTOM LAW MIDDAY PIT (POTHOLE) HOLE FURROW DITCH LAIR JUDGMENT COURT ADJUDICATE CONDEMN CONVICT ACCUSE BLAME ANNOUNCE PREACH EXPLAIN SAY ASK (REQUEST) THROW BUDGE (ONESELF) SHOOT EMBERS UGLY CHOP CUT DOWN COLD (OF WEATHER) FIREWOOD GRASP LEAD (GUIDE) DISTANCE LIE DOWN CARRY ON HEAD PERMIT PUSH MOLAR TOOTH FRONT TOOTH (INCISOR) RIDGEPOLE BEAK COAT TOWEL HELMET SHIRT HEADBAND HEADGEAR RAG VEIL SOON TOGETHER IMMEDIATELY NEST NOW BED TODAY INSTANTLY SUDDENLY RUG WITHOUT PONCHO BLANKET CLOAK MAT BEFORE BOLT (MOVE IN HASTE) ROAR (OF SEA) FAST DASH (OF VEHICLE) EARLY YESTERDAY HURRY AT FIRST EMPTY NO DRY ZERO NOTHING NOT RESULT IN BE BORN HAPPEN PASS SUCCEED BECOME BRAVE CLOTH POWERFUL DARE LOUD GRASS-SKIRT DRESS CLOTHES SKIRT RIPEN SOLID PIERCE HARD BEGET ROUGH REFUSE FRY DRESS UP DENY CALM MORNING PEACE BE SILENT QUIET SWELL TOMORROW HEALTHY EXPENSIVE HAPPY ROAST OR FRY STRONG BAKE PRICE BOIL (SOMETHING) PUT ON COOKED SLOW FAITHFUL RIGHT LAST (ENDURE) FOR A LONG TIME DAWN BEAUTIFUL GOOD COOK (SOMETHING) YES CORRECT (RIGHT) BOIL (OF LIQUID) DO PUT BRIGHT CLEAN LIGHT (COLOR) LAY (VERB) SHINE SEAT (SOMEBODY) INNOCENT FORBID PREPARE CERTAIN TRUTH TRUE DEAR PRECIOUS WARM HEAT CONCEIVE SEW LOOM PLAIT LIGHT (IGNITE) BURN (SOMETHING) PREVENT HOLY GOOD-LOOKING ARSON BEND CHANGE (BECOME DIFFERENT) BURNING TWIST DEBT CROOKED ROLL SPIN HEAVY HOT WEAVE DIFFICULT FEVER PLAIT OR BRAID OR WEAVE PREGNANT OWE TWINKLE CLEAR BEND (SOMETHING) MORTAR CRUSHER PESTLE BITTER MILL MONTH SKULL MEASURE TRY COME BACK TIME MOON COUNT JOIN SQUEEZE PILE UP CLOCK BUY DRAW MILK DAY (24 HOURS) BETRAY GUARD PROTECT PAY KNEE KEEP SELL SUN BILL HELP LIE (MISLEAD) TRADE OR BARTER DECEIT PERJURY RESCUE CURE FOLD SIEVE PRESERVE TRANSLATE TURN (SOMETHING) TURN WRAP HERD (SOMETHING) WAGES DEFEND CHANGE RETURN HOME TIE UP (TETHER) TURN AROUND HANG KNIT WEIGH HANG UP GIVE BACK CONNECT COVER BUTTON BUNCH KNOT SHUT BUNDLE TIE NOOSE GILL EAR EARLOBE THINK FOLLOW JEWEL BE ABLE OBEY SUMMER FEEL (TACTUALLY) REMEMBER SUSPECT BELIEVE GUESS RECOGNIZE (SOMEBODY) SOUR SWEET SUGAR CANE BRACKISH SUGAR TASTY CALCULATE IMITATE CITRUS FRUIT TASTE (SOMETHING) READ COME PRECIPICE SEE STONE OR ROCK APPROACH TOUCH ARRIVE YEAR MEET GRIND FRAGRANT ROTTEN SMELL (STINK) SMELL (PERCEIVE) STINKING SNIFF PUS FEEL UNDERSTAND HEAR THINK (BELIEVE) LISTEN MOVE (AFFECT EMOTIONALLY) KNOW (SOMETHING) NOTICE (SOMETHING) WATCH LEARN REEF STUDY LOOK FOR LOOK NASAL MUCUS (SNOT) SPLASH PITY HIDE (CONCEAL) SHELF FLY (MOVE THROUGH AIR) REGRET NOSTRIL THIEF BOARD SINK (DESCEND) DECREASE CHEEK NOSE BROKEN LOSE EMERGE (APPEAR) ANXIETY BAD LUCK GOOD LUCK OMEN WRONG SLAB FOREHEAD EYE BAD EVIL TABLE INJURE DANGER SURPRISED HARVEST BERRY FEAR (FRIGHT) NUT FAULT MISTAKE BECOME SICK SEED MISS (A TARGET) GUILTY SWELLING BRUISE BLISTER BOIL (OF SKIN) SCAR CHOKE ENTER ACHE SICK DISEASE PAIN DAMAGE (INJURY) SEVERE GRIEF SAUSAGE BEAD STOMACH INTESTINES CHAIN SPLEEN NECKLACE WOMB LIVER BELLY MEANING GHOST POSTCARD HEART LEGENDARY CREATURE SHADE DEMON BRAIN MEMORY FIGHT LETTER THOUGHT MIND BOOK COLLAR INTENTION SPIRIT PURSUE LONG HAIR SPRINGTIME HAIR (HEAD) THINK (REFLECT) DOUBT AUTUMN ORNAMENT HOPE ARMY QUARREL BEAT SOLDIER KNOCK BATTLE NOISE REST NAPE (OF NECK) THROAT NECK IDEA IF BECAUSE SLEEP FOREST DRIP (FALL IN GLOBULES) STICK TREE WALKING STICK PLANT (VEGETATION) LIE (REST) DRAG ASK (INQUIRE) DIVIDE URGE (SOMEONE) STING BRANCH CAMPFIRE BORROW SEPARATE TOOTH MOUTH CANDLE FALL ASLEEP DRIVE (CATTLE) MATCH DRIVE RAFTER BEAM DOORPOST DREAM (SOMETHING) POST MAST TUMBLE (FALL DOWN) WALK TREE TRUNK LAND (DESCEND) TEAR (SHRED) SAW GO OUT FALL TEAR (OF EYE) GO DOWN (DESCEND) BODY TREE STUMP SHOW CARVE SPOIL (SOMEBODY OR SOMETHING) BREAK (CLEAVE) PLANT (SOMETHING) DESTROY WALK (TAKE A WALK) CHIN BREAK (DESTROY OR GET DESTROYED) CUT PICK SPLIT LEAVE PULL CLUB WOOD MOVE (ONESELF) HIRE PRAISE MIX KNEAD WIPE SNEEZE BOAST SCRATCH CLEAN (SOMETHING) HOARFROST WORSHIP COUGH SWEEP RUB SCRAPE CARCASS DIE (FROM ACCIDENT) DIE BATHE SWIM DEAD FLOAT LOVE STAB SAIL PEEL SPREAD OUT CRY COMMON COLD (DISEASE) FROST CORPSE SHRIEK JUMP SHOUT DIG WINTER NAME STREAM (FLOW CONTINUOUSLY) PLOUGH CULTIVATE PLAY VISIBLE SEEM STRETCH SOW SEEDS RETREAT INVITE MUSIC RUN COLD HOLLOW OUT CHARCOAL TONGUE STOVE CONVERSATION SKIN DIVORCE OVEN EARWAX COOKHOUSE TIP (OF TONGUE) AIR HUNT BORE CALL BY NAME BREATH STEP (VERB) SONG ATTACK WASH PROUD SIN DEFENDANT CRIME CHIME (ACTION) EGG TESTICLES BARLEY FRUIT VEGETABLES GRAIN MAIZE RICE WHEAT RUDDER RYE PADDLE SWAY SWING (MOVEMENT) SWING (SOMETHING) SHAKE ROW FREEZE JOG (SOMETHING) OAT SHIVER RINSE RING (MAKE SOUND) MAKE NOISE SOUND (OF INSTRUMENT OR VOICE) TINKLE HOE SHOVEL SPADE FLOW DANCE FLEE CALL DAMAGE SAME FACE SIMILAR DISAPPEAR ESCAPE PRAY GAME BURY CAPE CHAIR MOVE STEAL GROAN HOWL COLD (CHILL) JAW DROWN SINK (DISAPPEAR IN WATER) SET (HEAVENLY BODIES) DIVE WOUND POUND TALK BREATHE PROMISE SPEAK WIND VOICE FUR PUBIC HAIR SOUND OR NOISE STRIKE OR BEAT BARK SCALE KILL HAMMER TONE (MUSIC) WOOL EXTINGUISH MURDER HIT SPEECH CHAT (WITH SOMEBODY) WORD STORM THRESH LEATHER LIKE NEED (NOUN) FELT SKIN (OF FRUIT) PAPER OATH WANT SWEAR KICK SNAIL DEATH PULL OFF (SKIN) SHELL FIREPLACE PEN HAIR (BODY) LANGUAGE CONVEY (A MESSAGE) TELL LEAF (LEAFLIKE OBJECT) FEATHER POUR FLAME GO SING BEESWAX HELL GATHER CARRY SEIZE CATCH TRAP (CATCH) WING FIRE CARRY ON SHOULDER CAST MOW BOSS FIND FIN ADMIT TEACH LEAF SAILCLOTH HAIR ANSWER SAY FOOT CIRCLE GRAIN Largest connected component in CLICS² Clusters inferred with the Infomap Community Detection algorithm List et al. (u. rev.) TONGUE TELL ANNOUNCE TALK ADMIT CHAT (WITH SOMEBODY) SAY WORD ANSWER LANGUAGE VOICE SOUND OR NOISE NOISE PREACH SPEECH TONE (MUSIC) EXPLAIN CONVERSATION CONVEY (A MESSAGE) SPEAK 60 / 62
ask completely stupid questions, but we should always work on questioning our key assumptions about language, language evolution, and how we study its synchronic and diachronic structures. Formulating open problems for our field is a first step towards their solution. Searching open prob- lems in our field that may have been overlooked so far is a first step to a deeper understanding of our research and our research subject. 61 / 62