Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mozilla Festival (MozFest) 2017 - How Can Linguistics Help a Healthy Internet?

Mozilla Festival (MozFest) 2017 - How Can Linguistics Help a Healthy Internet?

Marc Alexander, Fraser Dallachy, and Jennifer Smith

Marc Alexander

October 29, 2017
Tweet

More Decks by Marc Alexander

Other Decks in Education

Transcript

  1. “Thirty years ago when this research started it was considered

    impossible to process texts of several million words in length. Twenty years ago it was considered marginally possible but lunatic. Ten years ago it was considered quite possible but still lunatic. Today it is very popular.” John Sinclair 1991
  2. ‣ General corpus ‣ Monitor corpus ‣ Specialised corpus ‣

    Learner corpus ‣ Bi-/multilingual parallel corpus ‣ Bi-/multilingual comparable corpus ‣ Diachronic/synchronic corpora
  3. British National Corpus, late 20th century (BNC), 100m Brown Corpus

    1963–64 (Brown), 1m Corpus of Contemporary American English 1990– (COCA), 450m Corpus of Early English Correspondence 1403-1800 (CEEC), 5.1m Corpus of Historical American English 1810-2009 (COHA), 400m Corpus of Modern Scottish Writing 1700-1945 (CMSW), 5.5m Dictionary of Old English Corpus (DOEC), 3m Global Web-Based English Corpus (GloWbE), 1.9bn Google Books Corpora, 200bn Hansard Corpus 1803-2005 (Hansard), 1.6bn Helsinki Corpus c.730–1710 (HC), 1.5m International Corpus of English (ICE), 1m for each region (inc ICE-GB) Michigan Corpus of Academic Spoken English (MICASE), 1.8m Newcastle Electronic Corpus of Tyneside English 1969–1994 (NECTE) Old Bailey Corpus 1720-1913 (OBC), 13.9m Scottish Corpus of Texts & Speech (SCOTS), 4m Time Corpus 1923-2006 (TIME), 100m Zurich English Newspaper Corpus 1661-1791 (ZEN), 1.6m
  4. Bart: You're up to something, aren’t you?
 Homer: No! I'm

    just going out to commit certain deeds.
  5. acwelan OE • asteorfan OE • aswindan OE • becwelan

    OE • belifan OE • (ge)cringan OE • deagan OE • (ge)deorfan OE • (ge)dreosan OE • gefeallan OE • feorh agiefan OE • feorh gesellan OE • feorh losian OE • geferan OE • feran forþ OE • forfaran OE • forlætan OE • forsiþian OE • forswealtan OE • forþ(ge)feran OE • forþ gelæded beon OE • forþ(ge)leoran OE • gæst ofgiefan OE • gegan OE • gast onsendan OE • glidan OE • hweorfan OE • leoran OE • lif geendian OE • linnan ealdre OE • (ge)losian OE • oþcwelan OE • sawlian OE • tostencan OE • towitan OE • unætnessa gebidan OE • gewitan of lice OE • i-wite < gewitan OE–c1205 • of worlden iwiten < gewitan of weorulde OE–c1205 • forswelt < forsweltan OE–a1225 • adeaden < adeadian OE–c1230 • quele < cwelan OE–a1250 • forworth < forweorþan OE–a1300 • aswelt < asweltan OE–c1300 • forthfare < forþ(ge)faran OE–c1350 • fare < gefaran OE–1377 • forfere < forferan OE–a1400 • dead < deadian OE–c1425 • wite < witan OE–c1480 • wend (forth, hence, etc.) < wendan (heonan, etc.) OE–1567 • starve < steorfan OE–a1657 • end < geendian OE– 1858 • give up the ghost < gast agiefan OE– • swelt < sweltan OE–c1300 + c1375– dial. • die c1135– • wend to/into (heaven, hell, bliss) c1200–c1480 • let (one's) life c1200–1577/87 • to-swelt c1205 • to-worth c1205 • wend of life c1250 • yield (up) the ghost (soul, breath, life) c1290–1627 + 1844– arch. • take the way of death 1297 • go/depart (out) of this world 1297–c1588 • flee a1300 • leave one's breath a1300 • take (one's) fine a1300 • die up a1300–1563/87 • spill a1300–1592 • pass (hence) a1300– • shed (one's own) blood a1300– • tine/leave/lose the sweat c1320–1513 • leese one's life-dawes c1325 • part hence c1325 • do (one's) fine c1330 • miscarry a1340–1749 • trance 1340–a1500 • flit 1340–1619 • determine c1374 + 1607 • pass away c1375 + 1806– • disperish 1382(3) • be gathered to one's fathers/people 1382– • shut one's life 1390 • go 1390– • fine a1400 • part of this life a1400 • sye hethen/of life a1400–a1400/50 • expire a1400– • tine a1400–c1475 + 1570– Scots • seek out of life c1400 • pass the ghost c1400–1621 • leave one's life c1400–1635 • go west c1400– • have the death c1435 • decease 1439– • ungo c1450 • expire the soul c1450–1715/20 • take the death c1470 • espire 1483 • pay the debt of nature 1494– • vade 1495–a1678 • depart 1501– • decease this world 1515 • trepass 1523(2) • galp up the ghost 1529 • vade away 1530–1625 • trepass this life a1533 • end one's days a1533– • die the death 1535– • change one's life 1546 • jet 1546 • depart to God 1548 • play topple up tail 1573 • peak over the perch 1575–1633 • inlaik c1575–1785 Scots • finish 1578 + 1611 • ghost a1586 + 1689 • pitch over the perch 1587 • relent 1587 • unbreathe 1589 • pick over the perch 1591 • transpass 1592 • breathe one's last (breath) 1593– • lose one's breath 1596 • walk the way of nature 1597 • depart this life 1597– • part 1599– • go off 1605– • go the way of all flesh 1609– • go away 1611 • make a die (of it) 1611–1883 slang • fail 1613 • drop 1654 slang & colloq. • pay nature's due 1657 • kick up a1658 + 1813 • cross Jordan 1684– • march off 1693/4 • pike off 1697 • bite the ground 1697–1813 • die off 1697– • go out 1697– • drop off 1699– slang & colloq. • tip off a1700–1735 slang • knock off a1704 slang • bite the sand 1718 • vent one's soul 1718 • sink 1718–1804 • launch into eternity 1720–1812 • demise 1727–1783 rare • tip (over) the perch 1737–1808 slang • bite the dust 1750– • slip one's cable 1751– Nautical • turf it 1763 slang • move off 1764 colloq. • join the majority 1764– • pop off/off the hooks 1764– slang • pack off 1766 + 1914– • fall 1780– • kick the bucket 1785– • hop the perch 1791–1822 • hop (off) 1797– • pass on 1804/20– • exit 1806– • croak 1812– slang • go to glory 1814– colloq. • go home 1816 dial. • sough away 1816– Scots • slip one's breath/ wind a1819–1896 slang & colloq. • stiffen 1820 • buy 1825– slang • drop short 1826 slang & colloq. • go over to the majority 1837 • fall a sacrifice to 1839– • drop/slip etc. off the hooks 1840– slang • succumb 1849– • cash/pass/send in one's checks 1857– colloq., chiefly US • walk 1858 slang • turn one's toes up 1860 • go/be up the flume 1865– US slang • snuff out 1865– slang & colloq. • hand in one's checks 1870 US colloq. • peg out 1870– • pass in one's cheques 1872– slang • go bung 1882 Austral. & NZ slang + 1885 Austral. & NZ slang • pass over to the majority 1883 • go/pass to one's reward 1883– iron., orig. US • step out 1884–1903 US slang • get one's/the call 1884– dial. & Literary • snuff it 1885– slang • perch 1886 slang • end up 1886– slang • go up the flume 1888 • knock over 1892 slang & colloq. • pass out 1899– colloq. • pass over 1909– • silver cord is loosed 1911– • pip (out) 1913 + 1920 • cop it 1915– slang • snuff 1916 slang & colloq. • stop one 1916– colloq., orig. Military • conk (out) 1918– colloq. • kick off 1921– slang, orig. US • shuffle off 1922– colloq. • pack up 1925 Dict. • step off 1926 slang • take the ferry 1928 • off it 1930 slang • cross over 1930 euphem. + 1935 euphem. • meet one's maker 1933– • have had it 1952– • crease it 1959 slang, orig. US • zonk 1968 • kiss off 1970
  6. darling < deorling OE– • culver a1225–1491 • belamy a1225–1689

    • dear a1225– • sweetheart c1290– • sweet c1300– • heart c1305– • honey c1350– • (my) love c1369– • cinnamon c1386 • honeycomb c1386 + 1552 • (my) dove c1386– • mulling c1450–a1529 • daisy c1485–a1605 • turtle a1500–1865 • powsowdy/powsoddy 1500/20 Scots • suckler 1500/20 Scots • honey-sop 1500/20–1606 • butting a1528 • whiting a1529 • fool c1530–a1586 • beautiful 1535 + 1819 • turtle- dove 1535–1856 • bully 1538–1754 • lamb a1553– 1820 • coz 1559–1849 • sweet-love a1560 • (my) ding-ding 1564–1602 • pug 1566–1611 • golpol 1568 • sparling 1570 • soul 1581– • mopsy 1582– 1706 • bulkin 1583 • chuck 1588– • wanton 1589– 1616 • joy 1590–1875 + 1876 dial. • duck 1590– • ladybird 1592–1656 + 1858 • sweetikin/ sweetiekins 1596 + 1974– • muss 1598 + 1598(2) • honeysuckle 1598–1638 • sweetkin 1599 • pinkeny/pinkany 1599–1622 • bawcock 1599(2) colloq. + 1862 colloq. • sparrow c1600 • sucket 1605 • tickling 1605 • wanton 1605–1812 • nutting 1606 • bagpudding 1608 • flitter-mouse 1610 • (my) ding-dong a1611 • dainty 1611 • bun c1614 • fub(b)/fub(b)s 1614–1721 • bulch c1622 • duckling 1629–1716 • bulchin 1633–1725 • sweetling 1648– 1903 • frisco a1652 • bunting 1665 • deary/dearie 1681– • cocky 1687 + 1789 Scots • nykin 1693 • pinkaninny 1696 • nug a1700 • chucky 1727–1840 • lovey 1731– • puss 1753– • dovie 1769– • pretty 1773– • sweetie 1778– colloq., orig. US • ducky 1819– • lovey-dovey 1819– • toy 1822 • treat 1825 colloq. • machree 1829– • alanna 1839– Irish • (my) cabbage 1840– • acushla 1842– Irish • pet 1849– • pebble 1851– chiefly Austral. • old thing 1864– colloq. • macushla 1887– Irish • bach 1889– • prawn 1895 • so-and-so/soandso 1897– • luv 1898– colloq. • hon 1906– colloq. • honey-bun 1911– colloq. • honey-bunch 1911– colloq. • lover 1911– colloq., orig. & chiefly US • snookums 1919– • treasure 1920– • pussums 1924– colloq. • honey chile 1926– chiefly US colloq. • sugar 1930– colloq. • ducks 1936– • pumpkin(s) 1942– US colloq. • honey-baby 1948– colloq. • lamb chop 1962– • luvvy/luvvie 1968– colloq.
  7. BIOLOGY IMAGINATION noun, 1490– fertility quality/capacity of procreation transitive verb,

    a1300– conceive conceive transitive verb, 1483– father beget transitive verb, c1340– conceive imagine/visualize transitive verb, 1548– father contrive/devise/invent noun, 1666– fertility productivity of inventive/creative facility noun, a1529– pregnancy pregnancy/gestation noun, 1550-1833 pregnancy productivity of inventive/creative faculty
  8. ‣ Difficult to say something definitive about a word if

    searching for it misses instances of misspelling or historical variation, or includes the wrong meaning of the word ‣ Improvement of automatic processing of language ‣ Can we find repeated grammatical structures in sentences / around certain words? ‣ Can we say how often a word was used and in what circumstances across a range of time? ‣ Can we spot when someone is talking about, say, having lunch when they don’t use the word lunch? ‣ Can we identify whether the appearance of a particular word in a sentence (say, cat) makes it very likely that another (say, kitten) will be used too? In what ways are these useful?
  9. ‣ And I said ‘What are you doing?’ ‣ And

    I was like ‘What are you doing?’ Language Changes
  10. ‣ 120+ locations across Scotland ‣ two participants in 18-25

    age group, two in 65+ group ‣ judgment questionnaire ‣ recorded conversations, fully text-to- sound aligned The Atlas
  11. DJ: I think they've got to be really friendly that

    way but in tenements it seemed to be much easier for people to walk in and out of their house you know no no problem MJ: You got a bit of meat for soup and it would be passed round all the doors for to get everybody make a pot of soup out of one bit of meat that's what happened in our place it happened DJ: Oh aye these things happened aye that's right Airdrie, central belt
  12. Fraserburgh, North East KB: I canna- I canna– we dinna

    speak in 
 Doric really, we speak half and half DB: s- we just speak KB: divnt we? we speak half and half, half Doric half English, half fittever INT: fa kens, fa kens DB: (clk) I ken I ken KB: if I k-- mhm I daa ken my quine, I w- I would say it they'll say fitte- fitna granny's that you had? DB: oh well, it doesna matter eh