Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Data

Marc Alexander
December 11, 2014
59

Big Data

Presented at the CMH New Generations Medical Humanities Event, 2014

Marc Alexander

December 11, 2014
Tweet

More Decks by Marc Alexander

Transcript

  1. B I G D ATA C M H N E

    W G E N E R A T I O N S U N I V E R S I T Y O F G L A S G O W M H R C , D I G I TA L M E D I C A L H U M A N I T I E S W O R K S H O P M A R C A L E X A N D E R , U N I V E R S I T Y O F G L A S G O W
  2. B I G D ATA • Datasets at a scale

    which require rethinking how we process information for research • “In research terms, big data comprises information resources which are so large that they exceed the capacity of commonly used software and other tools, so that users have perforce to develop new approaches and methodologies to analyse them.” (AHRC) • Identified with the sciences – but the problem is at heart a humanistic one • And scale and rethinking information has long been the rationale behind the digital humanities
  3. “A new conjunction of scientist, curator, humanist, and artist is

    what the digital humanities must strive to achieve. It is the only way of ensuring that we do not lose our souls in a world of data.” Prescott, Andrew. 2012. An Electric Current of the Imagination: What the Digital Humanities Are and What They Might Become. Journal of Digital Humanities 1(2).
  4. B I G D ATA I N T H E

    H U M A N I T I E S • As data increases, so does interference (“noise”) • Text is the biggest challenge of big data in any field • Big data in the sciences is often highly ordered; in the humanities it is rarely so • Real language in use is chaotic, contradictory, complex, and counterintuitive (as are people)
  5. “Words, words. They’re all we have to go on.” Stoppard,

    Tom. 1967. Rosencrantz and Guildenstern are Dead.
  6. 01.03.01.05.02|03.05 n
 Health and disease :: Disorders of cattle/horse/sheep ::

    disorders of cattle/sheep :: other disorders strike (1933–) 01.03.03.04.14|02 vt
 Make healthy :: Practise physiotherapy :: rub/stroke with hands strike (1400 + 1611 + 1886 dial.) 01.02.04.04.03|07 vt
 Come by death :: Kill by specific method :: by poisoning strike (1592–1621) 01.06.10.08|01 vi
 Plant :: Be a root :: grow (as root) strike (1682–) 01.05.17.05.02|04 n
 Animals :: Suborder Ophidia (snakes) :: act of darting at prey strike (1879–) 01.10.03.03.02.03|12.06 vt
 Burn/consume by fire :: kindle/set alight :: produce (fire/spark) by striking strike (c1450– also fig.) 01.10.09.03.03|01 vi
 Dye :: sink in strike (c1790–) 01.13.06.01|04 vt
 Time :: Clock :: strike strike (1417–) 03.11.04.03|05 vi
 Carry on an occupation/work :: Participate in labour relations :: strike strike (1768–)
  7. 62% of English word forms
 refer to more than one

    meaning Of the 793,742 entries in the Historical Thesaurus
 of English there are 370,011 non-Old-English
 word forms, of which: 67 have more than 100 possible meanings 464 have more than 50 possible meanings 2,580 have more than 20 possible meanings 7,554 have more than 10 possible meanings 111,127 have more than 1 possible meaning 258,883 have just 1 possible meaning
  8. There are 27,230 words in the 
 Historical Thesaurus category

    Health and Disease, contained inside 6,595 concepts (4.3 words for every concept historically; English overall has 3.5). 3,307 medical word forms refer 
 to more than one meaning
 (144 have five or more)
  9. 01.03.01.04.08|04.01.07 n dolg (OE), þeorwenn (OE), wenbyl (OE), boil (OE–),

    kyle (1340–1579), botch (1377– dial.), anthrax (1398–), beal (c1400 + 1632 + 1783), carbuncle (1530–), froncle (1543 + 1547), knub (1570– dial.), bubukle (1599), nail (1600–1685), big (1601 + 1646), ouch (1612), bolwaie (1628), coal (1671), furuncle (1676–), Natal sore (1852–), gurry-sore (1897–)
  10. • Corpus of Historical American English: 464 results for physic

    • “The village doctor, with fate and physic in his eye, enters this abode of wretchedness, to insult the victim, whom he means to kill; hurries over some habitual queries, without waiting for a reply, and rushes to the door, leaving his patient to sink into the grave” (North American Review, July 1834: 135-167) • “our ministers are wicked and deceitful hypocrites; our physics poisoners and murderers; our lawyers all liars and knaves; our rich men all debauchees and oppressors” (Atheism in New-England, New England Magazine, December 1834: 500-508) • Annoying, but manageable.
  11. I N F L E C T I O N

    S • And then there are inflections of each word (cancer to the left appears in 47 different combinations and inflections and parts of speech in the Hansard Corpus, for example)
  12. S L I P P E RY M E A

    N I N G S • Even when you have a precise monosemous word (eg, pneumonia), there are issues of metaphor, metonymy, blending, and other forms of meaning extension: • “Whenever England caught an economic cold, Scotland got pneumonia.” (Hansard, House of Commons, 5 February 1976, vol 904 c1542).
  13. Semantic Annotation System VARD CLAWS HT sense tagger USAS NLP

    lexicon resources USAS [HT-related resources] Historical Thesaurus; Higher-level HT categories; Linked HT categories; Highly polysemous words; Z-category words; Polyseme density list; Input raw text Annotated text HT sense disambiguator Spelling training model
  14. As in the sciences, some of this research will come

    to nothing; a percentage of it will change how we account for particular works of literature; some may change how we understand the sweep of literary history. But there are no monsters, or fascists, under any of these beds. None of these questions is going to endanger the ways that literature spurs all of us to think. [...] The machine-driven projects of distant reading will humbly supplement — usually by just a little, and perhaps one day by a great deal — what we know about literature, just as historical data and biographical data have done all along. Selisker, Scott. 2012. The Digital Inhumanities? Los Angeles Review of Books, 5 November 2012. https://lareviewofbooks.org/essay/in-defense-of-data-responses-to- stephen-marches-literature-is-not-data
  15. R E S O U R C E S U

    S E D • BYU Corpora: http://corpus.byu.edu • SAMUELS Tagger: http://www.glasgow.ac.uk/samuels • Sketch Engine: https://the.sketchengine.co.uk/ • Historical Thesaurus of English: http://www.glasgow.ac.uk/thesaurus • You may wish to consider… • Apache Hadoop: http://hadoop.apache.org/ • JMP (http://www.jmp.com/) or Tableau (http:// www.tableausoftware.com/) for desktop visualisation • Open Corpus Workbench: http://cwb.sourceforge.net/ • You may also be interested in Moretti’s distant reading (Moretti, Franco. 2005. Graphs, Maps, Trees. London: Verso.)
  16. “a magnificent achievement of quite extraordinary value. It is perhaps

    the single most significant tool ever devised for investigating semantic, social, and intellectual history” Randolph Quirk 03.01 Society/the community 03.01.01 Kinship/relationship 03.01.02 Study of society 03.01.03 Society in relation to customs/values/beliefs 03.01.04 Social communication/relations 03.07.00.13 Conformity 03.07.00.14 Non-conformity 03.07.00.15 Apostasy 03.07.00.16 Sectarianism 03.07.00.17 Catholicity The Histori of the OED —the large in the world historical th created in a Based on th English Dic contains ev English from to the prese RUS nary ociety/the community  6A40C6A40C6A0=350C74A  74A     a 1693                 02.05      great- 05R34A       oe       g     6A40C6A40C 03      grandmother           4;3<>C74A  =30<4     a 1225–      g     6A0=3<>C74A  09 ( Scots & N. English )       g     14;30<4  4     1663–      g     6A0=3<0<<0  0=     1863–      g     6A0=3<0  olloq. )                  03.01      condition of           03.02      step-grandmother           03.03      great-grandmother           6A0=3<>C74A     1530–      g     6A40C     1340–      g     F7>;41A>C74A     1377–                 02.01      collectively            A87C641A>SAD       oe                  03      half-brother           70;51A>C74A      c 1330–                 03.01      by same father           5R34A4=1A>S>A       oe       g      1A>C74A2>=B0=6D8=40=     1880                 03.02      by same mother            F><11A>C74A     1647– a 1661                 04      bastard brother            7>A=D=61A>S>A       oe                  05      stepbrother           BC4?1A>C74A      1440–      g     BC4?     1933 ( colloq. )                  06      twin-brother           CF8=1A>C74A      1598–                 07      younger brother           2034C     1610–      g     1A>C74A:8=      1827–1856      g     :831A>C74A     1895– ( orig. US )                  08      foster-brother            5>BC4A1A>C74A   5>BC>A1A>S>A       oe –      g     BD2:8=654A4     1382      g      =>DA8B7431A>C74A     1470/85      g     =DAB454;;>F     1526      g     5>BC4A4A     03 03 Society
  17. 0 100 200 300 400 OE 1100 1200 1300 1400

    1500 1600 1700 1800 1900 2000
  18. 0 100 200 300 400 OE 1100 1200 1300 1400

    1500 1600 1700 1800 1900 2000
  19. Base Base with intensifier Change Change - Dye Change -

    Pigment Object Variant Variant with colour term Variant with intensifier Variant with ‘-ish’ Categories for Comparison
  20. (AVERAGE) White Black Red Green Yellow Blue Brown Grey Orange

    Purple Pink Change Change - Dye Change - Pigment Base Terms Base which Intensifies Object Variant Variant with Colour Word Variant which Intensifies Variant with ‘-ish’
  21. 03.01.03.02 Civilization 5 10 15 20 25 1350 1450 1550

    1650 1750 1850 1950 n: Lack of civilization aj: Uncivilized aj: Pertaining to civilization av: In uncivilized manner n: Civilization vt: Render uncivilized vt: Make civilized
  22. Uncivilized | Wild wild a1300– wildern a1300 fremd c1374 Chaucer,

    Troylus & Crysede (c1374): Al this world is blynd In this matere, bothe fremed and tame. bestial c1400– Mandeville’s Voyages (c1400): Thei weren but bestyalle folk, and diden no thing but kepten Bestes.
  23. Uncivilized | Wild savage c1420/30– Dryden, The Conquest of Granada

    (1672): I am as free as Nature first made man, 'Ere the base Laws of Servitude began, When wild in woods the noble Savage ran. warrigal 1855–(1890) Australian Old Bush Songs (1855): I'm a warragle fellow that long hath dwelt In the wild interior, nor hath felt, Nor heard, nor seen the pleasures of town.
  24. Uncivilized | Rough/Crude rude 1483– raw 1577– Harrison, England, in

    Holinshead, Chronicles (1587): Men, being as then but raw and void of ciiuilitie. ruvid 1632 Lithgow, The totall discourse of the rare adventures and painefull peregrinations of long nineteen yeares travayles (1632): The ruvid Cittizens, being Turkes, Moores, Iewes, … and Nostranes.
  25. Uncivilized | Barbar barbaric 1490-1533; a1837 The sense-development in ancient

    times was (with the Greeks) ‘foreign, non-Hellenic,’ later ‘outlandish, rude, brutal’; (with the Romans) ‘not Latin nor Greek,’ then ‘pertaining to those outside the Roman empire’; hence ‘uncivilized, uncultured,’ and later ‘non-Christian,’ whence ‘Saracen, heathen’; and generally ‘savage, rude, savagely cruel, inhuman’. [J.A. H. Murray, Etymology for barbarous, A New English Dictionary on Historical Principles, Fa.3, 1887]
  26. Uncivilized | Barbar barbaric 1490-1533; a1837 Aikin, General Biography (1799):

    At length, he came forth in all the splendor of his imperial dignity to give them an amicable welcome, and the Spanish historians employ the loftiest terms in describing the barbaric grandeur of his appearance. barbar 1535-a1726 barbarous 1538- barbarious 1570-1762 barbarian 1591- 
 semi-barbarous 1798- semi-barbaric 1864
  27. Uncivilized | Civilness incivil 1586 uncivilized 1607– incivilized 1647 Cowley,

    Welcome (The Mistress) (1647):
 Either by savages possest, 
 Or wild and uninhabited? 
 What joy couldst take, or what repose, 
 In countries so unciviliz'd as those?
  28. Uncivilized | Civilness inhumane a1680 Butler, Remains (a1680): There's nothing

    so absurd, or vain, Or barbarous, or inhumane, But if it lay the least Pretence To Piety and Godliness… Does sacred instantly commence. irreclaimed 1814 pre-civilized 1953–
  29. Uncivilized | The Other Scythical 1559-1602 Herring, Anatomyes of the

    true physition and counterfeit mounte-banke (1602): Such Schythicall… torturing and massacring of Men. negerous 1609
  30. Uncivilized | The Other mountainous 1613-1851 Mainwaring and Oldmixton, in

    Ellis, Swift vs. Mainwaring (1711): England… bounded on the North by a poor mountainous People call'd Scots. tramontane 1739-1832
  31. Uncivilized | The Other jungle 1908– jungli 1920– Chambers’ Journal

    (Jan 1927): Already he ceases to be jungli*.
 Note: Wild and boorish, a clodhopper or uneducated peasant.
  32. Sydney Smith, Letter to Francis Jeffrey (Mar 1814): When shall

    I see Scotland again? Never shall I forget the happy days I passed there amidst odious smells, barbarous sounds, bad suppers, excellent hearts, and most enlightened and cultivated understandings.