Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing a Historical Thesaurus Semantic Tagger

Marc Alexander
September 06, 2014
130

Developing a Historical Thesaurus Semantic Tagger

Presented at the Digital Humanities Congress 2014

Authors (asterisk indicates presenting authors):
* Scott Piao, Lancaster University
* Fraser Dallachy, University of Glasgow
Alistair Baron, Lancaster University
Paul Rayson, Lancaster University
* Marc Alexander, University of Glasgow

Marc Alexander

September 06, 2014
Tweet

More Decks by Marc Alexander

Transcript

  1. Developing a Historical Thesaurus Semantic Tagger Scott Piao†, Fraser Dallachy*,

    Alistair Baron†, Paul Rayson†, and Marc Alexander* † Lancaster University, UK * University of Glasgow, UK @samuelsproject
  2. Dr Marc Alexander University of Glasgow Jean Anderson University of

    Glasgow Professor Dawn Archer University of Central Lancashire Dr Alistair Baron Lancaster University Professor Jonathan Hope University of Strathclyde Professor Lesley Jeffries University of Huddersfield Professor Christian Kay University of Glasgow Dr Paul Rayson Lancaster University Dr Brian Walker University of Huddersfield Brian Aitken University of Glasgow Dr Fraser Dallachy University of Glasgow Dr Scott Piao Lancaster University Professor Mark Davies Brigham Young University Professor Anthony Johnson Åbo Akademi University Ilkka Juuso University of Oulu Professor Tapio Seppänen University of Oulu Also Oxford University Press and, through a linked project, the University of Wisconsin-Madison and the Folger Shakespeare Library. Also Oxford University Press and, through a linked project, the University of Wisconsin-Madison and the Folger Shakespeare Library.
  3. Words, words. They’re all we have to go on. Tom

    Stoppard (1967), Rosencrantz and Guildenstern are Dead.
  4. The His of the O —the l in the w

    historic created Based o English contain US ary ty/the community 6A40C6A0=350C74A  693                 02.05      great- 4A       oe       g     6A40C6A40C andmother           4;3<>C74A  4     a 1225–      g     6A0=3<>C74A  ts & N. English )       g     14;30<4  –      g     6A0=3<0<<0  –      g     6A0=3<0      1340–      g     F7>;41A>C74A     1377–                 02.01      collectively            A87C641A>SAD       oe                  03      half-brother           70;51A>C74A      c 1330–                 03.01      by same father           5R34A4=1A>S>A       oe       g      1A>C74A2>=B0=6D8=40=     1880                 03.02      by same mother            F><11A>C74A     1647– a 1661                 04      bastard brother            7>A=D=61A>S>A       oe                  05      stepbrother           BC4?1A>C74A      1440–      g     BC4?     1933 ( colloq. )                  06      twin-brother           CF8=1A>C74A      1598–                 07      younger brother           2034C     1610–      g     1A>C74A:8=  03 Society The Historical Thesaurus of English, version 4.2. 2014. Glasgow: University of Glasgow. http://historicalthesaurus.arts.gla.ac.uk.
  5. Disambiguation 01.02.02.03.05.02.07|04 n Health and disease Medicinal potion/draught :: medicated

    wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  6. Disambiguation Semantic Context Distance 01.02.02.03.05.02.07|04 n Health and disease Medicinal

    potion/draught :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  7. Disambiguation Semantic Context Distance 01.02.02.03.05.02.07|04 n Health and disease Medicinal

    potion/draught :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857– 01.02.09.02.02.02 01.04.09.07.03|02.02 02.02.22.15|14 03.11.02.05.01|11
  8. Disambiguation Time Filtering 01.02.02.03.05.02.07|04 n Health and disease Medicinal potion/draught

    :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  9. Disambiguation Time Filtering (Present Day) 01.02.02.03.05.02.07|04 n Health and disease

    Medicinal potion/draught :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness:: shades of red:: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  10. Disambiguation Time Filtering (1400) 01.02.02.03.05.02.07|04 n Health and disease Medicinal

    potion/draught :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Plant producing edible berries:: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  11. Disambiguation Polyseme Density 01.02.02.03.05.02.07|04 n Health and disease Medicinal potion/draught

    :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  12. Disambiguation Polyseme Density 01.02.02.03.05.02.07|04 n Health and disease Medicinal potion/draught

    :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857–
  13. Disambiguation Human Scale Distance 01.02.02.03.05.02.07|04 n Health and disease Medicinal

    potion/draught :: medicated wine wine 1652– 01.02.05.13.09.02.02|06.14 n Plants Tree/plant producing edible berries :: grape-vine wine 1340/70–1632 01.02.09.02.02.02 n Food and drink Wine wine OE– 01.02.09.02.02.02.12 n Food and drink Non-grape and home-made wines wine 1398– 01.02.09.02.14.01|08 vt Make wine store wine/stock cellar wine c1645 01.02.09.02.19|05 vt Provide/serve (with) drink supply with specific drink wine 1862– 01.02.09.02.20.01|18 n Food and drink Drinking vessel :: glass wine 1848– 01.02.09.02.21|10.09 vi Drink drink intoxicating liquor :: drink wine wine (and dine) 1829– 01.04.09.07.03|02.02 n Colour Red/redness :: shades of red :: deep red /crimson wine 1895– 01.04.09.07.03|07 aj Pertaining to colour Red :: deep red/crimson wine 1950– 01.05.05.12.01.03.02|02.03 n Action/operation Patronage :: patron :: as lord/protector wine OE 01.05.05.15.01.05|17.06.03 n Action/operation Care/protection :: protector :: one who looks after a lord wine OE 02.02.22.12|01 n Love One who loves/a lover :: male lover wine OE 02.02.22.15|14 n Love Friendliness :: friend wine OE–1481 02.02.22.15|14.19 n Love Friendliness :: friend :: friendly/gracious lord wine OE 03.04.09.02.01.02.02|09 n Being subject to authority Attendant :: confidential servant/companion wine OE 03.07.05.15.07.02 n Artefacts Wine wine OE– 03.11.02.05.01|11 n Social event Party :: drinking-party wine 1857– hsd 2.5 1 0 1 0.5 0.5 0.5 1 1 0.5 2 1.5 0.5 0.5 1 2.5 0 0.5
  14. Disambiguation Other Methods Highly polysemous words set (345 meanings), run

    (302), strike (256), fall (206), cast (187), round (179), turn (174), point (169), slip (165), pass (160), shoot (159), take (158)... eg, take (199 results) – likely to be 02.07.13 (‘move a thing from an initial place into one’s possession’) Template rules eg, sneeze usually 01.02.02.01.04.18.14|02 vi ‘have respiratory spasm’, but if in a structure sneeze NP PP (eg ‘sneeze the napkin off the table’) then tag as 01.02.02.01.04.18.14|03 vt ‘eject/cast by sneezing’ Topic identification Collocation/machine learning (OED)
  15. EEBO-TCP Corpus (40,000 Early Modern English texts; almost all the

    books and pamphlets published in English before 1700) Hansard Corpus (2.3 billion words; approximately every word uttered in Parliament over the past two hundred years)
  16. Hansard and Parliament from Above A birds-eye view of parliamentary

    concerns over the past two centuries (University of Glasgow) Is There a Baron in the Commons? The representation of trade unions and their members in Hansard across the past hundred years (University of Huddersfield) Delineating Aggression Across Genres (1473-1700) The speech acts of aggression to explore the nuances of shifting meanings in EEBO-TCP (University of Central Lancashire)
  17. Development • The HTST has been developed by incorporating the

    Historical Thesaurus of English and Lancaster UCREL text annotation tools. • Aims to annotate lexical units with correct HT sense categories. • Capable of annotating multi-layer information that can be explored for various purposes. • Suitable for historical text processing with historical- spelling-variant component.
  18. Main Resources • The Glasgow Historical Thesaurus of English •

    A set of sub-lexicons and tag mapping lists: –Providing default senses for highly ambiguous lexical words; –Dealing with functional and closed-class words; –Mapping full HT codes to more abstract, broader thematic-level sense category codes. • Lancaster UCREL lexical resources: –USAS semantic lexicons; –VARD spelling normalisation models;
  19. Main Components • CLAWS POS tagger – One of the

    most accurate Part-of-speech taggers (http://ucrel.lancs.ac.uk/ claws). • USAS semantic annotator – A generic full text semantic tagger for coarse-grained sense (http:// ucrel.lancs.ac.uk/usas). • VARD – Spelling variant detector and normaliser (http://ucrel.lancs.ac.uk/vard). • HT-based semantic annotator (http:// phlox.lancs.ac.uk/ucrel/semtagger/english).
  20. Architecture Seman&c  Annota&on  System VARD CLAWS HT  sense  tagger USAS

     NLP   lexicon   resources USAS [HT-­‐related  resources] Historical  Thesaurus; Higher-­‐level  HT   categories; Linked  HT  categories; Highly  polysemous   words; Z-­‐category  words; Polyseme  density  list; Input  raw  text Annotated  text HT  sense   disambiguator Spelling  train   model
  21. Context Based Disambiguation • Pre-process text using VARD, CLAWS and

    USAS tagger. • Currently mainly employ context-distance based method to disambiguate words and MWEs that have multiple HT categories. – Filter by POS. – For each candidate category, extract all possible parent categories and collect headings (simple definition) of them, including current heading. Words in the headings form a feature set HWi = {h1 , h2 , …, hm }. – Collect up to five content words from each side of the key word/MWE. Together with the target word/MWE wt , they form a context feature set CW={wt , w1 , w2 , …, wn }. – Measure Jaccard Distance between CW and each HWi , and select the candidate categories (up to three) that have close distances to the context. • If the previous steps fail, – Check core HT categories of the key word from a manually compiled list. – If not found, check for default HT categories from polyseme density list. – If not found, select random HT categories.
  22. Time Filtering • Filter word senses whose usage appear outside

    a given time window in the HT dataset. • HTST allows users to set upper and lower time boundaries (in years) to increase the relevance of the HT word senses to the given time. –E.g. if a text was published in 1800, using the time filter, ignore the word senses which appear after that era. • Particularly useful for tagging historical data.
  23. Human Scale • The highly fine-grained full HT semantic classification

    (225,131 categories) can be not so human-friendly. • So, the HTST maps the full HT sense codes to more “human-scale” broader thematic-level categories (4,028 categories, designed by Kay et al.). E.g. => X12b02e (Cart/carriage/wagon) 03.09.02.02.02.05 (Cart/carriage/wagon) 03.09.02.02.02.05.01 (Parts of cart/carriage) 03.09.02.02.02.06 (Other non-self-propelled vehicles)
  24. Output • Six layers of annotation for input text. E.g.

    for input word “Children”: 1. Lemma – “child” 2. Part-of-speech – “NN2” 3. USAS semantic tag – “S2mf/T3- S4mf” 4. Multiword expression flag – “0” 5. HT sense code – “01.02.08.04.04” 6. Thematic level sense code – “B23c04”
  25. Access • Web demo site: http://phlox.lancs.ac.uk/ucrel/semtagger/english – Constrained access for

    quick trial. • GUI tool that is convenient for processing multiple texts . – Please contact us if you want a copy. • Access via Web sever client – Please contact us if you are interested. • (Soon) Access via WMatrix website at (http:// ucrel.lancs.ac.uk/wmatrix).
  26. Evaluation (1) • Ten texts were selected from different genres

    (e.g. spoken and written). • Publication time spans from 1820 to 2014. • Each text contains about 1,000 words. • Evaluated for both HT sense codes and thematic sense codes. • Examined the impact of the time filter. • Evaluation criterion: If top three of the candidate tags suggested by the system contain the correct tag(s), it is considered to be correct annotation. –In our evaluation, 81.18% and 12.74% of the correct tags were the first and second candidate tags respectively.
  27. Document Text type Pub year HT full cat. precision (%)

    HT main cat. precision (%) Thematic cat precision (%) Biography Written biography 1904 70.50 72.93 73.41 Convers Spoken conversation 2014 69.31 71.72 73.77 Email Email messages 2001 67.60 69.79 70.89 Fiction1 Fiction 1852 70.27 73.82 75.32 Fiction2 Fiction 1915 73.31 75.10 75.60 Hans1820 Hansard Speech 1820 76.73 79.45 80.33 Hans2001 Hansard Speech 2001 74.61 77.32 77.42 History Written history 1845 73.83 77.73 77.98 Journalism Written Periodical 1998 66.67 70.45 71.74 NewsCol Newspaper opinion 2010 70.69 73.43 74.91 Total 71.56 74.44 75.36 Evaluation (2)
  28. • Numerous errors were caused by the highly fine- grained

    sense classification. • The thematic level sense categories help to group the senses into a practically manageable level and improve the tool performance. • HTST generally performs better on formal texts. • HTST struggles with informal text, e.g. email. Evaluation (3)
  29. Document Pub year Without time filtering Low time filter 1800

    Low time filter 1900 Time window 1800-1900 Biography 1904 73.41 74.00 74.59 74.10 Convers 2014 73.77 74.40 75.02 74.31 Email 2001 70.89 71.98 72.47 71.37 Fiction1 1852 75.32 77.36 77.04 78.00 Fiction2 1915 75.60 76.69 77.09 76.49 Hans1820 1820 80.33 82.66 83.05 82.76 Hans2001 2001 77.42 79.65 81.10 79.65 History 1845 77.98 79.71 80.73 79.89 Journalism 1998 71.74 72.34 73.23 71.24 NewsCol 2010 74.91 76.62 77.42 76.40 Total 75.36 76.77 77.44 76.70 Evaluation (4) – Time Filtering
  30. Evaluation (4) – Time Filtering 64# 66# 68# 70# 72#

    74# 76# 78# 80# 82# 84# Biography## Convers## Email## Fic;on1## Fic;on2## Hans1820## Hans2001## History## Journalism## NewsCol## No#filter# low#filter#1800# low#filter#1900# ;me#window#1800F1900#
  31. • Time filtering affects the performance of HTST. • A

    low time boundary closer to the publication time appears to improve the performance. • A narrow time window appears to have negative impact. • Appropriate time filtering can potentially improve the performance significantly. Evaluation (5)
  32. Future Development • More methods and lexical resources will be

    explored for improving HTST. –HT sense distance/relations in contexts. –Statistical training model based on sense definitions and example sentences from OED data. • Integrate HTST into WMatrix corpus processing and retrieval system.