Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ZeuScansion: a Tool for Scansion of English Poetry

Manex Agirrezabal
July 15, 2013
440

ZeuScansion: a Tool for Scansion of English Poetry

Manex Agirrezabal

July 15, 2013
Tweet

Transcript

  1. ZeuScansion ZeuScansion: a tool for scansion of English poetry Manex

    Agirrezabal1, Mans Hulden2, Bertol Arrieta1, Aitzol Astigarraga1 (1) Euskal Herriko Unibertsitatea / University of the Basque Country (UPV/EHU) (2)University of Helsinki July 15th, 2013 11th Conference on Finite-State Methods and Natural Language Processing St. Andrews, Scotland https://zeuscansion.googlecode.com 1/38
  2. ZeuScansion First of all, you should know that... I’m not

    a native English speaker. I’m not an expert in English poetry. 2/38
  3. ZeuScansion Outline 1 Scansion–what is it? Metrical patterns 2 Scansion–why

    do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 3/38
  4. ZeuScansion Scansion–what is it? 1 Scansion–what is it? Metrical patterns

    2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 4/38
  5. ZeuScansion Scansion–what is it? Rhythm Loudness, pitch, duration, syntax, frequency

    of polysyllabic words... contribute to the rhythm of the language. ...I„f ˆt‘h€e Œs€ca’nŒsˆi€o”n‡ €o†f €a‡ „l‰i’n€e •m€ea’nˆt €a„l…l ˆt‘h€e Œph€o”n€e‰t‰i€c „fa€c‰tš, •n€o ˆt“w‚o „l‰i’n€eš •w‚oŠu„ld‡ Œs€ca’n‡ ˆt‘h€e Œs€a’m€e •w‚a’y... C.S.Lewis en.wikipedia.org 5/38
  6. ZeuScansion Scansion–what is it? Rhythm Loudness, pitch, duration, syntax, frequency

    of polysyllabic words... contribute to the rhythm of the language. ...I„f ˆt‘h€e Œs€ca’nŒsˆi€o”n‡ €o†f €a‡ „l‰i’n€e •m€ea’nˆt €a„l…l ˆt‘h€e Œph€o”n€e‰t‰i€c „fa€c‰tš, •n€o ˆt“w‚o „l‰i’n€eš •w‚oŠu„ld‡ Œs€ca’n‡ ˆt‘h€e Œs€a’m€e •w‚a’y... C.S.Lewis en.wikipedia.org Meter The elements that influence the meter are very limited. In the case of English poetry, meter is described as a sequence of feet (a grouping of syllables usually containing one stressed syllable). en.wikipedia.org 5/38
  7. ZeuScansion Scansion–what is it? Scansion Scansion is the act of

    determining and graphically representing the metrical character of a line of verse. en.wikipedia.org 6/38
  8. ZeuScansion Scansion–what is it? Metrical patterns ZeuScansion knows the most

    common metrical patterns in English poetry. Disyllabic feet - - pyrrhus - ’ iamb ’ - trochee ’ ’ spondee Trisyllabic feet - - - tribrach ’ - - dactyl - ’ - amphibrach - - ’ anapest - ’ ’ bacchius ’ ’ - antibacchius ’ - ’ cretic ’ ’ ’ molossus 7/38
  9. ZeuScansion Scansion–what is it? Metrical patterns IAMB [to be] [or

    not] [to be] ... - ’ - ’ _ ’ TROCHEE [Tyger], [Tyger], [burning] bright ’ - ’ - ’ - [In the] [forests] [of the] night ’ - ’ - ’ - 8/38
  10. ZeuScansion Scansion–what is it? Metrical patterns IAMB [to be] [or

    not] [to be] ... - ’ - ’ _ ’ TROCHEE [Tyger], [Tyger], [burning] bright ’ - ’ - ’ - [In the] [forests] [of the] night ’ - ’ - ’ - DACTYL [ Woman much] [missed, how you] [call to me], [call to me] ... ’ _ _ ’ _ _ ’ _ _ ’ _ _ 8/38
  11. ZeuScansion Scansion–why do we need it? 1 Scansion–what is it?

    Metrical patterns 2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 9/38
  12. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system 10/38
  13. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system Poetry analysis system 10/38
  14. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system Poetry analysis system Rhymes, 10/38
  15. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system Poetry analysis system Rhymes, vocabulary, 10/38
  16. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system Poetry analysis system Rhymes, vocabulary, metaphors... 10/38
  17. ZeuScansion Scansion–why do we need it? My PhD work IN

    PROGRESS!! Poetry generation system Poetry analysis system Rhymes, vocabulary, metaphors... and also meter! 10/38
  18. ZeuScansion Related work 1 Scansion–what is it? Metrical patterns 2

    Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 11/38
  19. ZeuScansion Related work Scandroid (Hartman, 2005): Iambic and anapestic feet

    only. AnalysePoems (Plamondon, 2006): Identifies patterns, does not impose them. Rhymes also checked. 12/38
  20. ZeuScansion Related work Scandroid (Hartman, 2005): Iambic and anapestic feet

    only. AnalysePoems (Plamondon, 2006): Identifies patterns, does not impose them. Rhymes also checked. Calliope (McAleese, 2007): Scandroid improvement using linguistic theories. 12/38
  21. ZeuScansion Related work Scandroid (Hartman, 2005): Iambic and anapestic feet

    only. AnalysePoems (Plamondon, 2006): Identifies patterns, does not impose them. Rhymes also checked. Calliope (McAleese, 2007): Scandroid improvement using linguistic theories. (Greene et al., 2010): Statistical methods in the analysis. It uses WFST for stress assignment. 12/38
  22. ZeuScansion Scansion–why it’s difficult? 1 Scansion–what is it? Metrical patterns

    2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 13/38
  23. ZeuScansion Scansion–why it’s difficult? Input: Woman much missed how you

    call to me call to me Output: [’ - - ][ ’ - - ][’ - -][’ - -] 14/38
  24. ZeuScansion Scansion–why it’s difficult? Input: Woman much missed how you

    call to me call to me Output: [’ - - ][ ’ - - ][’ - -][’ - -] Problem no1: we can’t just use a dictionary Woman much missed how you call to me call to me ’ - ’ ‘ ’ ’ ’ - ’ ’ - ’ 14/38
  25. ZeuScansion Scansion–why it’s difficult? Input: Woman much missed how you

    call to me call to me Output: [’ - - ][ ’ - - ][’ - -][’ - -] Problem no1: we can’t just use a dictionary Woman much missed how you call to me call to me ’ - ’ ‘ ’ ’ ’ - ’ ’ - ’ Problem no2: Divide stresses into feet Woman much missed how you call to me call to me [’ - - ][ ’ - - ][’ - -][’ - -] 14/38
  26. ZeuScansion Scansion–why it’s difficult? Input: Woman much missed how you

    call to me call to me Output: [’ - - ][ ’ - - ][’ - -][’ - -] Problem no1: we can’t just use a dictionary Woman much missed how you call to me call to me ’ - ’ ‘ ’ ’ ’ - ’ ’ - ’ Problem no2: Divide stresses into feet Woman much missed how you call to me call to me [’ - - ][ ’ - - ][’ - -][’ - -] Problem no3: There are some unknown words: Wanna brewsky? brewsky -> brisky 14/38
  27. ZeuScansion Scansion–why it’s difficult? Problem no1: Groves’ rules (POS-tag based)

    implemented using FST Problem no2: Global analysis Problem no3: FST-based closest word finder 15/38
  28. ZeuScansion ZeuScansion: the output 1 Scansion–what is it? Metrical patterns

    2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 16/38
  29. ZeuScansion ZeuScansion: the output I dwell in possibility (Emily Dickinson)

    I dwell in Possibility A fairer House than Prose More numerous of Windows Superior for Doors Of Chambers as the Cedars Impregnable of Eye And for an Everlasting Roof The Gambrels of the Sky Of Visitors the fairest For Occupation This The spreading wide my narrow Hands To gather Paradise 1 - / - \-/-- 2 - \- / - / 3 / /-- - \- 4 -/-- - \ 6 - \- - - \- 7 -/-- - / 8 - - - \-/- / 9 - \- - - / 11 - \-- - \- 12 - \-/- - 13 - \- / - /\ \ 14 - /- /-\ 17/38
  30. ZeuScansion ZeuScansion: technical details 1 Scansion–what is it? Metrical patterns

    2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 18/38
  31. ZeuScansion ZeuScansion: technical details English poetry text T okenizer POS-tagger

    1st step 2nd step RHYTHMI-METRICAL SCANSION GROVES' RULES Metrical information Are the words in the dictionary? Y Closest word finder N Cleanup 19/38
  32. ZeuScansion ZeuScansion: technical details Tokenizer and POS-tagger I €d’wƒe…l…l ˆi’n‡

    Œp€osŒsˆi„bŠi„l‰iˆt“y I dwell in possibility ... 20/38
  33. ZeuScansion ZeuScansion: technical details Tokenizer and POS-tagger I €d’wƒe…l…l ˆi’n‡

    Œp€osŒsˆi„bŠi„l‰iˆt“y I dwell in possibility ... I+PRP dwell+VBP in+IN possibility+NN ... 20/38
  34. ZeuScansion ZeuScansion: technical details Pronunciation lexicon NETtalk (Sejnowski and Rosenberg,

    1987) CMU Pronouncing Dictionary (Weide, 1998) I+’+PRP dwell+’+VBP in+’+IN possibility+‘ ’ +NN ... 21/38
  35. ZeuScansion ZeuScansion: technical details Groves’ rules -Rules of thumb to

    provide a reasonable stress asignment for scansion -Needs access to POS information 1such as names, verbs, adjectives and adverbs 22/38
  36. ZeuScansion ZeuScansion: technical details Groves’ rules -Rules of thumb to

    provide a reasonable stress asignment for scansion -Needs access to POS information 1 Stress the primarily stressed syllable in content words1 1such as names, verbs, adjectives and adverbs 22/38
  37. ZeuScansion ZeuScansion: technical details Groves’ rules -Rules of thumb to

    provide a reasonable stress asignment for scansion -Needs access to POS information 1 Stress the primarily stressed syllable in content words1 2 Stress the secondarily stressed syllable of polysyllabic content words 1such as names, verbs, adjectives and adverbs 22/38
  38. ZeuScansion ZeuScansion: technical details Groves’ rules -Rules of thumb to

    provide a reasonable stress asignment for scansion -Needs access to POS information 1 Stress the primarily stressed syllable in content words1 2 Stress the secondarily stressed syllable of polysyllabic content words and the most strongly stressed syllable of polysyllabic words. 1such as names, verbs, adjectives and adverbs 22/38
  39. ZeuScansion ZeuScansion: technical details Groves’ rules TOKENIZE POS-tagger 1st step

    2nd Step CleanUp -------------------------------------------------------------------------------------- I I+PRP I+-+PRP I+-+PRP - dwell dwell+VBP dwell+’+VBP dwell+’+VBP ’ in in+IN in+-+IN in+-+IN - possibility possibility+NN possibility+--’--+NN possibility+‘-’--+NN ‘-’-- English poetry text T okenizer POS-tagger 1st step 2nd step RHYTHMI-METRICAL SCANSION GROVES' RULES Metrical information Are the words in the dictionary? Y Closest word finder N Cleanup 23/38
  40. ZeuScansion ZeuScansion: technical details Closest word finder FST-based system that

    finds the closest spelled word in the dictionary. We need this because English is not a phonemic language 24/38
  41. ZeuScansion ZeuScansion: technical details Closest word finder FST-based system that

    finds the closest spelled word in the dictionary. We need this because English is not a phonemic language W€e €c‘hˆu’mŒp€ed‡ €a’n€d‡ €c‘h€a’wƒed‡ ˆt‘h€e „bŠuˆt‰te‰r€ed‡ ˆto‚aŒsˆt Phantasmagoria and other poems, Lewis Carroll 24/38
  42. ZeuScansion ZeuScansion: technical details Closest word finder FST-based system that

    finds the closest spelled word in the dictionary. We need this because English is not a phonemic language W€e €c‘hˆu’mŒp€ed‡ €a’n€d‡ €c‘h€a’wƒed‡ ˆt‘h€e „bŠuˆt‰te‰r€ed‡ ˆto‚aŒsˆt Phantasmagoria and other poems, Lewis Carroll chumped and chawed are not in the dictionary. 24/38
  43. ZeuScansion ZeuScansion: technical details Closest word finder FST-based system that

    finds the closest spelled word in the dictionary. We need this because English is not a phonemic language W€e €c‘hˆu’mŒp€ed‡ €a’n€d‡ €c‘h€a’wƒed‡ ˆt‘h€e „bŠuˆt‰te‰r€ed‡ ˆto‚aŒsˆt Phantasmagoria and other poems, Lewis Carroll chumped and chawed are not in the dictionary. We must find a similarly pronounced word. 24/38
  44. ZeuScansion ZeuScansion: technical details Closest word finder Devised word change

    rules: 1 At the end of the word, higher cost (Word splitter) 25/38
  45. ZeuScansion ZeuScansion: technical details Closest word finder Devised word change

    rules: 1 At the end of the word, higher cost (Word splitter) 2 We only allow a maximum of 2 character changes. 25/38
  46. ZeuScansion ZeuScansion: technical details Closest word finder Devised word change

    rules: 1 At the end of the word, higher cost (Word splitter) 2 We only allow a maximum of 2 character changes. 3 Character change order: 25/38
  47. ZeuScansion ZeuScansion: technical details Closest word finder Devised word change

    rules: 1 At the end of the word, higher cost (Word splitter) 2 We only allow a maximum of 2 character changes. 3 Character change order: 1 1 vowel 2 1 consonant 3 2 vowels 4 1 vowel and 1 consonant 5 2 consonants 25/38
  48. ZeuScansion ZeuScansion: technical details Closest word finder Devised word change

    rules: 1 At the end of the word, higher cost (Word splitter) 2 We only allow a maximum of 2 character changes. 3 Character change order: 1 1 vowel 2 1 consonant 3 2 vowels 4 1 vowel and 1 consonant 5 2 consonants Word splitter: chumped: chum|ped chawed: cha|wed 25/38
  49. ZeuScansion ZeuScansion: technical details Closest word finder c h u

    m p e d | c h a w e d | | | | | | | | | | | | | | - h u m p e d | c h e w e d The similarly pronounced words presented by the system are humped and chewed. 26/38
  50. ZeuScansion ZeuScansion: technical details Closest word finder c h u

    m p e d | c h a w e d | | | | | | | | | | | | | | - h u m p e d | c h e w e d The similarly pronounced words presented by the system are humped and chewed. we we+PRP we+-+PRP we+-+PRP - chumped chumped+VBD humped+‘+VBD humped+‘+VBD ‘ and and+CC and+-+CC and+-+CC - chawed chawed+VBD chewed+‘+VBD chewed+‘+VBD ‘ the the+DT the+-+DT the+-+DT - buttered buttered+JJ buttered+‘-+JJ buttered+‘-+JJ ‘- toast toast+NN toast+’+NN toast+’+NN ’ 26/38
  51. ZeuScansion ZeuScansion: technical details Global analysis The syllable-stresses are marked

    The system tries to identify the predominant meter of the poem, by finding plausible feet. 27/38
  52. ZeuScansion ZeuScansion: technical details Global analysis By the shores of

    Gitche Gumee, By the shining Big Sea Water, Stood the wigwam of Nokomis, Daughter of the Moon, Nokomis. Dark behind it rose the forest, Rose the black and gloomy pine trees, Rose the firs with cones upon them; Bright before’ it beat the water, Beat the clear and sunny water, Beat the shining Big Sea Water. The song of Hiawatha, Henry Wadsworth Longfellow - - ‘ - ‘ ? - - ‘ - ’ ’ ’ - ’ - ’ ‘ - ‘ - ’ - - - ’ ‘ - ’ - ‘ - ’ - ’ - ’ - ’ - ‘ - ’ ‘ ’ - ‘ - ‘ - ’ - ’ - ’ - ’ - ’ - ’ - ’ - ‘ - ’ - ’ - ‘ - ’ ’ ’ - 28/38
  53. ZeuScansion ZeuScansion: technical details Global analysis Predominant stress structure per

    line: ’_’_’_’_ Different ways of splitting it up: 4 trochees | 2 amphibrachs | 2 cretics | 3 iambs ----------------------|----------------------|----------------------|--------------------- [’ -][’ -][’ -][’ -] | ’ [- ’ -] ’ [- ’ -] | [’ - ’] - [’ - ’] - | ’ [- ’][- ’][- ’] - 29/38
  54. ZeuScansion ZeuScansion: technical details Global analysis Predominant stress structure per

    line: ’_’_’_’_ Different ways of splitting it up: 4 trochees | 2 amphibrachs | 2 cretics | 3 iambs ----------------------|----------------------|----------------------|--------------------- [’ -][’ -][’ -][’ -] | ’ [- ’ -] ’ [- ’ -] | [’ - ’] - [’ - ’] - | ’ [- ’][- ’][- ’] - SCORE PER FEET: NAME | FEET | SC.|No MATCHES ----------|------|----|----------- trochee | (’-) | 4 |(4 matches) amphibrach| (-’-)| 3 |(2 matches) cretic | (’-’)| 3 |(2 matches) iamb | (-’) | 3 |(3 matches) 29/38
  55. ZeuScansion Evaluation 1 Scansion–what is it? Metrical patterns 2 Scansion–why

    do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 30/38
  56. ZeuScansion Evaluation For the evaluation we used the “For Better

    for Verse” poetry corpus. http://prosody.lib.virginia.edu/ 31/38
  57. ZeuScansion Evaluation For Better for Verse corpus: 55 different poems

    2Some lines have several alternative scansions 32/38
  58. ZeuScansion Evaluation For Better for Verse corpus: 55 different poems

    759 scanned (by experts) poetry lines2 2Some lines have several alternative scansions 32/38
  59. ZeuScansion Evaluation For Better for Verse corpus: 55 different poems

    759 scanned (by experts) poetry lines2 200 of them were correctly scanned by ZeuScansion (26.35 %) 2Some lines have several alternative scansions 32/38
  60. ZeuScansion Evaluation For Better for Verse corpus: 55 different poems

    759 scanned (by experts) poetry lines2 200 of them were correctly scanned by ZeuScansion (26.35 %) 7,076 syllables 2Some lines have several alternative scansions 32/38
  61. ZeuScansion Evaluation For Better for Verse corpus: 55 different poems

    759 scanned (by experts) poetry lines2 200 of them were correctly scanned by ZeuScansion (26.35 %) 7,076 syllables 6,002 of them were correctly scanned by ZeuScansion (84.82 %) 2Some lines have several alternative scansions 32/38
  62. ZeuScansion Discussion & future directions 1 Scansion–what is it? Metrical

    patterns 2 Scansion–why do we need it? 3 Related work 4 Scansion–why it’s difficult? 5 ZeuScansion: the output 6 ZeuScansion: technical details Groves’ rules Closest word finder Global analysis 7 Evaluation 8 Discussion & future directions 33/38
  63. ZeuScansion Discussion & future directions Basic system for scansion of

    English poetry Phonetically closest word finder The evaluation results are promising 34/38
  64. ZeuScansion Discussion & future directions Basic system for scansion of

    English poetry Phonetically closest word finder The evaluation results are promising FUTURE WORK: Do statistical inference about global metric pattern of the poem Improve closest word finder performance Replace HMM POS-tagger with a deterministic FST-based tagger (e.g. Brill’s tagger) 34/38
  65. ZeuScansion Acknowledgments Eskerrik asko! Thanks! Herbert Tucker and the Scholars’

    Lab, from the University of Virginia. All the people involved in the “For Better for Verse” project. 35/38
  66. ZeuScansion ZeuScansion: a tool for scansion of English poetry Manex

    Agirrezabal1, Mans Hulden2, Bertol Arrieta1, Aitzol Astigarraga1 (1) Euskal Herriko Unibertsitatea / University of the Basque Country (UPV/EHU) (2)University of Helsinki July 15th, 2013 11th Conference on Finite-State Methods and Natural Language Processing St. Andrews, Scotland https://zeuscansion.googlecode.com 38/38