Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NLP for poetry generation and analysis

NLP for poetry generation and analysis

In this talk I will go through several works that we have done in relation to computational poetry. As it will be seen, there is a wide variety of works that can be done relating poetry and Natural Language Processing, ranging from poetry writing assistants to computational models that sing poems. At the end, there will be emphasis on models that perform automatic rhythmic analysis of poetry and we will discuss possible future directions in relation to unsupervised analysis.

Manex Agirrezabal

January 15, 2020
Tweet

More Decks by Manex Agirrezabal

Other Decks in Technology

Transcript

  1. NLP for poetry generation and analysis Manex Agirrezabal Adjunkt Centre

    fra Sprogteknologi / Centre for Language Technology Nordisk Studier og sprogvidenskab / Nordic Studies and Linguistics Københavns Universitet / University of Copenhagen
  2. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  3. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  4. About myself ¡ Computer Engineering at the University of the

    Basque Country (UPV/EHU) (2006-2011) ¡ M. Sc. and PhD in Analysis and Processing of Language at the same place (2011-2012, 2013-2017) ¡ PostDoc at the University of Copenhagen (2017-2019) ¡ Assistant Professor at the University of Copenhagen
  5. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  6. The BAD tool ¡ Despite its name, it’s not a

    bad tool ¡ Bertsotarako Arbel Digitala ¡ Syllable count / Rhyme search engine ¡ Metrical structures / Melodies ¡ Synonym search engine ¡ Social media
  7. The BAD tool (sing it!) ¡ Side project ¡ Verse

    singing module ¡ First prototype using Festival TTS (singing module) ¡ Syllabify verse ¡ Assign note from melody to each syllable ¡ Improved version ¡ Collaboration with Speech Synthesis group Aholab
  8. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  9. Generation ¡ We had four approaches to generate poetry: ¡

    Classic poem retrieval ¡ Combination of independent lines that match rhyme ¡ Semantic search ¡ Random combination of words
  10. Public performance ¡ Collaboration of three research groups ¡ Natural

    Language Processing group ¡ Robotics and autonomous systems group ¡ Speech synthesis group ¡ Link to video ¡ ~1,000 listeners
  11. Analysis and Generation ¡ If we want to generate poetry

    ¡ We need to know how poetry is ¡ Analysis <-> Generation ¡ Generation of poetry ¡ Automatic analysis of poetry ¡ We tried to build a poetry generation system following Reiter et al. (2000)
  12. Experiment with WordNet ¡ Get independent verse lines ¡ Get

    most common POS tag patterns ¡ Modify any word to match the POS tag* ¡ Modify any noun or adjective (NorA)* ¡ Modify NorA with semantically related words* ¡ * Inflection is kept in all cases
  13. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  14. Automatic analysis of poetry ¡ Two branches about analysis ¡

    Basque ¡ Based on Jurafsky and Kao (2012) ¡ 1986-2009 -> 2013 ¡ Published in Basque ¡ English ¡ ZeuScansion: rule-based scansion of poetry ¡ Automatic Scansion of Poetry
  15. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  16. Scansion of poetry One, two! One, two! And through and

    through The vorpal blade went snicker-snack! He left it dead, and with its head He went galumphing back. Jabberwocky Lewis Carroll
  17. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll
  18. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed
  19. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM
  20. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot
  21. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot Rhyme
  22. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot Rhyme Scansion involves marking all this information, but in this work we mainly focus on the stress sequences
  23. Scansion of poetry Usages ¡ Poetry generation ¡ Authorship attribution

    ¡ Cataloging poems according to the meter ¡ Learn how to correctly recite a poem
  24. Tradition of English poetry ¡ Accentual-syllabic poetry ¡ Syllables ¡

    Stresses Iambic meter [x /] Anapestic meter [x x /] Come live with me and be my love And we will all the pleasures prove, That valleys, grooves, hills and fields, Woods, or steepy mountain yields. and I don't like to brag, but I'm telling you Liz that speaking of cooks I'm the best that there is why only last Tuesday when mother was out I really cooked something worth talking about Trochaic meter [/ x] Dactylic meter [/ x x] Can it be the sun descending O'er the level plain of water? Or the Red Swan floating, flying, Wounded by the magic arrow, Woman much missed, how you call to me, call to me Saying that now you are not as you were When you had changed from the one who was all to me, But as at first, when our day was fair.
  25. English poetry corpus ¡ 79 poems from For Better For

    Verse (4B4V) (Tucker, 2011) ¡ Brought by the Scholar's Lab at the University of Virginia ¡ Interactive website to train people on the scansion of traditional poetry English corpus No. syllables 10988 No. distinct syllables 2283 No. words 8802 No. distinct words 2422 No. lines 1093
  26. Tradition of Spanish poetry ¡ Accentual-syllabic poetry ¡ Syllables ¡

    Stresses Cual suele la luna tras lóbrega nube con franjas de plata bordarla_en redor, y luego si_el viento la_agita, la sube disuelta_a los aires en blanco vapor: ... El estudiante de Salamanca José de Espronceda
  27. Spanish poetry corpus ¡ 137 sonnets from the Spanish Golden

    Age (Navarro-Colorado et al., 2015, 2016) Spanish corpus No. syllables 24524 No. distinct syllables 1041 No. words 13566 No. distinct words 3633 No. lines 1898
  28. ZeuScansion ¡ Rule-based system ¡ Two main pieces of information

    ¡ Lexical stress ¡ Part-of-Speech tag ¡ Stress assignment
  29. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  30. Supervised learning for scansion ¡ Greedy prediction ¡ Naïve Bayes

    ¡ Averaged Perceptron ¡ Linear Support Vector Machines ¡ Structured predictors ¡ Hidden Markov Models (HMM) ¡ Conditional Random Fields (CRF) ¡ Neural Network models ¡ Encoder-Decoder ¡ Bi-LSTM+CRF
  31. Why Bi-LSTM+CRF? ¡ Bi-LSTM+CRF (Lample et al., 2016) ¡ Gets

    information from input characters and words with Bi-LSTMs ¡ The information goes through a CRF layer to model the output dependencies ¡ Advantages: ¡ Words' character sequence ¡ Interaction between words ¡ Conditional dependencies between outputs
  32. Results English Spanish #FTs Per syllable (%) Per line (%)

    Per syllable (%) Per line (%) Perceptron 10 85.04 28.79 74.39 0.44 Perceptron 64 89.12 40.86 91.49 35.71 HMM - 90.39 48.51 92.32 45.08 CRF 10 89.32 47.28 84.89 18.61 CRF 64 90.94 51.22 92.87 55.44 Bi-LSTM+CRF (W2SP) - 89.39 44.29 98.95 90.84 Bi-LSTM+CRF (S2S) - 91.26 55.28 95.13 63.68 Bi-LSTM+CRF (S2S+WB) - 92.96 61.39 98.74 88.82
  33. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  34. Unsupervised learning We did several experiments: 1. Simple cross-lingual experiment

    2. Clustering algorithms with 64 feature templates 1. K-Means 2. Expectation-Maximization 3. Hidden Markov Models
  35. Unsupervised learning We did several experiments: 1. Simple cross-lingual experiment

    (best result 71.65%) 2. Clustering algorithms with 64 feature templates (results below 55%) 1. K-Means 2. Expectation-Maximization 3. Hidden Markov Models Per syllable (%) Per line (%) HMM (4 states) 66.28 7.29 HMM (8 states) 74.65 9.91 HMM (16 states) 76.51 12.53 HMM (32 states) 74.03 8.07
  36. References: The BAD Tool: - Agirrezabal, M., Alegria, I., Arrieta,

    B., Hulden. (2012) “BAD: An assistant tool for making verses in Basque”, 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities - Agirrezabal, M., Alegria, I., Arrieta, B., Hulden. (2012) ”Finite-state technology in a verse-making tool”, Finite-State Methods for Natural Language Processing - Agirrezabal M., Alegria I., Hulden M., (2012) “Using foma for language-based games”, Proceedings of the First Workshop on Games and NLP BertsoBOT project - Astigarraga A., Jauregi E., Lazkano E., Agirrezabal M., (2014) "Textual Coherence in a Verse- Maker Robot", Human-Computer Systems Interaction: Backgrounds and Applications 3, Springer International Publishing - Osinalde M., Astigarraga A., Rodriguez I., Agirrezabal M., (2013) "Towards Basque Oral Poetry Analysis: A Machine Learning Approach", Recent Advances in Natural Language Processing, Hissar, Bulgaria - A Astigarraga, M Agirrezabal, E Lazkano, E Jauregi, B Sierra, (2013) "Bertsobot: the first minstrel robot", The 6th International Conference on Human System Interaction (HSI)
  37. References: Analysis: - Agirrezabal, M., Arrieta, B., Astigarraga, A., Hulden,

    M., (2014) "1986-2013 arteko Bertsolari Txapelketa Nagusien analisi estatistikoa", Euskal Herriko Bertsozale elkartea, Artikulua bertsozale elkartean - Agirrezabal, M., Arrieta, B., Astigarraga, A., Hulden, M., (2013) "Bota bertsoa, eta guk aztertuko dugu: Azken urteetako Bertsolari Txapelketa Nagusien analisia", Elhuyar zientzia eta teknika p. 46- 49, Artikulua Elhuyarren Generation - Agirrezabal M., Gonzalez-Dios I., Lopez-Gazpio I. ,(2015) "Euskararen Sorkuntza Automatikoa: lehen urratsak", Ikergazte - Agirrezabal M., Arrieta B., Hulden M., Astigarraga A., (2013) "POS-tag based poetry generation with WordNet", Proceedings of the 14th European Workshop on Natural Language Generation, ACL 2013
  38. References: Zeuscansion - Agirrezabal, M., Astigarraga, A., Arrieta, B., &

    Hulden, M. (2016). ”ZeuScansion: a tool for scansion of English poetry” Journal of Language Modelling, 4(1), 3-28. - Agirrezabal, M., Arrieta, B., Astigarraga, A., and Hulden, M. (2013) ”ZeuScansion: a tool for scansion of English poetry” Finite State Methods and Natural Language Processing Conference, 18-24. - Agirrezabal M., Heinz J., Hulden M., Arrieta B., (2014) "Assigning stress to out-of-vocabulary words: three approaches", International Conference on Artificial Intelligence Supervised Learning for Scansion - Agirrezabal, M., Alegria, I., & Hulden, M. (2017). Poesiaren eskantsio automatikoa: bi hizkuntzen azterketa, Ikergazte - Agirrezabal, M., Alegria, I., & Hulden, M. (2017). ”A comparison of feature-based and neural scansion of poetry.” Recent Advances in Natural Language Processing, Varna, Bulgaria - Agirrezabal, M., Alegria, I., & Hulden, M. (2016, December). ”Machine Learning for the Metrical Analysis of English Poetry.” International Conference on Computational Linguistics (COLING 2016), 772-781
  39. THANK YOU!!! Thank you, thank you, thank you, thank you

    Thank you, thank you, thank you, thank you Thank you, thank you, thank you, thank you Thank you, thank you, thank you, thank you
  40. NLP for poetry generation and analysis Manex Agirrezabal Adjunkt Centre

    fra Sprogteknologi / Centre for Language Technology Nordisk Studier og sprogvidenskab / Nordic Studies and Linguistics Københavns Universitet / University of Copenhagen