NLP for poetry generation and analysis

NLP for poetry generation and analysis

In this talk I will go through several works that we have done in relation to computational poetry. As it will be seen, there is a wide variety of works that can be done relating poetry and Natural Language Processing, ranging from poetry writing assistants to computational models that sing poems. At the end, there will be emphasis on models that perform automatic rhythmic analysis of poetry and we will discuss possible future directions in relation to unsupervised analysis.

2a2707abeffc7d8abb8487969c78eaf6?s=128

Manex Agirrezabal

January 15, 2020
Tweet

Transcript

  1. NLP for poetry generation and analysis Manex Agirrezabal Adjunkt Centre

    fra Sprogteknologi / Centre for Language Technology Nordisk Studier og sprogvidenskab / Nordic Studies and Linguistics Københavns Universitet / University of Copenhagen
  2. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  3. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  4. About myself ¡ Computer Engineering at the University of the

    Basque Country (UPV/EHU) (2006-2011) ¡ M. Sc. and PhD in Analysis and Processing of Language at the same place (2011-2012, 2013-2017) ¡ PostDoc at the University of Copenhagen (2017-2019) ¡ Assistant Professor at the University of Copenhagen
  5. About myself ¡ Computational morphology and phonology ¡ Finite-State methods

    ¡ Poetry ¡ Computational creativity
  6. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  7. The BAD tool ¡ Despite its name, it’s not a

    bad tool ¡ Bertsotarako Arbel Digitala ¡ Syllable count / Rhyme search engine ¡ Metrical structures / Melodies ¡ Synonym search engine ¡ Social media
  8. The BAD tool

  9. The BAD tool (sing it!) ¡ Side project ¡ Verse

    singing module ¡ First prototype using Festival TTS (singing module) ¡ Syllabify verse ¡ Assign note from melody to each syllable ¡ Improved version ¡ Collaboration with Speech Synthesis group Aholab
  10. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  11. Generation ¡ We had four approaches to generate poetry: ¡

    Classic poem retrieval ¡ Combination of independent lines that match rhyme ¡ Semantic search ¡ Random combination of words
  12. Public performance ¡ Collaboration of three research groups ¡ Natural

    Language Processing group ¡ Robotics and autonomous systems group ¡ Speech synthesis group ¡ Link to video ¡ ~1,000 listeners
  13. Analysis and Generation ¡ If we want to generate poetry

    ¡ We need to know how poetry is ¡ Analysis <-> Generation ¡ Generation of poetry ¡ Automatic analysis of poetry ¡ We tried to build a poetry generation system following Reiter et al. (2000)
  14. Experiment with WordNet ¡ Get independent verse lines ¡ Get

    most common POS tag patterns ¡ Modify any word to match the POS tag* ¡ Modify any noun or adjective (NorA)* ¡ Modify NorA with semantically related words* ¡ * Inflection is kept in all cases
  15. Experiment with WordNet

  16. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  17. Automatic analysis of poetry ¡ Two branches about analysis ¡

    Basque ¡ Based on Jurafsky and Kao (2012) ¡ 1986-2009 -> 2013 ¡ Published in Basque ¡ English ¡ ZeuScansion: rule-based scansion of poetry ¡ Automatic Scansion of Poetry
  18. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  19. Scansion of poetry One, two! One, two! And through and

    through The vorpal blade went snicker-snack! He left it dead, and with its head He went galumphing back. Jabberwocky Lewis Carroll
  20. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll
  21. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed
  22. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM
  23. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot
  24. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot Rhyme
  25. Scansion of poetry [One, two!] [One, two!] [And through] [and

    through] [The vor][pal blade] [went snick][er-snack!] [He left] [it dead,] [and with] [its head] [He went] [galum][phing back.] Jabberwocky Lewis Carroll unstressed stressed deh-DUM Foot Rhyme Scansion involves marking all this information, but in this work we mainly focus on the stress sequences
  26. Scansion of poetry Usages ¡ Poetry generation ¡ Authorship attribution

    ¡ Cataloging poems according to the meter ¡ Learn how to correctly recite a poem
  27. Tradition of English poetry ¡ Accentual-syllabic poetry ¡ Syllables ¡

    Stresses Iambic meter [x /] Anapestic meter [x x /] Come live with me and be my love And we will all the pleasures prove, That valleys, grooves, hills and fields, Woods, or steepy mountain yields. and I don't like to brag, but I'm telling you Liz that speaking of cooks I'm the best that there is why only last Tuesday when mother was out I really cooked something worth talking about Trochaic meter [/ x] Dactylic meter [/ x x] Can it be the sun descending O'er the level plain of water? Or the Red Swan floating, flying, Wounded by the magic arrow, Woman much missed, how you call to me, call to me Saying that now you are not as you were When you had changed from the one who was all to me, But as at first, when our day was fair.
  28. English poetry corpus ¡ 79 poems from For Better For

    Verse (4B4V) (Tucker, 2011) ¡ Brought by the Scholar's Lab at the University of Virginia ¡ Interactive website to train people on the scansion of traditional poetry English corpus No. syllables 10988 No. distinct syllables 2283 No. words 8802 No. distinct words 2422 No. lines 1093
  29. Tradition of Spanish poetry ¡ Accentual-syllabic poetry ¡ Syllables ¡

    Stresses Cual suele la luna tras lóbrega nube con franjas de plata bordarla_en redor, y luego si_el viento la_agita, la sube disuelta_a los aires en blanco vapor: ... El estudiante de Salamanca José de Espronceda
  30. Spanish poetry corpus ¡ 137 sonnets from the Spanish Golden

    Age (Navarro-Colorado et al., 2015, 2016) Spanish corpus No. syllables 24524 No. distinct syllables 1041 No. words 13566 No. distinct words 3633 No. lines 1898
  31. ZeuScansion ¡ Rule-based system ¡ Two main pieces of information

    ¡ Lexical stress ¡ Part-of-Speech tag ¡ Stress assignment
  32. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  33. Supervised learning for scansion ¡ Greedy prediction ¡ Naïve Bayes

    ¡ Averaged Perceptron ¡ Linear Support Vector Machines ¡ Structured predictors ¡ Hidden Markov Models (HMM) ¡ Conditional Random Fields (CRF) ¡ Neural Network models ¡ Encoder-Decoder ¡ Bi-LSTM+CRF
  34. Why Bi-LSTM+CRF? ¡ Bi-LSTM+CRF (Lample et al., 2016) ¡ Gets

    information from input characters and words with Bi-LSTMs ¡ The information goes through a CRF layer to model the output dependencies ¡ Advantages: ¡ Words' character sequence ¡ Interaction between words ¡ Conditional dependencies between outputs
  35. Results English Spanish #FTs Per syllable (%) Per line (%)

    Per syllable (%) Per line (%) Perceptron 10 85.04 28.79 74.39 0.44 Perceptron 64 89.12 40.86 91.49 35.71 HMM - 90.39 48.51 92.32 45.08 CRF 10 89.32 47.28 84.89 18.61 CRF 64 90.94 51.22 92.87 55.44 Bi-LSTM+CRF (W2SP) - 89.39 44.29 98.95 90.84 Bi-LSTM+CRF (S2S) - 91.26 55.28 95.13 63.68 Bi-LSTM+CRF (S2S+WB) - 92.96 61.39 98.74 88.82
  36. Outline • Me and myself • The BAD tool •

    Automatic Generation of poetry • Automatic Analysis of poetry • Basque • English • Supervised • Unsupervised
  37. Unsupervised learning We did several experiments: 1. Simple cross-lingual experiment

    2. Clustering algorithms with 64 feature templates 1. K-Means 2. Expectation-Maximization 3. Hidden Markov Models
  38. Unsupervised learning We did several experiments: 1. Simple cross-lingual experiment

    (best result 71.65%) 2. Clustering algorithms with 64 feature templates (results below 55%) 1. K-Means 2. Expectation-Maximization 3. Hidden Markov Models Per syllable (%) Per line (%) HMM (4 states) 66.28 7.29 HMM (8 states) 74.65 9.91 HMM (16 states) 76.51 12.53 HMM (32 states) 74.03 8.07
  39. Unsupervised learning Future work ¡ Recurrence Quantification Analysis ¡ Assumption:

    All poems have a sense of rhythm
  40. References: The BAD Tool: - Agirrezabal, M., Alegria, I., Arrieta,

    B., Hulden. (2012) “BAD: An assistant tool for making verses in Basque”, 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities - Agirrezabal, M., Alegria, I., Arrieta, B., Hulden. (2012) ”Finite-state technology in a verse-making tool”, Finite-State Methods for Natural Language Processing - Agirrezabal M., Alegria I., Hulden M., (2012) “Using foma for language-based games”, Proceedings of the First Workshop on Games and NLP BertsoBOT project - Astigarraga A., Jauregi E., Lazkano E., Agirrezabal M., (2014) "Textual Coherence in a Verse- Maker Robot", Human-Computer Systems Interaction: Backgrounds and Applications 3, Springer International Publishing - Osinalde M., Astigarraga A., Rodriguez I., Agirrezabal M., (2013) "Towards Basque Oral Poetry Analysis: A Machine Learning Approach", Recent Advances in Natural Language Processing, Hissar, Bulgaria - A Astigarraga, M Agirrezabal, E Lazkano, E Jauregi, B Sierra, (2013) "Bertsobot: the first minstrel robot", The 6th International Conference on Human System Interaction (HSI)
  41. References: Analysis: - Agirrezabal, M., Arrieta, B., Astigarraga, A., Hulden,

    M., (2014) "1986-2013 arteko Bertsolari Txapelketa Nagusien analisi estatistikoa", Euskal Herriko Bertsozale elkartea, Artikulua bertsozale elkartean - Agirrezabal, M., Arrieta, B., Astigarraga, A., Hulden, M., (2013) "Bota bertsoa, eta guk aztertuko dugu: Azken urteetako Bertsolari Txapelketa Nagusien analisia", Elhuyar zientzia eta teknika p. 46- 49, Artikulua Elhuyarren Generation - Agirrezabal M., Gonzalez-Dios I., Lopez-Gazpio I. ,(2015) "Euskararen Sorkuntza Automatikoa: lehen urratsak", Ikergazte - Agirrezabal M., Arrieta B., Hulden M., Astigarraga A., (2013) "POS-tag based poetry generation with WordNet", Proceedings of the 14th European Workshop on Natural Language Generation, ACL 2013
  42. References: Zeuscansion - Agirrezabal, M., Astigarraga, A., Arrieta, B., &

    Hulden, M. (2016). ”ZeuScansion: a tool for scansion of English poetry” Journal of Language Modelling, 4(1), 3-28. - Agirrezabal, M., Arrieta, B., Astigarraga, A., and Hulden, M. (2013) ”ZeuScansion: a tool for scansion of English poetry” Finite State Methods and Natural Language Processing Conference, 18-24. - Agirrezabal M., Heinz J., Hulden M., Arrieta B., (2014) "Assigning stress to out-of-vocabulary words: three approaches", International Conference on Artificial Intelligence Supervised Learning for Scansion - Agirrezabal, M., Alegria, I., & Hulden, M. (2017). Poesiaren eskantsio automatikoa: bi hizkuntzen azterketa, Ikergazte - Agirrezabal, M., Alegria, I., & Hulden, M. (2017). ”A comparison of feature-based and neural scansion of poetry.” Recent Advances in Natural Language Processing, Varna, Bulgaria - Agirrezabal, M., Alegria, I., & Hulden, M. (2016, December). ”Machine Learning for the Metrical Analysis of English Poetry.” International Conference on Computational Linguistics (COLING 2016), 772-781
  43. THANK YOU!!! Thank you, thank you, thank you, thank you

    Thank you, thank you, thank you, thank you Thank you, thank you, thank you, thank you Thank you, thank you, thank you, thank you
  44. NLP for poetry generation and analysis Manex Agirrezabal Adjunkt Centre

    fra Sprogteknologi / Centre for Language Technology Nordisk Studier og sprogvidenskab / Nordic Studies and Linguistics Københavns Universitet / University of Copenhagen