Natural language processing (NLP) is among the oldest of Computer Science fields, dating back at least to the 1950s. In this talk I'll present a crash course in simple NLP, focusing on tools to perform document-level summarization and understanding. Specifically, I'll go through TF•IDF and topic modelling. We'll use these techniques to make sense of the language people use on the web when describing beer. I'll introduce a dataset containing some 3 million paragraph length reviews of 120,000 beers.
We'll use this data to create a concise description for any commercially available beer. These descriptions will draw out the differences between the different techniques, at an intuitive level. We will then look at ways to quantify the distance between documents, which will then be used to show how similar different beers are. By the end of this talk, the audience should have enough of an understanding to use document-level NLP and know what the sweat horse blanket thing is all about.
Deck as given at Devs Love Bacon 2014: http://devslovebacon.com/conferences/bacon-2014/talks/sweaty-horse-blanket-processing-the-natural-language-of-beer