Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keyword Extraction with Ruby

t6d
October 30, 2011
360

Keyword Extraction with Ruby

t6d

October 30, 2011
Tweet

Transcript

  1. Konstantin Tennhard Ruby Developer at flinc Hi, I‘m… Ruby enthusiast

    Bartender Computer Science student Photographer Mountain bike addict Computer Linguist
  2. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  3. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  4. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  5. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  6. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  7. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  8. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for.
  9. The Occupy Wall Street movement began in Zuccotti Park on

    a glorious mid-September Saturday and, so far, many of its larger marches have taken place in the warmth of New York's Indian summer. But winter has been looming, and on Saturday, just a couple days before Halloween, the protesters got a preview of what they're in for. Named Entity Adjective Noun
  10. What do we want? Well, how about extracting nouns and

    adjectives that cooccur in word windows of a certain size? Wouldn't that be something.
  11. But ... ... before you can do the fancy stuff,

    you need to do a couple of other things first!
  12. The algorithm is known as TextRank and has been published

    by Rada Mihalcea and Paul Tarau in 2004 in their Paper “TextRank: Bringing Order into Texts” http:/ /acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mihalcea.pdf
  13. Street Wall movement Zuccotti Park gloriou mid-Septemb Saturdai larger mani

    march place warmth New york Indian summer winter coupl dai Halloween protest preview Cooccurrence Graph
  14. Weighted PageRank WS(Vi) = (1 d) + d · X

    Vj 2In(Vi ) wji P Vk 2Out(Vj ) wjk WS(Vj) We will use Lexicographer's Pointwise Mutual Information as weighting function.
  15. Code •Simple interface •Poor performance - this is just an

    academic example! text = "..." # your text count = 5 # number of words to extract KeywordExtractor. extract_most_important_words(text, count)