Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Do Angry People Have Poor Grammar? An Exploration of Language Processing and Statistics in Python

Do Angry People Have Poor Grammar? An Exploration of Language Processing and Statistics in Python

This talk is about two things: natural language processing (NLP) and statistical dependence. We will embark on a data science workflow using various python scientific computing tools to better understand the behavior of commenters on Reddit. To do this we'll go through an introduction to sentiment analysis in Python (mostly using NLTK) and a swift explanation of the statistics of variable dependence.

We'll couple these freshly learned methods with an excellent dataset for this domain: every public reddit comment. We'll talk a bit about handling and preprocessing data of this size and character. Then we'll compile scores for both sentiment and spelling/grammar. In the end we may just discover if angry comment are also grammatically poor comments. And the audience will walk away a few more tools in scientific computing toolbelt.

Deck as presented at PyData Amsterdam 2016

Ben Fields

March 12, 2016
Tweet

More Decks by Ben Fields

Other Decks in Technology

Transcript

  1. Do Angry People
    Have Poor
    Grammar?
    Ben Fields

    View Slide

  2. An Exploration of
    Language Processing
    and Statistics in Python
    Ben Fields

    View Slide

  3. intro and motivation

    View Slide

  4. @alsothings - Do Angry People Have Poor Grammar?
    Have you ever noticed on
    social media

    View Slide

  5. @alsothings - Do Angry People Have Poor Grammar?
    that the ‘loudest’

    View Slide

  6. @alsothings - Do Angry People Have Poor Grammar?
    seem to not construct the
    best sentences?

    View Slide

  7. @alsothings - Do Angry People Have Poor Grammar?
    me too.

    View Slide

  8. @alsothings - Do Angry People Have Poor Grammar?
    But I am a skeptic

    View Slide

  9. @alsothings - Do Angry People Have Poor Grammar?
    http://dx.doi.org/10.1037/1089-2680.2.2.175

    View Slide

  10. @alsothings - Do Angry People Have Poor Grammar?
    So lets do some analysis

    View Slide

  11. @alsothings - Do Angry People Have Poor Grammar?
    1. Find pile of comments
    2. Measure style and
    sentiment
    3. ????
    4. Profit

    View Slide

  12. @alsothings - Do Angry People Have Poor Grammar?
    1. Find pile of comments
    2. Measure style and
    sentiment
    3. Statistical dependance?
    4. Profit

    View Slide

  13. @alsothings - Do Angry People Have Poor Grammar?
    1. Find pile of comments
    2. Measure style and
    sentiment
    3. Statistical dependance?
    4. more twitter followers

    View Slide

  14. 1. Find pile of comments

    View Slide

  15. @alsothings - Do Angry People Have Poor Grammar?
    all of the reddit
    comments!

    View Slide

  16. @alsothings - Do Angry People Have Poor Grammar?
    https://www.reddit.com/r/datasets/comments/3bxlg7/
    i_have_every_publicly_available_reddit_comment

    View Slide

  17. @alsothings - Do Angry People Have Poor Grammar?
    1.7 Trillion reddit
    comments!

    View Slide

  18. @alsothings - Do Angry People Have Poor Grammar?
    59 Million reddit
    comments!
    https://mega.nz/#!ysBWXRqK!yPXLr25PgJi184pbJU3GtnqUY4wG7YvuPpxJjEmnb9A

    View Slide

  19. @alsothings - Do Angry People Have Poor Grammar?
    (actually ~1% sample of
    that: 390k comments)

    View Slide

  20. @alsothings - Do Angry People Have Poor Grammar?
    from nltk import tokenize

    for comment in pile_of_comment:
    num_tokens = len(filter(lambda t:t not in punctuation,
    tokenize.word_tokenize(comment['body'])))
    if num_tokens < 5:
    continue

    View Slide

  21. @alsothings - Do Angry People Have Poor Grammar?
    319K comments from
    January 2015

    View Slide

  22. 2. Measure style and
    sentiment

    View Slide

  23. @alsothings - Do Angry People Have Poor Grammar?
    grammar?

    View Slide

  24. @alsothings - Do Angry People Have Poor Grammar?
    Context-Free Grammar!
    http://www.nltk.org/book/ch08.html

    View Slide

  25. @alsothings - Do Angry People Have Poor Grammar?
    Great for analysing
    complex and ambiguous
    sentence structure
    http://www.nltk.org/book/ch08.html

    View Slide

  26. @alsothings - Do Angry People Have Poor Grammar?
    http://www.imdb.com/title/tt0020640/

    View Slide

  27. @alsothings - Do Angry People Have Poor Grammar?
    “One morning, I shot an
    elephant in my pyjamas.
    How he got in my
    pyjamas, I don't know.”
    http://www.nltk.org/book/ch08.html

    View Slide

  28. @alsothings - Do Angry People Have Poor Grammar?
    “I shot an elephant in my
    pyjamas.”
    http://www.nltk.org/book/ch08.html

    View Slide

  29. @alsothings - Do Angry People Have Poor Grammar?
    S -> NP VP
    PP -> P NP
    NP -> Det N | Det N PP | 'I'
    VP -> V NP | VP PP
    Det -> 'an' | 'my'
    N -> 'elephant' | 'pyjamas'
    V -> 'shot'
    P -> 'in'
    http://www.nltk.org/book/ch08.html

    View Slide

  30. @alsothings - Do Angry People Have Poor Grammar?
    groucho_grammar = nltk.CFG.fromstring(from_last_slide)
    sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pyjamas']
    parser = nltk.ChartParser(groucho_grammar)
    for tree in parser.parse(sent):
    print(tree)
    http://www.nltk.org/book/ch08.html

    View Slide

  31. @alsothings - Do Angry People Have Poor Grammar?
    (S
    (NP I)
    (VP
    (VP (V shot) (NP (Det an) (N elephant)))
    (PP (P in) (NP (Det my) (N pyjamas)))))
    (S
    (NP I)
    (VP
    (V shot)
    (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N
    pajamas))))))
    http://www.nltk.org/book/ch08.html

    View Slide

  32. @alsothings - Do Angry People Have Poor Grammar?
    http://www.nltk.org/book/ch08.html
    S S
    NP VP
    NP VP
    VP PP
    V NP
    V NP P NP
    Det N Det N
    Det N PP
    P NP
    Det N
    I shot an elephant in my pyjamas I shot an elephant in my pyjamas

    View Slide

  33. @alsothings - Do Angry People Have Poor Grammar?
    S -> NP VP
    PP -> P NP
    NP -> Det N | Det N PP | 'I'
    VP -> V NP | VP PP
    Det -> 'an' | 'my'
    N -> 'elephant' | 'pyjamas'
    V -> 'shot'
    P -> 'in'
    http://www.nltk.org/book/ch08.html

    View Slide

  34. @alsothings - Do Angry People Have Poor Grammar?
    Extended ad nauseum:
    a definition of english

    View Slide

  35. @alsothings - Do Angry People Have Poor Grammar?
    Extended ad nauseum:
    a static simplification of
    english

    View Slide

  36. @alsothings - Do Angry People Have Poor Grammar?
    pCFG

    View Slide

  37. @alsothings - Do Angry People Have Poor Grammar?
    probabilistic CFG

    View Slide

  38. @alsothings - Do Angry People Have Poor Grammar?
    “The main problem is that there is
    no common agreement on what are
    grammatically correct (English)
    sentences; nor has anyone yet been
    able to offer a grammar precise
    enough to propose as definitive.”
    http://dl.acm.org/citation.cfm?id=1882777

    View Slide

  39. @alsothings - Do Angry People Have Poor Grammar?
    Style checking!

    View Slide

  40. @alsothings - Do Angry People Have Poor Grammar?
    lint for prose

    View Slide

  41. @alsothings - Do Angry People Have Poor Grammar?
    proselint.com

    View Slide

  42. @alsothings - Do Angry People Have Poor Grammar?
    proselint.com/write

    View Slide

  43. @alsothings - Do Angry People Have Poor Grammar?
    https://github.com/amperser/proselint/
    {
    "max_errors": 1000,
    "checks": {
    "butterick.symbols" : true,
    "carlin.filth" : true,
    "consistency.spacing" : true,
    "consistency.spelling" : true,
    "garner.airlinese" : true,

    "inc.corporate_speak" : true,
    "leonard.exclamation" : true,
    "leonard.hell" : true,

    "write_good.weasel_words" : true,
    "wsj.athletes" : true
    }
    }

    View Slide

  44. @alsothings - Do Angry People Have Poor Grammar?
    https://github.com/amperser/proselint/
    (from checks/leonard/exclamation.py)
    @memoize
    def check_repeated_exclamations(text):
    """Check the text."""
    err = "leonard.exclamation.multiple"
    msg = u"Stop yelling. Keep your exclamation points under
    control."
    regex = r"[^A-Z]\b((\s[A-Z]+){3,})"
    return existence_check(
    text, [regex], err, msg, require_padding=False,
    ignore_case=False,max_errors=1, dotall=True)

    View Slide

  45. @alsothings - Do Angry People Have Poor Grammar?
    mean std dev min 25% 50% 75% max
    lints 0.4298 1.0861 0 0 0 1 400
    normed
    lints
    0.0216 0.0497 0 0 0 0.0178 1.2

    View Slide

  46. sentiment analysis
    http://www.nltk.org/howto/sentiment.html

    View Slide

  47. @alsothings - Do Angry People Have Poor Grammar?

    View Slide

  48. @alsothings - Do Angry People Have Poor Grammar?
    lul or luuuuuuuuuuuulz?

    View Slide

  49. @alsothings - Do Angry People Have Poor Grammar?
    VADER (Valence Aware
    Dictionary for
    sEntiment Reasoning)
    http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

    View Slide

  50. @alsothings - Do Angry People Have Poor Grammar?
    Test Condition Example Text
    Baseline Yay. Another good phone interview.
    Punctuation1 Yay! Another good phone interview!
    Punctuation1 +
    Degree Mod.
    Yay! Another extremely good phone interview!
    Punctuation2 Yay!! Another good phone interview!!
    Capitalization YAY. Another GOOD phone interview.
    Punct1 + Cap. YAY! Another GOOD phone interview!
    Punct2 + Cap. YAY!! Another GOOD phone interview!!
    Punct3 + Cap. YAY!!! Another GOOD phone interview!!!
    Punct3 + Cap. +
    Degree Mod.
    YAY!!! Another EXTREMELY GOOD phone in-
    terview!!!
    Table 2: Example of baseline text with eight test conditions com-
    prised of grammatical and syntactical variations.
    post
    synt
    diffe
    2. Mov
    from
    from
    tive
    the
    tenc
    3. Tec
    leve
    prod
    (200
    4. Opi
    http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

    View Slide

  51. @alsothings - Do Angry People Have Poor Grammar?
    mean std dev min 25% 50% 75% max
    positive 0.14381 0.15702 0 0 0.10800 0.22900 1
    neutral 0.776917 0.175849 0 0.667000 0.789000 0.915000 1
    0.079243 0.079243 0 0 0 0.128000 1

    View Slide

  52. 3. Statistical dependance?

    View Slide

  53. @alsothings - Do Angry People Have Poor Grammar?
    y
    =
    f
    (
    x
    )

    View Slide

  54. @alsothings - Do Angry People Have Poor Grammar?
    regression!
    http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

    View Slide

  55. @alsothings - Do Angry People Have Poor Grammar?

    View Slide

  56. @alsothings - Do Angry People Have Poor Grammar?
    http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
    S =
    n
    X
    i=1
    ri
    2

    View Slide

  57. @alsothings - Do Angry People Have Poor Grammar?

    View Slide

  58. @alsothings - Do Angry People Have Poor Grammar?
    correlation testing

    View Slide

  59. @alsothings - Do Angry People Have Poor Grammar?
    Pearson’s
    negative to correlation p
    lints 0.0311 4.535E-249
    normed lints 0.0600 2.4228E-68

    View Slide

  60. @alsothings - Do Angry People Have Poor Grammar?
    Nope.

    View Slide

  61. @alsothings - Do Angry People Have Poor Grammar?
    NOPE.

    View Slide

  62. conclusions!

    View Slide

  63. @alsothings - Do Angry People Have Poor Grammar?
    People on reddit are
    generally reasonable,

    View Slide

  64. @alsothings - Do Angry People Have Poor Grammar?
    but when they aren’t, the
    language isn’t any
    stylistically worse

    View Slide

  65. @alsothings - Do Angry People Have Poor Grammar?
    at least last January.

    View Slide

  66. Let’s have some
    questions
    !/alsothings

    View Slide