Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mo'Mentum

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

 Mo'Mentum

Updated slides including comments/suggestions up to 02/07/2017

Avatar for Peter Winslow

Peter Winslow

February 07, 2017
Tweet

More Decks by Peter Winslow

Other Decks in Technology

Transcript

  1. Is my petition good enough? Many resources dedicated to sharing

    user-generated petitions! Not much help for writing them though...
  2. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum
  3. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum Probability of success
  4. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum Probability of success Time scale to reach signature goal
  5. Change.org sitemap Petition urls Petition id’s Petition text and metadata

    Change.org api Mo’Mentum under the hood: Data Source
  6. Change.org sitemap Petition urls Petition id’s Petition text and metadata

    Change.org api Over ~ 40,000 Petitions Mo’Mentum under the hood: Data Source
  7. Mo’Mentum under the hood: Feature Engineering Sentiment, POS Tagging, word/sentence

    counts, ... Text data Stopwords, Lemmatization Features
  8. Mo’Mentum under the hood: Feature Engineering Sentiment, POS Tagging, word/sentence

    counts, ... Text data Stopwords, Lemmatization Features Metadata Timestamps, Signature count Success/Failure
  9. Mo’Mentum under the hood: Feature Engineering Metadata Timestamps, Signature count

    Success/Failure Signature Count tfinal - tinitial Targets Sentiment, POS Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features
  10. Predict signature accumulation rate Gradient Boosting Regressor: Least Squares loss

    function Train-Test-evaluation split with 5-fold CV Model Performance
  11. About me Peter Winslow The Professional PhD + 1 Postdoc

    in theoretical High Energy Physics and Cosmology. Specific interests: Origin of matter in the Universe Peter Winslow The New Father! Kiana Winslow, born Nov. 29th 2016
  12. Algorithms: Classification Random Forest Classifier (Scikit-Learn) Predict success/failure of petition

    18 features after selection Reasons for choosing: • Lots of complication yet resistant to overfitting Challenges: • Class imbalance in the data Validation: Train-Test-evaluation split with 5-fold CV Backup Slides
  13. Algorithms: Regression GradientBoostingRegressor (Scikit-Learn) Predict signature accumulation rate 17 features

    after selection Reasons for choosing: • Many features, highly non-linear, can return predicted “quantiles” Challenges: • The right evaluation metric? Validation: Train-Test-evaluation split with 5-fold CV Backup Slides