Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mo'Mentum

 Mo'Mentum

Updated slides including comments/suggestions up to 02/07/2017

Peter Winslow

February 07, 2017
Tweet

More Decks by Peter Winslow

Other Decks in Technology

Transcript

  1. Is my petition good enough? Many resources dedicated to sharing

    user-generated petitions! Not much help for writing them though...
  2. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum
  3. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum Probability of success
  4. Many resources dedicated to sharing user-generated petitions! Not much help

    for writing them though... Just what I was looking for! Mo’Mentum Probability of success Time scale to reach signature goal
  5. Change.org sitemap Petition urls Petition id’s Petition text and metadata

    Change.org api Mo’Mentum under the hood: Data Source
  6. Change.org sitemap Petition urls Petition id’s Petition text and metadata

    Change.org api Over ~ 40,000 Petitions Mo’Mentum under the hood: Data Source
  7. Mo’Mentum under the hood: Feature Engineering Sentiment, POS Tagging, word/sentence

    counts, ... Text data Stopwords, Lemmatization Features
  8. Mo’Mentum under the hood: Feature Engineering Sentiment, POS Tagging, word/sentence

    counts, ... Text data Stopwords, Lemmatization Features Metadata Timestamps, Signature count Success/Failure
  9. Mo’Mentum under the hood: Feature Engineering Metadata Timestamps, Signature count

    Success/Failure Signature Count tfinal - tinitial Targets Sentiment, POS Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features
  10. Predict signature accumulation rate Gradient Boosting Regressor: Least Squares loss

    function Train-Test-evaluation split with 5-fold CV Model Performance
  11. About me Peter Winslow The Professional PhD + 1 Postdoc

    in theoretical High Energy Physics and Cosmology. Specific interests: Origin of matter in the Universe Peter Winslow The New Father! Kiana Winslow, born Nov. 29th 2016
  12. Algorithms: Classification Random Forest Classifier (Scikit-Learn) Predict success/failure of petition

    18 features after selection Reasons for choosing: • Lots of complication yet resistant to overfitting Challenges: • Class imbalance in the data Validation: Train-Test-evaluation split with 5-fold CV Backup Slides
  13. Algorithms: Regression GradientBoostingRegressor (Scikit-Learn) Predict signature accumulation rate 17 features

    after selection Reasons for choosing: • Many features, highly non-linear, can return predicted “quantiles” Challenges: • The right evaluation metric? Validation: Train-Test-evaluation split with 5-fold CV Backup Slides