Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Final Mo'Mentum Slides

Peter Winslow
February 20, 2017

Final Mo'Mentum Slides

Final Mo'Mentum Slides

Peter Winslow

February 20, 2017
Tweet

More Decks by Peter Winslow

Other Decks in Technology

Transcript

  1. Just what I was looking for! Mo’Mentum Many resources for

    sharing petitions! Little help for writing petitions...
  2. Just what I was looking for! Mo’Mentum My Petition Many

    resources for sharing petitions! Little help for writing petitions...
  3. Just what I was looking for! Mo’Mentum Many resources for

    sharing petitions! Little help for writing petitions... My Petition Probability of success Time scale to reach signature goal
  4. Over ~ 40,000 Petitions Data Collection Petition urls Petition id’s

    Petition text and metadata Change.org API Change.org sitemap
  5. Sentiment, POS Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization

    Features Metadata Success/Failure Feature Engineering Signature Accumulation Rate
  6. Metadata Success/Failure Signature Accumulation Rate Random Forest Classifier Sentiment, POS

    Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features Feature Engineering
  7. Metadata Success/Failure Signature Accumulation Rate Random Forest Classifier Sentiment, POS

    Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features Feature Engineering Gradient Boosting Regressor
  8. Peter Winslow The Professional PhD + 1 Postdoc in theoretical

    High Energy Physics and Cosmology. Origin of matter in the Universe New Father! Kiana Winslow, born Nov. 29th 2016
  9. Algorithms: Classification Random Forest Classifier (Scikit-Learn) Predict success/failure of petition

    Reasons for choosing: • Lots of complication yet resistant to overfitting Challenges: • Class imbalance in the data Validation: Train-Test-evaluation split with 5-fold CV Backup Slides
  10. Algorithms: Regression GradientBoostingRegressor (Scikit-Learn) Predict signature accumulation rate Reasons for

    choosing: • Many features, highly non-linear, can return predicted “quantiles” Challenges: • The right evaluation metric? Validation: Train-Test-evaluation split with 5-fold CV Backup Slides