Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Final Mo'Mentum Slides

Final Mo'Mentum Slides

Final Mo'Mentum Slides

Peter Winslow

February 20, 2017
Tweet

More Decks by Peter Winslow

Other Decks in Technology

Transcript

  1. Just what I was looking for! Mo’Mentum Many resources for

    sharing petitions! Little help for writing petitions...
  2. Just what I was looking for! Mo’Mentum My Petition Many

    resources for sharing petitions! Little help for writing petitions...
  3. Just what I was looking for! Mo’Mentum Many resources for

    sharing petitions! Little help for writing petitions... My Petition Probability of success Time scale to reach signature goal
  4. Over ~ 40,000 Petitions Data Collection Petition urls Petition id’s

    Petition text and metadata Change.org API Change.org sitemap
  5. Sentiment, POS Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization

    Features Metadata Success/Failure Feature Engineering Signature Accumulation Rate
  6. Metadata Success/Failure Signature Accumulation Rate Random Forest Classifier Sentiment, POS

    Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features Feature Engineering
  7. Metadata Success/Failure Signature Accumulation Rate Random Forest Classifier Sentiment, POS

    Tagging, word/sentence counts, ... Text data Stopwords, Lemmatization Features Feature Engineering Gradient Boosting Regressor
  8. Peter Winslow The Professional PhD + 1 Postdoc in theoretical

    High Energy Physics and Cosmology. Origin of matter in the Universe New Father! Kiana Winslow, born Nov. 29th 2016
  9. Algorithms: Classification Random Forest Classifier (Scikit-Learn) Predict success/failure of petition

    Reasons for choosing: • Lots of complication yet resistant to overfitting Challenges: • Class imbalance in the data Validation: Train-Test-evaluation split with 5-fold CV Backup Slides
  10. Algorithms: Regression GradientBoostingRegressor (Scikit-Learn) Predict signature accumulation rate Reasons for

    choosing: • Many features, highly non-linear, can return predicted “quantiles” Challenges: • The right evaluation metric? Validation: Train-Test-evaluation split with 5-fold CV Backup Slides