Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine learning: boldly going where Twitter's APIs don't

Machine learning: boldly going where Twitter's APIs don't

MetaBeerTalks given by James Kember, Engineer, at MetaBroadcast on March 12th, 2014

MetaBroadcast

March 12, 2014
Tweet

More Decks by MetaBroadcast

Other Decks in Programming

Transcript

  1. Twitter • Lots of data! • Limited resources (180 calls

    per 15 mins) on /statuses/show:id • We can get more resource by using our users API allowance, but that still isn't enough • API doesn't give all we want :( (Favourites)
  2. Favourites • Stream api doesn't give favourite count :( •

    Polling individual tweets does • We just want the totals per tweet • So we need to pick the most active tweets • Most tweets have no favourites on them
  3. Metric gathering • We need metrics! • Sometimes you aren’t

    sure what the metrics are (limited resources) • Random sampling for a baseline
  4. 1st Attempt • Hand crafted • Requires good domain knowledge

    • Difficult to get all the rules in place (required quite a few tweaks based on looking at results)
  5. Machine Learning • Works well in a big data domain

    with random elements (Twitter..) • Possible to learn trends that you would not think of given appropriate data set • Neural Network + Backpropagation • After picking the dataset and the right algorithm it does all the work for you!