Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stanford ACM Tech Talk - Agile Data and Machine Learning

Ryan Weald
June 02, 2013
190

Stanford ACM Tech Talk - Agile Data and Machine Learning

Talk I gave at a Stanford ACM tech talk in May 2013.

Abstract of the talk was:
How Sharethrough uses the power of Scala, Hadoop, and agile development to built data driven ad products. Young engineers and data scientists often get overwhelmed and caught up with the latest algorithm. We'll discuss how the tools and concepts of software engineering are just as important as picking the right algorithm when building data products. In a startup environment the ability to rapidly iterate on underlying algorithms while shipping product is an exciting challenge, especially given the size of modern data sets.

Ryan Weald

June 02, 2013
Tweet

Transcript

  1. @rweald Outline 1) The problem 2) Understanding the business requirements

    3) 3 keys to moving fast when your data is big 4) Things that make you slower actually make you faster 5) Architecture we used to stay lean 6) Q & A Sunday, June 2, 13
  2. @rweald What this Talk is Not • What algorithms you

    should use • Bleeding edge machine learning • Something that is going to be on your final Sunday, June 2, 13
  3. @rweald Debugging is twice as hard as writing the code

    in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian W. Kernighan and P. J. Plauger in The Elements of Programming Style. Sunday, June 2, 13
  4. @rweald Architectural Abstractions Ad Server Hive For Ad Hoc Reporting

    Raw Input Data Raw Input Data Normalized Session Data Domain Data Ad Serving Models & Reporting Ad Serving Models & Reporting Ad Serving Models & Reporting Aggregated Reporting Data Content Models User Based Models Customer Facing App Sunday, June 2, 13