Slide 1

Slide 1 text

@rweald Agile Data and Machine Learning Ryan Weald http://rweald.github.io Sunday, June 2, 13

Slide 2

Slide 2 text

@rweald Who is this guy? Sunday, June 2, 13

Slide 3

Slide 3 text

@rweald * Courtesy of Big Data Borat Me Sunday, June 2, 13

Slide 4

Slide 4 text

@rweald Data Scientist @Sharethrough Native advertising platform Sunday, June 2, 13

Slide 5

Slide 5 text

@rweald Outline 1) The problem 2) Understanding the business requirements 3) 3 keys to moving fast when your data is big 4) Things that make you slower actually make you faster 5) Architecture we used to stay lean 6) Q & A Sunday, June 2, 13

Slide 6

Slide 6 text

@rweald What this Talk is Not • What algorithms you should use • Bleeding edge machine learning • Something that is going to be on your final Sunday, June 2, 13

Slide 7

Slide 7 text

@rweald The Problem Sunday, June 2, 13

Slide 8

Slide 8 text

@rweald How to ensure you don’t throw away 3 months of work Sunday, June 2, 13

Slide 9

Slide 9 text

@rweald Sunday, June 2, 13

Slide 10

Slide 10 text

@rweald Understand the Business Requirements Sunday, June 2, 13

Slide 11

Slide 11 text

@rweald How Good Does your algorithm need to be? Sunday, June 2, 13

Slide 12

Slide 12 text

@rweald Move Fast!! Sunday, June 2, 13

Slide 13

Slide 13 text

@rweald 3 keys to moving fast with a large data set Sunday, June 2, 13

Slide 14

Slide 14 text

@rweald 1) Create a smaller sample of your data Sunday, June 2, 13

Slide 15

Slide 15 text

@rweald Sunday, June 2, 13

Slide 16

Slide 16 text

@rweald 2) Normalize and compress Sunday, June 2, 13

Slide 17

Slide 17 text

@rweald 3) Utilize powerful open source tools Sunday, June 2, 13

Slide 18

Slide 18 text

@rweald Sunday, June 2, 13

Slide 19

Slide 19 text

@rweald Most Importantly Sunday, June 2, 13

Slide 20

Slide 20 text

@rweald Running in production is hard! Sunday, June 2, 13

Slide 21

Slide 21 text

@rweald Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. Brian W. Kernighan and P. J. Plauger in The Elements of Programming Style. Sunday, June 2, 13

Slide 22

Slide 22 text

@rweald Who Understands your Algorithm? Sunday, June 2, 13

Slide 23

Slide 23 text

@rweald Monitoring is Key Sunday, June 2, 13

Slide 24

Slide 24 text

@rweald Monitoring is Key WTF? Sunday, June 2, 13

Slide 25

Slide 25 text

@rweald What you think makes you slower actually makes you faster Sunday, June 2, 13

Slide 26

Slide 26 text

@rweald Write Tests!!!! Sunday, June 2, 13

Slide 27

Slide 27 text

@rweald When People Don’t Write Tests (›°□°)›ớ ᵲ━ᵲ Sunday, June 2, 13

Slide 28

Slide 28 text

@rweald Abstractions are key Sunday, June 2, 13

Slide 29

Slide 29 text

@rweald Architectural Abstractions Ad Server Hive For Ad Hoc Reporting Raw Input Data Raw Input Data Normalized Session Data Domain Data Ad Serving Models & Reporting Ad Serving Models & Reporting Ad Serving Models & Reporting Aggregated Reporting Data Content Models User Based Models Customer Facing App Sunday, June 2, 13

Slide 30

Slide 30 text

@rweald Functional Programming Abstractions Sunday, June 2, 13

Slide 31

Slide 31 text

@rweald Functional Programming Abstractions Sunday, June 2, 13

Slide 32

Slide 32 text

@rweald We’re Hiring http://bit.ly/str-engineering Sunday, June 2, 13

Slide 33

Slide 33 text

@rweald Thanks! Twitter: @rweald Email: [email protected] Sunday, June 2, 13