proprietary and confiden3al informa3on and may be subject to a non-‐disclosure agreement. If you have received this in error, please no3fy the sender immediately. 1. Analy3cs / Big Data intro 2. Data Science 101 workshop • Data cleaning • Variable selec3on • Parametric modeling • Non parametric modeling 3. Discussion
… Observa3ons AMributes “Short and fat” … e.g. • 10k customer quotes and conversion with hundreds of demographic aMributes • 100k insurance claims with thousands of vehicle data points • High informa3on content “Tall and skinny” … e.g. • billions of tweets with user and content only • web log data with IP and controller ac3on only • Petabytes and more
Observa3ons AMributes How do we apply machine learning and statistics in distributed, streaming systems? How do we translate complicated models in to realtime analytics? How do we get basic insights at scale? Which attributes matter the most? Data science (today’s focus) Software engineering
learning Data reduc3on Enriched data Raw star3ng data Hardware BeMer decisions Track and learn capacity results sources approach The sequence of analy9cal ac9vi9es has not fundamentally changed …
batch & real9me Parametric model Non parametric model if mycustomer.relationship.starts_with?(‘married’)! render ‘expensive_products’! else! render ‘discount’! end! mycustomer.each do |k, v|! income += coefficients[k][v]! end! If mycustomer.relationship.starts_with?(‘married’)! if mycustomer.capital.gain < 7000! render ‘a’! else! render ‘b’! end! else # ...! render ‘discount’! end! One way cut
Machine learning Data reduc3on Enriched data Raw star3ng data Hardware BeMer decisions The goal posts are shiJing with innova9on at every level Then Now Silo’d by channel Cross channel & industry op3miza3on Niche providers Tools & capabili3es PivoMables Hosted BI pla^orms Linear regression Non parametric SQL scripts Hive/Pig/Nosql Enrich only ‘as needed’ Gather everything CRM data Every touchpoint Dedicated capacity AWS/Private clouds
or new entrant? E.g. • Banks • Telcos • Retailers • B2C startups • What will it mean for service providers? E.g. • Technology conglomerates • Consul3ng firms • B2B startups • What will be outsourced / insourced? • Does Hong Kong have the poten3al to be a Big Data hub? www.demystdata.com 12 Ok. So what?