Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AFM 201705

AFM 201705

lanzani

May 15, 2017
Tweet

More Decks by lanzani

Other Decks in Technology

Transcript

  1. WHO AM I • Imported from Italy ca. 2006 •

    Master & PhD in Theoretical Physics in Leiden (2006-2012) • Consultant Software Quality @ KPMG (2012-2013) • Data Whisperer @ GoDataDriven (2013-2016) • Chief Science Officer @ GoDataDriven (2016-…)
  2. ABOUT GODATADRIVEN • Founded in 2013, part of the Xebia

    group (600 FTE) • 30 machine learning engineers, data scientists, data engineers • Commercial focus on Dutch top 200 or category leaders • Based in Amsterdam
  3. BEST MODEL • Which one would you choose here? •

    It’s about making a tradeoff • This trade off is the most important job of the PO • A 100% correct answer might not exist!!!
  4. ULTIMATELY • It’s about creating value from data • Using

    Machine Learning, Advanced Analytics, and visualization
  5. BEYOND THE DATA WAREHOUSE Traditional operational data sources EDW Data

    consumer Web app Dashboard / Reporting Traditional Business app
  6. DATA PLATFORM Machine Learning Data pipelines Scale Data consumer Web

    app Dashboard / Reporting Traditional Business app API External API Logs Chat/transcripts Scraping Unstructured data Traditional operational data sources
  7. JOB MARKET 2016 • The biggest differentiator I've seen is

    to be able to participate in actually building production quality systems vs being proficient enough in R or python to hack together a prototype on a very small dataset • Supply of the second group keeps growing while demand is flat or shrinking • especially as executives get burned by “data scientists” who don't know how to help them build things of value
  8. JOB MARKET 2017 • I am seeing the same things

    happening • We (GoDataDriven) are definitely only interested in these profiles (people who are already there, or that are getting there) • Many of our clients are in the same position
  9. HIRING • Companies that are not engineering driven, often have

    trouble hiring good technical people • The “IQ” test is not really representative of applied data science • At GoDataDriven we do a “at home, at your convenience” assessment • Real dataset, real business question, real product
  10. KAGGLE CURSE • Many data scientists approach the problem at

    hand with a Kaggle-like mentality: delivering the best model in absolute terms, no matter what the practical implications are. • In reality it's not the best model that we implement, but the one that combines quality and practicality. • Netflix competition
  11. ORG STRUCTURE One Team to rule them all, One Team

    to find them, One Team to bring them all and in the data bind them
  12. CENTRAL TEAM • Highly skilled • They only do the

    difficult parts • Easier to attract people with everchanging challenges • Easier to cross pollinate knowledge • Formalized and efficient process • Backlog out of the business • Prioritization is unclear • Maintenance and continuous improvement is done by different people
  13. MULTIPLE TEAMS • Backlog in the hand of the business

    • Maintenance and improvement are (mostly) done by the same people • More difficult to have many good teams • Risk of reinventing the wheel everywhere • Sharing knowledge is not as easy • Lots of different standards • Difficult professionalization as the business has different priorities
  14. Data Platform DATA CONTRACTS BU BU BU Data Lake Data

    Products Business Business Business IT IT IT App App App external data collection internal data collection