PyConSE 2015 Opening Keynote "Data Science Delivered"

PyConSE 2015 Opening Keynote "Data Science Delivered"

Opening keynote for PyCon Sweden 2015 discussing how to start, develop and deploy a successful data product using Python

3d644406158b4d440111903db1f62622?s=128

ianozsvald

May 13, 2015
Tweet

Transcript

  1. Data Science Deployed Turning raw data into valuable services Ian

    Ozsvald @IanOzsvald ModelInsight.io
  2. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Who Am I? • “Industrial

    Data Science” for 15 years • O'Reilly Author • Teacher at PyCons
  3. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 PyDataLondon Meetups

  4. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 I want to encourage you

    to... • Mix “data people” and “engineers” to deliver high-value products so we can... • Go faster than humans • Be more accurate than humans • Be consistent and reproducible • I want you to become a data scientist Attrib: http://www.xara.com/news/april07/tutorial2.asp
  5. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Who is a Data Scientist?

    http://datascopeanalytics.com/what-we-think/2014/02/05/what-is-a-data-scientist
  6. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Why 'now'? http://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users

  7. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Why is it valuable? •

    “Massively customised service” • Data Moats are hard to copy
  8. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Why is it valuable?

  9. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 “A day in my life”

    • “How can I turn our data into business value?” • Thinking on our data quality and transformations to improve quality • How can I better predict or classify something that's valuable? • Deploying, testing, documenting
  10. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Starting your first project •

    Need: High value & easy problem • Share insight, augment data, automate a process or predict the future • Deliver value at the end of day 1, day 2, week 1, week 2, month 1 etc • Tutorials on my blog (IanOzsvald.com)
  11. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Example of “insight” Data via:

    https://twitter.com/echen/status/594353863374737409 http://ianozsvald.com/2015/05/03/talkpay-tweet-salary-visualisation/
  12. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Example of “insight”

  13. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Example of “insight”

  14. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Extracting data from binary files

    • Copy/pasting PDF/PNG data is laborious • How can we scale it? • textract - unified interface • Apache's Tika (maybe) better • Specialised tools e.g. Sovren • Think on pipelines of transforms
  15. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Optical Character Recognition

  16. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Optical Character Recognition

  17. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Augmenting data • Identifying people,

    places, brands, sentiment • “i love my apple phone” • Context-sensitive (e.g movies vs products) • Accurately count mentions & sentiment
  18. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Augmenting images

  19. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Predicting the unknown • Forecasting

    the future or filling the gaps • Demand prediction, life expectancy, price estimation
  20. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Predicting the unknown

  21. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Gaussian Process price estimates

  22. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Classification • “Is it X

    or is it something else?” • Spam, malware, lead identification, text disambiguation, fraud classification • Many examples online, lots of tutorials
  23. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Digit classification

  24. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 More problems we can solve

    • Text topic detection • Duplicate detection • Data cleaning • Copyright violation (DMCA) • Speech recognition for call centre automation
  25. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Tooling IDE: Spyder (PyCharm) Notebooks

    great for tutorials & demos, not as an IDE
  26. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 First project: outline • Iterate

    on: • Visualise • Seaborn/Bokeh • Create milestones • KISS! • Think+hypothesise+test • Communicate results • IPython Notebook • (Engineer a solution)
  27. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Don't Kill It! • Your

    data is missing, it is poor and it lies • Missing data kills projects! • Log everything! • Make data quality reports • R&D != Engineering • Discovery-based • Iterative • Success and failure equally useful
  28. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Internal deployment • Scripts to

    drive report • CSVs/Reports • Database updates • IPython Notebook (not secure though!) • Bokeh
  29. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Deploying live systems • Spyre

    (locked-down) • Microservices • Flask is my go-to tool • Swagger docs • (git pull / fabric / provisioned machines) • Docker + Amazon ECS
  30. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 flask-restful-swagger

  31. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Avoid Big Data if possible...

    • Don't be in a rush - 5000 lines of good data will beat a pile of Bad Big Data • 244GB RAM EC2+many Xeons $2.80/hr • Scaling options: • ElasticSearch + Jython/Java • Azure/Amazon ML • Apache Spark # if you have HDFS already
  32. Ian.Ozsvald@ModelInsight.io @IanOzsvald PyConSE May 2015 Frågor? • We have a

    crazy-good selection of tools! • Don't worry about imposter syndrome - your business knowledge has a lot of value • We need data science patterns - what's your story? • Ask me how you can get started (I respond well to beer) • ianozsvald.com