From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

From Data Science to Production 01 deploy, scale, enjoy! Sergii
Khomenko, Data Scientist [email protected], @lc0d3r PyData Amsterdam - March 12, 2016

Sergii Khomenko 2 Data scientist at one of the biggest
fashion communities, Stylight. Data analysis and visualisation hobbyist, working on problems not only in working time but in free time for fun and personal data visualisations. Originally from computer engineering background. Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM 2016

Fellow DevOps 3 Quentin Nerden Milos Radovanovic Patrick Roelke

Profitable Leads Stylight provides its partners with high- quality leads
enabling partner shops to leverage Stylight as a ROI positive traffic channel. Inspiration Stylight offers shoppable inspiration that makes it easy to know what to buy and how to style it. Branding & Reach Stylight offers a unique opportunity for brands to reach an audience that is actively looking for style online. Shopping Stylight helps users search and shop fashion and lifestyle products smarter across hundreds of shops. 4 Stylight – Make Style Happen Core Target Group Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.

Stylight – acting on a global scale

Experienced & Ambitious Team Innovative cross- functional organisation with flat
hierarchy builds a   unique team spirit. • +200 employees • 40 PhDs/Engineers • 28 years average age • 63% female • 23 nationalities • 0 suits 6

7 D a t a S c i e n
t i s t : P e r s o n w h o i s b e t t e r a t s t a t i s t i c s t h a n a n y s o f t w a r e e n g i n e e r a n d b e t t e r a t s o f t w a r e e n g i n e e r i n g t h a n a n y s t a t i s t i c i a n .

Agenda 8 E a r l y d a y
s o f s t a r t u p s S o f t w a r e e n g i n e e r i n g I m m u t a b l e i n f r a s t r u c t u r e S e r v e r l e s s a r c h i t e c t u r e

The Early Days of Startups 9

Problem deﬁnition: 10 • Many different technologies • Hard to
reproduce data science results • Issues with backward compatibility • Dependency hell • Hard to scale products • Hard to on-board new people

Software engineering 12 built circa 2015-16

Our stack 13

You most likely doing it already 15 • Version control
• Cover code with tests • nosetests, pytest, unittest2 - start small with doc tests - try out TDD: rednose, nose-watch

You most likely doing it already 16 • Cover code
with tests • yes, even your R application could have tests - testthat - devtools • Code reviews • Pair programming

Some of the mentioned problems 17 • Many different technologies
• Issues with backward compatibility • Dependency hell • Hard to on-board new people

18 image from http://udaypal.com/

Some of the mentioned problems 21 • Many different technologies
• Issues with backward compatibility • Dependency hell • Hard to on-board new people

How it could help: 22 • Every technology has its
own container - just docker run • Every package with version defined in Dockerfile - have a base image for more advanced cases • New people - just docker run

23 image from http://udaypal.com/ r-base/Dockerﬁle

24 image from http://udaypal.com/ lc0/docker-shiny-server

Known issues 26 • Images could be really huge •
Try to skip anything you do not need • Alpine Linux as a base image • 5 mb base image (musl libc and BusyBox) • Iron.io has pre-built images based on alpine • python, scala, java, elixir, etc

Known issues 27 16 mb 232 mb

Some of the mentioned problems 28 • Hard to roll
out • Hard to maintain production dependencies

29 image from http://udaypal.com/ AWS ECR

31 image from http://udaypal.com/ CircleCI deployments

Immutable infrastructure 35 Infrastructure as Code

36 N e e d t o u p g
r a d e ? N o p r o b l e m . B u i l d a n e w , u p g r a d e d s y s t e m a n d t h r o w t h e o l d o n e a w a y . N e w a p p r e v i s i o n ? S a m e t h i n g . B u i l d a s e r v e r ( o r i m a g e ) w i t h a n e w r e v i s i o n a n d t h r o w a w a y t h e o l d o n e s .

40 CloudFormation

41 CloudFormation

42 cloudtools/troposphere

45 Terraform

47 Terraform Kubernetes and Docker {Swarm, Compose}

Serverless architecture 48

Possibilities 56 • all Lambdas in one place with version
control • integration tests with real events • proper CI/CD setup

57 CircleCI deployments

60 Cloud functions

Use-case of outlier detection 61

63 custom uniﬁcation pipeline Departments Business Intelligence internal processes variety
of event types and structures

64 Outlier detection to Slack

www.stylight.com [email protected] @lc0d3r

Related links 66 1. Testing Your Code - The Hitchhiker's
Guide to Python 2. https://hub.docker.com/_/r-base/ 3. http://www.alpinelinux.org/ 4. https://github.com/iron-io/dockers 5. Docker Hub: A new stack plus ecosystem partners automate developer workﬂows 6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

Related links 67 7. https://github.com/cloudtools/troposphere 8. CloudFormation UpdatePolicy Attribute 9.
https://www.terraform.io/ 10.(Docker Compose + Docker Swarm) or Kubernetes 11.Google Cloud Functions 12.https://github.com/apex/apex 13.Streaming Data Processing with Amazon Kinesis and AWS Lambda

From Data Science to Production - deploy, scale...

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016

More Decks by Sergii Khomenko

Other Decks in Programming

Featured

Transcript