Reproducible Data Science with Docker by Richard Ackon

Slide 1

Slide 1 text

Reproducible Data Science with Docker By: Richard Ackon @esquire_gh

Slide 2

Slide 2 text

Who Am I? ● Machine Learning Engineer, Kudobuzz ● Co-organizer, Accra Artificial Intelligence Meetup ● Writer for Analytics Vidhya, Divo.com

Slide 3

Slide 3 text

Overview ● Reproducible Data Science? ● Why is it important? ● Where do we need reproducibility? ● How do we achieve reproducibility ● Demo ● Conclusion

Slide 4

Slide 4 text

What is Reproducible Data Science? The ability to replicate the same results for a data science experiment using the same data and code running in the same environment.

Slide 5

Slide 5 text

Why is it important? “non-reproducible single occurrences are of no significance to science.” - Karl Popper ● Proof of phenomenon ● Facilitates peer review ● Basis for decision making

Slide 6

Slide 6 text

Where do we need it? ● Data ● Environment ● Code

Slide 7

Slide 7 text

So, How do we achieve reproducibility?

Slide 8

Slide 8 text

Common Data Science Workflow

Slide 9

Slide 9 text

Common Reproducibility Errors

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Docker ● Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. ● Containers allow you to package an application with everything it needs to run, such as libraries and other dependencies.