Reproducible Data Science with Docker by Richar...

October 12, 2018

220

Reproducible Data Science with Docker by Richard Ackon

Collaboration is a major part of doing Data Science. This means Data Scientists are always sharing their work with their colleagues whether to continue in the Data Science process or for review. One problem that is mostly faced in this process is the "It works on my machine" problem.

Docker is a tool that is used to package and run applications with all their dependencies in an isolated environment.

In this talk, I'll use Python to analyse some data in jupyter notebooks and show how Docker can be used to ensure reproducibility of that analysis in a different environment.

This talk will cover:

The basics of the data science workflow
The basics of Docker
A demonstration of sharing and reproducing data analysis work in a jupyter notebook.

Pycon ZA

October 12, 2018

Tweet

More Decks by Pycon ZA

See All by Pycon ZA

Trio: Structured Concurrency for Python by Jeremy Thurgood

0

210

Preparing for the great snakes migration by Heather Williams

0

80

Satellite Data and Super-Resolution to enhance a Slope Soaring Simulator by Schalk Heunis

0

140

"Should we just go home on the third Friday afternoon?" by Kim van Wyk

0

95

"Dolosse: Distributed Physics Data Acquisition System" by Bertram Losper & Sehlabaka Qhobosheane

0

140

Modern JavaScript for Python Developers by Cory Zue

0

300

Making Art with Python by Kirk Kaiser

0

200

"Posits: A proposed new floating point number format for ML" by Kevin Colville

0

130

"Building a label printer using Python, Arduino, duct tape and paperclips" by Johan Beyers

0

260

Other Decks in Technology

See All in Technology

金融サービスにおける高速な価値提供とAIの役割 #BetAIDay

PRO

1

720

解消したはずが…技術と人間のエラーが交錯する恐怖体験

0

190

AWS DDoS攻撃防御の最前線

0

120

dipにおけるSRE変革の軌跡

PRO

1

230

LTに影響を受けてテンプレリポジトリを作った話

0

290

生成AI時代におけるAI・機械学習技術を用いたプロダクト開発の深化と進化 #BetAIDay

PRO

1

1k

Claude Codeから我々が学ぶべきこと

8

1.9k

私とAWSとの関わりの歩み～意志あるところに道は開けるかも？～

1

160

Claude Codeは仕様駆動の夢を見ない

11

1.9k

ビジネス文書に特化した基盤モデル開発 / SaaSxML_Session_2

0

260

製造業の課題解決に向けた機械学習の活用と、製造業特化LLM開発への挑戦

0

150

ロールが細分化された組織でSREと協働するインフラエンジニアは何をするか？ / SRE Lounge #18

0

170

Featured

See All Featured

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

26

3k

How to train your dragon (web standard)

96

6.1k

Code Reviewing Like a Champion

524

40k

Practical Orchestrator

190

11k

The Invisible Side of Design

301

51k

How to Ace a Technical Interview

278

23k

StorybookのUI Testing Handbookを読んだ

30

6k

Intergalactic Javascript Robots from Outer Space

272

27k

GraphQLの誤解/rethinking-graphql

71

11k

A better future with KSS

238

17k

[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails

35

2.5k

Fashionably flexible responsive web design (full day workshop)

407

66k

Transcript

Reproducible Data Science with Docker By: Richard Ackon @esquire_gh
Who Am I? • Machine Learning Engineer, Kudobuzz • Co-organizer,
Accra Artificial Intelligence Meetup • Writer for Analytics Vidhya, Divo.com
Overview • Reproducible Data Science? • Why is it important?
• Where do we need reproducibility? • How do we achieve reproducibility • Demo • Conclusion
What is Reproducible Data Science? The ability to replicate the
same results for a data science experiment using the same data and code running in the same environment.
Why is it important? “non-reproducible single occurrences are of no
significance to science.” - Karl Popper • Proof of phenomenon • Facilitates peer review • Basis for decision making
Where do we need it? • Data • Environment •
Code
So, How do we achieve reproducibility?
Common Data Science Workflow
Common Reproducibility Errors
None
Docker • Docker is a tool designed to make it
easier to create, deploy, and run applications by using containers. • Containers allow you to package an application with everything it needs to run, such as libraries and other dependencies.
None
Demo Using Docker to ensure reproducibility
Thank you!