Automate Your Boilerplate

Automate Your Boilerplate Chris Musselle Senior Data Science Consultant Twitter:
@chrismusselle Email: [email protected]

Introduction • Data Scientist / Developer / DevOps • Resident
Python Specialist • I work hard to be lazy!

Data Science Projects No standard path to get from A
to B Common issues: • Consistency • Reproducibility • Reusability

Consistency Helps • Using standard layouts • Analysis is not
a package • Useful to automate this boilerplate setup example-project ├── analysis │ ├── 01-Explore.ipynb │ └── source.py ├── data │ ├── interim │ ├── prepared │ └── raw ├── output ├── Makefile ├── README.md └── environment.yml

• Uses jinja2 templating engine • Allows pre/post hooks (.py
or .sh) • Language agnostic • Many existing templates – https://drivendata.github.io/cookiecutter- data-science/ – https://github.com/audreyr/cookiecutter

• Variable tag in files/directory names and content: • Replace
with variables in a json file: { "my_variable_name": "default value" } {{cookiecutter.my_variable_name}}

Usage Installation pip install cookiecutter Create new project with cookiecutter
path/to/template Or cookiecutter gh:path/to/github/repo

Project Templating Pro • Faster project startup time • Less
distractions, more focus on the issue • More consistent layouts • Collaborators will like you more! Con • Time and effort to create/debug etc.

So Whats the Problem?

Data Science Projects • Different sizes and mixture of: •
Different needs at different times in a projects lifetime. Exploration Construction

Good First Step, but… May later want to add: •
Python package • Tests • Documentation • Docker file • CI build • conda-recipe • Logging • Command line API • …

Issue • Templates generated only once – Need to know
what you want upfront • Many project needs = many templates? – Maintenance issues, more updating/debugging – Many repeated sections • Or one large monolithic template? – Long survey, no multi-select – Unused files are confusing/distracting – Harder to on board new team members

What if We start out with bare minimum and add
in extra components as needed? • Pros: – Greater flexibility – Easier comprehension – Multiple checkpoints • Cons: – Time in upfront design – Longer dev/debug times – Much more complexity

Would need 1. A way to render each template layer
2. A way to handle dependencies between components 3. A way to store state information for the project 4. Tool to setup/add templates as needed

Keep it as simple as possible • Minimise changes •
Use independent files • Favor appending text Projects as Layered Components

Masonry Installation pip install masonry Create new project mason init
path/to/masonry-templates Add a new layer mason add template_layer

Masonry Template Collection Cookiecutter template per directory path/to/masonry-templates │ ├──
package │ ├── {{cookiecutter.project_name}} │ ├── hooks │ └── cookiecutter.json │ ├── pytest │ ├── {{cookiecutter.project_name}} │ ├── hooks │ └── cookiecutter.json │ └── metadata.json { "default": "package", "dependencies": { "pytest": ["package"] } }

Time for a Demo!

Masonry Features • Cookiecutter – pre and post hooks supported
– variables remembered and reused • Interactively select project and templates • Easy pre/post fixing to text and .py files • Git commits after each added layer • Automated check on all the template layers

Summary • Consistency in DS project layout helps • Cookiecutter
- create custom templates and reduce startup times. • Multiple needs throughout a project: – Consider a layered templating approach – Upfront design really pays off

Discussion References https://github.com/audreyr/cookiecutter https://github.com/MrKriss/masonry Slides https://speakerdeck.com/mrkriss/ automate-your-boilerplate Chris Musselle Senior
Data Science Consultant Twitter: @chrismusselle Email: [email protected]

Roadmap • Aim to support all cookiecutter features • Improve
UI feedback • Improve logging for hook scripts and error feedback • pytest-plugin? • Include utilities for common post hook tasks e.g. run subprocess

Automate Your Boilerplate

Automate Your Boilerplate

Chris

More Decks by Chris

Other Decks in Programming

Featured

Transcript

Automate Your Boilerplate Chris Musselle Senior Data Science Consultant Twitter:

Introduction • Data Scientist / Developer / DevOps • Resident

Data Science Projects No standard path to get from A

Consistency Helps • Using standard layouts • Analysis is not

• Uses jinja2 templating engine • Allows pre/post hooks (.py

• Variable tag in files/directory names and content: • Replace

Usage Installation pip install cookiecutter Create new project with cookiecutter

Project Templating Pro • Faster project startup time • Less

So Whats the Problem?

Data Science Projects • Different sizes and mixture of: •

Good First Step, but… May later want to add: •

Issue • Templates generated only once – Need to know

What if We start out with bare minimum and add

Would need 1. A way to render each template layer

Keep it as simple as possible • Minimise changes •

Masonry Installation pip install masonry Create new project mason init

Masonry Template Collection Cookiecutter template per directory path/to/masonry-templates │ ├──

Time for a Demo!

Masonry Features • Cookiecutter – pre and post hooks supported

Summary • Consistency in DS project layout helps • Cookiecutter

Discussion References https://github.com/audreyr/cookiecutter https://github.com/MrKriss/masonry Slides https://speakerdeck.com/mrkriss/ automate-your-boilerplate Chris Musselle Senior

Roadmap • Aim to support all cookiecutter features • Improve