lot of code • Increasingly, this is software rather scripts • Academics ◦ Teaching ◦ Writing papers ◦ Implementations • Tool Builders ◦ Maintaining popular open source tools • Industry ◦ Deploying models that drive business decisions ◦ Maintaining code ◦ Knowledge transfer
◦ Engendering trust ◦ Useful and widely felt contributions ◦ Be a good collaborator or colleague • Academics ◦ Trust through reproducibility • Tool Building ◦ Trust through working, easy to use, and documented code • Industry ◦ Trust through demonstrated ability to deliver
projects ◦ Most often code ◦ Also works for collaborative paper writing • How does this make you impactful? ◦ Backup ◦ Collaboration ◦ Distribution and integration ◦ Legacy ◦ Github as a resume • What tools are available? ◦ Git ◦ Github
exactly what it is supposed to do • Unit tests should test the smallest possible unit • How does this make you impactful? ◦ Documentation (not a substitute for real documentation!) ◦ Improves code quality through modularity ◦ Refactor code quickly ◦ Maintainable code base ◦ Reduces risks of bugs and even retractions
Many advocate to write a test first • When you discover a bug • How does this make you impactful? ◦ Makes you think about code design ◦ Easy for you and others to change and improve your code
your “master” branch ◦ Coupled with unit tests ▪ Always have code that works ▪ Always have a paper that compiles • How does this make you impactful? ◦ Trust ◦ Makes iteration fast
sudo apt-get -qq update - sudo apt-get install -y pdflatex install: - pip install -r numpy scipy scikit-learn script: py.test your-project && make pdf • An example using Travis CI
Resource isolation and limits for groups of processes • Reliably run software from one computational environment in another • What problems can containers solve? ◦ Your colleague has a different setup than you ◦ Your development environment looks different than your test environment ◦ You need to limit the computational resources of an application • Running groups of containers with tools like Kubernetes ◦ Spark ◦ dask.distributed • How does this make you impactful? ◦ Makes CI easier ◦ Makes distribution of your code easier ◦ Easy to try things out
Environment is defined by a Dockerfile FROM ubuntu:14.04 MAINTAINER "Open Source Dev Team [email protected]" RUN apt-get update && \ apt-get install -y git gcc r-base CMD ["R"] • Large user community around Docker Hub $ docker run -d -p 8000:8000 --name jupyter jupyter/datascience-notebook Unable to find image 'jupyter/datascience-notebook:latest' locally latest: Pulling from jupyter/datascience-notebook ...
The Pragmatic Programmer ◦ The Senior Software Engineer • Tools ◦ The Git Book ◦ The Docker Book • Testing ◦ xUnit Test Patterns • General Programming ◦ The Structure and Interpretation of Computer Programs (aka The Wizard Book)