Make you a more productive scientist Science is only going to get more computational Robots can do PCR Robots can not do analysis and interpretation (yet)
Make you a more productive scientist 2. Makes you a better scientist There is a reproducibility crisis Open code + open data easily reproducible results
Make you a more productive scientist 2. Makes you a better scientist 3. Makes you marketable • A “hard” skill that clearly transfers to other fields outside academia • If you are in science and you interpret data, you are a data scientist
Acquire basic literacy in scientific computing Become familiar with data science concepts and software tools Identify projects that will let you build this skills and build a resume
in Scientific Training BEST Data Science club Why are we offering this course? Unmet need for training in our programs, as discussed at Emory Academic Learning Community on Data Science: https://docs.google.com/document/d/1rLCljOhOUVTqH30Nn5Dzu4ugnSSmlt3_npnDjpjfluo/edit?usp =sharing Grad students that want more training face a pre-requisite pipeline: https://github.com/SarTaku/Emory-Data-Science-Classes
Member of Sober lab in Biology Contributor to / developer of open source software for science Open science advocate Future: applied machine learning + artificial intelligence
shell is the layer around the kernel of the operating system bash is the name for the shell on UNIX systems like Mac or Linux command prompt is the name for the shell on Windows
shell is the layer around the kernel of the operating system bash is the name for the shell on UNIX systems like Mac or Linux command prompt is the name for the shell on Windows both are “command line” i.e. not a “graphical user interface” (GUI)
makes it easier to automate makes it easier for you to use many scientific software libraries often the only way to interact with computers on a network / in the cloud / on a server that process big data
Its name is written as a path Slashes in the path tell you: the directory to the right of the slash “lives inside” the directory to the left of the slash
back to the directory I came from? bash & command prompt $cd .. to go to the parent directory e.g. C\Users\dnicho4\Documents>cd .. C\Users\dnicho4>
changes to a file or set of files over time so that you can recall specific versions later” -- https://git- scm.com/book/en/v2/Getting-Started- About-Version-Control From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
track your own writing / analysis code / data 2. to collaborate with others on code / paper 3. to contribute to open source software, such as scientific libraries
track your own writing / analysis code / data 2. to collaborate with others on code / paper 3. to contribute to open source software, such as scientific libraries Each of these will have a slightly different workflow
hosting service” free, accessible by anyone on the web git with extra features for control + collaboration • access control: repository owner, collaborators, etc. • forking – copy repo to your profile • pull requests, merge • issues -- bug tracking + feature requests • task management
hosts all major open source libraries • including for scientific computing + data science • your way to share your science – reproducibility • your on-line data science resume – marketable(ity) • social network for programmers
does not track individual files by the changes you make to them Instead it creates “a stream of snapshots” of all the files From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
of files in a git repo: 1. untracked 2. modified 3. staged 4. unmodified (committed) From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) textfile.txt nothing added to commit but untracked files present (use "git add" to track)
is now “unmodified” From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: textfile.txt no changes added to commit (use "git add" and/or "git commit - a")
it is modified From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
of our repo From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
git add YourfirstnameYourlastname.md $ git commit –m “add contact info for $ git push –u origin add-contact-info “push” – to the remote copy of the repo, named “origin” “-u” – short version of “--set-upstream”