Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Atlanta BEST short course week 1

David Nicholson
September 28, 2017
140

Atlanta BEST short course week 1

slideshow for week 1 of Atlanta BEST short course in scientific computing and data science

David Nicholson

September 28, 2017
Tweet

Transcript

  1. OUTLINE Introduction  Why learn scientific computing & data science

    skills?  What will you get out of the course?  What is BEST? Who am I Shell Version control systems  git + Github
  2. INTRODUCTION Why learn sci. comp. & data science skills? 1.

    Make you a more productive scientist  Science is only going to get more computational  Robots can do PCR  Robots can not do analysis and interpretation (yet)
  3. INTRODUCTION Why learn sci. comp. & data science skills? 1.

    Make you a more productive scientist 2. Makes you a better scientist  There is a reproducibility crisis  Open code + open data  easily reproducible results
  4. INTRODUCTION Why learn sci. comp. & data science skills? 1.

    Make you a more productive scientist 2. Makes you a better scientist 3. Makes you marketable • A “hard” skill that clearly transfers to other fields outside academia • If you are in science and you interpret data, you are a data scientist
  5. INTRODUCTION What will you get out of this course? 

    Acquire basic literacy in scientific computing  Become familiar with data science concepts and software tools  Identify projects that will let you build this skills and build a resume
  6. INTRODUCTION What will you get out of this course? In

    class: - what you need to know to be able to do the exercises - what you won’t be able to learn by reading random websites
  7. INTRODUCTION What is the Atlanta BEST program?  Broadening Experiences

    in Scientific Training  BEST Data Science club Why are we offering this course? Unmet need for training in our programs, as discussed at Emory Academic Learning Community on Data Science: https://docs.google.com/document/d/1rLCljOhOUVTqH30Nn5Dzu4ugnSSmlt3_npnDjpjfluo/edit?usp =sharing Grad students that want more training face a pre-requisite pipeline: https://github.com/SarTaku/Emory-Data-Science-Classes
  8. INTRODUCTION Who am I?  GDBBS Neuroscience program graduate 

    Member of Sober lab in Biology  Contributor to / developer of open source software for science  Open science advocate  Future: applied machine learning + artificial intelligence
  9. SHELL AKA: bash, command prompt What is it?  the

    shell is the layer around the kernel of the operating system
  10. SHELL AKA: bash, command prompt What is it?  the

    shell is the layer around the kernel of the operating system  bash is the name for the shell on UNIX systems like Mac or Linux
  11. SHELL AKA: bash, command prompt What is it?  the

    shell is the layer around the kernel of the operating system  bash is the name for the shell on UNIX systems like Mac or Linux  command prompt is the name for the shell on Windows
  12. SHELL AKA: bash, command prompt What is it?  the

    shell is the layer around the kernel of the operating system  bash is the name for the shell on UNIX systems like Mac or Linux  command prompt is the name for the shell on Windows  both are “command line” i.e. not a “graphical user interface” (GUI)
  13. SHELL Why should you learn how to use it? 

    i.e., why learn a bunch of cryptic commands you have to type?
  14. SHELL Why should you learn how to use it? 

    makes it easier to automate  makes it easier for you to use many scientific software libraries  often the only way to interact with computers on a network / in the cloud / on a server that process big data
  15. SHELL What do you do with it?  Work with

    the file system  Run programs, scripts
  16. SHELL Navigating the file system  you do this with

    File Explorer on modern operating systems that provide GUI environments
  17. SHELL Navigating the file system  Here’s what it’s like

    to be in the same location in the terminal
  18. SHELL Directories  Notice we start in our home directory

     Its name is written as a path  Slashes in the path tell you:  the directory to the right of the slash “lives inside” the directory to the left of the slash
  19. SHELL Directories  How do I see what what’s in

    this directory?  bash:  $ ls
  20. SHELL Directories  How do I see what what’s in

    this directory?  bash:  $ ls  command prompt:  > dir
  21. SHELL Navigating the file system  How do I move

    to another directory?  bash & command prompt  $cd dirname  e.g.  C\Users\dnicho4>cd Documents  C\Users\dnicho4\Documents>
  22. SHELL Navigating the file system  How do I go

    back to the directory I came from?  bash & command prompt  $cd ..  to go to the parent directory  e.g.  C\Users\dnicho4\Documents>cd ..  C\Users\dnicho4>
  23. SHELL Directories  How do I make a new directory?

     bash & command prompt:  $ mkdir dirname
  24. VERSION CONTROL SYSTEMS $ls /david/dissertation FINAL.doc FINAL_rev.2.doc FINAL_rev.6.COMMENTS.doc FINAL_rev.8.comments5.COORECTIONS.d oc

    FINAL_rev.18.comments7.corrections9 .MORE.30.doc FINAL_rev.22.comments49.corrections .10.#@$%whydidicometogradschool???? .doc
  25. VERSION CONTROL SYSTEMS What are they?  “system[s] that [record]

    changes to a file or set of files over time so that you can recall specific versions later”  -- https://git- scm.com/book/en/v2/Getting-Started- About-Version-Control From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
  26. VERSION CONTROL SYSTEMS Why would you use them? • Backup

    and Restore • Synchronization • Short-term undo • Long-term undo • Track Changes • Track Ownership • Sandboxing • Branching and merging https://betterexplained.com/articles/a-visual-guide-to-version-control/
  27. VERSION CONTROL SYSTEMS When would you use them? 1. to

    track your own writing / analysis code / data 2. to collaborate with others on code / paper 3. to contribute to open source software, such as scientific libraries
  28. VERSION CONTROL SYSTEMS When would you use them? 1. to

    track your own writing / analysis code / data 2. to collaborate with others on code / paper 3. to contribute to open source software, such as scientific libraries Each of these will have a slightly different workflow
  29. VERSION CONTROL SYSTEMS: GIT Why use git? 1. branching, cheap

    https://www.atlassian.com/git/tutorials/why-git
  30. VERSION CONTROL SYSTEMS: GIT Why use git? 1. branching, cheap

    2. distributed https://www.atlassian.com/git/tutorials/why-git
  31. VERSION CONTROL SYSTEMS: GIT Why use git? 1. branching, cheap

    2. distributed 3. it’s what Github uses
  32. VERSION CONTROL SYSTEMS: GIT What is Github?  “Git repository

    hosting service”  free, accessible by anyone on the web  git with extra features for control + collaboration • access control: repository owner, collaborators, etc. • forking – copy repo to your profile • pull requests, merge • issues -- bug tracking + feature requests • task management
  33. VERSION CONTROL SYSTEMS: GIT What is Github?  Also: •

    hosts all major open source libraries • including for scientific computing + data science • your way to share your science – reproducibility • your on-line data science resume – marketable(ity) • social network for programmers
  34. VERSION CONTROL SYSTEMS: GIT How does git work?  git

    does not track individual files by the changes you make to them  Instead it creates “a stream of snapshots” of all the files From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
  35. VERSION CONTROL SYSTEMS: GIT How does git work?  States

    of files in a git repo: 1. untracked 2. modified 3. staged 4. unmodified (committed) From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/
  36. VERSION CONTROL SYSTEMS: GIT workflows today: 1. track your own

    work  so that all the crap I just said starts to make sense
  37. VERSION CONTROL SYSTEMS: GIT workflows today: 1. track your own

    work 2. collaborate with others  experience stuff you won’t get through an on-line tutorial  help us show interest in this class at Emory
  38. VERSION CONTROL SYSTEMS: GIT workflows today: 1. track your own

    work 2. collaborate with others last session: 3. contribute to open source projects
  39. VERSION CONTROL SYSTEMS: GIT workflow 1: tracking changes first time

    git set up: $ git config --global user.name "John Doe“ $ git config --global user.email [email protected]
  40. VERSION CONTROL SYSTEMS: GIT workflow 1: tracking changes working with

    git all commands will take the form of: $ git verb --flag arguments for example $ git init --help launching browser to display html
  41. VERSION CONTROL SYSTEMS: GIT workflow 1: tracking changes $ cd

    ~/Documents $ mkdir first_repo $ cd first_repo  > cd C:\Users\you\Documents
  42. VERSION CONTROL SYSTEMS: GIT $ git status # On branch

    master # # Initial commit # nothing to commit (create/copy files and use "git add" to track)
  43. VERSION CONTROL SYSTEMS: GIT $ git status On branch master

    Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) textfile.txt nothing added to commit but untracked files present (use "git add" to track)
  44. VERSION CONTROL SYSTEMS: GIT so currently our file is untracked

    From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
  45. VERSION CONTROL SYSTEMS: GIT $ git add textfile.txt $ git

    status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: textfile.txt
  46. VERSION CONTROL SYSTEMS: GIT we have now staged our file

    From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
  47. VERSION CONTROL SYSTEMS: GIT $ git commit –m “initial commit”

    [master (root-commit) 12a8927] initial commit 1 file changed, 1 insertion(+) create mode 100644 textfile.txt
  48. VERSION CONTROL SYSTEMS: GIT $ git status On branch master

    nothing to commit, working directory clean
  49. VERSION CONTROL SYSTEMS: GIT we make a commit, the file

    is now “unmodified” From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
  50. VERSION CONTROL SYSTEMS: GIT $ echo “I want this in

    my textfile instead” > textfile.txt
  51. VERSION CONTROL SYSTEMS: GIT $ git status On branch master

    Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: textfile.txt no changes added to commit (use "git add" and/or "git commit - a")
  52. VERSION CONTROL SYSTEMS: GIT we changed the file, git sees

    it is modified From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
  53. VERSION CONTROL SYSTEMS: GIT $ git add * $ git

    commit –m “change text in textfile.txt” [master cb94cce] change text in textfile.txt 1 file changed, 1 insertion(+), 1 deletion(-)
  54. VERSION CONTROL SYSTEMS: GIT we now taken a second “snapshot”

    of our repo From the Git book https://git-scm.com/book/en/v2 Used under license CC BY-NC-SA 3.0 https://creativecommons.org/licenses/by-nc-sa/3.0/ textfile.txt
  55. VERSION CONTROL SYSTEMS: GIT $ git log commit cb94cce17b50e54817ac9a953ae6105dd6f46bf8 Author:

    nickledave <[email protected]> Date: Thu Sep 28 15:03:28 2017 -0400 change text in textfile.txt commit 12a8927267eb0937a4d21a5d0d5e14ef0efccaac Author: nickledave <[email protected]> Date: Thu Sep 28 14:56:27 2017 -0400 initial commit
  56. VERSION CONTROL SYSTEMS: GIT workflow 2: collaborating through Github clone

    your fork of the repo $ git clone https://github.com/yourusername/atlanta-best- contacts cloning into repo...
  57. VERSION CONTROL SYSTEMS: GIT workflow 2: collaborating through Github make

    a new branch to work on $ git checkout –b add-my-info
  58. VERSION CONTROL SYSTEMS: GIT workflow 2: collaborating through Github edit

    one of the files Save as: YourfirstnameYourlastname.md
  59. VERSION CONTROL SYSTEMS: GIT workflow 2: collaborating through Github $

    git add YourfirstnameYourlastname.md $ git commit –m “add contact info for $ git push –u origin add-contact-info “push” – to the remote copy of the repo, named “origin” “-u” – short version of “--set-upstream”