Human body habitats skin (n=3) dorsal tongue (n=3) gut (human feces) (n=5) Environmental samples soil (n=3) freshwater lake (n=2) freshwater creek (n=3) ocean (n=3) marine sediment (n=3) Mock communities 67 bacterial strains pooled at even abundance (n=3) Replication: 5 prime reads and 3 prime reads analyzed independently
produces the same series of internal states and results in the same output. • Non-‐determinis5c algorithm: a given input may produce diﬀerent internal states and/or result in a diﬀerent output. – Commonly probabilis5c algorithms in bioinforma5cs
allowed • Tends to be what biologists are interested in as it’s nearly impossible to truly replicate an experiment (e.g., microevolu5on) and it oYen would be too expensive (e.g., replica5ng the Human Microbiome Project) • More interes5ng than replicability: is your conclusion robust?
– Modiﬁca5ons to ﬁles can be made and tracked (i.e., you know who made them and when) – You can revert to previous revisions (roll back changes or access something that was previously deleted) – Repository can be made public so others can view your history, access diﬀerent versions, or collaborate with you.
a resume builder. Showcases your development and communica5on skills. • Scien5ﬁc integrity: providing others access to your source (including old versions) allows them to reproduce your analysis. • Others can contribute. • Less experienced developers can learn from your code.
QIIME*) • Only lead developers have push access. • All devs have pull access (even though it’s not technically necessary). • All pushes to master go in as pull requests, including pushes from the lead developers. • Code reviews are performed using the GitHub pull requests. • Discussion of new code should happen on the page associated with the pull request, not by email. • All feature requests and bug reports should happen via the GitHub issue tracker system. * hDps://github.com/qiime/qiime As a development group, we’re fairly new to GitHub, so these strategies may change with 5me.
opera5ng system running within a “host” opera5ng system • A soYware implementa5on of a computer, that operates like a physical computer. • A developer can create a virtual machine image which contains their tools pre-‐installed. Users can then instan)ate that image to work with those tools. Browse this page: hDp://en.wikipedia.org/wiki/Virtual_machine
• Ideally will supplement your lab notebook (for successful runs) – Version informa5on – exact command that was run – Any ‘subcommands’ that were run – Details on input ﬁles (path, md5) – System conﬁgura5on details
func5on: determinis5c func5on which takes some input and returns a ﬁxed-‐size string – changing the input should change the return value From Wikipedia: • it is easy (but not necessarily quick) to compute the hash value for any given message • it is infeasible to generate a message that has a given hash • it is infeasible to modify a message without changing the hash • it is infeasible to ﬁnd two diﬀerent messages with the same hash
United States License. To view a copy of this license, visit hDp://crea5vecommons.org/licenses/by/3.0/us/ or send a leDer to Crea5ve Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Feel free to use or modify these slides, but please credit me by placing the following aDribu5on informa5on where you feel that it makes sense: Greg Caporaso, www.caporaso.us.