Slide 1

Slide 1 text

2017 Summer Fellowship Progress Report Barry Grant http://thegrantlab.org/bggn213

Slide 2

Slide 2 text

One month of summer support was requested for the development of a new bioinformatics laboratory course for biologists.

Slide 3

Slide 3 text

This included the development of: 1. Specific learning outcomes for BGGN-213 and BIMM-143 in collaboration with invested faculty. 2. Open learning resources, including public websites, web-apps and video screencasts supporting these outcomes.

Slide 4

Slide 4 text

This included the development of: 1. Specific learning outcomes for BGGN-213 and BIMM-143 in collaboration with invested faculty. 2. Open learning resources, including public websites, web-apps and video screencasts supporting these outcomes.

Slide 5

Slide 5 text

Side Note: I joined UCSD on July 6th and I received a ‘loaner’ office chair and desk from BKM on Sept 13th (last Wednesday).

Slide 6

Slide 6 text

This included the development of: 1. Specific learning outcomes for BGGN-213 and BIMM-143 in collaboration with invested faculty. 2. Open learning resources, including public websites, web-apps and video screencasts supporting these outcomes.

Slide 7

Slide 7 text

Develop a set of 20 specific learning goals for BGGN-213, of which 16 will be shared with BIMM-143. • The major difference is that we will delve into more advanced UNIX and R programming supporting “big-data” analysis in BGGN-213. • Successfully applied for a computing grant from NSF/ XSEDE (eXtreme Science and Engineering Development Environment). This provides us with 17,800 service units (equivalent to $3,007) of cloud-based supercomputing resources for Fall 2017. • We plan to use AWS for BIMM-143, utilizing the Amazon/ UCSD agreed allocation of $50 per student.

Slide 8

Slide 8 text

http://thegrantlab.org/bggn213/

Slide 9

Slide 9 text

http://thegrantlab.org/bggn213/

Slide 10

Slide 10 text

http://thegrantlab.org/ucsd/ What essential concepts and skills should students attain from this course? To Update!

Slide 11

Slide 11 text

At the end of this course students will: • Understand the increasing necessity for computation in modern life sciences research. • Be able to use and evaluate online bioinformatics resources and analysis tools to solve problems in the biological sciences. • Be able to use the UNIX command line and the R environment to analyze bioinformatics data at scale. • Be familiar with the research objectives of the bioinformatics related sub-disciplines of Genome informatics, Transcriptomics and Structural informatics.

Slide 12

Slide 12 text

In short, students will develop a solid foundational knowledge of bioinformatics and be able to evaluate new biomolecular and genomic information using existing bioinformatic tools and resources.

Slide 13

Slide 13 text

Specific Learning Goals…. What I want students to know by course end!

Slide 14

Slide 14 text

Course Structure http://thegrantlab.org/ucsd/ Derived from specific learning goals To Update!

Slide 15

Slide 15 text

Course Structure http://thegrantlab.org/ucsd/ Derived from specific learning goals To Update!

Slide 16

Slide 16 text

Class Details Goals, Class material, Screencasts & Homework

Slide 17

Slide 17 text

Homework Goals, Class material, Screencasts & Homework

Slide 18

Slide 18 text

Homework Goals, Class material, Screencasts & Homework

Slide 19

Slide 19 text

Homework Goals, Class material, Screencasts & Homework

Slide 20

Slide 20 text

Homework Goals, Class material, Screencasts & Homework Homework is due before the next week’s class!

Slide 21

Slide 21 text

BGGN-213 Learning Goals…. Advanced UNIX and R based learning goals

Slide 22

Slide 22 text

BGGN-213 Learning Goals…. Delve deeper into “real-world” bioinformatics

Slide 23

Slide 23 text

These support a major learning objective At the end of this course students will: • Understand the increasing necessity for computation in modern life sciences research. • Be able to use and evaluate online bioinformatics resources and analysis tools to solve problems in the biological sciences. • Be able to use the UNIX command line and the R environment to analyze bioinformatics data at scale. • Be familiar with the research objectives of the bioinformatics related sub-disciplines of Genome informatics, Transcriptomics and Structural informatics.

Slide 24

Slide 24 text

How do we actually do Bioinformatics? Pre-packaged tools and databases • Many online • Most are free to use • Time consuming methods require downloading… Advanced tool application & development • Mostly on a UNIX environment • Knowledge of programing languages frequently required (e.g. R, Python, Perl, C, Java, Fortran) • May require specialized high performance computing…

Slide 25

Slide 25 text

How do we actually do Bioinformatics? Pre-packaged tools and databases • Many online • Most are free to use • Time consuming methods require downloading… Advanced tool application & development • Mostly on a UNIX environment • Knowledge of programing languages frequently required (e.g. R, Python, Perl, C, Java, Fortran) • May require specialized high performance computing… ?

Slide 26

Slide 26 text

NSF Extreme Science and Engineering Discovery Environment (XSEDE)

Slide 27

Slide 27 text

XSEDE Proposal for Jetstream Resources XSEDE Educational Allocation Resource Justification Title: Teaching Bioinformatics to Biologists at UC San Diego PI: Dr. Barry J Grant ([email protected]) University of California, San Diego 9/21/2017 Overview: XSEDE Jetstream resources are requested to support the teaching of a new bioinformatics graduate course for biologists at UC San Diego. This course, BGGN-213 ("Foundations of Bioinformatics") provides a hands-on introduction to the computer-based analysis of genomic and biomolecular data. Major topics include: Genome informatics, Structural informatics, Transcriptomics, UNIX for bioinformatics, and Bioinformatics data analysis with R. Full course details are available at: < https:// bioboot.github.io/bggn213_f17/ > Critical Need: Modern biomedical research is generating ever increasing quantities of complex biological data. As the rate of this data generation continues to outpace the rate at which biologists are able to analyze these data, there is a critical need for new bioinformatics training to help the next generation of biologists drive the collection and analysis of this “big data revolution” in the biosciences. Why XSEDE? The Division of Biological Sciences at UC San Diego has no suitable UNIX compute server to use for this course. Limiting students to their own laptops or departmental desktop windows machines will severely limit the scope and utility of this course. Access to XSEDE Jetstream resources will enable students to learn and gain proficiency in modern bioinformatics workflows and best practices for reproducible research on todays large genomic and biomolecular datasets. Resources Requested: Between 24 and 30 students will require an estimated maximum of 17,800 Service Units. Students will use these resources from week 5 of the course onward (10/12/17 to 12/12/17). A maximum of 32 Virtual Machines and associated public IP addresses (for SSH access) will be required along with 2TB of storage space total (students will download and store several eukaryotic genomes along with several small molecule and protein structure datasets). BIOGRAPHICAL SKETCH: BARRY J. GRANT A. Professional Preparation • Queen’s University of Belfast, UK Biochemistry B.Sc. (1999) • University of York, UK Bioinformatics M.Res. (2000) • University of York, UK Chemistry Ph.D. (2005) • University of California, San Diego Biophysics Postdoc (2005-2009) B. Appointments • Assistant Professor Division of Biological Sciences (2017-present) University of California, San Diego, CA. • Assistant Professor Department of Computational Medicine & Bioinformatics (2011-2017) University of Michigan, Ann Arbor, MI. • Bioinformatics Specialist (Senior Scientist) Howard Hughes Medical Institute (2009-2011) University of California, San Diego, CA. • Bioinformatics Scientist deCODE Genetics Inc., Reykjavik, Iceland. (2000) C. Publications Note. Complete bibliography and full-text options available from: http://thegrantlab.org/publications/ Publications closely related to project • Yao XQ, Skjaerven L, Grant BJ. Rapid characterization of allosteric networks with ensemble normal mode analysis. J Phys Chem B. 2016. DOI: 10.1021/acs.jpcb.6b019912016. • Yao XQ, Malik RU, Griggs NW, Skjaerven L, Traynor JR, Sivaramakrishnan S, Grant BJ. Dynamic coupling and allosteric networks in the alpha subunit of heterotrimeric G proteins. J Biol Chem. 2016;291(9):4742-53. PMCID: 4813496. • Scarabelli G, Soppina V, Yao XQ, Atherton J, Moores CA, Verhey KJ, Grant BJ. Mapping the processivity determinants of the kinesin-3 motor domain. Biophys J. 2015;109(8):1537-40. PMCID: 4624112. • Scarabelli G, Grant BJ. Mapping the structural and dynamical features of kinesin motor domains. PLoS Comput Biol. 2013;9(11):e1003329. PMCID: 3820509. • Grant BJ, Gheorghe DM, Zheng W, Alonso M, Huber G, Dlugosz M, McCammon JA, Cross RA. Electrostatically biased binding of kinesin to microtubules. PLoS Biol. 2011;9(11):e1001207. PMCID: 3226556. Other significant publications • Skjaerven L, Jariwala S, Yao XQ, Grant BJ. Online interactive analysis of protein structure ensembles with Bio3D-web. Bioinformatics. 2016; (in press). • Skjaerven L, Yao XQ, Scarabelli G, Grant BJ. Integrating protein structural dynamics and evolutionary analysis with Bio3D. BMC Bioinformatics. 2014;15:399. PMCID: 4279791. • Scarabelli G, Grant BJ. Kinesin-5 allosteric inhibitors uncouple the dynamics of nucleotide, microtubule, and neck-linker binding sites. Biophys J. 2014;107(9):2204-13. PMCID: 4223232. • Yao XQ, Grant BJ. Domain-opening and dynamic coupling in the alpha-subunit of heterotrimeric G proteins. Biophys J. 2013;105(2):L08-10. PMCID: 3714883. • Grant BJ, Rodrigues AP, ElSawy KM, McCammon JA, Caves LS. Bio3D: An R package for the comparative analysis of protein structures. Bioinformatics. 2006;22(21):2695-6. D. Selected Synergistic Activities • Excellence in Basic Science Teaching Award, Computational Medicine and Bioinformatics, University of Michigan (2013). Awarded 17,800 SUs (equivalent to $3,007)

Slide 28

Slide 28 text

What is Jetstream? • A new cloud computing environment based at Indiana University and the Texas Advanced Computing Center (TACC) providing on-demand access to interactive computing and data analysis resources.

Slide 29

Slide 29 text

Jetstream tutorials Developed user friendly labs for Jetstream basics

Slide 30

Slide 30 text

Jetstream tutorials Developed user friendly labs for Jetstream basics

Slide 31

Slide 31 text

Jetstream tutorials Developed user friendly labs for Jetstream basics

Slide 32

Slide 32 text

Basics File Control Viewing & Editing Files Misc. useful Power commands Process related ls mv less chmod grep top cd cp head echo find ps pwd mkdir tail wc sed kill man rm nano curl uniq Crl-c ssh | (pipe) touch source git Crl-z > (write to file) cat R bg < (read from file) tmux python fg

Slide 33

Slide 33 text

Jetstream tutorials R & RStudio running remotely on Jetstream :-)

Slide 34

Slide 34 text

This included the development of: 1. Specific learning outcomes for BGGN-213 and BIMM-143 in collaboration with invested faculty. 2. Open learning resources, including public websites, video screencasts and web-apps supporting these outcomes.

Slide 35

Slide 35 text

Pre-class Screencast Videos Addressing variability in student background knowledge • Laptop (with webcam and microphone), • Blue screen/Green screen, • ScreenFlow software, • Lots of patience.

Slide 36

Slide 36 text

Pre-class Screencast Videos Addressing variability in student background knowledge

Slide 37

Slide 37 text

Partnering with QUBES • Currently a group of 7 faculty from around the US interested in developing video tutorials for computational genomics education. • Basically a mentoring network started by Hong Qin (Associate Professor of Computer Science and Biology, University of Tennessee- Chattanooga), who has developed hundreds of YouTube educational videos. • Thus far we have had three virtual meetings and have one NSF grant proposal in embryonic form.

Slide 38

Slide 38 text

Prototype Web Apps

Slide 39

Slide 39 text

• Consulted with invested faculty to develop a set of 20 specific learning goals for BGGN-213 of which 16 will be shared and adapted for BIMM-143. • Obtained NSF/XSEDE (eXtreme Science and Engineering Development Environment) cloud-based computing grant to support BGGN-213. • Published http://thegrantlab.org/bggn213/ with all open online bioinformatics teaching materials and joined NIBLSE (Network for Integrating Bioinformatics into Life Science Education). • Developed an initial set of video screencasts for BGGN-213 and joined the QUBSE project for developing video tutorials for computational genomics education. • Developed local web server infrastructure and prototyped interactive web apps for teaching. Summary