Slide 1

Slide 1 text

Beyond-Desktop Computing and Your Research Lauren Michael | Center for High Throughput Computing | UW-Madison

Slide 2

Slide 2 text

Beyond-Desktop Computing and Your Research Lauren Michael, Research Computing Facilitator Center for High Throughput Computing RDS Brown Bag Series, 7 Sep 2016

Slide 3

Slide 3 text

Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types of large-scale computing ¡  Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators

Slide 4

Slide 4 text

Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators

Slide 5

Slide 5 text

Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services

Slide 6

Slide 6 text

Researchers who use the CHTC are located all over campus (red buildings) http://chtc.cs.wisc.edu Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments

Slide 7

Slide 7 text

Researchers who use the CHTC are located all over campus (red buildings) http://chtc.cs.wisc.edu % of Compute Hours Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments

Slide 8

Slide 8 text

Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators

Slide 9

Slide 9 text

What is large-scale computing?

Slide 10

Slide 10 text

What is large-scale computing? larger than ‘desktop’

Slide 11

Slide 11 text

What is large-scale computing? larger than ‘desktop’ (in memory, data, processors)

Slide 12

Slide 12 text

time What a lot of computing looks like:

Slide 13

Slide 13 text

time What a lot of computing looks like: running on 1 computer (1 processor)

Slide 14

Slide 14 text

time So how do you speed things up?

Slide 15

Slide 15 text

time Break up the work! Use more processors! “parallelize” n processors

Slide 16

Slide 16 text

time n processors High throughput computing (HTC) (and for “big data”) Break up the work! Use more processors! “parallelize”

Slide 17

Slide 17 text

time High performance computing (HPC) (good for single, long simulations) Break up the work! Use more processors! “parallelize” n processors

Slide 18

Slide 18 text

Types (and Examples) of Large-Scale Computing ¡ High-throughput computing (HTC) ¡  many independent tasks OR a large task that can be broken into independent, smaller tasks ¡  examples: parameter sweeps, image(s) analysis, text analysis, many simulations ¡ High-performance computing (HPC) ¡  simulations (or other computational calculations) with steps that can be split up into smaller sub-tasks ¡  requires software that manages the internal splitting ¡ High-memory computing ¡  un-splitable computation requiring extreme memory (100s GBs) ¡  examples: genome assembly, metagenomics, some math models

Slide 19

Slide 19 text

When to move beyond the desktop? Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival In the research data lifecycle …

Slide 20

Slide 20 text

When to move beyond the desktop? In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival

Slide 21

Slide 21 text

When to move beyond the desktop? ¡ If you’re outgrowing your current resources In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival

Slide 22

Slide 22 text

When to move beyond the desktop? ¡ If you’re outgrowing your current resources ¡ If there is a possibility that you’re not thinking as ‘big’ as you could be In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival

Slide 23

Slide 23 text

Explaining Post-Katrina Home Rebuilding Economics professor, Jesse Gregory, performs HTC optimization of a model to predict the most important factors determining household rebuilding after Katrina. Most important rebuilding factors: -  relative funding available to household if rebuilt -  rebuild status of neighboring households http://www.opensciencegrid.org/using-high-throughput-computing-to- evaluate-post-katrina-rebuilding-grants/ Jesse’s projects in the last year: 4.5 million hours, 1.5 million OSG hours Fraction of Neighbors Rebuilt (Repair Cost / Replacement Cost) more funds qualified for less funds qualified for

Slide 24

Slide 24 text

Inference  of  genome-­‐scale     transcrip2onal  regulatory  networks   Transcriptional regulatory networks specify which genes must be expressed, when, where and how much. Sushmita Roy’s group uses CHTC computing resources to reconstruct regulatory networks using methods based on statistical machine-learning methods. 2.6 million HTC hours in the last year Method Genome-scale network One Example: ~6,700 genes, ~2,500 regulators, ~350 arrays. 15 genes per job, 100 bootstraps, 3 scenarios, ~30 CPU min per job: 447*100*3*0.5hrs = ~7,00,000 hours = ~7.5 years

Slide 25

Slide 25 text

Modeling Brain Connectivity Brain Regions and Connectivity EEG data collection Algorithm from professor Barry Van Veen predicts neural connectivity from EEG data. Used by numerous groups on- campus clinical projects examining: •  short-term memory •  imagination vs perception •  sleep versus waking states •  and more … Per subject, per condition, per time point: dozens X 20,000 Monte Carlo iterations J.Y. Chang, et al, Front. Hum. Neurosci., vol. 6, no. 317, November 2012. http://www.engr.wisc.edu/ece/faculty/vanveen_barry.html 15 million CPU hours in 2015, 6 million OSG hours Perception Imagination

Slide 26

Slide 26 text

Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators

Slide 27

Slide 27 text

Resources for Large-Scale Computing ¡ Serving all of campus ¡  Center for High Throughput Computing (CHTC) ¡ Specialized Services ¡  In campus units (e.g. Biochemistry, WID, etc.) ¡  Serving broader communities (e.g. Social Sciences Computing Cooperative, Computer-Aided Engineering, etc.) ¡ Off-Campus ¡  XSEDE – NSF-funded HPC clusters (by proposal) ¡  National Center for Genome Analysis Support (NCGAS) ¡  many others contact chtc@cs.wisc.edu for more details

Slide 28

Slide 28 text

chtc.cs.wisc.edu Get Access to CHTC

Slide 29

Slide 29 text

Make it easy for researchers to find the right people. “Facilitators” -consultants/liaisons for research computing -identify with the researcher perspective

Slide 30

Slide 30 text

Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services

Slide 31

Slide 31 text

CHTC xecute servers (16,000 CPU cores) CHTC-Accessible HTC Computing: S E

Slide 32

Slide 32 text

CHTC-Accessible Computing: S E input files program submit files

Slide 33

Slide 33 text

CHTC-Accessible Computing: S E HTCondor input files program submit files

Slide 34

Slide 34 text

CHTC-Accessible Computing: S E HTCondor input files program submit files output

Slide 35

Slide 35 text

CHTC S CHTC-Accessible HTC Computing:

Slide 36

Slide 36 text

UW Grid CHTC S CHTC-Accessible HTC Computing:

Slide 37

Slide 37 text

Open Science Grid UW Grid CHTC S CHTC-Accessible HTC Computing:

Slide 38

Slide 38 text

Researchers who use the CHTC are located all over campus (red buildings) http://chtc.cs.wisc.edu Individual researchers: 30 years of computing per day Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments

Slide 39

Slide 39 text

Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators

Slide 40

Slide 40 text

Make it easy for researchers to find the right people. “Facilitators” -consultants/liaisons for research computing -identify with the user perspective

Slide 41

Slide 41 text

Research Computing Facilitators Scholarship Experience Communication and Leadership Skills Technical Skills

Slide 42

Slide 42 text

¡ Initial Meetings with researchers new to CHTC ¡ chtc.cs.wisc.edu > “Get Started” ¡ One-on-One Training ¡ Office Hours ¡ Tue/Thur, 3-4:30pm, Wed, 9:30-11:30am ¡ Talks (like this one!) and Courses Facilitator Support at CHTC

Slide 43

Slide 43 text

¡ Data Carpentry ¡ from spreadsheets to data visualization and basic programming ¡ Software Carpentry ¡ best programming practices for reproducible and automated research ¡ Each offered 3 times per year at UW-Madison ¡ January (next), May/June, August ¡ Join the Advanced Computing Initiative (ACI) mailing list (aci.wisc.edu) to learn about future opportunities Data and Computing Workshops

Slide 44

Slide 44 text

Facilitator Impact Compute Hours Delivered by CHTC Facilitators hired: Jan 2013, Nov 2014

Slide 45

Slide 45 text

CONTACT US Go to: chtc.cs.wisc.edu “How To” > “Get Started” chtc@cs.wisc.edu time n processors