Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond-Desktop Computing and Your Research

Research Data Services
September 29, 2016
470

Beyond-Desktop Computing and Your Research

Presentation given as part of the RDS Holz Brown Bag series, September 2016.

Research Data Services

September 29, 2016
Tweet

Transcript

  1. Beyond-Desktop Computing and Your Research Lauren Michael, Research Computing Facilitator

    Center for High Throughput Computing RDS Brown Bag Series, 7 Sep 2016
  2. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡  Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  3. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  4. Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared

    computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services
  5. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  6. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu % of Compute Hours Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  7. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  8. time n processors High throughput computing (HTC) (and for “big

    data”) Break up the work! Use more processors! “parallelize”
  9. time High performance computing (HPC) (good for single, long simulations)

    Break up the work! Use more processors! “parallelize” n processors
  10. Types (and Examples) of Large-Scale Computing ¡ High-throughput computing (HTC) ¡ 

    many independent tasks OR a large task that can be broken into independent, smaller tasks ¡  examples: parameter sweeps, image(s) analysis, text analysis, many simulations ¡ High-performance computing (HPC) ¡  simulations (or other computational calculations) with steps that can be split up into smaller sub-tasks ¡  requires software that manages the internal splitting ¡ High-memory computing ¡  un-splitable computation requiring extreme memory (100s GBs) ¡  examples: genome assembly, metagenomics, some math models
  11. When to move beyond the desktop? Project Planning Raw Data

    Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival In the research data lifecycle …
  12. When to move beyond the desktop? In the research data

    lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  13. When to move beyond the desktop? ¡ If you’re outgrowing your

    current resources In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  14. When to move beyond the desktop? ¡ If you’re outgrowing your

    current resources ¡ If there is a possibility that you’re not thinking as ‘big’ as you could be In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  15. Explaining Post-Katrina Home Rebuilding Economics professor, Jesse Gregory, performs HTC

    optimization of a model to predict the most important factors determining household rebuilding after Katrina. Most important rebuilding factors: -  relative funding available to household if rebuilt -  rebuild status of neighboring households http://www.opensciencegrid.org/using-high-throughput-computing-to- evaluate-post-katrina-rebuilding-grants/ Jesse’s projects in the last year: 4.5 million hours, 1.5 million OSG hours Fraction of Neighbors Rebuilt (Repair Cost / Replacement Cost) more funds qualified for less funds qualified for
  16. Inference  of  genome-­‐scale     transcrip2onal  regulatory  networks   Transcriptional

    regulatory networks specify which genes must be expressed, when, where and how much. Sushmita Roy’s group uses CHTC computing resources to reconstruct regulatory networks using methods based on statistical machine-learning methods. 2.6 million HTC hours in the last year Method Genome-scale network One Example: ~6,700 genes, ~2,500 regulators, ~350 arrays. 15 genes per job, 100 bootstraps, 3 scenarios, ~30 CPU min per job: 447*100*3*0.5hrs = ~7,00,000 hours = ~7.5 years
  17. Modeling Brain Connectivity Brain Regions and Connectivity EEG data collection

    Algorithm from professor Barry Van Veen predicts neural connectivity from EEG data. Used by numerous groups on- campus clinical projects examining: •  short-term memory •  imagination vs perception •  sleep versus waking states •  and more … Per subject, per condition, per time point: dozens X 20,000 Monte Carlo iterations J.Y. Chang, et al, Front. Hum. Neurosci., vol. 6, no. 317, November 2012. http://www.engr.wisc.edu/ece/faculty/vanveen_barry.html 15 million CPU hours in 2015, 6 million OSG hours Perception Imagination
  18. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  19. Resources for Large-Scale Computing ¡ Serving all of campus ¡  Center

    for High Throughput Computing (CHTC) ¡ Specialized Services ¡  In campus units (e.g. Biochemistry, WID, etc.) ¡  Serving broader communities (e.g. Social Sciences Computing Cooperative, Computer-Aided Engineering, etc.) ¡ Off-Campus ¡  XSEDE – NSF-funded HPC clusters (by proposal) ¡  National Center for Genome Analysis Support (NCGAS) ¡  many others contact [email protected] for more details
  20. Make it easy for researchers to find the right people.

    “Facilitators” -consultants/liaisons for research computing -identify with the researcher perspective
  21. Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared

    computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services
  22. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu Individual researchers: 30 years of computing per day Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  23. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  24. Make it easy for researchers to find the right people.

    “Facilitators” -consultants/liaisons for research computing -identify with the user perspective
  25. ¡ Initial Meetings with researchers new to CHTC ¡ chtc.cs.wisc.edu > “Get

    Started” ¡ One-on-One Training ¡ Office Hours ¡ Tue/Thur, 3-4:30pm, Wed, 9:30-11:30am ¡ Talks (like this one!) and Courses Facilitator Support at CHTC
  26. ¡ Data Carpentry ¡ from spreadsheets to data visualization and basic programming

    ¡ Software Carpentry ¡ best programming practices for reproducible and automated research ¡ Each offered 3 times per year at UW-Madison ¡ January (next), May/June, August ¡ Join the Advanced Computing Initiative (ACI) mailing list (aci.wisc.edu) to learn about future opportunities Data and Computing Workshops