Pro Yearly is on sale from $80 to $50! »

Beyond-Desktop Computing and Your Research

60d0e0af6e89ae0f6114f89cb72b21d3?s=47 Research Data Services
September 29, 2016
360

Beyond-Desktop Computing and Your Research

Presentation given as part of the RDS Holz Brown Bag series, September 2016.

60d0e0af6e89ae0f6114f89cb72b21d3?s=128

Research Data Services

September 29, 2016
Tweet

Transcript

  1. Beyond-Desktop Computing and Your Research Lauren Michael | Center for

    High Throughput Computing | UW-Madison
  2. Beyond-Desktop Computing and Your Research Lauren Michael, Research Computing Facilitator

    Center for High Throughput Computing RDS Brown Bag Series, 7 Sep 2016
  3. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡  Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  4. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  5. Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared

    computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services
  6. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  7. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu % of Compute Hours Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  8. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  9. What is large-scale computing?

  10. What is large-scale computing? larger than ‘desktop’

  11. What is large-scale computing? larger than ‘desktop’ (in memory, data,

    processors)
  12. time What a lot of computing looks like:

  13. time What a lot of computing looks like: running on

    1 computer (1 processor)
  14. time So how do you speed things up?

  15. time Break up the work! Use more processors! “parallelize” n

    processors
  16. time n processors High throughput computing (HTC) (and for “big

    data”) Break up the work! Use more processors! “parallelize”
  17. time High performance computing (HPC) (good for single, long simulations)

    Break up the work! Use more processors! “parallelize” n processors
  18. Types (and Examples) of Large-Scale Computing ¡ High-throughput computing (HTC) ¡ 

    many independent tasks OR a large task that can be broken into independent, smaller tasks ¡  examples: parameter sweeps, image(s) analysis, text analysis, many simulations ¡ High-performance computing (HPC) ¡  simulations (or other computational calculations) with steps that can be split up into smaller sub-tasks ¡  requires software that manages the internal splitting ¡ High-memory computing ¡  un-splitable computation requiring extreme memory (100s GBs) ¡  examples: genome assembly, metagenomics, some math models
  19. When to move beyond the desktop? Project Planning Raw Data

    Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival In the research data lifecycle …
  20. When to move beyond the desktop? In the research data

    lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  21. When to move beyond the desktop? ¡ If you’re outgrowing your

    current resources In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  22. When to move beyond the desktop? ¡ If you’re outgrowing your

    current resources ¡ If there is a possibility that you’re not thinking as ‘big’ as you could be In the research data lifecycle … Project Planning Raw Data Acquisition Pre- processing Data Analysis & Inference Visualizatio n Publishing & Archival
  23. Explaining Post-Katrina Home Rebuilding Economics professor, Jesse Gregory, performs HTC

    optimization of a model to predict the most important factors determining household rebuilding after Katrina. Most important rebuilding factors: -  relative funding available to household if rebuilt -  rebuild status of neighboring households http://www.opensciencegrid.org/using-high-throughput-computing-to- evaluate-post-katrina-rebuilding-grants/ Jesse’s projects in the last year: 4.5 million hours, 1.5 million OSG hours Fraction of Neighbors Rebuilt (Repair Cost / Replacement Cost) more funds qualified for less funds qualified for
  24. Inference  of  genome-­‐scale     transcrip2onal  regulatory  networks   Transcriptional

    regulatory networks specify which genes must be expressed, when, where and how much. Sushmita Roy’s group uses CHTC computing resources to reconstruct regulatory networks using methods based on statistical machine-learning methods. 2.6 million HTC hours in the last year Method Genome-scale network One Example: ~6,700 genes, ~2,500 regulators, ~350 arrays. 15 genes per job, 100 bootstraps, 3 scenarios, ~30 CPU min per job: 447*100*3*0.5hrs = ~7,00,000 hours = ~7.5 years
  25. Modeling Brain Connectivity Brain Regions and Connectivity EEG data collection

    Algorithm from professor Barry Van Veen predicts neural connectivity from EEG data. Used by numerous groups on- campus clinical projects examining: •  short-term memory •  imagination vs perception •  sleep versus waking states •  and more … Per subject, per condition, per time point: dozens X 20,000 Monte Carlo iterations J.Y. Chang, et al, Front. Hum. Neurosci., vol. 6, no. 317, November 2012. http://www.engr.wisc.edu/ece/faculty/vanveen_barry.html 15 million CPU hours in 2015, 6 million OSG hours Perception Imagination
  26. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  27. Resources for Large-Scale Computing ¡ Serving all of campus ¡  Center

    for High Throughput Computing (CHTC) ¡ Specialized Services ¡  In campus units (e.g. Biochemistry, WID, etc.) ¡  Serving broader communities (e.g. Social Sciences Computing Cooperative, Computer-Aided Engineering, etc.) ¡ Off-Campus ¡  XSEDE – NSF-funded HPC clusters (by proposal) ¡  National Center for Genome Analysis Support (NCGAS) ¡  many others contact chtc@cs.wisc.edu for more details
  28. chtc.cs.wisc.edu Get Access to CHTC

  29. Make it easy for researchers to find the right people.

    “Facilitators” -consultants/liaisons for research computing -identify with the researcher perspective
  30. Center for High Throughput Computing, est. 2006 ›  Large-scale, campus-shared

    computing systems h high-throughput computing (HTC), high-performance computing (HPC), and high-memory systems h all standard services provided free-of-charge h hardware buy-in options h support and training for using our systems h proposal assistance h chtc.cs.wisc.edu CHTC Services
  31. CHTC xecute servers (16,000 CPU cores) CHTC-Accessible HTC Computing: S

    E
  32. CHTC-Accessible Computing: S E input files program submit files

  33. CHTC-Accessible Computing: S E HTCondor input files program submit files

  34. CHTC-Accessible Computing: S E HTCondor input files program submit files

    output
  35. CHTC S CHTC-Accessible HTC Computing:

  36. UW Grid CHTC S CHTC-Accessible HTC Computing:

  37. Open Science Grid UW Grid CHTC S CHTC-Accessible HTC Computing:

  38. Researchers who use the CHTC are located all over campus

    (red buildings) http://chtc.cs.wisc.edu Individual researchers: 30 years of computing per day Jul’13- Jun’14 Jul’14- Jun’15 Jul’15- Jun’16 Quick Facts 132 265 325 Million Hours Served 148 188 204 Research Projects 56 61 62 Departments
  39. Overview ¡ CHTC: UW-Madison’s Research Computing Center ¡ Why large-scale computing? ¡ Types

    of large-scale computing ¡ Research impacts ¡ Resources for large-scale computing ¡ CHTC’s Research Computing Facilitators
  40. Make it easy for researchers to find the right people.

    “Facilitators” -consultants/liaisons for research computing -identify with the user perspective
  41. Research Computing Facilitators Scholarship Experience Communication and Leadership Skills Technical

    Skills
  42. ¡ Initial Meetings with researchers new to CHTC ¡ chtc.cs.wisc.edu > “Get

    Started” ¡ One-on-One Training ¡ Office Hours ¡ Tue/Thur, 3-4:30pm, Wed, 9:30-11:30am ¡ Talks (like this one!) and Courses Facilitator Support at CHTC
  43. ¡ Data Carpentry ¡ from spreadsheets to data visualization and basic programming

    ¡ Software Carpentry ¡ best programming practices for reproducible and automated research ¡ Each offered 3 times per year at UW-Madison ¡ January (next), May/June, August ¡ Join the Advanced Computing Initiative (ACI) mailing list (aci.wisc.edu) to learn about future opportunities Data and Computing Workshops
  44. Facilitator Impact Compute Hours Delivered by CHTC Facilitators hired: Jan

    2013, Nov 2014
  45. CONTACT US Go to: chtc.cs.wisc.edu “How To” > “Get Started”

    chtc@cs.wisc.edu time n processors