Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tim Cornwell. CSIRO, Australia

Tim Cornwell. CSIRO, Australia

"Legacy Code: who needs it?"

Multicore World

July 18, 2012
Tweet

More Decks by Multicore World

Other Decks in Science

Transcript

  1. Legacy code - who needs it? Tim Cornwell, ASKAP Computing

    - Project Lead Ben Humphreys, ASKAP Computing - Project Engineer Wednesday, 18 July 12
  2. The Square Kilometre Array • 2020 era radio telescope •

    Very large collecting area (km2) • Very large field of view • Wide frequency range (70MHz - 25 GHz) • Large physical extent (3000+ km) • International project • Telescope sited in Australia or South Africa • Headquarters in UK • Multiple pathfinders and precursors now being built around the world Wednesday, 18 July 12
  3. IESP SKA October 2010 Site selection • Two possible sites

    • Australia + New Zealand • South Africa + eight other African countries • Site selection process under way • Report from Site Selection Committee sent to SKA Board in Feb 2012 • Feb 19: Board met with head of SSAC • Report + commentary sent to SKA members • Apr 3/4: Members and Board meet to start negotiation process Wednesday, 18 July 12
  4. Australian SKA Pathfinder = 1% SKA • Wide field of

    view telescope (30 sq degrees) • Sited at the Murchison Radio Observatory, Western Australia • Observes between 0.7 and 1.8 GHz • 36 antennas, 12m diameter • 188 element phased array feed on each antenna • Started construction July 2006 • 6 antenna prototype 2012 • Full system 2014 • Scientific capabilities • Survey HI emission from 1.7 million galaxies up z ~ 0.3 • Deep continuum survey of entire sky ~ 10uJy • Polarimetry over entire sky CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  5. Phased Array Feeds • Key technology development in ASKAP •

    Increases survey speed by ~ order of magnitude • Necessary for wide-field SKA • Highest area of risk • 188 Receiver Elements • Typical radio-telescope has two • Data-rate ~1.9Tbit/s from each antenna • Form partially overlapping beams on the sky by summing neighboring elements of array feed CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  6. Hotan, Chippendale, Reynolds, O’Sullivan, Hay et al, CSIRO Hay, IJMOT

    5,6,2010 & ICEAA 2010. Elevation scans Virgo-A across PAF elements, PTF to 64 metre, 64 MHz Wednesday, 18 July 12
  7. CALIM2011 Comparison of imaging speed of ATCA and ASKAP 231

    hours observing with ATCA 
 2 hours observing with ASKAP Survey entire sky to very sensitive HI limits in ~ 1 year Survey entire sky every day for transient sources in ~ 3 hours Wednesday, 18 July 12
  8. SKA Intel September 2011 SST2 (run9) • 30” 8 hour

    synthesis • SKADS model • Peak = 2.6Jy • Edge effects due to rolloff in sensitivity • Data set ~ 1.1 TB • ~ 1800 CPU-hours • ~ 190 GB memory Wednesday, 18 July 12
  9. 1.9Tb/s 0.6Tb/s 2.5GB/s Thirty six antennas Beamformers Correlator Central processor

    Operations data archive 27Tflop/s 340Tflop/s 0.5 - 1 Pflop/s 1.9Tb/s 1.9Tb/s 0.6Tb/s 0.6Tb/s 27Tflop/s 27Tflop/s ASKAP Science Data Archive Facility ASKAP Science products via VO protocols PAF filterbank samples Beamformed filterbank samples 18Tflop/s 18Tflop/s 18Tflop/s Filterbanks Astronomers Virtual Observatory 10GB/s Pawsey High Performance Centre for SKA MRO-Perth link Murchison Radioastronomical Observatory T. Cornwell, July 9 2010 IESP SKA October 2010 ASKAP data flow • From observing to archive with no human decision making • Calibrate automatically • Image automatically • Form science oriented catalogues automatically Wednesday, 18 July 12
  10. CSS March 2012 Pawsey High Performance Computing Centre for SKA

    Science, Perth, Western Australia • A$80M, funded by Australian Federal government • 8800 core machine now in operation • HP cluster in a box at Murdoch University: EPIC • ~ 88 on Top 500 • ASKAP used EPIC as early adopters • Now regular use - 8 Mhour mid 2012 • Petascale system by 2013 • 25% for radio astronomy Wednesday, 18 July 12
  11. CSS March 2012 Radio telescope imaging • Spatial coherence of

    electric field (visibility) is Fourier transform of sky brightness • Measures for many values of the Fourier components u,v • Invert Fourier relationship to get image of sky brightness • Typical problems • Incomplete u,v, sampling • Calibration • Wide field of view: no longer Fourier transform V A' B = E A' E B * t = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ A B A' v u w Wednesday, 18 July 12
  12. • Iterate to find model of sky that fits the

    data • Multiple transforms between data and image space CSS March 2012 Iterative imaging 752 A.L. Varbanescu et al. Fig. 2. A diagram of the typical deconvolution process in which a model is iterat vely refined by multiple passes. The shaded blocks (gridding and degridding) are bot Wednesday, 18 July 12
  13. CSS March 2012 Small field of view gridding • Use

    FFT to Fourier transform the data to image space • Put data onto a regular grid using an anti-aliasing filter July 25-29, 2011 July 25-29, 2011 2 CALIM '11 CALIM '11 Gridding vis conv grid (~100x100) (~4096x4096) Wednesday, 18 July 12
  14. Challenges for ASKAP Computing Flop/s vs Memory sub-system performance •

    Our imaging algorithms are more data intensive than computationally intensive. Typical operation: • Load spectral sample (α) • Load convolution (x) • Load grid point (y) • Compute y <- α * x + y • Store grid point (y) • The performance penalty for such an instruction mix is usually mitigated by CPU features: • Memory latency is hidden by pre-fetching and caching • Memory bandwidth demands are reduced by caching CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  15. Challenges for ASKAP Computing Flop/s vs Memory sub-system performance •

    Locality optimizations are hard because of… • large memory requirements of the images and convolution function + quasi-random access pattern • high input data rate and potential inability to buffer and reorder input data CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  16. • Visibility on plane AB is Fourier transform of sky

    • Antenna A’ receives radiation by Fresnel diffraction from AB plane • Visibility between A’ and B is Fresnel/Fourier transform • Not invertible • Use iterative algorithm • Predict forward with high accuracy • Reverse calculation is approximate • Apply prior information e.g. sparse CSS March 2012 Wide field imaging V A' B = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ V A' B = I(l,m)e −2π j ul+vm+w 1−l2 −m2 ( ) dldm 1− l2 − m2 ∫ Fresnel diffraction A B A' v u w Wednesday, 18 July 12
  17. W projection gridding July 25-29, 2011 July 25-29, 2011 15

    CALIM '11 CALIM '11 W-Projection Gridding vis conv grid o (u,v) not exact grid points o oversampling o choose most appropriate conv. matrix (int(u), int(v)) depends on frac(u), frac(v), w Wednesday, 18 July 12
  18. Data locality July 25-29, 2011 July 25-29, 2011 17 CALIM

    '11 CALIM '11 Placement Movement vis conv grid o per baseline: o (u,v,w) changes slowly o grid locality t f Wednesday, 18 July 12
  19. • Quadratic phase term added to Fourier Transform CSS March

    2012 Wide field imaging Convolution in data space Multiplication in image space V(u,v,w) = I(l,m)ej2πw 1−l2 − m2 −1 ( ) 1− l2 − m2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ∫ ej2π ul +vm ( ) dldm Slices in data space Wednesday, 18 July 12
  20. John Romein’s GPU-based gridder • CUDA rather than OpenCL •

    6 - 10 x faster than other gridders • 37 x faster than dual CPU • Based on clever distribution of threads July 25-29, 2011 21 CALIM '11 An Unintuitive Approach vis conv grid o 1 thread monitors all X o at any time: conv. matrix covers 1 X!!! Wednesday, 18 July 12
  21. CSS March 2012 Changes in imaging during scaling work •

    AWProject (2007) • W projection + A projection (for primary beam) • Too much CPU • Too much memory for convolution function • AProjectWStack (2008) • Apply W term in image space • Much less CPU • Too much memory for w stack • AWProject + trimmed convolution function (2009) • Only apply and keep non-zero part of convolution function • Still too much memory for convolution function • AWProject + trimmed convolution function + multiple snapshot planes (2011) • Fit and remove w=au+bv plane every 30 - 60 min • Small memory for convolution function • Serialise normal equations piece-by-piece for MPI (2011) • Cuts down short bump in memory use • No current algorithm will scale as-is to full-field longer baselines (ASKAP 6km) Convolution Convolution/Multiplication Convolution Convolution + slices Wednesday, 18 July 12
  22. Data Processing: Scaling ASKAP Processing to ~10,000 cores CSIRO |

    ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  23. CSS March 2012 Multi-scale Multi-frequency Synthesis • MFS necessary to

    correct for source changes over ASKAP bandwidth • MFClean in MIRIAD • Multi-scale multi-frequency synthesis algorithm • Urvashi Rau PhD (CASS/NMT/NRAO) • Rau and Cornwell, A&A • Parallel version in ASKAPSoft, currently memory hungry • Also testing Compressive Sampling algorithm Wednesday, 18 July 12
  24. SKA Intel September 2011 Problems for SKA scale development •

    Basic gridding operation is very low arithmetic intensity • Becomes less and less acceptable with time • ASKAPsoft currently multi-process per node, will have to convert to multi- thread per core • Just about ok as is for ASKAP 2km • Relative memory will shrink • Continual pressure to develop memory-lean algorithms • Algorithms will become ever more complex • Autotuning may be needed • Single threaded and/or object-oriented libraries (e.g. casacore) now barely acceptable will have to be rewritten • I/O models may need to be revisited • MPI Parallel/IO • Lustre, GPFS • HDF5? Wednesday, 18 July 12
  25. The cost of concurrency • Rejected first set of legacy

    code in 2006 • Had much of the required functionality • Poorly structured for HPC • Developed new code targeted for HPC • Expect to get to O(10000) cores via MPI (barely) • Developed multiple algorithms • Control space very complex • Need to wring out code, algorithm complexity soon • For SKA, need to go to 100 million concurrency • Code, algorithms all must change • Conclusion: If scaling far enough, everything must change Wednesday, 18 July 12
  26. Contact Us Phone: 1300 363 400 or +61 3 9545

    2176 Email: [email protected] Web: www.csiro.au Thank you CSIRO Astronomy and Space Science Tim Cornwell ASKAP Computing Project Engineer Phone: 02 9372 4261 Email: [email protected] Web: http://www.atnf.csiro.au/projects/askap CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12