Tim Cornwell. CSIRO, Australia

Tim Cornwell. CSIRO, Australia

"Legacy Code: who needs it?"

Bd1c4acb24d143c7ca8dff849461ebe3?s=128

Multicore World

July 18, 2012
Tweet

Transcript

  1. Legacy code - who needs it? Tim Cornwell, ASKAP Computing

    - Project Lead Ben Humphreys, ASKAP Computing - Project Engineer Wednesday, 18 July 12
  2. The Square Kilometre Array • 2020 era radio telescope •

    Very large collecting area (km2) • Very large field of view • Wide frequency range (70MHz - 25 GHz) • Large physical extent (3000+ km) • International project • Telescope sited in Australia or South Africa • Headquarters in UK • Multiple pathfinders and precursors now being built around the world Wednesday, 18 July 12
  3. IESP SKA October 2010 Site selection • Two possible sites

    • Australia + New Zealand • South Africa + eight other African countries • Site selection process under way • Report from Site Selection Committee sent to SKA Board in Feb 2012 • Feb 19: Board met with head of SSAC • Report + commentary sent to SKA members • Apr 3/4: Members and Board meet to start negotiation process Wednesday, 18 July 12
  4. Australian SKA Pathfinder = 1% SKA • Wide field of

    view telescope (30 sq degrees) • Sited at the Murchison Radio Observatory, Western Australia • Observes between 0.7 and 1.8 GHz • 36 antennas, 12m diameter • 188 element phased array feed on each antenna • Started construction July 2006 • 6 antenna prototype 2012 • Full system 2014 • Scientific capabilities • Survey HI emission from 1.7 million galaxies up z ~ 0.3 • Deep continuum survey of entire sky ~ 10uJy • Polarimetry over entire sky CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  5. CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday,

    18 July 12
  6. Phased Array Feeds • Key technology development in ASKAP •

    Increases survey speed by ~ order of magnitude • Necessary for wide-field SKA • Highest area of risk • 188 Receiver Elements • Typical radio-telescope has two • Data-rate ~1.9Tbit/s from each antenna • Form partially overlapping beams on the sky by summing neighboring elements of array feed CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  7. CSS March 2012 Wednesday, 18 July 12

  8. Hotan, Chippendale, Reynolds, O’Sullivan, Hay et al, CSIRO Hay, IJMOT

    5,6,2010 & ICEAA 2010. Elevation scans Virgo-A across PAF elements, PTF to 64 metre, 64 MHz Wednesday, 18 July 12
  9. CALIM2011 Comparison of imaging speed of ATCA and ASKAP 231

    hours observing with ATCA 
 2 hours observing with ASKAP Survey entire sky to very sensitive HI limits in ~ 1 year Survey entire sky every day for transient sources in ~ 3 hours Wednesday, 18 July 12
  10. SKA Intel September 2011 SST2 (run9) • 30” 8 hour

    synthesis • SKADS model • Peak = 2.6Jy • Edge effects due to rolloff in sensitivity • Data set ~ 1.1 TB • ~ 1800 CPU-hours • ~ 190 GB memory Wednesday, 18 July 12
  11. SKA Intel September 2011 SST2 (run9) zoomed Wednesday, 18 July

    12
  12. SKA Intel September 2011 SST2 (run9) zoomed Wednesday, 18 July

    12
  13. 1.9Tb/s 0.6Tb/s 2.5GB/s Thirty six antennas Beamformers Correlator Central processor

    Operations data archive 27Tflop/s 340Tflop/s 0.5 - 1 Pflop/s 1.9Tb/s 1.9Tb/s 0.6Tb/s 0.6Tb/s 27Tflop/s 27Tflop/s ASKAP Science Data Archive Facility ASKAP Science products via VO protocols PAF filterbank samples Beamformed filterbank samples 18Tflop/s 18Tflop/s 18Tflop/s Filterbanks Astronomers Virtual Observatory 10GB/s Pawsey High Performance Centre for SKA MRO-Perth link Murchison Radioastronomical Observatory T. Cornwell, July 9 2010 IESP SKA October 2010 ASKAP data flow • From observing to archive with no human decision making • Calibrate automatically • Image automatically • Form science oriented catalogues automatically Wednesday, 18 July 12
  14. CSS March 2012 Pawsey High Performance Computing Centre for SKA

    Science, Perth, Western Australia • A$80M, funded by Australian Federal government • 8800 core machine now in operation • HP cluster in a box at Murdoch University: EPIC • ~ 88 on Top 500 • ASKAP used EPIC as early adopters • Now regular use - 8 Mhour mid 2012 • Petascale system by 2013 • 25% for radio astronomy Wednesday, 18 July 12
  15. CSS March 2012 Radio telescope imaging • Spatial coherence of

    electric field (visibility) is Fourier transform of sky brightness • Measures for many values of the Fourier components u,v • Invert Fourier relationship to get image of sky brightness • Typical problems • Incomplete u,v, sampling • Calibration • Wide field of view: no longer Fourier transform V A' B = E A' E B * t = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ A B A' v u w Wednesday, 18 July 12
  16. • Iterate to find model of sky that fits the

    data • Multiple transforms between data and image space CSS March 2012 Iterative imaging 752 A.L. Varbanescu et al. Fig. 2. A diagram of the typical deconvolution process in which a model is iterat vely refined by multiple passes. The shaded blocks (gridding and degridding) are bot Wednesday, 18 July 12
  17. CSS March 2012 Small field of view gridding • Use

    FFT to Fourier transform the data to image space • Put data onto a regular grid using an anti-aliasing filter July 25-29, 2011 July 25-29, 2011 2 CALIM '11 CALIM '11 Gridding vis conv grid (~100x100) (~4096x4096) Wednesday, 18 July 12
  18. Challenges for ASKAP Computing Flop/s vs Memory sub-system performance •

    Our imaging algorithms are more data intensive than computationally intensive. Typical operation: • Load spectral sample (α) • Load convolution (x) • Load grid point (y) • Compute y <- α * x + y • Store grid point (y) • The performance penalty for such an instruction mix is usually mitigated by CPU features: • Memory latency is hidden by pre-fetching and caching • Memory bandwidth demands are reduced by caching CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  19. Challenges for ASKAP Computing Flop/s vs Memory sub-system performance •

    Locality optimizations are hard because of… • large memory requirements of the images and convolution function + quasi-random access pattern • high input data rate and potential inability to buffer and reorder input data CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  20. • Visibility on plane AB is Fourier transform of sky

    • Antenna A’ receives radiation by Fresnel diffraction from AB plane • Visibility between A’ and B is Fresnel/Fourier transform • Not invertible • Use iterative algorithm • Predict forward with high accuracy • Reverse calculation is approximate • Apply prior information e.g. sparse CSS March 2012 Wide field imaging V A' B = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ V A' B = I(l,m)e −2π j ul+vm+w 1−l2 −m2 ( ) dldm 1− l2 − m2 ∫ Fresnel diffraction A B A' v u w Wednesday, 18 July 12
  21. SKA Intel September 2011 Comparison of 2D FT, faceting, and

    w projection Wednesday, 18 July 12
  22. SKA Intel September 2011 Comparison of 2D FT, faceting, and

    w projection Wednesday, 18 July 12
  23. SKA Intel September 2011 Comparison of 2D FT, faceting, and

    w projection Wednesday, 18 July 12
  24. SKA Intel September 2011 Comparison of 2D FT, faceting, and

    w projection Wednesday, 18 July 12
  25. W projection gridding July 25-29, 2011 July 25-29, 2011 15

    CALIM '11 CALIM '11 W-Projection Gridding vis conv grid o (u,v) not exact grid points o oversampling o choose most appropriate conv. matrix (int(u), int(v)) depends on frac(u), frac(v), w Wednesday, 18 July 12
  26. Data locality July 25-29, 2011 July 25-29, 2011 17 CALIM

    '11 CALIM '11 Placement Movement vis conv grid o per baseline: o (u,v,w) changes slowly o grid locality t f Wednesday, 18 July 12
  27. • Quadratic phase term added to Fourier Transform CSS March

    2012 Wide field imaging Convolution in data space Multiplication in image space V(u,v,w) = I(l,m)ej2πw 1−l2 − m2 −1 ( ) 1− l2 − m2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ∫ ej2π ul +vm ( ) dldm Slices in data space Wednesday, 18 July 12
  28. John Romein’s GPU-based gridder • CUDA rather than OpenCL •

    6 - 10 x faster than other gridders • 37 x faster than dual CPU • Based on clever distribution of threads July 25-29, 2011 21 CALIM '11 An Unintuitive Approach vis conv grid o 1 thread monitors all X o at any time: conv. matrix covers 1 X!!! Wednesday, 18 July 12
  29. CSS March 2012 Changes in imaging during scaling work •

    AWProject (2007) • W projection + A projection (for primary beam) • Too much CPU • Too much memory for convolution function • AProjectWStack (2008) • Apply W term in image space • Much less CPU • Too much memory for w stack • AWProject + trimmed convolution function (2009) • Only apply and keep non-zero part of convolution function • Still too much memory for convolution function • AWProject + trimmed convolution function + multiple snapshot planes (2011) • Fit and remove w=au+bv plane every 30 - 60 min • Small memory for convolution function • Serialise normal equations piece-by-piece for MPI (2011) • Cuts down short bump in memory use • No current algorithm will scale as-is to full-field longer baselines (ASKAP 6km) Convolution Convolution/Multiplication Convolution Convolution + slices Wednesday, 18 July 12
  30. Data Processing: Scaling ASKAP Processing to ~10,000 cores CSIRO |

    ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
  31. CSS March 2012 Multi-scale Multi-frequency Synthesis • MFS necessary to

    correct for source changes over ASKAP bandwidth • MFClean in MIRIAD • Multi-scale multi-frequency synthesis algorithm • Urvashi Rau PhD (CASS/NMT/NRAO) • Rau and Cornwell, A&A • Parallel version in ASKAPSoft, currently memory hungry • Also testing Compressive Sampling algorithm Wednesday, 18 July 12
  32. SKA Intel September 2011 Problems for SKA scale development •

    Basic gridding operation is very low arithmetic intensity • Becomes less and less acceptable with time • ASKAPsoft currently multi-process per node, will have to convert to multi- thread per core • Just about ok as is for ASKAP 2km • Relative memory will shrink • Continual pressure to develop memory-lean algorithms • Algorithms will become ever more complex • Autotuning may be needed • Single threaded and/or object-oriented libraries (e.g. casacore) now barely acceptable will have to be rewritten • I/O models may need to be revisited • MPI Parallel/IO • Lustre, GPFS • HDF5? Wednesday, 18 July 12
  33. The cost of concurrency • Rejected first set of legacy

    code in 2006 • Had much of the required functionality • Poorly structured for HPC • Developed new code targeted for HPC • Expect to get to O(10000) cores via MPI (barely) • Developed multiple algorithms • Control space very complex • Need to wring out code, algorithm complexity soon • For SKA, need to go to 100 million concurrency • Code, algorithms all must change • Conclusion: If scaling far enough, everything must change Wednesday, 18 July 12
  34. Contact Us Phone: 1300 363 400 or +61 3 9545

    2176 Email: enquiries@csiro.au Web: www.csiro.au Thank you CSIRO Astronomy and Space Science Tim Cornwell ASKAP Computing Project Engineer Phone: 02 9372 4261 Email: tim.cornwell@csiro.au Web: http://www.atnf.csiro.au/projects/askap CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12