Very large collecting area (km2) • Very large field of view • Wide frequency range (70MHz - 25 GHz) • Large physical extent (3000+ km) • International project • Telescope sited in Australia or South Africa • Headquarters in UK • Multiple pathfinders and precursors now being built around the world Wednesday, 18 July 12
• Australia + New Zealand • South Africa + eight other African countries • Site selection process under way • Report from Site Selection Committee sent to SKA Board in Feb 2012 • Feb 19: Board met with head of SSAC • Report + commentary sent to SKA members • Apr 3/4: Members and Board meet to start negotiation process Wednesday, 18 July 12
view telescope (30 sq degrees) • Sited at the Murchison Radio Observatory, Western Australia • Observes between 0.7 and 1.8 GHz • 36 antennas, 12m diameter • 188 element phased array feed on each antenna • Started construction July 2006 • 6 antenna prototype 2012 • Full system 2014 • Scientific capabilities • Survey HI emission from 1.7 million galaxies up z ~ 0.3 • Deep continuum survey of entire sky ~ 10uJy • Polarimetry over entire sky CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
Increases survey speed by ~ order of magnitude • Necessary for wide-field SKA • Highest area of risk • 188 Receiver Elements • Typical radio-telescope has two • Data-rate ~1.9Tbit/s from each antenna • Form partially overlapping beams on the sky by summing neighboring elements of array feed CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
hours observing with ATCA 2 hours observing with ASKAP Survey entire sky to very sensitive HI limits in ~ 1 year Survey entire sky every day for transient sources in ~ 3 hours Wednesday, 18 July 12
synthesis • SKADS model • Peak = 2.6Jy • Edge effects due to rolloff in sensitivity • Data set ~ 1.1 TB • ~ 1800 CPU-hours • ~ 190 GB memory Wednesday, 18 July 12
Operations data archive 27Tflop/s 340Tflop/s 0.5 - 1 Pflop/s 1.9Tb/s 1.9Tb/s 0.6Tb/s 0.6Tb/s 27Tflop/s 27Tflop/s ASKAP Science Data Archive Facility ASKAP Science products via VO protocols PAF filterbank samples Beamformed filterbank samples 18Tflop/s 18Tflop/s 18Tflop/s Filterbanks Astronomers Virtual Observatory 10GB/s Pawsey High Performance Centre for SKA MRO-Perth link Murchison Radioastronomical Observatory T. Cornwell, July 9 2010 IESP SKA October 2010 ASKAP data flow • From observing to archive with no human decision making • Calibrate automatically • Image automatically • Form science oriented catalogues automatically Wednesday, 18 July 12
Science, Perth, Western Australia • A$80M, funded by Australian Federal government • 8800 core machine now in operation • HP cluster in a box at Murdoch University: EPIC • ~ 88 on Top 500 • ASKAP used EPIC as early adopters • Now regular use - 8 Mhour mid 2012 • Petascale system by 2013 • 25% for radio astronomy Wednesday, 18 July 12
electric field (visibility) is Fourier transform of sky brightness • Measures for many values of the Fourier components u,v • Invert Fourier relationship to get image of sky brightness • Typical problems • Incomplete u,v, sampling • Calibration • Wide field of view: no longer Fourier transform V A' B = E A' E B * t = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ A B A' v u w Wednesday, 18 July 12
data • Multiple transforms between data and image space CSS March 2012 Iterative imaging 752 A.L. Varbanescu et al. Fig. 2. A diagram of the typical deconvolution process in which a model is iterat vely refined by multiple passes. The shaded blocks (gridding and degridding) are bot Wednesday, 18 July 12
FFT to Fourier transform the data to image space • Put data onto a regular grid using an anti-aliasing filter July 25-29, 2011 July 25-29, 2011 2 CALIM '11 CALIM '11 Gridding vis conv grid (~100x100) (~4096x4096) Wednesday, 18 July 12
Our imaging algorithms are more data intensive than computationally intensive. Typical operation: • Load spectral sample (α) • Load convolution (x) • Load grid point (y) • Compute y <- α * x + y • Store grid point (y) • The performance penalty for such an instruction mix is usually mitigated by CPU features: • Memory latency is hidden by pre-fetching and caching • Memory bandwidth demands are reduced by caching CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
Locality optimizations are hard because of… • large memory requirements of the images and convolution function + quasi-random access pattern • high input data rate and potential inability to buffer and reorder input data CSIRO | ASKAP and Pawsey Centre | TAP Presentation Wednesday, 18 July 12
• Antenna A’ receives radiation by Fresnel diffraction from AB plane • Visibility between A’ and B is Fresnel/Fourier transform • Not invertible • Use iterative algorithm • Predict forward with high accuracy • Reverse calculation is approximate • Apply prior information e.g. sparse CSS March 2012 Wide field imaging V A' B = e−2π jw I(l,m)e−2π j ul+vm ( ) dldm ∫ V A' B = I(l,m)e −2π j ul+vm+w 1−l2 −m2 ( ) dldm 1− l2 − m2 ∫ Fresnel diffraction A B A' v u w Wednesday, 18 July 12
CALIM '11 CALIM '11 W-Projection Gridding vis conv grid o (u,v) not exact grid points o oversampling o choose most appropriate conv. matrix (int(u), int(v)) depends on frac(u), frac(v), w Wednesday, 18 July 12
2012 Wide field imaging Convolution in data space Multiplication in image space V(u,v,w) = I(l,m)ej2πw 1−l2 − m2 −1 ( ) 1− l2 − m2 ⎡ ⎣ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ∫ ej2π ul +vm ( ) dldm Slices in data space Wednesday, 18 July 12
6 - 10 x faster than other gridders • 37 x faster than dual CPU • Based on clever distribution of threads July 25-29, 2011 21 CALIM '11 An Unintuitive Approach vis conv grid o 1 thread monitors all X o at any time: conv. matrix covers 1 X!!! Wednesday, 18 July 12
AWProject (2007) • W projection + A projection (for primary beam) • Too much CPU • Too much memory for convolution function • AProjectWStack (2008) • Apply W term in image space • Much less CPU • Too much memory for w stack • AWProject + trimmed convolution function (2009) • Only apply and keep non-zero part of convolution function • Still too much memory for convolution function • AWProject + trimmed convolution function + multiple snapshot planes (2011) • Fit and remove w=au+bv plane every 30 - 60 min • Small memory for convolution function • Serialise normal equations piece-by-piece for MPI (2011) • Cuts down short bump in memory use • No current algorithm will scale as-is to full-field longer baselines (ASKAP 6km) Convolution Convolution/Multiplication Convolution Convolution + slices Wednesday, 18 July 12
correct for source changes over ASKAP bandwidth • MFClean in MIRIAD • Multi-scale multi-frequency synthesis algorithm • Urvashi Rau PhD (CASS/NMT/NRAO) • Rau and Cornwell, A&A • Parallel version in ASKAPSoft, currently memory hungry • Also testing Compressive Sampling algorithm Wednesday, 18 July 12
Basic gridding operation is very low arithmetic intensity • Becomes less and less acceptable with time • ASKAPsoft currently multi-process per node, will have to convert to multi- thread per core • Just about ok as is for ASKAP 2km • Relative memory will shrink • Continual pressure to develop memory-lean algorithms • Algorithms will become ever more complex • Autotuning may be needed • Single threaded and/or object-oriented libraries (e.g. casacore) now barely acceptable will have to be rewritten • I/O models may need to be revisited • MPI Parallel/IO • Lustre, GPFS • HDF5? Wednesday, 18 July 12
code in 2006 • Had much of the required functionality • Poorly structured for HPC • Developed new code targeted for HPC • Expect to get to O(10000) cores via MPI (barely) • Developed multiple algorithms • Control space very complex • Need to wring out code, algorithm complexity soon • For SKA, need to go to 100 million concurrency • Code, algorithms all must change • Conclusion: If scaling far enough, everything must change Wednesday, 18 July 12