storage, transfer and visualization General introduction Petascale Astronomy Era 2 • Big Data : broad expression used in different contexts 1.Lots of small to medium sized files (few 100 MB, 1000s stored) • e.g. Many sky surveys (SDSS, SAMI) 2.Few very large sized files • e.g. Millenium simulation (64 files, each ~300 GB) 3.Lots of very large sized files • LOFAR, MeerKAT, ASKAP, SKA
storage, transfer and visualization General introduction Petascale Astronomy Era 3 • Data growth reached epidemic proportion • New challenges arise: • Visualisation and analysis on large data samples • Transfer over a network with limited bandwidth • Store into data storage facilities
Astronomy in the Petascale Data Era: storage, transfer and visualization 6 • Probability of occurrence is linked to amount of information • Given a domain V containing n values • If a value v ∈ V occurs f(v) times • The amount of information it carries is given by
Astronomy in the Petascale Data Era: storage, transfer and visualization • The more frequent a word occurs, the less information it contains (and vice versa) • Averaging on all information in a source V of n words is Shannon’s entropy (closely linked to Boltzmann's second law of thermodynamic) • Minimum size needed to represent a source perfectly 7 Source: supplychain247.com
Astronomy in the Petascale Data Era: storage, transfer and visualization • Which data format/model best for Big Data (FITS, HDF5, ASDF, JPEG2000)? 10 e.g. Kitaeff et al., 2012; Kitaeff et al., 2014; Natusch, 2014; Mink et al., 2014; Peters & Kitaeff, 2014; Price et al., 2014; Vohl et al 2015.
Astronomy in the Petascale Data Era: storage, transfer and visualization • Which data format/model best for Big Data (FITS, HDF5, ASDF, JPEG2000)? 11 e.g. Kitaeff et al., 2012; Kitaeff et al., 2014; Natusch, 2014; Mink et al., 2014; Peters & Kitaeff, 2014; Price et al., 2014; Vohl et al 2015.
Astronomy in the Petascale Data Era: storage, transfer and visualization • JPEG2000 in astronomy? • ISO/IEC 15444 standard • Based on the Discrete Wavelet Transform • Well suited for quantization and compression • progressive transmission, decode part of data, ROI 12
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Observational data (Peters & Kitaeff, 2014) • Can deliver high compression ratio without corrupting data • e.g. At ~90:1 • low integrated flux sources identified with Duchamp (less than 800 mJy km/s) • lossy compression can denoise 13 Image: Duchamp results rendered with GraphTIVA (Hassan, Fluke & Barnes, 2010)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Observational data • noisy by nature (i.e. instrumental noise) • Theoretical data? • Numerical simulation data, lossy compression will tend to introduce additional noise • True? 14
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 15 GPU-Enabled, High-Resolution, cosmological MicroLensing parameter survey • GERLUMPH • Currently ≈ 70,000 maps http://gerlum ph.swin.edu.au Convergence External shear Parameter space coverage (G. Vernardos, C. J. Fluke, N. Bate, D. Crotton, and myself)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 17 Original map Map convolved w/ small quasar profile Map convolved w/ medium quasar profile Map convolved w/ large quasar profile Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Near-lossless compression • Set a threshold to difference between original and uncompressed • RMSE <= 0.01 19
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 20 Convolved maps At best ~325,812:1 ! Original maps At best 10:1 Original maps Convolved maps (small) Convolved maps (medium) Convolved maps (large) Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 21 Convolved map (medium) Compressed map (17,271:1) Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 22 Compressed map (17,271:1) Convolved map (medium) Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 23 Compressed map (17,271:1) Convolved map (medium) Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 24 Original map Compressed map (17,271:1) Suggests high level of compression is possible for convolved magnification maps with minimal or no impact on their future scientific use Vohl, Fluke & Vernardos (2015)
Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 25 Original map Compressed map (17,271:1) Could shrink 2.5 petabyte to less than 1 terabyte Vohl, Fluke & Vernardos (2015)
| Astronomy in the Petascale Data Era: storage, transfer and visualization • Current and upcoming large-scale extragalactic surveys • large # of spectral-cubes, each with large # of sources • A lot of the analysis will be done with automated pipelines • Human still big part of discovery process (!) • Visualization 27
| Astronomy in the Petascale Data Era: storage, transfer and visualization • Classical desktop-based visualization methodology inefficient for large-scale spectral-cube surveys visualization • e.g. APERTIF radio survey of the northern sky (Röttgering et al. 2011) • 20,000 spectral-cubes, each containing ~100 sources 28
| Astronomy in the Petascale Data Era: storage, transfer and visualization • Next generation immersive 3D environments • Designed for large amounts of data in a collaborative setup • e.g. Monash CAVE2 29
| Astronomy in the Petascale Data Era: storage, transfer and visualization • Not currently any out-of-the-box solution for spectral-cubes data set • We developed a visualization system 30
Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign Search for FRB’s optical counterparts 33 Jeff Cooke, Tyler Pritchard, Igor Andreoni, Emily Petroff, Sarah Burke-Spolaor, Christopher Flynn, Michael Murphy, Evan Keane, Manisha Caleb, Stephi Bernard, Bernard Meade, Christopher Fluke, and myself! DECam @ CTIO and Parkes’ beams overlayed • DECam • 62 science CCDs • 520 megapixels each • images 3 square degrees (2.2 degree wide field) at 0.263 arcsecond/pixel resolution.
Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign Search for FRB’s optical counterparts 34 Jeff Cooke, Tyler Pritchard, Igor Andreoni, Emily Petroff, Sarah Burke-Spolaor, Christopher Flynn, Michael Murphy, Evan Keane, Manisha Caleb, Stephi Bernard, Bernard Meade, Christopher Fluke, and myself! DECam @ CTIO and Parkes beams overlayed • DECam • 62 science CCDs • 520 megapixels each • images 3 square degrees (2.2 degree wide field) at 0.263 arcsecond/pixel resolution. Each FITS file ~1GB Ship from Chile to Australia ~20min/file With JPEG2000 @ ~30:1 ~1min/file
| Astronomy in the Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign Search for FRB’s optical counterparts 35 Jeff Cooke, Tyler Pritchard, Igor Andreoni, Emily Petroff, Sarah Burke-Spolaor, Christopher Flynn, Michael Murphy, Evan Keane, Manisha Caleb, Stephi Bernard, Bernard Meade, Christopher Fluke, and myself! Credit:Bernard Meade OziPortal@ University of Melbourne