Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dany Vohl - .Astronomy7 - Astronomy in the Petascale Data Era: storage, transfer and visualization

Dany Vohl
November 04, 2015

Dany Vohl - .Astronomy7 - Astronomy in the Petascale Data Era: storage, transfer and visualization

Slides from my invited talk presented at .Astronomy 7 (Sydney, NSW, Australia, 2015)

Dany Vohl

November 04, 2015
Tweet

More Decks by Dany Vohl

Other Decks in Science

Transcript

  1. Astronomy in the Petascale Data Era: storage, transfer and visualization

    Dany Vohl Centre for Astrophysics & Supercomputing 7
  2. Dany Vohl | | Astronomy in the Petascale Data Era:

    storage, transfer and visualization General introduction Petascale Astronomy Era 2 • Big Data : broad expression used in different contexts 1.Lots of small to medium sized files 
 (few 100 MB, 1000s stored) • e.g. Many sky surveys (SDSS, SAMI) 2.Few very large sized files • e.g. Millenium simulation (64 files, each ~300 GB) 3.Lots of very large sized files • LOFAR, MeerKAT, ASKAP, SKA
  3. Dany Vohl | | Astronomy in the Petascale Data Era:

    storage, transfer and visualization General introduction Petascale Astronomy Era 3 • Data growth reached epidemic proportion • New challenges arise: • Visualisation and analysis on large data samples • Transfer over a network with limited bandwidth • Store into data storage facilities
  4. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization 4 Storage & transfer
 data compression
  5. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization 5 What is Data Compression ? LOL
  6. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization 6 • Probability of occurrence is linked to amount of information • Given a domain V containing n values • If a value v ∈ V occurs f(v) times • The amount of information it carries is given by
  7. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization • The more frequent a word occurs, the less information it contains (and vice versa) • Averaging on all information in a source V of n words is Shannon’s entropy (closely linked to Boltzmann's second law of thermodynamic) • Minimum size needed to represent a source perfectly 7 Source: supplychain247.com
  8. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization Data Compression : two categories 8
  9. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization Data Compression : two categories 9
  10. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization • Which data format/model best for Big Data 
 (FITS, HDF5, ASDF, JPEG2000)? 10 e.g. Kitaeff et al., 2012; Kitaeff et al., 2014; Natusch, 2014; Mink et al., 2014; Peters & Kitaeff, 2014; Price et al., 2014; Vohl et al 2015.
  11. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization • Which data format/model best for Big Data 
 (FITS, HDF5, ASDF, JPEG2000)? 11 e.g. Kitaeff et al., 2012; Kitaeff et al., 2014; Natusch, 2014; Mink et al., 2014; Peters & Kitaeff, 2014; Price et al., 2014; Vohl et al 2015.
  12. 1. Storage and transfer Data Compression Dany Vohl | |

    Astronomy in the Petascale Data Era: storage, transfer and visualization • JPEG2000 in astronomy? • ISO/IEC 15444 standard • Based on the Discrete Wavelet Transform • Well suited for quantization and compression • progressive transmission, decode part of data, ROI 12
  13. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Observational data (Peters & Kitaeff, 2014) • Can deliver high compression ratio without corrupting data • e.g. At ~90:1 • low integrated flux sources identified with Duchamp
 (less than 800 mJy km/s) • lossy compression can denoise 13 Image: Duchamp results rendered with GraphTIVA
 (Hassan, Fluke & Barnes, 2010)
  14. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Observational data • noisy by nature (i.e. instrumental noise) • Theoretical data? • Numerical simulation data, lossy compression will tend to introduce additional noise • True? 14
  15. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 15 GPU-Enabled, High-Resolution, cosmological MicroLensing parameter survey • GERLUMPH • Currently ≈ 70,000 maps http://gerlum ph.swin.edu.au Convergence External shear Parameter space coverage (G. Vernardos, C. J. Fluke, 
 N. Bate, D. Crotton, and myself)
  16. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 16
  17. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 17 Original map Map convolved
 w/ small quasar
 profile Map convolved
 w/ medium quasar
 profile Map convolved
 w/ large quasar
 profile Vohl, Fluke & Vernardos (2015)
  18. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 18 Original Map convolved w/ profile Map convolved w/ profile Map convolved w/ profile Integers floats Vohl, Fluke & Vernardos (2015)
  19. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer • Near-lossless compression • Set a threshold to difference between original and uncompressed • RMSE <= 0.01 19
  20. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 20 Convolved maps At best ~325,812:1 ! Original maps At best 10:1 Original maps Convolved maps (small) Convolved maps (medium) Convolved maps (large) Vohl, Fluke & Vernardos (2015)
  21. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 21 Convolved map (medium) Compressed map (17,271:1) Vohl, Fluke & Vernardos (2015)
  22. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 22 Compressed map (17,271:1) Convolved map (medium) Vohl, Fluke & Vernardos (2015)
  23. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 23 Compressed map (17,271:1) Convolved map (medium) Vohl, Fluke & Vernardos (2015)
  24. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 24 Original map Compressed map (17,271:1) Suggests 
 high level of compression 
 is possible 
 for convolved magnification maps 
 with minimal or no impact 
 on their future scientific use Vohl, Fluke & Vernardos (2015)
  25. Data Compression: JPEG2000 Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization 1. Storage and transfer 25 Original map Compressed map (17,271:1) Could shrink
 2.5 petabyte to less than 1 terabyte Vohl, Fluke & Vernardos (2015)
  26. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization 26 Visualization & Analysis
  27. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization • Current and upcoming large-scale extragalactic surveys • large # of spectral-cubes, each with large # of sources • A lot of the analysis will be done with automated pipelines • Human still big part of discovery process (!) • Visualization 27
  28. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization • Classical desktop-based visualization methodology inefficient 
 for large-scale spectral-cube surveys visualization • e.g. APERTIF radio survey of the northern sky (Röttgering et al. 2011) • 20,000 spectral-cubes, each containing ~100 sources 28
  29. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization • Next generation immersive 3D environments • Designed for large amounts of data in a collaborative setup • e.g. Monash CAVE2 29
  30. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization • Not currently any out-of-the-box solution for spectral-cubes data set • We developed a visualization system 30
  31. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization 31
  32. Let’s wrap up! Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization Compression and large display wall 32
  33. Let’s wrap up! Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign 
 Search for FRB’s optical counterparts
 
 
 
 
 
 
 
 
 33 Jeff Cooke, Tyler Pritchard, Igor Andreoni, 
 Emily Petroff, Sarah Burke-Spolaor, 
 Christopher Flynn, Michael Murphy, 
 Evan Keane, Manisha Caleb, Stephi Bernard, 
 Bernard Meade, Christopher Fluke, and myself! DECam @ CTIO and Parkes’ beams overlayed • DECam • 62 science CCDs • 520 megapixels each • images 3 square degrees 
 (2.2 degree wide field)
 at 0.263 arcsecond/pixel 
 resolution.
  34. Let’s wrap up! Dany Vohl | | Astronomy in the

    Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign 
 Search for FRB’s optical counterparts
 
 
 
 
 
 
 
 
 34 Jeff Cooke, Tyler Pritchard, Igor Andreoni, 
 Emily Petroff, Sarah Burke-Spolaor, 
 Christopher Flynn, Michael Murphy, 
 Evan Keane, Manisha Caleb, Stephi Bernard, 
 Bernard Meade, Christopher Fluke, and myself! DECam @ CTIO and Parkes beams overlayed • DECam • 62 science CCDs • 520 megapixels each • images 3 square degrees 
 (2.2 degree wide field)
 at 0.263 arcsecond/pixel 
 resolution. Each FITS file ~1GB Ship from Chile to Australia ~20min/file With JPEG2000 @ ~30:1 ~1min/file
  35. 2. Visualization and Analysis Large-scale spectral-cube surveys Dany Vohl |

    | Astronomy in the Petascale Data Era: storage, transfer and visualization • Deeper, Wider, Faster campaign 
 Search for FRB’s optical counterparts
 
 
 
 
 
 
 
 
 35 Jeff Cooke, Tyler Pritchard, Igor Andreoni, 
 Emily Petroff, Sarah Burke-Spolaor, 
 Christopher Flynn, Michael Murphy, 
 Evan Keane, Manisha Caleb, Stephi Bernard, 
 Bernard Meade, Christopher Fluke, and myself! Credit:Bernard Meade OziPortal@ University of Melbourne