Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharded Light-curve Database

Sharded Light-curve Database

Bart Scheers
LOFAR Transients Key Project Meeting, Meudon, December 2011

Avatar for transientskp

transientskp

June 23, 2012
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Sharded Light-curve Database Bart Scheers Astronomical Institute Anton Pannekoek, University

    of Amsterdam Centre for Mathematics & Informatics (CWI) LOFAR TKP Meeting, Meudon, 2011–12–14
  2. 1 LOFAR TKP Meeting – 2011-12-14 Bart Scheers LOFAR Catalogue

    (of Light Curves) ▸ List of all sources detected at least once by LOFAR ▸ Multiple observations per source ▹ Light curves ▹ Adding time domain → dynamic catalogue ▸ Keep track of 'meta-data' ▹ image properties (noise) ▹ observation characteristics ▸ Make available for data mining/discovery ▹ Scalable
  3. What do we expect? ▸ Full operation: 50 – 100

    TB/yr ▸ Peaks: 10,000 sources per second ▸ Distinct sources: ~107 – 108 ▹ which are revisted many, many, many times ▸ These numbers call for bulk-processing ▹ maintain statistical representations of data ▹ spread data over multiple nodes 2 LOFAR TKP Meeting – 2011-12-14 Bart Scheers
  4. Source Association 3 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

    Dynamic (updated after every image) Static (but updated after every db instantiation)
  5. TKP Data(base) flow 4 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers Light-curve Database, Long-term Archive, 50 – 100 TB/yr TRAP Database, during observations, ≲ 500 GB
  6. Source Association 6 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

    (1) distance on sky (2) dimensionless distance (3) likelyhood ratio
  7. Source Association 6 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

    (1) distance on sky (2) dimensionless distance (3) likelyhood ratio
  8. Source Variability 7 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

    (1) absolute flux change (2) weighted flux change Maintain 6 properties
  9. Association & Variability Probabilities ▸ Rayleigh distribution ▸ η ν

    behaves as chi square probability 8 LOFAR TKP Meeting – 2011-12-14 Bart Scheers
  10. 11 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Global Sky

    Model ▸ Get all VLSS sources within the field of view ▸ Find (none or) counterpart in WENSS and NVSS catalogues ▸ Fit spectral index, curvature and higher order curvature order terms ▸ Create source-list file ▸ Wanted: No VLSS in FoV, use WENSS as base
  11. 12 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Global Sky

    Model gsm.expected_fluxes_in_fov(conn, ra_c, decl_c, fov_radius, assoc_theta, 'bbs.skymodel.test', storespectraplots=True)
  12. We want to mine more... ▸ Detecting trends ▹ n

    sequential data points mσ above average ▸ Systematic structure of light curve ▹ Ratio of the mean square successive difference to the sample variance ▸ FTs, cross- & auto-correlations; all work with (varying) window sizes → SciQL 14 LOFAR TKP Meeting – 2011-12-14 Bart Scheers
  13. SciQL – Cross-Correlation Example ▸ Extend SQL2003 → SciQL 15

    LOFAR TKP Meeting – 2011-12-14 Bart Scheers
  14. SciLens Platform (original) 16 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers ▸ Computational top tier (1 node) ▸ High-end tier (16 nodes) ▸ Cloud-oriented tier (64 nodes) ▸ Energy-conservative tier (256 nodes)
  15. SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)
  16. SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)
  17. SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)
  18. SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart

    Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)
  19. 18 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Conclusions ▸

    Statistical representation of full LOFAR catalogue relaxes source association ▸ Sharded database reduces replication ▸ Together with SciLens the infrastructure is scalable ▸ SciQL extends data mining opertunities