Slide 1

Slide 1 text

Sharded Light-curve Database Bart Scheers Astronomical Institute Anton Pannekoek, University of Amsterdam Centre for Mathematics & Informatics (CWI) LOFAR TKP Meeting, Meudon, 2011–12–14

Slide 2

Slide 2 text

1 LOFAR TKP Meeting – 2011-12-14 Bart Scheers LOFAR Catalogue (of Light Curves) ▸ List of all sources detected at least once by LOFAR ▸ Multiple observations per source ▹ Light curves ▹ Adding time domain → dynamic catalogue ▸ Keep track of 'meta-data' ▹ image properties (noise) ▹ observation characteristics ▸ Make available for data mining/discovery ▹ Scalable

Slide 3

Slide 3 text

What do we expect? ▸ Full operation: 50 – 100 TB/yr ▸ Peaks: 10,000 sources per second ▸ Distinct sources: ~107 – 108 ▹ which are revisted many, many, many times ▸ These numbers call for bulk-processing ▹ maintain statistical representations of data ▹ spread data over multiple nodes 2 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 4

Slide 4 text

Source Association 3 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 5

Slide 5 text

Source Association 3 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Dynamic (updated after every image) Static (but updated after every db instantiation)

Slide 6

Slide 6 text

TKP Data(base) flow 4 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 7

Slide 7 text

TKP Data(base) flow 4 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Light-curve Database, Long-term Archive, 50 – 100 TB/yr TRAP Database, during observations, ≲ 500 GB

Slide 8

Slide 8 text

5 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 9

Slide 9 text

Source Association 6 LOFAR TKP Meeting – 2011-12-14 Bart Scheers (1) distance on sky (2) dimensionless distance (3) likelyhood ratio

Slide 10

Slide 10 text

Source Association 6 LOFAR TKP Meeting – 2011-12-14 Bart Scheers (1) distance on sky (2) dimensionless distance (3) likelyhood ratio

Slide 11

Slide 11 text

Source Variability 7 LOFAR TKP Meeting – 2011-12-14 Bart Scheers (1) absolute flux change (2) weighted flux change Maintain 6 properties

Slide 12

Slide 12 text

Association & Variability Probabilities ▸ Rayleigh distribution ▸ η ν behaves as chi square probability 8 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 13

Slide 13 text

WSRT Data 9 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 14

Slide 14 text

WSRT Data 9 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 15

Slide 15 text

WSRT Data 9 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 16

Slide 16 text

10 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Global Sky Model

Slide 17

Slide 17 text

11 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Global Sky Model ▸ Get all VLSS sources within the field of view ▸ Find (none or) counterpart in WENSS and NVSS catalogues ▸ Fit spectral index, curvature and higher order curvature order terms ▸ Create source-list file ▸ Wanted: No VLSS in FoV, use WENSS as base

Slide 18

Slide 18 text

12 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Global Sky Model gsm.expected_fluxes_in_fov(conn, ra_c, decl_c, fov_radius, assoc_theta, 'bbs.skymodel.test', storespectraplots=True)

Slide 19

Slide 19 text

13 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 20

Slide 20 text

We want to mine more... ▸ Detecting trends ▹ n sequential data points mσ above average ▸ Systematic structure of light curve ▹ Ratio of the mean square successive difference to the sample variance ▸ FTs, cross- & auto-correlations; all work with (varying) window sizes → SciQL 14 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 21

Slide 21 text

SciQL – Cross-Correlation Example ▸ Extend SQL2003 → SciQL 15 LOFAR TKP Meeting – 2011-12-14 Bart Scheers

Slide 22

Slide 22 text

SciLens Platform (original) 16 LOFAR TKP Meeting – 2011-12-14 Bart Scheers ▸ Computational top tier (1 node) ▸ High-end tier (16 nodes) ▸ Cloud-oriented tier (64 nodes) ▸ Energy-conservative tier (256 nodes)

Slide 23

Slide 23 text

SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)

Slide 24

Slide 24 text

SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)

Slide 25

Slide 25 text

SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)

Slide 26

Slide 26 text

SciLens Platform (current) 17 LOFAR TKP Meeting – 2011-12-14 Bart Scheers ▸ Top and high-end tier not built yet ▸ Cloud-oriented tier ▹ 144 Rocks, single quad cores, 16 GB, 0.5 or 1 TB SSD, 1 × 2 TB, Infiniband (40Gb/s) ▸ Bottom tier ▹ 144 Pebbles AMD- Bobcat, 8GB, 5 × 2TB, ethernet (1Gb/s)

Slide 27

Slide 27 text

18 LOFAR TKP Meeting – 2011-12-14 Bart Scheers Conclusions ▸ Statistical representation of full LOFAR catalogue relaxes source association ▸ Sharded database reduces replication ▸ Together with SciLens the infrastructure is scalable ▸ SciQL extends data mining opertunities