Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Update on LOFAR TKP Database

Ab44292d7d6f032baf342a98230a6654?s=47 transientskp
December 04, 2012

Update on LOFAR TKP Database

Bart Scheers

Ab44292d7d6f032baf342a98230a6654?s=128

transientskp

December 04, 2012
Tweet

Transcript

  1. Update on LOFAR TKP Database Bart Scheers Astronomical Institute ”Anton

    Pannekoek”, University of Amsterdam Centrum Wiskunde & Informatica, Amsterdam TKP Meeting Amsterdam, December 4th, 2012 Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  2. The Transients Database The Aim ◮ Store all LOFAR measurements

    ◮ Build light-curve catalogue ◮ Enable fast processing, and access (exploit database engine) Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  3. The Transients Database The Aim ◮ Store all LOFAR measurements

    ◮ Build light-curve catalogue ◮ Enable fast processing, and access (exploit database engine) The Schema Design ◮ Propagate algorithms to the data ◮ Optimise for comparison of latest measurements with a statistical model of all measurements ◮ Recently: redesign, renaming, explicit table relations, installing & upgrading Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  4. The Transients Database The Aim ◮ Store all LOFAR measurements

    ◮ Build light-curve catalogue ◮ Enable fast processing, and access (exploit database engine) The Schema Design ◮ Propagate algorithms to the data ◮ Optimise for comparison of latest measurements with a statistical model of all measurements ◮ Recently: redesign, renaming, explicit table relations, installing & upgrading The Content ◮ External catalogues: VLSS(r), WENSS, NVSS, exoplanets ◮ Standard frequency bands (as defined for MSSS) ◮ Original measurements ◮ Deduced data: associations between measurements, cataloguing measurements ◮ Meta-data: pipeline configuration and task settings Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  5. LOFAR Characteristics, expected volumes & data rates Data production ◮

    Raw data ∼ 25 TB/hr Here, we focus on the database ◮ Distinct sources: ∼ 107 − 108, ⊲ which are measured/revisited many, many, many times ◮ Single measurement stores ∼300B of data ◮ Overall data accumulation about 50 − 100 TB/yr ◮ Peaks may be over 10,000 source measurements per second Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  6. Exploit the Database Engine Move the algorithms to the data,

    inside the database engine, reducing I/O ◮ Build & maintain an up-to-date statistical sky model ◮ Source association ◮ Monitoring list ◮ Transient & variability search ◮ Feature extraction Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  7. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  8. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints ◮ Therefore, we use a more database-friendly approach Avg xN = 1 N N i=1 xi ⇒ xN+1 = NxN +xN+1 N+1 Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  9. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints ◮ Therefore, we use a more database-friendly approach Avg xN = 1 N N i=1 xi ⇒ xN+1 = NxN +xN+1 N+1 w’d Avg ξN = PN i=1 wi xi PN i=1 wi ⇒ NξN +wN+1xN+1 NwN +wN+1xN+1 , wN+1 = 1/e2 N+1 Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  10. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints ◮ Therefore, we use a more database-friendly approach Avg xN = 1 N N i=1 xi ⇒ xN+1 = NxN +xN+1 N+1 w’d Avg ξN = PN i=1 wi xi PN i=1 wi ⇒ NξN +wN+1xN+1 NwN +wN+1xN+1 , wN+1 = 1/e2 N+1 Variability indices per band: Magnitude Vν = sν /Iν = 1 Iν N N−1 Iν 2 − Iν 2 Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  11. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints ◮ Therefore, we use a more database-friendly approach Avg xN = 1 N N i=1 xi ⇒ xN+1 = NxN +xN+1 N+1 w’d Avg ξN = PN i=1 wi xi PN i=1 wi ⇒ NξN +wN+1xN+1 NwN +wN+1xN+1 , wN+1 = 1/e2 N+1 Variability indices per band: Magnitude Vν = sν /Iν = 1 Iν N N−1 Iν 2 − Iν 2 Significance ην = N N−1 wνIν 2 − wν Iν 2 wν Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  12. Building & maintaining an up-to-date statistical sky model ◮ We

    want to summarise/reduce our data statistically, instead of using all individual datapoints ◮ Therefore, we use a more database-friendly approach Avg xN = 1 N N i=1 xi ⇒ xN+1 = NxN +xN+1 N+1 w’d Avg ξN = PN i=1 wi xi PN i=1 wi ⇒ NξN +wN+1xN+1 NwN +wN+1xN+1 , wN+1 = 1/e2 N+1 Variability indices per band: Magnitude Vν = sν /Iν = 1 Iν N N−1 Iν 2 − Iν 2 Significance ην = N N−1 wνIν 2 − wν Iν 2 wν ◮ Store factors for fast calculation ◮ http://docs.transientskp.org/tkp/database/schema.html Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  13. Source Association (by Position only) De Ruiter radius, dimensionless distance

    takes errors into account rij = (αi cos δi −αj cos δj )2 σ2 αi +σ2 αj + (δi −δj )2 σ2 δi +σ2 δj < rlim Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  14. Source Association (by Position only) Rayleigh Distribution: probability of finding

    source at r ≥ ρ p(r ≥ ρ) = exp(−ρ2/2) Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  15. Source Association (by Position only) Taking care of types of

    association: one-to-one, one-to-many, many-to-one, many-to-many (http://docs.transientskp.org/tkp/database/assoc.html) Missed ones are processed by the monitoring-list recipe Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  16. Monitoring Sources ◮ List of sources to be monitored based

    on position ◮ User-defined sources ◮ Picked up by the TraP ◮ Forced fits at locations by sourcefinder ◮ RMS upper limits if no source is found Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  17. Transient & Variability Detection ◮ Look for deviations in all

    light curves ◮ Use Variability Magnitude (Vν) and Significance (ην) indices ◮ Reduced χ2 probability justifies a rejection/acception of H0 (i.e. the source not being a variable) ⊲ p ην = ∞ ην ′=ην p ην (η ν ′, N − 1)dη ν ′ Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  18. Feature Extraction ◮ Obtain characteristics from detected transient sources. ◮

    Duration ◮ Peak flux ◮ Absolute and relative increase and decrease from background to peak flux, and the increase/decrease ratio Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  19. Transients Database, single node Bart Scheers | TKP Meeting |

    2012-12-04 LOFAR Databases
  20. MonetDB−MySQL, or comparing a column- and row-store MySQL 5.0.45 (red

    line) and MonetDB v5.20.4 Jun2010-SP1 (blue line). Dual-core 64-bit Intel(R) Pentium(R) 4 CPU 3.00 GHz with 1 GB of RAM, running Fedora 8 (Linux kernel 2.6.26.8-57) desk-top computer. Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  21. Non-digestivity of Recipes 0 100 200 300 400 500 600

    700 800 900 Images, grouped per 9 ( ∼20sources per image) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Query processing time (seconds) total ins_temp Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  22. Non-digestivity of Recipes 0 100 200 300 400 500 600

    700 800 900 Images, grouped per 9 ( ∼20sources per image) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Query processing time (seconds) assoc xtr Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  23. Digestivity of Recipes Bart Scheers | TKP Meeting | 2012-12-04

    LOFAR Databases
  24. Transients Database, from single to multiple sharded nodes Bart Scheers

    | TKP Meeting | 2012-12-04 LOFAR Databases
  25. Transients Database, from single to multiple sharded nodes Bart Scheers

    | TKP Meeting | 2012-12-04 LOFAR Databases
  26. Transients Database, from single to multiple sharded nodes Table1 Table2

    Load and Alter SQL Statements 10-1 100 101 102 103 104 Time [s] Load on single node Alter on single node Load data; alter table, add and update 4 DBL columns T1: 4.5 GB, row size 1023B, 4 Mrows T2: 85 GB, row size 467 B, 165 Mrows Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  27. Transients Database, from single to multiple sharded nodes Table1 Table2

    Load and Alter SQL Statements 10-1 100 101 102 103 104 Time [s] Load on single node Alter on single node Load over 9 nodes Alter over 9 nodes Load data; alter table add and update 4 DBL columns T1: 4.5 GB, row size 1023B, 4 Mrows T2: 85 GB, row size 467 B, 165 Mrows Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  28. Transients Database, from single to multiple sharded nodes Q1 Q2

    Q3 Q4 Q5 Q6 Queries 10-3 10-2 10-1 100 101 102 Time [s] Cold Q on single node Hot Q on single node Cold mode: after server start, no in-memory data Hot mode: in-memory data Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  29. Transients Database, from single to multiple sharded nodes Q1 Q2

    Q3 Q4 Q5 Q6 Queries 10-3 10-2 10-1 100 101 102 Time [s] Cold Q on single node Cold Q over 9 nodes Hot Q on single node Hot Q over 9 nodes Cold mode: after server start, no in-memory data Hot mode: in-memory data Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  30. Transients Database, from single to multiple sharded nodes Bart Scheers

    | TKP Meeting | 2012-12-04 LOFAR Databases
  31. Summary & Open Issues ◮ Column-stores perform better in high

    data volumes ◮ Maintain good statistical models ◮ Sharded databases reduce data replication ◮ Merge TraDB with/to LTA ◮ More unit tests ◮ Refactoring on monitoring ◮ Keep monitoring database performance Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases
  32. Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases