Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Update on LOFAR TKP Database

transientskp
December 04, 2012

Update on LOFAR TKP Database

Bart Scheers

transientskp

December 04, 2012
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Update on LOFAR TKP Database
    Bart Scheers
    Astronomical Institute ”Anton Pannekoek”, University of Amsterdam
    Centrum Wiskunde & Informatica, Amsterdam
    TKP Meeting
    Amsterdam, December 4th, 2012
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  2. The Transients Database
    The Aim
    ◮ Store all LOFAR measurements
    ◮ Build light-curve catalogue
    ◮ Enable fast processing, and access (exploit database engine)
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  3. The Transients Database
    The Aim
    ◮ Store all LOFAR measurements
    ◮ Build light-curve catalogue
    ◮ Enable fast processing, and access (exploit database engine)
    The Schema Design
    ◮ Propagate algorithms to the data
    ◮ Optimise for comparison of latest measurements with a
    statistical model of all measurements
    ◮ Recently: redesign, renaming, explicit table relations, installing
    & upgrading
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  4. The Transients Database
    The Aim
    ◮ Store all LOFAR measurements
    ◮ Build light-curve catalogue
    ◮ Enable fast processing, and access (exploit database engine)
    The Schema Design
    ◮ Propagate algorithms to the data
    ◮ Optimise for comparison of latest measurements with a
    statistical model of all measurements
    ◮ Recently: redesign, renaming, explicit table relations, installing
    & upgrading
    The Content
    ◮ External catalogues: VLSS(r), WENSS, NVSS, exoplanets
    ◮ Standard frequency bands (as defined for MSSS)
    ◮ Original measurements
    ◮ Deduced data: associations between measurements,
    cataloguing measurements
    ◮ Meta-data: pipeline configuration and task settings
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  5. LOFAR Characteristics, expected volumes & data rates
    Data production
    ◮ Raw data ∼ 25 TB/hr
    Here, we focus on the database
    ◮ Distinct sources: ∼ 107 − 108,
    ⊲ which are measured/revisited many, many, many times
    ◮ Single measurement stores ∼300B of data
    ◮ Overall data accumulation about 50 − 100 TB/yr
    ◮ Peaks may be over 10,000 source measurements per second
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  6. Exploit the Database Engine
    Move the algorithms to the data, inside the database
    engine, reducing I/O
    ◮ Build & maintain an up-to-date statistical sky model
    ◮ Source association
    ◮ Monitoring list
    ◮ Transient & variability search
    ◮ Feature extraction
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  7. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  8. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    ◮ Therefore, we use a more database-friendly approach
    Avg xN
    = 1
    N
    N
    i=1
    xi ⇒ xN+1 = NxN
    +xN+1
    N+1
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  9. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    ◮ Therefore, we use a more database-friendly approach
    Avg xN
    = 1
    N
    N
    i=1
    xi ⇒ xN+1 = NxN
    +xN+1
    N+1
    w’d Avg ξN
    = PN
    i=1
    wi xi
    PN
    i=1
    wi
    ⇒ NξN
    +wN+1xN+1
    NwN
    +wN+1xN+1
    ,
    wN+1 = 1/e2
    N+1
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  10. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    ◮ Therefore, we use a more database-friendly approach
    Avg xN
    = 1
    N
    N
    i=1
    xi ⇒ xN+1 = NxN
    +xN+1
    N+1
    w’d Avg ξN
    = PN
    i=1
    wi xi
    PN
    i=1
    wi
    ⇒ NξN
    +wN+1xN+1
    NwN
    +wN+1xN+1
    ,
    wN+1 = 1/e2
    N+1
    Variability indices per band:
    Magnitude Vν = sν
    /Iν = 1

    N
    N−1

    2 − Iν
    2
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  11. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    ◮ Therefore, we use a more database-friendly approach
    Avg xN
    = 1
    N
    N
    i=1
    xi ⇒ xN+1 = NxN
    +xN+1
    N+1
    w’d Avg ξN
    = PN
    i=1
    wi xi
    PN
    i=1
    wi
    ⇒ NξN
    +wN+1xN+1
    NwN
    +wN+1xN+1
    ,
    wN+1 = 1/e2
    N+1
    Variability indices per band:
    Magnitude Vν = sν
    /Iν = 1

    N
    N−1

    2 − Iν
    2
    Significance ην = N
    N−1
    wνIν
    2 − wν Iν
    2

    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  12. Building & maintaining an up-to-date statistical sky model
    ◮ We want to summarise/reduce our data statistically, instead
    of using all individual datapoints
    ◮ Therefore, we use a more database-friendly approach
    Avg xN
    = 1
    N
    N
    i=1
    xi ⇒ xN+1 = NxN
    +xN+1
    N+1
    w’d Avg ξN
    = PN
    i=1
    wi xi
    PN
    i=1
    wi
    ⇒ NξN
    +wN+1xN+1
    NwN
    +wN+1xN+1
    ,
    wN+1 = 1/e2
    N+1
    Variability indices per band:
    Magnitude Vν = sν
    /Iν = 1

    N
    N−1

    2 − Iν
    2
    Significance ην = N
    N−1
    wνIν
    2 − wν Iν
    2

    ◮ Store factors for fast calculation
    ◮ http://docs.transientskp.org/tkp/database/schema.html
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  13. Source Association (by Position only)
    De Ruiter radius, dimensionless distance takes errors into account
    rij = (αi cos δi −αj cos δj )2
    σ2
    αi
    +σ2
    αj
    + (δi −δj )2
    σ2
    δi
    +σ2
    δj
    < rlim
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  14. Source Association (by Position only)
    Rayleigh Distribution: probability of finding source at r ≥ ρ
    p(r ≥ ρ) = exp(−ρ2/2)
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  15. Source Association (by Position only)
    Taking care of types of association:
    one-to-one, one-to-many, many-to-one, many-to-many
    (http://docs.transientskp.org/tkp/database/assoc.html)
    Missed ones are processed by the monitoring-list recipe
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  16. Monitoring Sources
    ◮ List of sources to be monitored based on position
    ◮ User-defined sources
    ◮ Picked up by the TraP
    ◮ Forced fits at locations by sourcefinder
    ◮ RMS upper limits if no source is found
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  17. Transient & Variability Detection
    ◮ Look for deviations in all light curves
    ◮ Use Variability Magnitude (Vν) and Significance (ην) indices
    ◮ Reduced χ2 probability justifies a rejection/acception of H0
    (i.e. the source not being a variable)
    ⊲ p
    ην
    = ∞
    ην
    ′=ην
    p
    ην

    ν
    ′, N − 1)dη
    ν

    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  18. Feature Extraction
    ◮ Obtain characteristics from detected transient sources.
    ◮ Duration
    ◮ Peak flux
    ◮ Absolute and relative increase and decrease from background
    to peak flux, and the increase/decrease ratio
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  19. Transients Database, single node
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  20. MonetDB−MySQL, or comparing a column- and row-store
    MySQL 5.0.45 (red line) and MonetDB v5.20.4 Jun2010-SP1 (blue line).
    Dual-core 64-bit Intel(R) Pentium(R) 4 CPU 3.00 GHz with 1 GB of
    RAM, running Fedora 8 (Linux kernel 2.6.26.8-57) desk-top computer.
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  21. Non-digestivity of Recipes
    0 100 200 300 400 500 600 700 800 900
    Images, grouped per 9 (
    ∼20sources per image)
    0.0
    0.5
    1.0
    1.5
    2.0
    2.5
    3.0
    3.5
    Query processing time (seconds)
    total
    ins_temp
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  22. Non-digestivity of Recipes
    0 100 200 300 400 500 600 700 800 900
    Images, grouped per 9 (
    ∼20sources per image)
    0.0
    0.5
    1.0
    1.5
    2.0
    2.5
    3.0
    3.5
    Query processing time (seconds)
    assoc xtr
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  23. Digestivity of Recipes
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  24. Transients Database, from single to multiple sharded nodes
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  25. Transients Database, from single to multiple sharded nodes
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  26. Transients Database, from single to multiple sharded nodes
    Table1 Table2
    Load and Alter SQL Statements
    10-1
    100
    101
    102
    103
    104
    Time [s]
    Load on single node
    Alter on single node
    Load data; alter table, add and update 4 DBL columns
    T1: 4.5 GB, row size 1023B, 4 Mrows
    T2: 85 GB, row size 467 B, 165 Mrows
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  27. Transients Database, from single to multiple sharded nodes
    Table1 Table2
    Load and Alter SQL Statements
    10-1
    100
    101
    102
    103
    104
    Time [s]
    Load on single node
    Alter on single node
    Load over 9 nodes
    Alter over 9 nodes
    Load data; alter table add and update 4 DBL columns
    T1: 4.5 GB, row size 1023B, 4 Mrows
    T2: 85 GB, row size 467 B, 165 Mrows
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  28. Transients Database, from single to multiple sharded nodes
    Q1 Q2 Q3 Q4 Q5 Q6
    Queries
    10-3
    10-2
    10-1
    100
    101
    102
    Time [s]
    Cold Q on single node Hot Q on single node
    Cold mode: after server start, no in-memory data
    Hot mode: in-memory data
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  29. Transients Database, from single to multiple sharded nodes
    Q1 Q2 Q3 Q4 Q5 Q6
    Queries
    10-3
    10-2
    10-1
    100
    101
    102
    Time [s]
    Cold Q on single node
    Cold Q over 9 nodes
    Hot Q on single node
    Hot Q over 9 nodes
    Cold mode: after server start, no in-memory data
    Hot mode: in-memory data
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  30. Transients Database, from single to multiple sharded nodes
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  31. Summary & Open Issues
    ◮ Column-stores perform better in high data volumes
    ◮ Maintain good statistical models
    ◮ Sharded databases reduce data replication
    ◮ Merge TraDB with/to LTA
    ◮ More unit tests
    ◮ Refactoring on monitoring
    ◮ Keep monitoring database performance
    Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide

  32. Bart Scheers | TKP Meeting | 2012-12-04 LOFAR Databases

    View Slide