Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transients Database Performance

transientskp
January 09, 2014

Transients Database Performance

Bart Scheers

transientskp

January 09, 2014
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Transients Database Performance
    Bart Scheers
    Centrum Wiskunde & Informatica, Amsterdam
    Astronomical Institute ”Anton Pannekoek”, University of Amsterdam
    TKP Project Meeting
    Amsterdam
    January 9th, 2014
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  2. LOFAR Transients Database − The Prerequisites
    ◮ Raw data rate ∼ 1 TB/hr
    ◮ Distinct sources: ∼ 107 − 108,
    ⊲ 15, 000 measurements per unique source per year
    ⊲ A measurement is about 300 B
    ◮ Stored source properties reduce to 50 − 100 TB/yr
    ◮ Peaks over 10,000 sources per second
    ◮ n transients per day, n > 0
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  3. LOFAR Transients Database − The Prerequisites
    ◮ Raw data rate ∼ 1 TB/hr
    ◮ Distinct sources: ∼ 107 − 108,
    ⊲ 15, 000 measurements per unique source per year
    ⊲ A measurement is about 300 B
    ◮ Stored source properties reduce to 50 − 100 TB/yr
    ◮ Peaks over 10,000 sources per second
    ◮ n transients per day, n > 0
    ◮ Actively use database ⇒ move algorithms and statistics inside
    database engine
    ◮ Real-time data access, quick responses ⇒ single node
    ◮ Accumulate data over time ⇒ multiple nodes
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  4. LOFAR Transients Database − Plugging it in
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  5. LOFAR Transients Database − Plugging it in
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  6. LOFAR − MonetDB: How it started
    ◮ Compare row-store vs. column-store database
    ◮ Processed 6 series of 1000 images (x axes)
    ◮ Per series the number of sources varied
    ◮ Response times of two most intensive queries shown on the y
    axes.
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  7. LOFAR − MonetDB: How it evolved
    ◮ The Development Cycle
    ⊲ From cutting edge
    ⊲ to crashes
    ⊲ and bugs
    ⊲ And back to cutting edge
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  8. Simulations
    ◮ Number of images
    ◮ Source density per image
    ◮ Measure individual Query Response Times
    ⊲ Insertions (images/extracted sources)
    ⊲ Null Detections
    ⊲ Source Association
    ⊲ Transient Detection
    ◮ MonetDB & PostgreSQL
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  9. Simulations, Insertion, 100 × 10
    ,
    000
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.5
    1.0
    1.5
    2.0
    Accumulated run time [s] (×103 )
    run_1IQ_feb2013-sp6-cr_rocks098_local_100x10000
    isrejected
    insert_image
    insert_detections
    insert_extractedsources
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.5
    1.0
    1.5
    2.0
    Accumulated run time [s] (×103 )
    run_2IQ_pg-v9.1_rocks090_local_100x10000
    isrejected
    insert_image
    insert_extracted_sources
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  10. Simulations, Null Detection, 100 × 10
    ,
    000
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.00
    0.05
    0.10
    0.15
    0.20
    0.25
    Accumulated run time [s] (×103 )
    run_1NDQueries_feb2013-sp6-cr_rocks098_local_100x10000
    get_nulldetections
    add_nulldetections
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.00
    0.05
    0.10
    0.15
    0.20
    0.25
    Accumulated run time [s] (×103 )
    run_2NDQueries_pg-v9.1_rocks090_local_100x10000
    get_nulldetections
    add_nulldetections
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  11. Simulations, Source Association, 100 × 10
    ,
    000
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    1.2
    1.4
    1.6
    1.8
    Accumulated run time [s] (×103 )
    run_1SAQ_feb2013-sp6-cr_rocks098_local_100x10000
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    1.2
    1.4
    1.6
    1.8
    Accumulated run time [s] (×103 )
    run_2SAQ_pg-v9.1_rocks090_local_100x10000
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  12. Simulations, Transient Detection, 100 × 10
    ,
    000
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.00
    0.10
    0.20
    0.30
    0.40
    0.50
    0.60
    Accumulated run time [s] (×103 )
    run_1TSQ_feb2013-sp6-cr_rocks098_local_100x10000
    select_updated_variability_indices
    update_known_transients
    insert_transients
    update_known_transients_in_monitoringlist
    insert_new_transients_in_monitoringlist
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.00
    0.10
    0.20
    0.30
    0.40
    0.50
    0.60
    Accumulated run time [s] (×103 )
    run_2TSQ_pg-v9.1_rocks090_local_100x10000
    select_updated_variability_indices
    update_known_transients
    insert_transients
    update_known_transients_in_monitoringlist
    insert_new_transients_in_monitoringlist
    ◮ 0.83 s/img − 1.2 s/img
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  13. Simulations, Source Association, 1
    ,
    000 × 10
    ,
    000
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.5
    1.0
    1.5
    2.0
    2.5
    3.0
    Accumulated run time [s] (×103 )
    run_1SAQ_feb2013-sp6-cr_rocks098_local_1000x10000
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    10.0
    20.0
    30.0
    40.0
    50.0
    Accumulated run time [s] (×103 )
    run_1SAQ_pg-v9.1_rocks095_local_1000x10000
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    ◮ 2.6 s/img − 22 s/img
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  14. Simulations, Source Association, 100 × 100
    ,
    000
    0 20 40 60 80 100
    Image/Query call (×103 )
    0.0
    1.0
    2.0
    3.0
    4.0
    5.0
    6.0
    7.0
    Accumulated run time [s] (×103 )
    run_1SAQ_feb2013-sp6-cr_rocks098_local_100x100000
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    ◮ 0.89 s/img
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  15. RSM Data, Source Association
    0 2 4 6 8 10
    Image/Query call (×103 )
    0.0
    0.2
    0.4
    0.6
    0.8
    1.0
    1.2
    1.4
    1.6
    1.8
    Accumulated run time [s] (×103 )
    run_3SourceAssociationQueries_feb2013-sp6_rocks098_local_20x10737
    check_meridian_wrap
    insert_temprunningcatalog
    flag_many_to_many_tempruncat
    insert_1_to_many_assoc
    insert_1_to_many_runcat
    insert_1_to_many_runcat_flux
    insert_1_to_many_basepoint_assoc
    insert_1_to_many_skyrgn
    insert_1_to_many_monitoringlist
    insert_1_to_many_transient
    delete_1_to_many_inactive_assoc
    delete_1_to_many_inactive_runcat_flux
    flag_1_to_many_inactive_runcat
    flag_1_to_many_inactive_tempruncat
    delete_1_to_many_inactive_assocskyrgn
    delete_1_to_many_inactive_monitoringlist
    delete_1_to_many_inactive_transient
    insert_1_to_1_assoc
    update_1_to_1_runcat
    update_1_to_1_runcat_flux
    insert_1_to_1_runcat_flux
    insert_new_runcat
    insert_new_runcat_flux
    insert_new_runcat_skyrgn_assocs_a
    insert_new_runcat_skyrgn_assocs_b
    insert_new_assoc
    insert_new_monitoringlist
    insert_new_transient
    delete_inactive_runcat
    ◮ det.level=10, ∼ 22 sources/img, 2.8 s/img
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  16. How to distribute a database
    How to ”break up” data over multiple nodes
    ◮ Shard by zone/declination
    ◮ Partition by time
    ◮ Preferably no code changes
    Distributed Databases
    ◮ Use distributed file system
    ◮ Use intelligence and autonomy of storage devices
    ◮ Exploit tiers for data summarisations
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  17. SciLens Platform, 300+ node experimentation cluster
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  18. From Local to distributed
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  19. From Local to distributed
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  20. From Local to distributed
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  21. From Local to distributed
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  22. From Local to distributed
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  23. Query Monitoring, distributed vs. single
    0 500 1000 1500 2000
    Query call
    0
    200
    400
    600
    800
    1000
    1200
    Accumulated run time [s]
    check mw
    copy into
    flag m-m tempruncat
    insert 1-1 assoc
    insert 1-1 runcatflux
    insert 1-m skyrgn
    insert new runcatflux
    insert new runcat
    insert new skyrgn assoc 1
    insert new skyrgn assoc 2
    insert tempruncat
    insert xtrsrc
    update 1-1 runcatflux
    update 1-1 runcat
    insert image
    0 500 1000 1500 2000
    Query call
    0
    200
    400
    600
    800
    1000
    1200
    Accumulated run time [s]
    check mw
    copy into
    flag m-m tempruncat
    insert 1-1 assoc
    insert 1-1 runcatflux
    insert 1-m skyrgn
    insert new runcatflux
    insert new runcat
    insert new skyrgn assoc 1
    insert new skyrgn assoc 2
    insert tempruncat
    insert xtrsrc
    update 1-1 runcatflux
    update 1-1 runcat
    insert image
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  24. Mutiple Query nodes
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  25. Mutiple Query nodes
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  26. Mutiple Query nodes
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide

  27. Conclusions & Future Work
    ◮ Column-store boosts performance
    ◮ Moving algorithms and operations to the data
    ◮ Real-time database
    ⊲ Known TraP queries behave linearly over time
    ⊲ Adding more statistical functions
    ◮ Distributed Databases, using intelligence and autonomy of
    storage devices
    ⊲ Read-only archive performs acceptible
    ⊲ Scalable
    ⊲ Adding more Query Nodes
    Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance

    View Slide