Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transients Database Performance

transientskp
January 09, 2014

Transients Database Performance

Bart Scheers

transientskp

January 09, 2014
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Transients Database Performance Bart Scheers Centrum Wiskunde & Informatica, Amsterdam

    Astronomical Institute ”Anton Pannekoek”, University of Amsterdam TKP Project Meeting Amsterdam January 9th, 2014 Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  2. LOFAR Transients Database − The Prerequisites ◮ Raw data rate

    ∼ 1 TB/hr ◮ Distinct sources: ∼ 107 − 108, ⊲ 15, 000 measurements per unique source per year ⊲ A measurement is about 300 B ◮ Stored source properties reduce to 50 − 100 TB/yr ◮ Peaks over 10,000 sources per second ◮ n transients per day, n > 0 Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  3. LOFAR Transients Database − The Prerequisites ◮ Raw data rate

    ∼ 1 TB/hr ◮ Distinct sources: ∼ 107 − 108, ⊲ 15, 000 measurements per unique source per year ⊲ A measurement is about 300 B ◮ Stored source properties reduce to 50 − 100 TB/yr ◮ Peaks over 10,000 sources per second ◮ n transients per day, n > 0 ◮ Actively use database ⇒ move algorithms and statistics inside database engine ◮ Real-time data access, quick responses ⇒ single node ◮ Accumulate data over time ⇒ multiple nodes Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  4. LOFAR Transients Database − Plugging it in Bart Scheers |

    TKP Meeting | 2014-01-09 Transients Database Performance
  5. LOFAR Transients Database − Plugging it in Bart Scheers |

    TKP Meeting | 2014-01-09 Transients Database Performance
  6. LOFAR − MonetDB: How it started ◮ Compare row-store vs.

    column-store database ◮ Processed 6 series of 1000 images (x axes) ◮ Per series the number of sources varied ◮ Response times of two most intensive queries shown on the y axes. Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  7. LOFAR − MonetDB: How it evolved ◮ The Development Cycle

    ⊲ From cutting edge ⊲ to crashes ⊲ and bugs ⊲ And back to cutting edge Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  8. Simulations ◮ Number of images ◮ Source density per image

    ◮ Measure individual Query Response Times ⊲ Insertions (images/extracted sources) ⊲ Null Detections ⊲ Source Association ⊲ Transient Detection ◮ MonetDB & PostgreSQL Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  9. Simulations, Insertion, 100 × 10 , 000 0 2 4

    6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 Accumulated run time [s] (×103 ) run_1IQ_feb2013-sp6-cr_rocks098_local_100x10000 isrejected insert_image insert_detections insert_extractedsources 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 Accumulated run time [s] (×103 ) run_2IQ_pg-v9.1_rocks090_local_100x10000 isrejected insert_image insert_extracted_sources Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  10. Simulations, Null Detection, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.00 0.05 0.10 0.15 0.20 0.25 Accumulated run time [s] (×103 ) run_1NDQueries_feb2013-sp6-cr_rocks098_local_100x10000 get_nulldetections add_nulldetections 0 2 4 6 8 10 Image/Query call (×103 ) 0.00 0.05 0.10 0.15 0.20 0.25 Accumulated run time [s] (×103 ) run_2NDQueries_pg-v9.1_rocks090_local_100x10000 get_nulldetections add_nulldetections Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  11. Simulations, Source Association, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_100x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_2SAQ_pg-v9.1_rocks090_local_100x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  12. Simulations, Transient Detection, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Accumulated run time [s] (×103 ) run_1TSQ_feb2013-sp6-cr_rocks098_local_100x10000 select_updated_variability_indices update_known_transients insert_transients update_known_transients_in_monitoringlist insert_new_transients_in_monitoringlist 0 2 4 6 8 10 Image/Query call (×103 ) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Accumulated run time [s] (×103 ) run_2TSQ_pg-v9.1_rocks090_local_100x10000 select_updated_variability_indices update_known_transients insert_transients update_known_transients_in_monitoringlist insert_new_transients_in_monitoringlist ◮ 0.83 s/img − 1.2 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  13. Simulations, Source Association, 1 , 000 × 10 , 000

    0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_1000x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 10.0 20.0 30.0 40.0 50.0 Accumulated run time [s] (×103 ) run_1SAQ_pg-v9.1_rocks095_local_1000x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ 2.6 s/img − 22 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  14. Simulations, Source Association, 100 × 100 , 000 0 20

    40 60 80 100 Image/Query call (×103 ) 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_100x100000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ 0.89 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  15. RSM Data, Source Association 0 2 4 6 8 10

    Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_3SourceAssociationQueries_feb2013-sp6_rocks098_local_20x10737 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ det.level=10, ∼ 22 sources/img, 2.8 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  16. How to distribute a database How to ”break up” data

    over multiple nodes ◮ Shard by zone/declination ◮ Partition by time ◮ Preferably no code changes Distributed Databases ◮ Use distributed file system ◮ Use intelligence and autonomy of storage devices ◮ Exploit tiers for data summarisations Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  17. SciLens Platform, 300+ node experimentation cluster Bart Scheers | TKP

    Meeting | 2014-01-09 Transients Database Performance
  18. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  19. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  20. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  21. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  22. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  23. Query Monitoring, distributed vs. single 0 500 1000 1500 2000

    Query call 0 200 400 600 800 1000 1200 Accumulated run time [s] check mw copy into flag m-m tempruncat insert 1-1 assoc insert 1-1 runcatflux insert 1-m skyrgn insert new runcatflux insert new runcat insert new skyrgn assoc 1 insert new skyrgn assoc 2 insert tempruncat insert xtrsrc update 1-1 runcatflux update 1-1 runcat insert image 0 500 1000 1500 2000 Query call 0 200 400 600 800 1000 1200 Accumulated run time [s] check mw copy into flag m-m tempruncat insert 1-1 assoc insert 1-1 runcatflux insert 1-m skyrgn insert new runcatflux insert new runcat insert new skyrgn assoc 1 insert new skyrgn assoc 2 insert tempruncat insert xtrsrc update 1-1 runcatflux update 1-1 runcat insert image Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  24. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  25. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  26. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  27. Conclusions & Future Work ◮ Column-store boosts performance ◮ Moving

    algorithms and operations to the data ◮ Real-time database ⊲ Known TraP queries behave linearly over time ⊲ Adding more statistical functions ◮ Distributed Databases, using intelligence and autonomy of storage devices ⊲ Read-only archive performs acceptible ⊲ Scalable ⊲ Adding more Query Nodes Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance