Pro Yearly is on sale from $80 to $50! »

Transients Database Performance

Ab44292d7d6f032baf342a98230a6654?s=47 transientskp
January 09, 2014

Transients Database Performance

Bart Scheers

Ab44292d7d6f032baf342a98230a6654?s=128

transientskp

January 09, 2014
Tweet

Transcript

  1. Transients Database Performance Bart Scheers Centrum Wiskunde & Informatica, Amsterdam

    Astronomical Institute ”Anton Pannekoek”, University of Amsterdam TKP Project Meeting Amsterdam January 9th, 2014 Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  2. LOFAR Transients Database − The Prerequisites ◮ Raw data rate

    ∼ 1 TB/hr ◮ Distinct sources: ∼ 107 − 108, ⊲ 15, 000 measurements per unique source per year ⊲ A measurement is about 300 B ◮ Stored source properties reduce to 50 − 100 TB/yr ◮ Peaks over 10,000 sources per second ◮ n transients per day, n > 0 Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  3. LOFAR Transients Database − The Prerequisites ◮ Raw data rate

    ∼ 1 TB/hr ◮ Distinct sources: ∼ 107 − 108, ⊲ 15, 000 measurements per unique source per year ⊲ A measurement is about 300 B ◮ Stored source properties reduce to 50 − 100 TB/yr ◮ Peaks over 10,000 sources per second ◮ n transients per day, n > 0 ◮ Actively use database ⇒ move algorithms and statistics inside database engine ◮ Real-time data access, quick responses ⇒ single node ◮ Accumulate data over time ⇒ multiple nodes Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  4. LOFAR Transients Database − Plugging it in Bart Scheers |

    TKP Meeting | 2014-01-09 Transients Database Performance
  5. LOFAR Transients Database − Plugging it in Bart Scheers |

    TKP Meeting | 2014-01-09 Transients Database Performance
  6. LOFAR − MonetDB: How it started ◮ Compare row-store vs.

    column-store database ◮ Processed 6 series of 1000 images (x axes) ◮ Per series the number of sources varied ◮ Response times of two most intensive queries shown on the y axes. Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  7. LOFAR − MonetDB: How it evolved ◮ The Development Cycle

    ⊲ From cutting edge ⊲ to crashes ⊲ and bugs ⊲ And back to cutting edge Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  8. Simulations ◮ Number of images ◮ Source density per image

    ◮ Measure individual Query Response Times ⊲ Insertions (images/extracted sources) ⊲ Null Detections ⊲ Source Association ⊲ Transient Detection ◮ MonetDB & PostgreSQL Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  9. Simulations, Insertion, 100 × 10 , 000 0 2 4

    6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 Accumulated run time [s] (×103 ) run_1IQ_feb2013-sp6-cr_rocks098_local_100x10000 isrejected insert_image insert_detections insert_extractedsources 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 Accumulated run time [s] (×103 ) run_2IQ_pg-v9.1_rocks090_local_100x10000 isrejected insert_image insert_extracted_sources Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  10. Simulations, Null Detection, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.00 0.05 0.10 0.15 0.20 0.25 Accumulated run time [s] (×103 ) run_1NDQueries_feb2013-sp6-cr_rocks098_local_100x10000 get_nulldetections add_nulldetections 0 2 4 6 8 10 Image/Query call (×103 ) 0.00 0.05 0.10 0.15 0.20 0.25 Accumulated run time [s] (×103 ) run_2NDQueries_pg-v9.1_rocks090_local_100x10000 get_nulldetections add_nulldetections Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  11. Simulations, Source Association, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_100x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_2SAQ_pg-v9.1_rocks090_local_100x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  12. Simulations, Transient Detection, 100 × 10 , 000 0 2

    4 6 8 10 Image/Query call (×103 ) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Accumulated run time [s] (×103 ) run_1TSQ_feb2013-sp6-cr_rocks098_local_100x10000 select_updated_variability_indices update_known_transients insert_transients update_known_transients_in_monitoringlist insert_new_transients_in_monitoringlist 0 2 4 6 8 10 Image/Query call (×103 ) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Accumulated run time [s] (×103 ) run_2TSQ_pg-v9.1_rocks090_local_100x10000 select_updated_variability_indices update_known_transients insert_transients update_known_transients_in_monitoringlist insert_new_transients_in_monitoringlist ◮ 0.83 s/img − 1.2 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  13. Simulations, Source Association, 1 , 000 × 10 , 000

    0 2 4 6 8 10 Image/Query call (×103 ) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_1000x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat 0 2 4 6 8 10 Image/Query call (×103 ) 0.0 10.0 20.0 30.0 40.0 50.0 Accumulated run time [s] (×103 ) run_1SAQ_pg-v9.1_rocks095_local_1000x10000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ 2.6 s/img − 22 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  14. Simulations, Source Association, 100 × 100 , 000 0 20

    40 60 80 100 Image/Query call (×103 ) 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Accumulated run time [s] (×103 ) run_1SAQ_feb2013-sp6-cr_rocks098_local_100x100000 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ 0.89 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  15. RSM Data, Source Association 0 2 4 6 8 10

    Image/Query call (×103 ) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Accumulated run time [s] (×103 ) run_3SourceAssociationQueries_feb2013-sp6_rocks098_local_20x10737 check_meridian_wrap insert_temprunningcatalog flag_many_to_many_tempruncat insert_1_to_many_assoc insert_1_to_many_runcat insert_1_to_many_runcat_flux insert_1_to_many_basepoint_assoc insert_1_to_many_skyrgn insert_1_to_many_monitoringlist insert_1_to_many_transient delete_1_to_many_inactive_assoc delete_1_to_many_inactive_runcat_flux flag_1_to_many_inactive_runcat flag_1_to_many_inactive_tempruncat delete_1_to_many_inactive_assocskyrgn delete_1_to_many_inactive_monitoringlist delete_1_to_many_inactive_transient insert_1_to_1_assoc update_1_to_1_runcat update_1_to_1_runcat_flux insert_1_to_1_runcat_flux insert_new_runcat insert_new_runcat_flux insert_new_runcat_skyrgn_assocs_a insert_new_runcat_skyrgn_assocs_b insert_new_assoc insert_new_monitoringlist insert_new_transient delete_inactive_runcat ◮ det.level=10, ∼ 22 sources/img, 2.8 s/img Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  16. How to distribute a database How to ”break up” data

    over multiple nodes ◮ Shard by zone/declination ◮ Partition by time ◮ Preferably no code changes Distributed Databases ◮ Use distributed file system ◮ Use intelligence and autonomy of storage devices ◮ Exploit tiers for data summarisations Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  17. SciLens Platform, 300+ node experimentation cluster Bart Scheers | TKP

    Meeting | 2014-01-09 Transients Database Performance
  18. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  19. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  20. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  21. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  22. From Local to distributed Bart Scheers | TKP Meeting |

    2014-01-09 Transients Database Performance
  23. Query Monitoring, distributed vs. single 0 500 1000 1500 2000

    Query call 0 200 400 600 800 1000 1200 Accumulated run time [s] check mw copy into flag m-m tempruncat insert 1-1 assoc insert 1-1 runcatflux insert 1-m skyrgn insert new runcatflux insert new runcat insert new skyrgn assoc 1 insert new skyrgn assoc 2 insert tempruncat insert xtrsrc update 1-1 runcatflux update 1-1 runcat insert image 0 500 1000 1500 2000 Query call 0 200 400 600 800 1000 1200 Accumulated run time [s] check mw copy into flag m-m tempruncat insert 1-1 assoc insert 1-1 runcatflux insert 1-m skyrgn insert new runcatflux insert new runcat insert new skyrgn assoc 1 insert new skyrgn assoc 2 insert tempruncat insert xtrsrc update 1-1 runcatflux update 1-1 runcat insert image Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance
  24. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  25. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  26. Mutiple Query nodes Bart Scheers | TKP Meeting | 2014-01-09

    Transients Database Performance
  27. Conclusions & Future Work ◮ Column-store boosts performance ◮ Moving

    algorithms and operations to the data ◮ Real-time database ⊲ Known TraP queries behave linearly over time ⊲ Adding more statistical functions ◮ Distributed Databases, using intelligence and autonomy of storage devices ⊲ Read-only archive performs acceptible ⊲ Scalable ⊲ Adding more Query Nodes Bart Scheers | TKP Meeting | 2014-01-09 Transients Database Performance