Transient Detection and SQL

Ab44292d7d6f032baf342a98230a6654?s=47 transientskp
January 09, 2014

Transient Detection and SQL

Gijs Molenaar

Ab44292d7d6f032baf342a98230a6654?s=128

transientskp

January 09, 2014
Tweet

Transcript

  1. Transient Detection and SQL LOFAR Transients Key Science Project meeting

    January 9 2014 Gijs Molenaar @gijzelaerr gijs@pythonic.nl
  2. TRAP • TRAnsient Detection Pipeline • Detect Transients in AARTFAAC

    data • 24/7 full sky monitor
  3. Images 10 bands (image cube) • 1 image cube per

    second • In future 10 images per second • 20 to 50 sources per image
  4. Data • Source (unique id) • Position (ra & decl)

    • Error (ra & decl) • Flux • Shape (2d gaussian) • Timestamp • Frequency (band)
  5. Data rate • 500 - 5000 inserts per second •

    May get more in future
  6. Calculations • Associate in frequency • Associate in time •

    Transients detection
  7. SQL Storing, manipulating and retrieving data De facto standard Structured

    data Relational data Old language
  8. The data Unique id, ra, decl, errors, flux, shape, timestamp,

    frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency table
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. Quite a lot of data • So quite a lot

    of calculations • move calculations to data (base) • Minimal communication
  16. Problems with SQL • Mix of language domains (python, SQL)

    • No SQL standard (various dialects) • Manual: • type checking / conversion • Error handling • schema migrations / instantiation
  17. Problems with monolithic queries • hard to read • hard

    to reason about • Hard to atomize • hard to test • Code reuse difficult • hard to maintain
  18. Example

  19. Solution? • Server side functions?

  20. NO! • DB revisions, versioning hell • For every code

    change new version • Schema migrations • Init functions on startup? • DB specific
  21. SOLUTION

  22. Why? • Object Relational Mapper • Support multiple database dialect

    • Client side error checking (before query execution) • Less code • Better Query Builder Language SQLAlchemy -> SQL assembler -> C
  23. But also • • 3rd party applications! • Schema management

    • Creation and Up/down grading • Tools for creating migration paths
  24. And last but not least • Backwards compatible! • Can

    still do raw queries
  25. example

  26. None
  27. None
  28. Why not earlier? • Unforeseen problem at project start •

    No MonetDB dialect for SQLAlchemy • (I made one)
  29. sqlalchemy-monetdb • pip install sqlalchemy-monetdb • https://github.com/gijzelaerr/sqlalchemy-monetdb • https://github.com/gijzelaerr/djonet •

    (banana)
  30. Possible Disadvantage A bit harder to see the actual SQL

    BUT SQLALchemy very similar to SQL
  31. Conclusion • SQLAlchemy is SQL powertool • I see no

    reason NOT to use SQLAlchemy • Will result in improved code readability and maintainability • (if properly used of course)
  32. Questions?