Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transient Detection and SQL

transientskp
January 09, 2014

Transient Detection and SQL

Gijs Molenaar

transientskp

January 09, 2014
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Transient Detection
    and SQL
    LOFAR Transients Key Science Project meeting
    January 9 2014
    Gijs Molenaar
    @gijzelaerr
    [email protected]

    View Slide

  2. TRAP
    • TRAnsient Detection Pipeline
    • Detect Transients in
    AARTFAAC data
    • 24/7 full sky monitor

    View Slide

  3. Images
    10 bands (image cube)
    • 1 image cube per second
    • In future 10 images per second
    • 20 to 50 sources per image

    View Slide

  4. Data
    • Source (unique id)
    • Position (ra & decl)
    • Error (ra & decl)
    • Flux
    • Shape (2d gaussian)
    • Timestamp
    • Frequency (band)

    View Slide

  5. Data rate
    • 500 - 5000 inserts per second
    • May get more in future

    View Slide

  6. Calculations
    • Associate in frequency
    • Associate in time
    • Transients detection

    View Slide

  7. SQL
    Storing, manipulating and retrieving data
    De facto standard
    Structured data
    Relational data
    Old language

    View Slide

  8. The data
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    table

    View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. Quite a lot of data
    • So quite a lot of calculations
    • move calculations to data (base)
    • Minimal communication

    View Slide

  16. Problems with SQL

    Mix of language domains (python, SQL)

    No SQL standard (various dialects)

    Manual:

    type checking / conversion

    Error handling

    schema migrations / instantiation

    View Slide

  17. Problems with
    monolithic queries
    • hard to read
    • hard to reason about
    • Hard to atomize
    • hard to test
    • Code reuse difficult
    • hard to maintain

    View Slide

  18. Example

    View Slide

  19. Solution?
    • Server side functions?

    View Slide

  20. NO!
    • DB revisions, versioning hell
    • For every code change new version
    • Schema migrations
    • Init functions on startup?
    • DB specific

    View Slide

  21. SOLUTION

    View Slide

  22. Why?
    • Object Relational Mapper
    • Support multiple database dialect
    • Client side error checking (before query execution)
    • Less code
    • Better Query Builder Language
    SQLAlchemy -> SQL assembler -> C

    View Slide

  23. But also


    3rd party applications!

    Schema management

    Creation and Up/down grading

    Tools for creating migration paths

    View Slide

  24. And last but not least

    Backwards compatible!

    Can still do raw queries

    View Slide

  25. example

    View Slide

  26. View Slide

  27. View Slide

  28. Why not earlier?
    • Unforeseen problem at
    project start
    • No MonetDB dialect for
    SQLAlchemy
    • (I made one)

    View Slide

  29. sqlalchemy-monetdb
    • pip install sqlalchemy-monetdb
    • https://github.com/gijzelaerr/sqlalchemy-monetdb
    • https://github.com/gijzelaerr/djonet
    • (banana)

    View Slide

  30. Possible Disadvantage
    A bit harder to see the actual SQL
    BUT SQLALchemy very similar to SQL

    View Slide

  31. Conclusion

    SQLAlchemy is SQL powertool

    I see no reason NOT to use SQLAlchemy

    Will result in improved code readability and
    maintainability

    (if properly used of course)

    View Slide

  32. Questions?

    View Slide