Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transient Detection and SQL

transientskp
January 09, 2014

Transient Detection and SQL

Gijs Molenaar

transientskp

January 09, 2014
Tweet

More Decks by transientskp

Other Decks in Science

Transcript

  1. Transient Detection
    and SQL
    LOFAR Transients Key Science Project meeting
    January 9 2014
    Gijs Molenaar
    @gijzelaerr
    [email protected]

    View full-size slide

  2. TRAP
    • TRAnsient Detection Pipeline
    • Detect Transients in
    AARTFAAC data
    • 24/7 full sky monitor

    View full-size slide

  3. Images
    10 bands (image cube)
    • 1 image cube per second
    • In future 10 images per second
    • 20 to 50 sources per image

    View full-size slide

  4. Data
    • Source (unique id)
    • Position (ra & decl)
    • Error (ra & decl)
    • Flux
    • Shape (2d gaussian)
    • Timestamp
    • Frequency (band)

    View full-size slide

  5. Data rate
    • 500 - 5000 inserts per second
    • May get more in future

    View full-size slide

  6. Calculations
    • Associate in frequency
    • Associate in time
    • Transients detection

    View full-size slide

  7. SQL
    Storing, manipulating and retrieving data
    De facto standard
    Structured data
    Relational data
    Old language

    View full-size slide

  8. The data
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    Unique id, ra, decl, errors, flux, shape, timestamp, frequency
    table

    View full-size slide

  9. Quite a lot of data
    • So quite a lot of calculations
    • move calculations to data (base)
    • Minimal communication

    View full-size slide

  10. Problems with SQL

    Mix of language domains (python, SQL)

    No SQL standard (various dialects)

    Manual:

    type checking / conversion

    Error handling

    schema migrations / instantiation

    View full-size slide

  11. Problems with
    monolithic queries
    • hard to read
    • hard to reason about
    • Hard to atomize
    • hard to test
    • Code reuse difficult
    • hard to maintain

    View full-size slide

  12. Solution?
    • Server side functions?

    View full-size slide

  13. NO!
    • DB revisions, versioning hell
    • For every code change new version
    • Schema migrations
    • Init functions on startup?
    • DB specific

    View full-size slide

  14. Why?
    • Object Relational Mapper
    • Support multiple database dialect
    • Client side error checking (before query execution)
    • Less code
    • Better Query Builder Language
    SQLAlchemy -> SQL assembler -> C

    View full-size slide

  15. But also


    3rd party applications!

    Schema management

    Creation and Up/down grading

    Tools for creating migration paths

    View full-size slide

  16. And last but not least

    Backwards compatible!

    Can still do raw queries

    View full-size slide

  17. Why not earlier?
    • Unforeseen problem at
    project start
    • No MonetDB dialect for
    SQLAlchemy
    • (I made one)

    View full-size slide

  18. sqlalchemy-monetdb
    • pip install sqlalchemy-monetdb
    • https://github.com/gijzelaerr/sqlalchemy-monetdb
    • https://github.com/gijzelaerr/djonet
    • (banana)

    View full-size slide

  19. Possible Disadvantage
    A bit harder to see the actual SQL
    BUT SQLALchemy very similar to SQL

    View full-size slide

  20. Conclusion

    SQLAlchemy is SQL powertool

    I see no reason NOT to use SQLAlchemy

    Will result in improved code readability and
    maintainability

    (if properly used of course)

    View full-size slide