Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQLALchemy in the TRAnsient detection Pipeline

Gijs Molenaar
January 09, 2014

SQLALchemy in the TRAnsient detection Pipeline

About why I think using SQLAlchemy and not only SQL in a big scientific code base is a good idea.

Gijs Molenaar

January 09, 2014
Tweet

More Decks by Gijs Molenaar

Other Decks in Science

Transcript

  1. Images 10 bands (image cube) • 1 image cube per

    second • In future 10 images per second • 20 to 50 sources per image
  2. Data • Source (unique id) • Position (ra & decl)

    • Error (ra & decl) • Flux • Shape (2d gaussian) • Timestamp • Frequency (band)
  3. The data Unique id, ra, decl, errors, flux, shape, timestamp,

    frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency Unique id, ra, decl, errors, flux, shape, timestamp, frequency table
  4. Quite a lot of data • So quite a lot

    of calculations • move calculations to data (base) • Minimal communication
  5. Problems with SQL • Mix of language domains (python, SQL)

    • No SQL standard (various dialects) • Manual: • type checking / conversion • Error handling • schema migrations / instantiation
  6. Problems with monolithic queries • hard to read • hard

    to reason about • Hard to atomize • hard to test • Code reuse difficult • hard to maintain
  7. NO! • DB revisions, versioning hell • For every code

    change new version • Schema migrations • Init functions on startup? • DB specific
  8. Why? • Object Relational Mapper • Support multiple database dialect

    • Client side error checking (before query execution) • Less code • Better Query Builder Language SQLAlchemy -> SQL assembler -> C
  9. But also • • 3rd party applications! • Schema management

    • Creation and Up/down grading • Tools for creating migration paths
  10. Why not earlier? • Unforeseen problem at project start •

    No MonetDB dialect for SQLAlchemy • (I made one)
  11. Conclusion • SQLAlchemy is SQL powertool • I see no

    reason NOT to use SQLAlchemy • Will result in improved code readability and maintainability • (if properly used of course)