Transient detection with SQLAlchemy

Transient detection with SQLAlchemy

About how we use SQLAlchemy as a domain specific language to build maintainable scientific SQL queries.

978e79ad01185b39efcfca1482f0f819?s=128

Gijs Molenaar

May 09, 2014
Tweet

Transcript

  1. Transient Detection with SQLAlchemy Gijs Molenaar PyGrunn 2014

  2. Me • Gijs Molenaar • Scientific Software Engineer • UvA

    - Anton Pannekoek Instituut • SKA - Square Kilometre Array (South Africa) ! • http://pythonic.nl
  3. Lasting only for a short time; impermanent Transient

  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. Source Extraction • Position • Frequency • Time • Intensity

    (flux)
  11. Data rate • Data cubes • Multiple frequencies • 500

    - 5000 sources /s • Will grow in future
  12. Calculations • Associate in time • Associate in frequency •

    Transient Detection
  13. SQL • Storing, manipulating and retrieving data • De facto

    standard • Structured data • Relational data • Old language
  14. insert into id, ra, decl, errors, flux, shape, timestamp, frequency

    id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency id, ra, decl, errors, flux, shape, timestamp, frequency Table
  15. None
  16. None
  17. None
  18. None
  19. None
  20. A lot of data • So quite a lot of

    calculations • move calculations to data • Minimize communication
  21. SQL problems • Mix of language domains (python, SQL) •

    No SQL standard (various dialects) • Manual: • type checking / conversion • Error handling • schema migrations / instantiation
  22. monolithic queries • hard to read • hard to reason

    about • Hard to atomise • hard to test • Code reuse difficult • hard to maintain
  23. Example

  24. Server side functions solution?

  25. no! • DB revisions, versioning hell • For every code

    change new version • Schema migrations • DB specific
  26. SOLUTION

  27. Why? • Object Relational Mapper • Support multiple database dialect

    • Better Query Builder Language! • SQLAlchemy -> SQL C -> assembler
  28. but also • 3rd party applications • Schema management •

    profiling • Backwards compatible, raw queries
  29. None
  30. None
  31. Status • Migrating • Why now? • No MonetDB dialect

    for SQLAlchemy • (We made one)
  32. MonetDB SQLAlchemy • pip install sqlalchemy-monetdb • https://github.com/gijzelaerr/sqlalchemy-monetdb

  33. Conclusions • SQLAlchemy is SQL power tool • No reason

    NOT to use SQLAlchemy • improved code readability and maintainability • (if properly used of course)
  34. Questions?