Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Overview of [some of] the PS1 Image Processing Pipeline Infrastructure

Overview of [some of] the PS1 Image Processing Pipeline Infrastructure

Presented at Science Data Processing - Pipelines III

Joshua Hoblitt

December 03, 2012
Tweet

More Decks by Joshua Hoblitt

Other Decks in Programming

Transcript

  1. Outline • Software / hardware / data scale • Data

    management • Limited workflow discussion
  2. How the IPP fits in to the overall PS1 data

    flow camera OATS IPP telescope MOPS (science pipelines) PSPS
  3. Listing of IPP packages • lipp • magic • megacamTools

    • Nebulous • Nebulous-Server • Ohana • pedestal • ppArith • ppBackground • ppbgrestore • ppConfigDump • ppImage • ppMerge • ppNoiseMap • ppNorm • ppSim • ppSkycell • ppSmooth • ppStack • ppStats • ppSub • ppTranslate • ppViz • psastro • psconfig • psdemo • PS-IPP-Config • PS-IPP-Metadata-Config • PS-IPP-MetaDB • PS-IPP-PStamp • psLib • psModules • psphot • pstamp • pstest • psvideophot • pswarp • simtest • console • DataChallenge • DataStore • DataStoreServer • dbconfig • dvodist • dvoTools • glueforge • gpc1_test_suite • icd-demo • ippconfig • ippData • ippdb • ippdor • ippMonitor • ippScripts • ippTasks • ippTests • ippTools • ippToPsps
  4. ~size of the IPP code base Totals grouped by language

    (dominant language first): ansic: 598226 (83.18%) perl: 89535 (12.45%) python: 9285 (1.29%) sh: 9272 (1.29%) php: 6985 (0.97%) csh: 3358 (0.47%) fortran: 1303 (0.18%) asm: 786 (0.11%) tcl: 391 (0.05%) lisp: 42 (0.01%) Total Physical Source Lines of Code (SLOC) = 719,183 Does not count DVO, mana, panTasks, and some scripts `sloccount` as of 2012-11-30 (r34754)
  5. ~size of DECam CP for context Totals grouped by language

    (dominant language first): perl: 54434 (34.52%) ansic: 53046 (33.64%) cpp: 46962 (29.79%) python: 1535 (0.97%) sh: 1473 (0.93%) fortran: 158 (0.10%) csh: 58 (0.04%) Total Physical Source Lines of Code (SLOC) = 157,666 Does not include astromatic modifications `sloccount` of deccp 2.1.1
  6. Hardware Environment @ MHPCC • 62 x 5U compute +

    storage nodes • 10 x compute + 4U DAS nodes • 66? x 1U compute nodes • 4 x DB nodes • 4 x 8U/74 disk systems (off site) • total raw storage = 3.6PB • 147 nodes / 1492 cores Stats as of 2012-11-30 courtesy of Eugene Magnier
  7. Data footprint • GPC1 “workflow” DB – 541,848 raw exposures,

    including 333,095 science exposures – 709,171 processed exposures (we have processed all science exposures 1x, plus nearly completed a full re-processing, plus additional re-processings) • DVO 3pi survey DB – 27 billion measurements of 1.5 billion astronomical objects – ~10TB • Storage – Nebulous is tracking 1,357,067,950 “instances” – Nebulous DB > 1.35TB – current usage = 3.2PB (raw + results + short-term outputs) Stats as of 2012-11-30 courtesy of Eugene Magnier
  8. Data Management: Nebulous • design constraints – driven by workflow

    – Files must be accessible as a local path (cfitsio does not work with FDs) – Support seeking on remote files without copying the complete file locally – Data replication – Scales [more or less] linearly with cluster size – Synchronous state between all clients – C / Perl clients • Paper evaluated dozens of systems – Predates iRODS
  9. Nebulous Data Model storage object /foo/file.fits instance instance /<volume mnt

    path>/<hashing scheme>/<encoded instance + file name> ...
  10. Nebulous Architecture • Central server w/ more or less atomic

    operations • All storage nodes + server need to have a consistent view of target storage volumes (NFS cross mounts w/ automounter) • Performance trumps safety – No mandatory locking / no permissions – Clients are trusted implicitly – Volume rebalancing / etc. Is a batch operation • As POSIX like as is possible, eg. xattrs
  11. Nebulous Server • Does not touch data (except for new

    file creation) • Implemented as Perl modules • Separate daemon process monitors storage volume usage / status • MySQL/innodb backend • Production system runs under Apache/mod_perl w/ SOAP RPC adaption layer • Support for memcache / sharding implemented but not in use
  12. Nebulous Clients • All regular interaction with server via SOAP

    • POSIXish API • Responsible for handling storage object instance replication management – File modifications are tricky • POSIXish CLI utils: neb-ls, neb-df, neb-rm, etc. • IPP packages configurable for either local files or Nebulous
  13. Workflow: panTasks • Top level task manager is panTasks –

    DSL derived from CFHT Ohana suite – Tasks poll on time intervals for pending work – Does not directly maintain state – No end-to-end concept of a processing run – Does not maintain state itself – Runs as a regular users – Runs jobs via persistent ssh connections to configured nodes – There was significant design time concern about deadlocked processing
  14. panTasks example task receive.fileset.load host local periods -poll $LOADPOLL periods

    -exec $LOADEXEC periods -timeout 30 npending 1 stdout NULL stderr $LOGDIR/receive.fileset.log task.exec $run = receivetool -pendingfileset if ($DB:n == 0) option DEFAULT else # save the DB name for the exit tasks option $DB:$receive_DB $run = $run -dbname $DB:$receive_DB $receive_DB ++ if ($receive_DB >= $DB:n) set receive_DB = 0 end add_poll_args run command $run End ...
  15. Workflow: ippTools/ippTasks • State management is done via package of

    CLI utilities called ippTools. – All state is managed in a SQL DB via wrapper APIs which is queried / updated via CLI utilities • Providence / versioning of every step is persevered • Workflow is static, ie, specific to IPP processing steps – Some steps are optional, others are not triggered by panTasks • Build on top of IPP db management utilities – C does not have a good perl DBI analog – psDB* – glueforge / ippDB
  16. Locality Optimization • Data processing takes place on the storage

    node that holds the data locally when possible • Chips in the focal plane have when possible affinity to a specific storage node • Data is transferred from summit directly to the target storage node; required as part of the transfer parallelization scheme
  17. Enclosure to IPP data transfers GRASP GRASP GRASP GRASP 16x

    DataStore DataStore Pixel Host Pixel Host Pixel Host Pixel Host x8 DataStore 1000baseSX 1000baseT 100baseTX 1Gbit/s PS1 MHPCC
  18. A few lessons learned • “this pthreads stuff is easy”

    – Threading is a solution to latency issues – OpenMP in gcc 4.3+, Intel TBB, etc. • You will save all the bits • Simulated data != real data • “Data Challenges” are important • Estimating CPU is difficult • Hardware is cheaper than debugging memory errors • Post-commissioning support requires more software FTEs than development • DSL use needs to be carefully considered • Tightly coupling workflow, storage, cluster design limits reuse but reduces software effort • Beware of over specification