Overview of [some of] the PS1 Image Processing Pipeline Infrastructure

Overview of [some of] the PS1 Image Processing Pipeline Infrastructure
Joshua Hoblitt

Outline • Software / hardware / data scale • Data
management • Limited workflow discussion

How the IPP fits in to the overall PS1 data
flow camera OATS IPP telescope MOPS (science pipelines) PSPS

Listing of IPP packages • lipp • magic • megacamTools
• Nebulous • Nebulous-Server • Ohana • pedestal • ppArith • ppBackground • ppbgrestore • ppConfigDump • ppImage • ppMerge • ppNoiseMap • ppNorm • ppSim • ppSkycell • ppSmooth • ppStack • ppStats • ppSub • ppTranslate • ppViz • psastro • psconfig • psdemo • PS-IPP-Config • PS-IPP-Metadata-Config • PS-IPP-MetaDB • PS-IPP-PStamp • psLib • psModules • psphot • pstamp • pstest • psvideophot • pswarp • simtest • console • DataChallenge • DataStore • DataStoreServer • dbconfig • dvodist • dvoTools • glueforge • gpc1_test_suite • icd-demo • ippconfig • ippData • ippdb • ippdor • ippMonitor • ippScripts • ippTasks • ippTests • ippTools • ippToPsps

~size of the IPP code base Totals grouped by language
(dominant language first): ansic: 598226 (83.18%) perl: 89535 (12.45%) python: 9285 (1.29%) sh: 9272 (1.29%) php: 6985 (0.97%) csh: 3358 (0.47%) fortran: 1303 (0.18%) asm: 786 (0.11%) tcl: 391 (0.05%) lisp: 42 (0.01%) Total Physical Source Lines of Code (SLOC) = 719,183 Does not count DVO, mana, panTasks, and some scripts `sloccount` as of 2012-11-30 (r34754)

~size of DECam CP for context Totals grouped by language
(dominant language first): perl: 54434 (34.52%) ansic: 53046 (33.64%) cpp: 46962 (29.79%) python: 1535 (0.97%) sh: 1473 (0.93%) fortran: 158 (0.10%) csh: 58 (0.04%) Total Physical Source Lines of Code (SLOC) = 157,666 Does not include astromatic modifications `sloccount` of deccp 2.1.1

Hardware Environment @ MHPCC • 62 x 5U compute +
storage nodes • 10 x compute + 4U DAS nodes • 66? x 1U compute nodes • 4 x DB nodes • 4 x 8U/74 disk systems (off site) • total raw storage = 3.6PB • 147 nodes / 1492 cores Stats as of 2012-11-30 courtesy of Eugene Magnier

Data footprint • GPC1 “workflow” DB – 541,848 raw exposures,
including 333,095 science exposures – 709,171 processed exposures (we have processed all science exposures 1x, plus nearly completed a full re-processing, plus additional re-processings) • DVO 3pi survey DB – 27 billion measurements of 1.5 billion astronomical objects – ~10TB • Storage – Nebulous is tracking 1,357,067,950 “instances” – Nebulous DB > 1.35TB – current usage = 3.2PB (raw + results + short-term outputs) Stats as of 2012-11-30 courtesy of Eugene Magnier

Data Management: Nebulous • design constraints – driven by workflow
– Files must be accessible as a local path (cfitsio does not work with FDs) – Support seeking on remote files without copying the complete file locally – Data replication – Scales [more or less] linearly with cluster size – Synchronous state between all clients – C / Perl clients • Paper evaluated dozens of systems – Predates iRODS

Nebulous Data Model storage object /foo/file.fits instance instance /<volume mnt
path>/<hashing scheme>/<encoded instance + file name> ...

Nebulous Architecture • Central server w/ more or less atomic
operations • All storage nodes + server need to have a consistent view of target storage volumes (NFS cross mounts w/ automounter) • Performance trumps safety – No mandatory locking / no permissions – Clients are trusted implicitly – Volume rebalancing / etc. Is a batch operation • As POSIX like as is possible, eg. xattrs

Nebulous Server • Does not touch data (except for new
file creation) • Implemented as Perl modules • Separate daemon process monitors storage volume usage / status • MySQL/innodb backend • Production system runs under Apache/mod_perl w/ SOAP RPC adaption layer • Support for memcache / sharding implemented but not in use

Nebulous Clients • All regular interaction with server via SOAP
• POSIXish API • Responsible for handling storage object instance replication management – File modifications are tricky • POSIXish CLI utils: neb-ls, neb-df, neb-rm, etc. • IPP packages configurable for either local files or Nebulous

Workflow: panTasks • Top level task manager is panTasks –
DSL derived from CFHT Ohana suite – Tasks poll on time intervals for pending work – Does not directly maintain state – No end-to-end concept of a processing run – Does not maintain state itself – Runs as a regular users – Runs jobs via persistent ssh connections to configured nodes – There was significant design time concern about deadlocked processing

panTasks example task receive.fileset.load host local periods -poll $LOADPOLL periods
-exec $LOADEXEC periods -timeout 30 npending 1 stdout NULL stderr $LOGDIR/receive.fileset.log task.exec $run = receivetool -pendingfileset if ($DB:n == 0) option DEFAULT else # save the DB name for the exit tasks option $DB:$receive_DB $run = $run -dbname $DB:$receive_DB $receive_DB ++ if ($receive_DB >= $DB:n) set receive_DB = 0 end add_poll_args run command $run End ...

Workflow: ippTools/ippTasks • State management is done via package of
CLI utilities called ippTools. – All state is managed in a SQL DB via wrapper APIs which is queried / updated via CLI utilities • Providence / versioning of every step is persevered • Workflow is static, ie, specific to IPP processing steps – Some steps are optional, others are not triggered by panTasks • Build on top of IPP db management utilities – C does not have a good perl DBI analog – psDB* – glueforge / ippDB

Locality Optimization • Data processing takes place on the storage
node that holds the data locally when possible • Chips in the focal plane have when possible affinity to a specific storage node • Data is transferred from summit directly to the target storage node; required as part of the transfer parallelization scheme

Enclosure to IPP data transfers GRASP GRASP GRASP GRASP 16x
DataStore DataStore Pixel Host Pixel Host Pixel Host Pixel Host x8 DataStore 1000baseSX 1000baseT 100baseTX 1Gbit/s PS1 MHPCC

A few lessons learned • “this pthreads stuff is easy”
– Threading is a solution to latency issues – OpenMP in gcc 4.3+, Intel TBB, etc. • You will save all the bits • Simulated data != real data • “Data Challenges” are important • Estimating CPU is difficult • Hardware is cheaper than debugging memory errors • Post-commissioning support requires more software FTEs than development • DSL use needs to be carefully considered • Tightly coupling workflow, storage, cluster design limits reuse but reduces software effort • Beware of over specification

Overview of [some of] the PS1 Image Processing ...

Overview of [some of] the PS1 Image Processing Pipeline Infrastructure

Joshua Hoblitt

More Decks by Joshua Hoblitt

Other Decks in Programming

Featured

Transcript

Overview of [some of] the PS1 Image Processing Pipeline Infrastructure

Outline • Software / hardware / data scale • Data

How the IPP fits in to the overall PS1 data

Listing of IPP packages • lipp • magic • megacamTools

~size of the IPP code base Totals grouped by language

~size of DECam CP for context Totals grouped by language

Hardware Environment @ MHPCC • 62 x 5U compute +

Data footprint • GPC1 “workflow” DB – 541,848 raw exposures,

Data Management: Nebulous • design constraints – driven by workflow

Nebulous Data Model storage object /foo/file.fits instance instance /<volume mnt

Nebulous Architecture • Central server w/ more or less atomic

Nebulous Server • Does not touch data (except for new

Nebulous Clients • All regular interaction with server via SOAP

Workflow: panTasks • Top level task manager is panTasks –

panTasks example task receive.fileset.load host local periods -poll $LOADPOLL periods

Workflow: ippTools/ippTasks • State management is done via package of

Locality Optimization • Data processing takes place on the storage

Enclosure to IPP data transfers GRASP GRASP GRASP GRASP 16x

A few lessons learned • “this pthreads stuff is easy”