Slide 1

Slide 1 text

Bless Transparently logging program outputs https://github.com/haozeke/bless

Slide 2

Slide 2 text

+ Prof. Dr. - Ing. Morris Riedel + Prof. Birgir Hrafnkelsson CECAM Workshop :: Electronic Structure Software Development: Advancing the Modular Paradigm

Slide 3

Slide 3 text

Reproducible Research (1) Community, T. T. W.; Arnold, B.; Bowler, L.; Gibson, S.; Herterich, P.; Higman, R.; Krystalli, A.; Morley, A.; O’Reilly, M.; Whitaker, K. The Turing Way: A Handbook for Reproducible Data Science, 2019. https://doi.org/10.5281/zenodo.3233986 (accessed 2020-09-20).

Slide 4

Slide 4 text

Workflow engines ● Snakemake ● Fireworks ● AiiDA (1) Pizzi, G.; Cepellotti, A.; Sabatini, R.; Marzari, N.; Kozinsky, B. AiiDA: Automated Interactive Infrastructure and Database for Computational Science. Computational Materials Science 2016, 111, 218–230. https://doi.org/10.1016/j.commatsci.2015.09.013.

Slide 5

Slide 5 text

Registering workers ● Plugin ecosystems are present ● Tasks are composable But rely on STABLE underlying codes! (1) Rüßmann, P.; Bertoldo, F.; Blügel, S. The AiiDA-KKR Plugin and Its Application to High-Throughput Impurity Embedding into a Topological Insulator. npj Comput Mater 2021, 7 (1), 1–9. https://doi.org/10.10 38/s41524-020-004 82-5.

Slide 6

Slide 6 text

What about Development?

Slide 7

Slide 7 text

Workflow limitations ● Development code is typically iterated over very quickly ○ No stable API ● Debugging via stdout is common ● Logging is unreliable ● Outputs can be variable ● Timing information may / may not be provided

Slide 8

Slide 8 text

Life without bless ● 1 is stdout ● 2 is stderr blah > foo.out Pipe into tee to view while writing ./a.out > i_swear_this_is_an_important_labeled_run …

Slide 9

Slide 9 text

Desiderata ● Better logging ● Labeling runs ● Unified output ● Compression? ● File (.gz) output ● MongoDB output ● Labels and UUID bless Feature

Slide 10

Slide 10 text

File logging ● Additional information ○ trace! ○ UUID, Label, Time ● Time stamped stdout ○ info! ● bless errors ○ error! ● Time stamped stderr ○ warn! bless --label "orca_nopar" -- $(which orca) orca.inp # Generates export FNAME='orca_nopar_cd523354-35f5-4d1d-a627-082139d243 ef.log.gz' zcat $FNAME | head -n5 [2024-07-17T12:57:20Z TRACE] Label: orca_nopar UUID: cd523354-35f5-4d1d-a627-082139d243ef zcat $FNAME | tail -n 3 [2024-07-17T12:57:20Z INFO] ****ORCA TERMINATED NORMALLY**** [2024-07-17T12:57:20Z INFO] TOTAL RUN TIME: 0 days 0 hours 0 minutes 0 seconds 197 msec [2024-07-17T12:57:20Z TRACE] orca orca.inp took 206ms 450us 233ns to complete.

Slide 11

Slide 11 text

MongoDB Backend ● No additional metadata in the .gz ● Binary stores warn! and info! #[derive(Serialize, Deserialize, Clone, Debug)] pub struct SaveGzipBlobParams<'a> { pub cmd: &'a str, pub args: &'a str, pub label: &'a str, pub duration: &'a str, pub uuid: &'a str, pub file_path: &'a Path, #[serde(with = "bson_datetime_as_rfc3339_string")] pub start_time: DateTime, #[serde(with = "bson_datetime_as_rfc3339_string")] pub end_time: DateTime, }

Slide 12

Slide 12 text

Using mongo { _id: ObjectId('66975007f17cad7b36c2430c'), cmd: '/local/orca', args: 'orca.inp', label: 'orca_nopar_mongod', run_uuid: 'c58d557d-9783-4c48-8f5d-f1c56508730f', duration: '185ms 385us 646ns', start_time: ISODate('2024-07-17T05:00:55.231Z'), end_time: ISODate('2024-07-17T05:00:55.417Z') }, npx mongsh use local # Show all entries db.commands.find() # Suppress blob data db.commands.find({}, { gzip_blob: 0 }) python scripts/get_db_gzip.py --db-name local --collection-name commands --query-field args --query-value orca.inp

Slide 13

Slide 13 text

Timing Commands let stdout_handle = std::thread::spawn(move || { for line in stdout_reader.lines() { match line { Ok(line) => info!("{}", line), Err(e) => error!("Error reading stdout: {}", e), } } }); let start_time = std::time::SystemTime::now(); if let Err(e) = run_command(command, args).await { error!("Failed to run command: {} {}", command, args.join(" ")); error!("Error: {}", e); } let end_time = std::time::SystemTime::now(); Process timer Time-stamping stdout

Slide 14

Slide 14 text

Benchmarks Command Mean [ms] Min [ms] Max [ms] Relative MONGO run 203.7 ± 10.3 187.3 227.8 1.08 ± 0.07 orca orca.inp > b 188.7 ± 8.2 179.8 205.5 1.00 FileWriter run 193.0 ± 8.6 180.4 204.7 1.02 ± 0.06 MONGODB_URI="mongodb://localhost:27017/" bless --label "orca_nopar_mongod" --use-mongodb -- $(which orca) orca.inp bless --label "orca_nopar" -- $(which orca) orca.inp

Slide 15

Slide 15 text

Takeaways ● Near zero overhead ○ Better development logging ● Uses less space ● Adds context ● Provenance, beyond git https://github.com/haozeke/bless Contributors welcome!! v0.1.0

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Thank you!! Questions?