My Research Environmental Setup

My Research Environmental Setup Makefile, Tools, Tips, … ˍ Hiroyuki
Deguchi Á [email protected] ʊ 2022/04/28: Manage Your Experiments 2022

Typical Experiments ▪ Setup ▪ Preprocess the dataset ▪ Implement
models ▪ Run experiments ▪ Evaluate 1

Setup My Python Project Environment pyenv Python version manager poetry
Python project manager ▪ Package version manager ▪ Virtual environment wrapper ▪ Easy to create package command: replace `entry_points` of `setup.py` ▪ Easy to publish the package: you can publish packages to PyPi by `poetry publish` git Version control system make Usually it is used to compile, while I use it as a task runner 2

Preprocess the dataset Important process to ensure reproducibility ▪ Write
shell scripts instead of typing commands directly. ▪ Pay attention to tool version, OS, environment variables, etc. e.g., If you use `mosestokenizer.perl`, you should check the following: • LANG=C or LANG=en_US.UTF-8 ? • `perl` or `perl -C` (unicode switch) ? • the RELEASE-4.0 version or the HEAD of master branch? • whether to use `-a` (aggresive hyphen split) option or not. ▪ The above misc problems may be avoided by writing shell scripts. 3

Coding Tips for Shell Script Be aware the following points:
▪ Use bash instead of sh. (Avoid to use `/bin/sh`.) ▪ Environment variables (`export A="aa"`) and shell variables (`A="aa"`) are different. ▪ Note that GNU and BSD (including macOS) commands are different. ▪ Use `set -eu` to stop the script when unexpected errors are occured. ▪ Use the bash built-in commands, especially `[[ ]]` instead of `[ ]`. • [ a.out = $FNAME ] returns error when $FNAME is empty. ▪ Don’t hard-code absolute path. • Maybe `pwd`, `$HOME`, `$0` are useful. 4

Implement models Coding Tips ▪ Reading official references and writing
docs (docstring) and unit tests would help you. ▪ I use `rsync` for small debugging on remote servers.  https://github.com/de9uch1/git-rsync ▪ I use `ptpython`, a rich Python REPL. ▪ Keep it simple. • The UNIX philosophy ¹, originated by Ken Thompson and Dennis Ritchie. ¹https://en.wikipedia.org/wiki/Unix_philosophy 5

Run Experiments: Makefile as a task runner Pros ▪ It
can define the dependency between files and procedures. ▪ If the target file is already created, the task will not be run. • It can also compare the file timestamp. ▪ (No additional requirements.) Cons ▪ The syntax is too bad complicated and hacky. ▪ If there is a modern task runner that satisfies my functional requirements, I’d like to switch… 6

Run Experiments: Makefile as a task runner Makefile can define
the dependency between files and procedures. ▪ Run a task defined as PHONY target with specified options: • make BEAM_SIZE=5 generate 1 $(RESULT_FILE): $(NMT_MODEL) # define the dependency by `target: source`. 2 mkdir -p $(@D)/ # $(@D) means the target directory. 3 fairseq-generate \ 4 --path $< \ # $< means the source file. 5 --beam $(BEAM_SIZE) --lenpen 1.0 \ 6 $(DATA_BIN_DIR) \ 7 > $@ # $@ means the target file. 8 9 .PHONY: generate # define as a dummy target 10 generate: $(RESULT_FILE) 7

Run Experiments: Makefile x Help make help It looks like
a magic spell… 😟 1 .DEFAULT: help 2 .PHONY: help generate 3 help: 4 @echo "Usage: make <task> [OPTION1=VAR1] [...]" 5 @echo -e "\nAvailable tasks:" 6 @grep -hE '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-18s\033[0m %s\n", $$1, $$2}' → 7 @echo -e "\nOptions:" 8 @grep -A1 -hE '^##.+' $(MAKEFILE_LIST) | grep -v '^--' | perl -pe 's/^##\s*(.+)\n/\1##\t/g' | awk 'BEGIN {FS = "(:?=[ \t]*|##\t)"}; {printf " \033[36m%-18s\033[0m %s\n%-20s (default: %s)\n", $$2, $$1," ",$$3}' → → 9 10 generate: $(RESULT_FILE) ## Generate the translation. 8

Run Experiments: make-runner  https://github.com/de9uch1/make-runner ▪ Easy-to-use: Show the help
message described by `##`-prefixed comments. ▪ GNU-like long option: Options can be passed by `--abc xyz` instead of `ABC=xyz`. ▪ No additional requirements: No additional packages are required, and also standalone `make` is executable. 9

Run Experiments: GNU Parallel Multiple Parallel Execution with GNU Parallel
▪ Multiple models: % parallel mrun evaluate --model {} ::: transformer_base transformer_big ▪ Parameter tuning: % parallel mrun evaluate --model transformer_base --dropout 0.{} ::: $(seq 0 5) 10

Evaluate: Quantitative & Qualitative Quantitative Implement a scorer, build a
pipeline, and `make evaluate`. Qualitative Implement the analyzer scripts. 11

Appendix: Environmental Setup ▪ Shell: fish ▪ Shell Prompt: Starship
(Rust) ▪ Editor: Emacs ▪ Terminal: Alacritty (Rust) + Tmux ▪ Fuzzy Finder: fzf (Go) • History search • Directory/File search • Repository search (& `cd` alias) ▪ grep replacement: ripgrep (Rust) ▪ cat replacement: bat (Rust) ▪ diff replacement: delta (Rust) 12

My Research Environmental Setup

My Research Environmental Setup

Hiroyuki Deguchi

More Decks by Hiroyuki Deguchi

Other Decks in Research

Featured

Transcript

My Research Environmental Setup Makefile, Tools, Tips, … ˍ Hiroyuki

Typical Experiments ▪ Setup ▪ Preprocess the dataset ▪ Implement

Setup My Python Project Environment pyenv Python version manager poetry

Preprocess the dataset Important process to ensure reproducibility ▪ Write

Coding Tips for Shell Script Be aware the following points:

Implement models Coding Tips ▪ Reading official references and writing

Run Experiments: Makefile as a task runner Pros ▪ It

Run Experiments: Makefile as a task runner Makefile can define

Run Experiments: Makefile x Help make help It looks like

Run Experiments: make-runner  https://github.com/de9uch1/make-runner ▪ Easy-to-use: Show the help

Run Experiments: GNU Parallel Multiple Parallel Execution with GNU Parallel

Evaluate: Quantitative & Qualitative Quantitative Implement a scorer, build a

Appendix: Environmental Setup ▪ Shell: fish ▪ Shell Prompt: Starship