Upgrade to Pro — share decks privately, control downloads, hide ads and more …

My Research Environmental Setup

My Research Environmental Setup

B275e1bd14d1c75a1999497f225dacfa?s=128

Hiroyuki Deguchi

April 28, 2022
Tweet

More Decks by Hiroyuki Deguchi

Other Decks in Research

Transcript

  1. My Research Environmental Setup Makefile, Tools, Tips, … ˍ Hiroyuki

    Deguchi Á deguchi.hiroyuki.db0@is.naist.jp ʊ 2022/04/28: Manage Your Experiments 2022
  2. Typical Experiments ▪ Setup ▪ Preprocess the dataset ▪ Implement

    models ▪ Run experiments ▪ Evaluate 1
  3. Setup My Python Project Environment pyenv Python version manager poetry

    Python project manager ▪ Package version manager ▪ Virtual environment wrapper ▪ Easy to create package command: replace `entry_points` of `setup.py` ▪ Easy to publish the package: you can publish packages to PyPi by `poetry publish` git Version control system make Usually it is used to compile, while I use it as a task runner 2
  4. Preprocess the dataset Important process to ensure reproducibility ▪ Write

    shell scripts instead of typing commands directly. ▪ Pay attention to tool version, OS, environment variables, etc. e.g., If you use `mosestokenizer.perl`, you should check the following: • LANG=C or LANG=en_US.UTF-8 ? • `perl` or `perl -C` (unicode switch) ? • the RELEASE-4.0 version or the HEAD of master branch? • whether to use `-a` (aggresive hyphen split) option or not. ▪ The above misc problems may be avoided by writing shell scripts. 3
  5. Coding Tips for Shell Script Be aware the following points:

    ▪ Use bash instead of sh. (Avoid to use `/bin/sh`.) ▪ Environment variables (`export A="aa"`) and shell variables (`A="aa"`) are different. ▪ Note that GNU and BSD (including macOS) commands are different. ▪ Use `set -eu` to stop the script when unexpected errors are occured. ▪ Use the bash built-in commands, especially `[[ ]]` instead of `[ ]`. • [ a.out = $FNAME ] returns error when $FNAME is empty. ▪ Don’t hard-code absolute path. • Maybe `pwd`, `$HOME`, `$0` are useful. 4
  6. Implement models Coding Tips ▪ Reading official references and writing

    docs (docstring) and unit tests would help you. ▪ I use `rsync` for small debugging on remote servers.  https://github.com/de9uch1/git-rsync ▪ I use `ptpython`, a rich Python REPL. ▪ Keep it simple. • The UNIX philosophy ¹, originated by Ken Thompson and Dennis Ritchie. ¹https://en.wikipedia.org/wiki/Unix_philosophy 5
  7. Run Experiments: Makefile as a task runner Pros ▪ It

    can define the dependency between files and procedures. ▪ If the target file is already created, the task will not be run. • It can also compare the file timestamp. ▪ (No additional requirements.) Cons ▪ The syntax is too bad complicated and hacky. ▪ If there is a modern task runner that satisfies my functional requirements, I’d like to switch… 6
  8. Run Experiments: Makefile as a task runner Makefile can define

    the dependency between files and procedures. ▪ Run a task defined as PHONY target with specified options: • make BEAM_SIZE=5 generate 1 $(RESULT_FILE): $(NMT_MODEL) # define the dependency by `target: source`. 2 mkdir -p $(@D)/ # $(@D) means the target directory. 3 fairseq-generate \ 4 --path $< \ # $< means the source file. 5 --beam $(BEAM_SIZE) --lenpen 1.0 \ 6 $(DATA_BIN_DIR) \ 7 > $@ # $@ means the target file. 8 9 .PHONY: generate # define as a dummy target 10 generate: $(RESULT_FILE) 7
  9. Run Experiments: Makefile x Help make help It looks like

    a magic spell… 😟 1 .DEFAULT: help 2 .PHONY: help generate 3 help: 4 @echo "Usage: make <task> [OPTION1=VAR1] [...]" 5 @echo -e "\nAvailable tasks:" 6 @grep -hE '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-18s\033[0m %s\n", $$1, $$2}' → 7 @echo -e "\nOptions:" 8 @grep -A1 -hE '^##.+' $(MAKEFILE_LIST) | grep -v '^--' | perl -pe 's/^##\s*(.+)\n/\1##\t/g' | awk 'BEGIN {FS = "(:?=[ \t]*|##\t)"}; {printf " \033[36m%-18s\033[0m %s\n%-20s (default: %s)\n", $$2, $$1," ",$$3}' → → 9 10 generate: $(RESULT_FILE) ## Generate the translation. 8
  10. Run Experiments: make-runner  https://github.com/de9uch1/make-runner ▪ Easy-to-use: Show the help

    message described by `##`-prefixed comments. ▪ GNU-like long option: Options can be passed by `--abc xyz` instead of `ABC=xyz`. ▪ No additional requirements: No additional packages are required, and also standalone `make` is executable. 9
  11. Run Experiments: GNU Parallel Multiple Parallel Execution with GNU Parallel

    ▪ Multiple models: % parallel mrun evaluate --model {} ::: transformer_base transformer_big ▪ Parameter tuning: % parallel mrun evaluate --model transformer_base --dropout 0.{} ::: $(seq 0 5) 10
  12. Evaluate: Quantitative & Qualitative Quantitative Implement a scorer, build a

    pipeline, and `make evaluate`. Qualitative Implement the analyzer scripts. 11
  13. Appendix: Environmental Setup ▪ Shell: fish ▪ Shell Prompt: Starship

    (Rust) ▪ Editor: Emacs ▪ Terminal: Alacritty (Rust) + Tmux ▪ Fuzzy Finder: fzf (Go) • History search • Directory/File search • Repository search (& `cd` alias) ▪ grep replacement: ripgrep (Rust) ▪ cat replacement: bat (Rust) ▪ diff replacement: delta (Rust) 12