Slide 1

Slide 1 text

MAD · NOV 22-23 · 2019 Python para administradores de sistemas Alejandro Guirao @lekum github.com/lekum lekum.org

Slide 2

Slide 2 text

MAD · NOV 22-23 · 2019 Nerea

Slide 3

Slide 3 text

MAD · NOV 22-23 · 2019

Slide 4

Slide 4 text

MAD · NOV 22-23 · 2019

Slide 5

Slide 5 text

MAD · NOV 22-23 · 2019 ♀

Slide 6

Slide 6 text

MAD · NOV 22-23 · 2019 Smartbird

Slide 7

Slide 7 text

MAD · NOV 22-23 · 2019

Slide 8

Slide 8 text

MAD · NOV 22-23 · 2019 Generación del modelo .csv .hd5 TensorFlow PostgreSQL

Slide 9

Slide 9 text

MAD · NOV 22-23 · 2019 Infraestructura de producción Load Balancer Flask app con . . .

Slide 10

Slide 10 text

MAD · NOV 22-23 · 2019 [2019-11-05 00:50:10,620} {python_operator.py:96} ERROR - No files found at /mnt/observations/

Slide 11

Slide 11 text

MAD · NOV 22-23 · 2019 /mnt/observations//YYYYMMDDHHMM.csv

Slide 12

Slide 12 text

MAD · NOV 22-23 · 2019 $ cat bald_eagle_john_b_nov19.csv observation_date,species,observation_coords,notes 03/11/2019 11:56,Haliaeetus leucocephalus,"51.103177,-115.359446",A young couple of black eagles that live by the cliffs 03/11/2019 12:33,Haliaeetus leucocephalus,"50.255013,-110.203278",Lone bald eagle fishing 04/11/2019 07:02,Haliaeetus leucocephalus,"38.588273,-115.117030",Bald eagle flying with a captured marmot

Slide 13

Slide 13 text

MAD · NOV 22-23 · 2019 collect_csv #! /usr/bin/env python def collect_csv(): ... if __name__ == "__main__": collect_csv()

Slide 14

Slide 14 text

MAD · NOV 22-23 · 2019 if __name__ == "__main__": source_dir = "latest" dest_dir = "observations" collect_csv(source_dir, dest_dir)

Slide 15

Slide 15 text

MAD · NOV 22-23 · 2019 pathlib (Python 3.4+)

Slide 16

Slide 16 text

MAD · NOV 22-23 · 2019 from pathlib import Path def collect_csv(source_dir, dest_dir): """ Scan for csvs in a source_dir recursively. Place each file in a path of //.csv """ source_dir = Path(source_dir) dest_dir = Path(dest_dir) for csvfile in source_dir.rglob("*.csv"): species = normalized_species(csvfile) species_dir = dest_dir / species species_dir.mkdir(exist_ok=True, parents=True) date_time = normalized_datetime(csvfile) print(f"Renaming {csvfile} to {species_dir / (date_time + '.csv')}") csvfile.rename(species_dir / (date_time + ".csv"))

Slide 17

Slide 17 text

MAD · NOV 22-23 · 2019 from csv import DictReader def normalized_species(csv_filename): """ Return the species of the column ‘species’ of the csv_filename. Normalize it via lowercasing and transforming spaces into "_" """ with open(csv_filename) as csvfilename: reader = DictReader(csvfilename) first_row = next(reader) return first_row.get("species").lower().replace(" ", "_")

Slide 18

Slide 18 text

MAD · NOV 22-23 · 2019 from datetime import datetime def normalized_datetime(csv_filename): """ Return the datetime of the column ‘observation_date’ of the csv_filename. Normalize it with the format YYYYMMDDHHMM """ with open(csv_filename) as csvfilename: reader = DictReader(csvfilename) first_row = next(reader) src_date_fmt = "%d/%m/%Y %H:%M" dst_date_fmt = "%Y%m%d%H%M" obs_date = datetime.strptime(first_row.get("observation_date"), src_date_fmt) return obs_date.strftime(dst_date_fmt)

Slide 19

Slide 19 text

MAD · NOV 22-23 · 2019 Time Format Codes

Slide 20

Slide 20 text

MAD · NOV 22-23 · 2019 $ ./collect_csv Renaming latest/martha_october2019_ospreys.csv to observations/pandion_haliaetus/201910080532.csv Renaming latest/bald_eagle_john_b_nov19.csv to observations/haliaeetus_leucocephalus/201911031156.csv

Slide 21

Slide 21 text

MAD · NOV 22-23 · 2019 MÁS FLEXIBLE

Slide 22

Slide 22 text

MAD · NOV 22-23 · 2019 Construcción de CLI ■ Click ■ docopt ■ argparse ⌨

Slide 23

Slide 23 text

MAD · NOV 22-23 · 2019 argparse

Slide 24

Slide 24 text

MAD · NOV 22-23 · 2019 from argparse import ArgumentParser def parse_args(): """ Parse the CLI args """ parser = ArgumentParser() parser.add_argument("--dest-dir" help="Output dir to store the results", default="observations") parser.add_argument("--search-paths", nargs='*', default=["."], help="File search paths") return parser.parse_args()

Slide 25

Slide 25 text

MAD · NOV 22-23 · 2019 if __name__ == "__main__": args = parse_args() for path in args.search_paths: collect_csv(path, args.dest_dir)

Slide 26

Slide 26 text

MAD · NOV 22-23 · 2019 $ ./collect_csv --help usage: collect_csv [-h] [--dest-dir DEST_DIR] [--search-paths [SEARCH_PATHS [SEARCH_PATHS ...]]] optional arguments: -h, --help show this help message and exit --dest-dir DEST_DIR Output dir to store the results --search-paths [SEARCH_PATHS [SEARCH_PATHS ...]] File search paths

Slide 27

Slide 27 text

MAD · NOV 22-23 · 2019 $ ./collect_csv --dest-dir new_output_dir --search-paths original_dir1 original_dir2 Renaming original_dir1/martha_october2019_ospreys.csv to new_output_dir/pandion_haliaetus/201910080532.csv Renaming original_dir2/bald_eagle_john_b_nov19.csv to new_output_dir/haliaeetus_leucocephalus/201911031156.c sv

Slide 28

Slide 28 text

MAD · NOV 22-23 · 2019 MÁS RÁPIDO

Slide 29

Slide 29 text

MAD · NOV 22-23 · 2019 Concurrencia y paralelismo en Python ■ threading ■ multiprocessing ■ asyncio

Slide 30

Slide 30 text

MAD · NOV 22-23 · 2019 from multiprocessing import Pool from functools import partial collect_csv_dest = partial(collect_csv, dest_dir=args.dest_dir) with Pool() as p: p.map(collect_csv_dest, args.search_paths)

Slide 31

Slide 31 text

MAD · NOV 22-23 · 2019 PostgreSQL pg_dump smartbirdctl

Slide 32

Slide 32 text

MAD · NOV 22-23 · 2019 def parse_args(): """ Parse the CLI args """ parser = ArgumentParser() subparsers = parser.add_subparsers(help="Sub-command help", dest="subcommand") parser_collect_csv = subparsers.add_parser("collect-csv", help="Collect csv files and place them in their right directories") parser_collect_csv.add_argument("--dest-dir", help="Output dir to store the results", default="observations") parser_collect_csv.add_argument("--search-paths", nargs='*', default=["."], help="File search paths")

Slide 33

Slide 33 text

MAD · NOV 22-23 · 2019 [...] parser_backup_tables = subparsers.add_parser("backup-tables", help="Make a pg_dump of the selected tables") parser_backup_tables.add_argument("database", help="Name of the database to make the backup") parser_backup_tables.add_argument("tables", nargs='*', help="List of tables to dump", default="-") parser_backup_tables.add_argument("--outfile", help="Filename for the backup", default="backup.sql") return parser.parse_args()

Slide 34

Slide 34 text

MAD · NOV 22-23 · 2019 $ ./smartbirdctl --help usage: smartbirdctl [-h] {collect-csv,backup-tables} ... positional arguments: {collect-csv,backup-tables} Sub-command help collect-csv Collect csv files and place them in their right directories backup-tables Make a pg_dump of the selected tables optional arguments: -h, --help show this help message and exit

Slide 35

Slide 35 text

MAD · NOV 22-23 · 2019 $ ./smartbirdctl backup-tables --help usage: smartbirdctl backup-tables [-h] [--outfile OUTFILE] database [tables [tables ...]] positional arguments: database Name of the database to make the backup tables List of tables to dump optional arguments: -h, --help show this help message and exit --outfile OUTFILE Filename for the backup

Slide 36

Slide 36 text

MAD · NOV 22-23 · 2019 from sys import stdin if __name__ == "__main__": args = parse_args() if args.subcommand == "collect-csv": for path in args.search_paths: collect_csv(path, args.dest_dir) elif args.subcommand == "backup-tables": if args.tables == "-": tables = stdin.read().split() else: tables = args.tables backup_tables(tables, args.database, args.outfile)

Slide 37

Slide 37 text

MAD · NOV 22-23 · 2019 from os import cpu_count from subprocess import run def backup_tables(tables, database, backup_filename): """ Backup a list of tables using pg_dump to a backup_filename Notify via Slack in case of failure """ tables_switches = " ".join(f"-t {table}" for table in tables) jobs = cpu_count() cmd = f"pg_dump -d {database} {tables_switches} -j {jobs} -Fc > {backup_filename}" pg_dump = run(cmd, shell=True, capture_output=True)

Slide 38

Slide 38 text

MAD · NOV 22-23 · 2019 popen (3)

Slide 39

Slide 39 text

MAD · NOV 22-23 · 2019 subprocess subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None) ■ run (3.5+) shlex.split(cmd)

Slide 40

Slide 40 text

MAD · NOV 22-23 · 2019 Publicar en Slack curl -X POST -H 'Content-type: application/json' --data '{"text":"Allow me to reintroduce myself!"}' YOUR_WEBHOOK_URL

Slide 41

Slide 41 text

MAD · NOV 22-23 · 2019 from sys import exit from os import environ [...] if pg_dump.returncode != 0: webhook_url = environ.get("SLACK_WEBHOOK_URL") if webhook_url: msg = f"Failed to {cmd}:\n{pg_dump.stderr.decode()}" notify_via_slack(webhook_url, msg) exit(pg_dump.returncode)

Slide 42

Slide 42 text

MAD · NOV 22-23 · 2019 requests

Slide 43

Slide 43 text

MAD · NOV 22-23 · 2019 from requests import post def notify_via_slack(webhook_url, msg): """ Notify via Slack webhook url """ slack_data = {"text": msg} post(webhook_url, json=slack_data)

Slide 44

Slide 44 text

MAD · NOV 22-23 · 2019 Gestión de dependencias en Python ■ Tradicionalmente: requirements.txt, setup.py ■ Pipfile y Pipfile.lock (uso mediante pipenv) ■ pyproject.toml (uso mediante poetry) ■ pip-tools

Slide 45

Slide 45 text

MAD · NOV 22-23 · 2019 python -m venv .env --prompt sbctl source .env/bin/activate (sbctl) python -m pip install requests Usar un entorno virtual

Slide 46

Slide 46 text

MAD · NOV 22-23 · 2019 Automatización https://crontab.guru 0 4 * * * psql -d observations -c "\dt" | grep tracks | smartbirdctl backup-tables --outfile /mnt/backups/`date +\%Y\%m\%d\%H\%M\%S`-backup.sql observations -

Slide 47

Slide 47 text

MAD · NOV 22-23 · 2019 ♀ ♀ /var/log/syslog ✔ ✔ /mnt/backups/... ✔ ☕

Slide 48

Slide 48 text

MAD · NOV 22-23 · 2019 From: CEO To: [email protected] Subject: URGENTE - Producción no funciona Estoy intentando realizar predicciones en el entorno de producción y no están saliendo resultados coherentes. De vez en cuando salen avistamientos en medio del océano que claramente son erróneos. Por favor, todo el mundo a arreglarlo cuanto antes

Slide 49

Slide 49 text

MAD · NOV 22-23 · 2019 python3 wsgi.py -m model-002357.hd5 . . . python3 wsgi.py -m model-002342.hd5 python3 wsgi.py -m model-002357.hd5

Slide 50

Slide 50 text

MAD · NOV 22-23 · 2019 fabric (2) 2⃣

Slide 51

Slide 51 text

MAD · NOV 22-23 · 2019 #! /usr/bin/env python latest_version = "model-002357.hd5" def update_host(conn): """ Copy the latest model file, replace the version in the service unit and restart the service """ conn.put(latest_version) conn.sudo(f"sed -i -e 's/model-[0-9]+\.hd5/{latest_version}/g' /etc/systemd/system/smartbird.service") conn.sudo("systemctl daemon-reload && systemctl restart smartbird")

Slide 52

Slide 52 text

MAD · NOV 22-23 · 2019 import sys from fabric2 import ThreadingGroup if __name__ == "__main__": hosts = sys.argv[1:] all_hosts_parallel = ThreadingGroup(*hosts) ps = all_hosts_parallel.run(f"ps aux | grep -V grep | grep {latest_version}", warn=True, hide=True) hosts_to_update = [conn for conn,result in ps.items() if not result.ok] for conn in hosts_to_update: update_host(conn)

Slide 53

Slide 53 text

MAD · NOV 22-23 · 2019 ☕ ☕

Slide 54

Slide 54 text

MAD · NOV 22-23 · 2019

Slide 55

Slide 55 text

MAD · NOV 22-23 · 2019 Python para administración de sistemas ■ Exploración y prototipado ■ Pegamento ■ “Pilas incluidas” + bibliotecas de terceros ■ No siempre es la mejor herramienta

Slide 56

Slide 56 text

MAD · NOV 22-23 · 2019 Happy hacking! Alejandro Guirao @lekum lekum.org