Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python para Administradores de Sistemas

Python para Administradores de Sistemas

Desde su versión 1.0 hace ya 25 años, Python ha sido el lenguaje de elección por muchos administradores de sistemas para automatizar todo tipo de tareas y permitirles realizar de manera mucho más eficiente en su día a día la gestión de servidores web, redes, usuarios, bases de datos...

En esta charla explicaré, poniéndonos por un rato en la piel de una administradora de sistemas, cómo usar Python para crear herramientas de línea de comandos para automatizar acciones repetitivas y resolver diversos problemas. A lo largo de la charla introduciré varios poderosos módulos tanto de la biblioteca estándar como de terceros que nos ayudarán a la hora de crear nuestras utilidades. ¡Dile adiós a BASH!

Alejandro Guirao Rodríguez

November 22, 2019
Tweet

More Decks by Alejandro Guirao Rodríguez

Other Decks in Programming

Transcript

  1. MAD · NOV 22-23 · 2019 Python para administradores de

    sistemas Alejandro Guirao @lekum github.com/lekum lekum.org
  2. MAD · NOV 22-23 · 2019 $ cat bald_eagle_john_b_nov19.csv observation_date,species,observation_coords,notes

    03/11/2019 11:56,Haliaeetus leucocephalus,"51.103177,-115.359446",A young couple of black eagles that live by the cliffs 03/11/2019 12:33,Haliaeetus leucocephalus,"50.255013,-110.203278",Lone bald eagle fishing 04/11/2019 07:02,Haliaeetus leucocephalus,"38.588273,-115.117030",Bald eagle flying with a captured marmot
  3. MAD · NOV 22-23 · 2019 collect_csv #! /usr/bin/env python

    def collect_csv(): ... if __name__ == "__main__": collect_csv()
  4. MAD · NOV 22-23 · 2019 if __name__ == "__main__":

    source_dir = "latest" dest_dir = "observations" collect_csv(source_dir, dest_dir)
  5. MAD · NOV 22-23 · 2019 from pathlib import Path

    def collect_csv(source_dir, dest_dir): """ Scan for csvs in a source_dir recursively. Place each file in a path of <dest_dir>/<species>/<YYYYMMDDHHMM>.csv """ source_dir = Path(source_dir) dest_dir = Path(dest_dir) for csvfile in source_dir.rglob("*.csv"): species = normalized_species(csvfile) species_dir = dest_dir / species species_dir.mkdir(exist_ok=True, parents=True) date_time = normalized_datetime(csvfile) print(f"Renaming {csvfile} to {species_dir / (date_time + '.csv')}") csvfile.rename(species_dir / (date_time + ".csv"))
  6. MAD · NOV 22-23 · 2019 from csv import DictReader

    def normalized_species(csv_filename): """ Return the species of the column ‘species’ of the csv_filename. Normalize it via lowercasing and transforming spaces into "_" """ with open(csv_filename) as csvfilename: reader = DictReader(csvfilename) first_row = next(reader) return first_row.get("species").lower().replace(" ", "_")
  7. MAD · NOV 22-23 · 2019 from datetime import datetime

    def normalized_datetime(csv_filename): """ Return the datetime of the column ‘observation_date’ of the csv_filename. Normalize it with the format YYYYMMDDHHMM """ with open(csv_filename) as csvfilename: reader = DictReader(csvfilename) first_row = next(reader) src_date_fmt = "%d/%m/%Y %H:%M" dst_date_fmt = "%Y%m%d%H%M" obs_date = datetime.strptime(first_row.get("observation_date"), src_date_fmt) return obs_date.strftime(dst_date_fmt)
  8. MAD · NOV 22-23 · 2019 $ ./collect_csv Renaming latest/martha_october2019_ospreys.csv

    to observations/pandion_haliaetus/201910080532.csv Renaming latest/bald_eagle_john_b_nov19.csv to observations/haliaeetus_leucocephalus/201911031156.csv
  9. MAD · NOV 22-23 · 2019 Construcción de CLI ▪

    Click ▪ docopt ▪ argparse ⌨
  10. MAD · NOV 22-23 · 2019 from argparse import ArgumentParser

    def parse_args(): """ Parse the CLI args """ parser = ArgumentParser() parser.add_argument("--dest-dir" help="Output dir to store the results", default="observations") parser.add_argument("--search-paths", nargs='*', default=["."], help="File search paths") return parser.parse_args()
  11. MAD · NOV 22-23 · 2019 if __name__ == "__main__":

    args = parse_args() for path in args.search_paths: collect_csv(path, args.dest_dir)
  12. MAD · NOV 22-23 · 2019 $ ./collect_csv --help usage:

    collect_csv [-h] [--dest-dir DEST_DIR] [--search-paths [SEARCH_PATHS [SEARCH_PATHS ...]]] optional arguments: -h, --help show this help message and exit --dest-dir DEST_DIR Output dir to store the results --search-paths [SEARCH_PATHS [SEARCH_PATHS ...]] File search paths
  13. MAD · NOV 22-23 · 2019 $ ./collect_csv --dest-dir new_output_dir

    --search-paths original_dir1 original_dir2 Renaming original_dir1/martha_october2019_ospreys.csv to new_output_dir/pandion_haliaetus/201910080532.csv Renaming original_dir2/bald_eagle_john_b_nov19.csv to new_output_dir/haliaeetus_leucocephalus/201911031156.c sv
  14. MAD · NOV 22-23 · 2019 Concurrencia y paralelismo en

    Python ▪ threading ▪ multiprocessing ▪ asyncio
  15. MAD · NOV 22-23 · 2019 from multiprocessing import Pool

    from functools import partial collect_csv_dest = partial(collect_csv, dest_dir=args.dest_dir) with Pool() as p: p.map(collect_csv_dest, args.search_paths)
  16. MAD · NOV 22-23 · 2019 def parse_args(): """ Parse

    the CLI args """ parser = ArgumentParser() subparsers = parser.add_subparsers(help="Sub-command help", dest="subcommand") parser_collect_csv = subparsers.add_parser("collect-csv", help="Collect csv files and place them in their right directories") parser_collect_csv.add_argument("--dest-dir", help="Output dir to store the results", default="observations") parser_collect_csv.add_argument("--search-paths", nargs='*', default=["."], help="File search paths")
  17. MAD · NOV 22-23 · 2019 [...] parser_backup_tables = subparsers.add_parser("backup-tables",

    help="Make a pg_dump of the selected tables") parser_backup_tables.add_argument("database", help="Name of the database to make the backup") parser_backup_tables.add_argument("tables", nargs='*', help="List of tables to dump", default="-") parser_backup_tables.add_argument("--outfile", help="Filename for the backup", default="backup.sql") return parser.parse_args()
  18. MAD · NOV 22-23 · 2019 $ ./smartbirdctl --help usage:

    smartbirdctl [-h] {collect-csv,backup-tables} ... positional arguments: {collect-csv,backup-tables} Sub-command help collect-csv Collect csv files and place them in their right directories backup-tables Make a pg_dump of the selected tables optional arguments: -h, --help show this help message and exit
  19. MAD · NOV 22-23 · 2019 $ ./smartbirdctl backup-tables --help

    usage: smartbirdctl backup-tables [-h] [--outfile OUTFILE] database [tables [tables ...]] positional arguments: database Name of the database to make the backup tables List of tables to dump optional arguments: -h, --help show this help message and exit --outfile OUTFILE Filename for the backup
  20. MAD · NOV 22-23 · 2019 from sys import stdin

    if __name__ == "__main__": args = parse_args() if args.subcommand == "collect-csv": for path in args.search_paths: collect_csv(path, args.dest_dir) elif args.subcommand == "backup-tables": if args.tables == "-": tables = stdin.read().split() else: tables = args.tables backup_tables(tables, args.database, args.outfile)
  21. MAD · NOV 22-23 · 2019 from os import cpu_count

    from subprocess import run def backup_tables(tables, database, backup_filename): """ Backup a list of tables using pg_dump to a backup_filename Notify via Slack in case of failure """ tables_switches = " ".join(f"-t {table}" for table in tables) jobs = cpu_count() cmd = f"pg_dump -d {database} {tables_switches} -j {jobs} -Fc > {backup_filename}" pg_dump = run(cmd, shell=True, capture_output=True)
  22. MAD · NOV 22-23 · 2019 subprocess subprocess.run(args, *, stdin=None,

    input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None) ▪ run (3.5+) shlex.split(cmd)
  23. MAD · NOV 22-23 · 2019 Publicar en Slack curl

    -X POST -H 'Content-type: application/json' --data '{"text":"Allow me to reintroduce myself!"}' YOUR_WEBHOOK_URL
  24. MAD · NOV 22-23 · 2019 from sys import exit

    from os import environ [...] if pg_dump.returncode != 0: webhook_url = environ.get("SLACK_WEBHOOK_URL") if webhook_url: msg = f"Failed to {cmd}:\n{pg_dump.stderr.decode()}" notify_via_slack(webhook_url, msg) exit(pg_dump.returncode)
  25. MAD · NOV 22-23 · 2019 from requests import post

    def notify_via_slack(webhook_url, msg): """ Notify via Slack webhook url """ slack_data = {"text": msg} post(webhook_url, json=slack_data)
  26. MAD · NOV 22-23 · 2019 Gestión de dependencias en

    Python ▪ Tradicionalmente: requirements.txt, setup.py ▪ Pipfile y Pipfile.lock (uso mediante pipenv) ▪ pyproject.toml (uso mediante poetry) ▪ pip-tools
  27. MAD · NOV 22-23 · 2019 python -m venv .env

    --prompt sbctl source .env/bin/activate (sbctl) python -m pip install requests Usar un entorno virtual
  28. MAD · NOV 22-23 · 2019 Automatización https://crontab.guru 0 4

    * * * psql -d observations -c "\dt" | grep tracks | smartbirdctl backup-tables --outfile /mnt/backups/`date +\%Y\%m\%d\%H\%M\%S`-backup.sql observations -
  29. MAD · NOV 22-23 · 2019 From: CEO <[email protected]> To:

    [email protected] Subject: URGENTE - Producción no funciona Estoy intentando realizar predicciones en el entorno de producción y no están saliendo resultados coherentes. De vez en cuando salen avistamientos en medio del océano que claramente son erróneos. Por favor, todo el mundo a arreglarlo cuanto antes
  30. MAD · NOV 22-23 · 2019 python3 wsgi.py -m model-002357.hd5

    . . . python3 wsgi.py -m model-002342.hd5 python3 wsgi.py -m model-002357.hd5
  31. MAD · NOV 22-23 · 2019 #! /usr/bin/env python latest_version

    = "model-002357.hd5" def update_host(conn): """ Copy the latest model file, replace the version in the service unit and restart the service """ conn.put(latest_version) conn.sudo(f"sed -i -e 's/model-[0-9]+\.hd5/{latest_version}/g' /etc/systemd/system/smartbird.service") conn.sudo("systemctl daemon-reload && systemctl restart smartbird")
  32. MAD · NOV 22-23 · 2019 import sys from fabric2

    import ThreadingGroup if __name__ == "__main__": hosts = sys.argv[1:] all_hosts_parallel = ThreadingGroup(*hosts) ps = all_hosts_parallel.run(f"ps aux | grep -V grep | grep {latest_version}", warn=True, hide=True) hosts_to_update = [conn for conn,result in ps.items() if not result.ok] for conn in hosts_to_update: update_host(conn)
  33. MAD · NOV 22-23 · 2019 Python para administración de

    sistemas ▪ Exploración y prototipado ▪ Pegamento ▪ “Pilas incluidas” + bibliotecas de terceros ▪ No siempre es la mejor herramienta