$30 off During Our Annual Pro Sale. View Details »

Python para Administradores de Sistemas

Python para Administradores de Sistemas

Desde su versión 1.0 hace ya 25 años, Python ha sido el lenguaje de elección por muchos administradores de sistemas para automatizar todo tipo de tareas y permitirles realizar de manera mucho más eficiente en su día a día la gestión de servidores web, redes, usuarios, bases de datos...

En esta charla explicaré, poniéndonos por un rato en la piel de una administradora de sistemas, cómo usar Python para crear herramientas de línea de comandos para automatizar acciones repetitivas y resolver diversos problemas. A lo largo de la charla introduciré varios poderosos módulos tanto de la biblioteca estándar como de terceros que nos ayudarán a la hora de crear nuestras utilidades. ¡Dile adiós a BASH!

Alejandro Guirao Rodríguez

November 22, 2019
Tweet

More Decks by Alejandro Guirao Rodríguez

Other Decks in Programming

Transcript

  1. MAD · NOV 22-23 · 2019
    Python para administradores
    de sistemas
    Alejandro Guirao
    @lekum
    github.com/lekum
    lekum.org

    View Slide

  2. MAD · NOV 22-23 · 2019
    Nerea

    View Slide

  3. MAD · NOV 22-23 · 2019

    View Slide

  4. MAD · NOV 22-23 · 2019

    View Slide

  5. MAD · NOV 22-23 · 2019




    View Slide

  6. MAD · NOV 22-23 · 2019
    Smartbird




    View Slide

  7. MAD · NOV 22-23 · 2019

    View Slide

  8. MAD · NOV 22-23 · 2019
    Generación del modelo
    .csv
    .hd5

    TensorFlow
    PostgreSQL

    View Slide

  9. MAD · NOV 22-23 · 2019
    Infraestructura de producción
    Load Balancer
    Flask app con
    .
    .
    .

    View Slide

  10. MAD · NOV 22-23 · 2019
    [2019-11-05 00:50:10,620}
    {python_operator.py:96} ERROR - No
    files found at /mnt/observations/

    View Slide

  11. MAD · NOV 22-23 · 2019
    /mnt/observations//YYYYMMDDHHMM.csv

    View Slide

  12. MAD · NOV 22-23 · 2019
    $ cat bald_eagle_john_b_nov19.csv
    observation_date,species,observation_coords,notes
    03/11/2019 11:56,Haliaeetus leucocephalus,"51.103177,-115.359446",A
    young couple of black eagles that live by the cliffs
    03/11/2019 12:33,Haliaeetus leucocephalus,"50.255013,-110.203278",Lone
    bald eagle fishing
    04/11/2019 07:02,Haliaeetus leucocephalus,"38.588273,-115.117030",Bald
    eagle flying with a captured marmot

    View Slide

  13. MAD · NOV 22-23 · 2019
    collect_csv
    #! /usr/bin/env python
    def collect_csv():
    ...
    if __name__ == "__main__":
    collect_csv()

    View Slide

  14. MAD · NOV 22-23 · 2019
    if __name__ == "__main__":
    source_dir = "latest"
    dest_dir = "observations"
    collect_csv(source_dir, dest_dir)

    View Slide

  15. MAD · NOV 22-23 · 2019
    pathlib (Python 3.4+)

    View Slide

  16. MAD · NOV 22-23 · 2019
    from pathlib import Path
    def collect_csv(source_dir, dest_dir):
    """
    Scan for csvs in a source_dir recursively.
    Place each file in a path of //.csv
    """
    source_dir = Path(source_dir)
    dest_dir = Path(dest_dir)
    for csvfile in source_dir.rglob("*.csv"):
    species = normalized_species(csvfile)
    species_dir = dest_dir / species
    species_dir.mkdir(exist_ok=True, parents=True)
    date_time = normalized_datetime(csvfile)
    print(f"Renaming {csvfile} to {species_dir / (date_time + '.csv')}")
    csvfile.rename(species_dir / (date_time + ".csv"))

    View Slide

  17. MAD · NOV 22-23 · 2019
    from csv import DictReader
    def normalized_species(csv_filename):
    """
    Return the species of the column ‘species’ of the csv_filename.
    Normalize it via lowercasing and transforming spaces into "_"
    """
    with open(csv_filename) as csvfilename:
    reader = DictReader(csvfilename)
    first_row = next(reader)
    return first_row.get("species").lower().replace(" ", "_")

    View Slide

  18. MAD · NOV 22-23 · 2019
    from datetime import datetime
    def normalized_datetime(csv_filename):
    """
    Return the datetime of the column ‘observation_date’ of the csv_filename.
    Normalize it with the format YYYYMMDDHHMM
    """
    with open(csv_filename) as csvfilename:
    reader = DictReader(csvfilename)
    first_row = next(reader)
    src_date_fmt = "%d/%m/%Y %H:%M"
    dst_date_fmt = "%Y%m%d%H%M"
    obs_date = datetime.strptime(first_row.get("observation_date"), src_date_fmt)
    return obs_date.strftime(dst_date_fmt)

    View Slide

  19. MAD · NOV 22-23 · 2019
    Time Format Codes

    View Slide

  20. MAD · NOV 22-23 · 2019
    $ ./collect_csv
    Renaming latest/martha_october2019_ospreys.csv to
    observations/pandion_haliaetus/201910080532.csv
    Renaming latest/bald_eagle_john_b_nov19.csv to
    observations/haliaeetus_leucocephalus/201911031156.csv

    View Slide

  21. MAD · NOV 22-23 · 2019
    MÁS FLEXIBLE

    View Slide

  22. MAD · NOV 22-23 · 2019
    Construcción de CLI
    ■ Click
    ■ docopt
    ■ argparse

    View Slide

  23. MAD · NOV 22-23 · 2019
    argparse

    View Slide

  24. MAD · NOV 22-23 · 2019
    from argparse import ArgumentParser
    def parse_args():
    """
    Parse the CLI args
    """
    parser = ArgumentParser()
    parser.add_argument("--dest-dir"
    help="Output dir to store the results",
    default="observations")
    parser.add_argument("--search-paths",
    nargs='*',
    default=["."],
    help="File search paths")
    return parser.parse_args()

    View Slide

  25. MAD · NOV 22-23 · 2019
    if __name__ == "__main__":
    args = parse_args()
    for path in args.search_paths:
    collect_csv(path, args.dest_dir)

    View Slide

  26. MAD · NOV 22-23 · 2019
    $ ./collect_csv --help
    usage: collect_csv [-h] [--dest-dir DEST_DIR]
    [--search-paths [SEARCH_PATHS
    [SEARCH_PATHS ...]]]
    optional arguments:
    -h, --help show this help message and exit
    --dest-dir DEST_DIR Output dir to store the results
    --search-paths [SEARCH_PATHS [SEARCH_PATHS ...]]
    File search paths

    View Slide

  27. MAD · NOV 22-23 · 2019
    $ ./collect_csv --dest-dir new_output_dir
    --search-paths original_dir1 original_dir2
    Renaming original_dir1/martha_october2019_ospreys.csv
    to new_output_dir/pandion_haliaetus/201910080532.csv
    Renaming original_dir2/bald_eagle_john_b_nov19.csv to
    new_output_dir/haliaeetus_leucocephalus/201911031156.c
    sv

    View Slide

  28. MAD · NOV 22-23 · 2019
    MÁS RÁPIDO

    View Slide

  29. MAD · NOV 22-23 · 2019
    Concurrencia y paralelismo en Python
    ■ threading
    ■ multiprocessing
    ■ asyncio

    View Slide

  30. MAD · NOV 22-23 · 2019
    from multiprocessing import Pool
    from functools import partial
    collect_csv_dest = partial(collect_csv,
    dest_dir=args.dest_dir)
    with Pool() as p:
    p.map(collect_csv_dest, args.search_paths)

    View Slide

  31. MAD · NOV 22-23 · 2019
    PostgreSQL

    pg_dump

    smartbirdctl

    View Slide

  32. MAD · NOV 22-23 · 2019
    def parse_args():
    """
    Parse the CLI args
    """
    parser = ArgumentParser()
    subparsers = parser.add_subparsers(help="Sub-command help",
    dest="subcommand")
    parser_collect_csv = subparsers.add_parser("collect-csv",
    help="Collect csv files and place them in their right directories")
    parser_collect_csv.add_argument("--dest-dir", help="Output dir to
    store the results", default="observations")
    parser_collect_csv.add_argument("--search-paths", nargs='*',
    default=["."], help="File search paths")

    View Slide

  33. MAD · NOV 22-23 · 2019
    [...]
    parser_backup_tables = subparsers.add_parser("backup-tables",
    help="Make a pg_dump of the selected tables")
    parser_backup_tables.add_argument("database", help="Name of the
    database to make the backup")
    parser_backup_tables.add_argument("tables", nargs='*', help="List of
    tables to dump", default="-")
    parser_backup_tables.add_argument("--outfile", help="Filename for the
    backup", default="backup.sql")
    return parser.parse_args()

    View Slide

  34. MAD · NOV 22-23 · 2019
    $ ./smartbirdctl --help
    usage: smartbirdctl [-h] {collect-csv,backup-tables} ...
    positional arguments:
    {collect-csv,backup-tables}
    Sub-command help
    collect-csv Collect csv files and place them in their
    right
    directories
    backup-tables Make a pg_dump of the selected tables
    optional arguments:
    -h, --help show this help message and exit

    View Slide

  35. MAD · NOV 22-23 · 2019
    $ ./smartbirdctl backup-tables --help
    usage: smartbirdctl backup-tables [-h] [--outfile OUTFILE]
    database [tables [tables
    ...]]
    positional arguments:
    database Name of the database to make the backup
    tables List of tables to dump
    optional arguments:
    -h, --help show this help message and exit
    --outfile OUTFILE Filename for the backup

    View Slide

  36. MAD · NOV 22-23 · 2019
    from sys import stdin
    if __name__ == "__main__":
    args = parse_args()
    if args.subcommand == "collect-csv":
    for path in args.search_paths:
    collect_csv(path, args.dest_dir)
    elif args.subcommand == "backup-tables":
    if args.tables == "-":
    tables = stdin.read().split()
    else:
    tables = args.tables
    backup_tables(tables, args.database, args.outfile)

    View Slide

  37. MAD · NOV 22-23 · 2019
    from os import cpu_count
    from subprocess import run
    def backup_tables(tables, database, backup_filename):
    """
    Backup a list of tables using pg_dump to a backup_filename
    Notify via Slack in case of failure
    """
    tables_switches = " ".join(f"-t {table}" for table in tables)
    jobs = cpu_count()
    cmd = f"pg_dump -d {database} {tables_switches} -j {jobs} -Fc >
    {backup_filename}"
    pg_dump = run(cmd, shell=True, capture_output=True)

    View Slide

  38. MAD · NOV 22-23 · 2019
    popen (3)

    View Slide

  39. MAD · NOV 22-23 · 2019
    subprocess
    subprocess.run(args, *, stdin=None,
    input=None, stdout=None, stderr=None,
    capture_output=False, shell=False, cwd=None,
    timeout=None, check=False, encoding=None,
    errors=None, text=None, env=None,
    universal_newlines=None)
    ■ run (3.5+)
    shlex.split(cmd)

    View Slide

  40. MAD · NOV 22-23 · 2019
    Publicar en Slack
    curl -X POST -H
    'Content-type:
    application/json' --data
    '{"text":"Allow me to
    reintroduce myself!"}'
    YOUR_WEBHOOK_URL

    View Slide

  41. MAD · NOV 22-23 · 2019
    from sys import exit
    from os import environ
    [...]
    if pg_dump.returncode != 0:
    webhook_url = environ.get("SLACK_WEBHOOK_URL")
    if webhook_url:
    msg = f"Failed to {cmd}:\n{pg_dump.stderr.decode()}"
    notify_via_slack(webhook_url, msg)
    exit(pg_dump.returncode)

    View Slide

  42. MAD · NOV 22-23 · 2019
    requests

    View Slide

  43. MAD · NOV 22-23 · 2019
    from requests import post
    def notify_via_slack(webhook_url, msg):
    """
    Notify via Slack webhook url
    """
    slack_data = {"text": msg}
    post(webhook_url, json=slack_data)

    View Slide

  44. MAD · NOV 22-23 · 2019
    Gestión de dependencias en Python
    ■ Tradicionalmente: requirements.txt, setup.py
    ■ Pipfile y Pipfile.lock (uso mediante pipenv)
    ■ pyproject.toml (uso mediante poetry)
    ■ pip-tools

    View Slide

  45. MAD · NOV 22-23 · 2019
    python -m venv .env --prompt sbctl
    source .env/bin/activate
    (sbctl) python -m pip install requests
    Usar un entorno virtual

    View Slide

  46. MAD · NOV 22-23 · 2019
    Automatización
    https://crontab.guru
    0 4 * * * psql -d observations -c "\dt" | grep
    tracks | smartbirdctl backup-tables --outfile
    /mnt/backups/`date
    +\%Y\%m\%d\%H\%M\%S`-backup.sql observations -

    View Slide

  47. MAD · NOV 22-23 · 2019






    /var/log/syslog ✔

    /mnt/backups/... ✔

    View Slide

  48. MAD · NOV 22-23 · 2019

    From: CEO
    To: [email protected]
    Subject: URGENTE - Producción no funciona
    Estoy intentando realizar predicciones en el entorno de
    producción y no están saliendo resultados coherentes.
    De vez en cuando salen avistamientos en medio del
    océano que claramente son erróneos. Por favor, todo el
    mundo a arreglarlo cuanto antes

    View Slide

  49. MAD · NOV 22-23 · 2019
    python3 wsgi.py -m model-002357.hd5
    .
    .
    .
    python3 wsgi.py -m model-002342.hd5
    python3 wsgi.py -m model-002357.hd5

    View Slide

  50. MAD · NOV 22-23 · 2019
    fabric (2)
    2⃣

    View Slide

  51. MAD · NOV 22-23 · 2019
    #! /usr/bin/env python
    latest_version = "model-002357.hd5"
    def update_host(conn):
    """
    Copy the latest model file, replace the version in the service
    unit and restart the service
    """
    conn.put(latest_version)
    conn.sudo(f"sed -i -e 's/model-[0-9]+\.hd5/{latest_version}/g'
    /etc/systemd/system/smartbird.service")
    conn.sudo("systemctl daemon-reload && systemctl restart
    smartbird")

    View Slide

  52. MAD · NOV 22-23 · 2019
    import sys
    from fabric2 import ThreadingGroup
    if __name__ == "__main__":
    hosts = sys.argv[1:]
    all_hosts_parallel = ThreadingGroup(*hosts)
    ps = all_hosts_parallel.run(f"ps aux | grep -V grep | grep
    {latest_version}", warn=True, hide=True)
    hosts_to_update = [conn for conn,result in ps.items() if not
    result.ok]
    for conn in hosts_to_update:
    update_host(conn)

    View Slide

  53. MAD · NOV 22-23 · 2019






    View Slide

  54. MAD · NOV 22-23 · 2019

    View Slide

  55. MAD · NOV 22-23 · 2019
    Python para administración de sistemas
    ■ Exploración y prototipado
    ■ Pegamento
    ■ “Pilas incluidas” + bibliotecas de terceros
    ■ No siempre es la mejor herramienta

    View Slide

  56. MAD · NOV 22-23 · 2019
    Happy hacking!
    Alejandro Guirao
    @lekum
    lekum.org

    View Slide