Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Automating infrastructure at SA Home Loans with Python (and friends)" by Kim van Wyk

Pycon ZA
October 10, 2019

"Automating infrastructure at SA Home Loans with Python (and friends)" by Kim van Wyk

SA Home Loans develops most of its business software in-house, with 5 agile teams of developers, database specialists and testers. Each team is provided with an isolated virtualised clone of the production environment, comprising more than 20 Windows servers, 5 Linux servers and various databases and supporting infrastructure. A combination of Python, Node.js, bash and Powershell is used to glue a variety of open-source apps together and spin up new labs in as automated a fashion as possible. One of the main focuses of this talk is an illustration of some of the tools and methods SAHL uses or has written to do this work, as opposed to previously creating each lab manually over a week of error-prone manual effort per lab.

This talk will aim to show that this kind of automation need not be complex or require a large team. The easy-to-learn nature of Python has allowed SAHL's existing developers and devops engineers to work on the above systems after about 2 days of internally-developed Python training and a bit of at-desk assistance when needed.

The talk will also discuss some of the lessons learned while moving away from manual methods to an automated approach.

Pycon ZA

October 10, 2019
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. PROD SERVERS ± 40 virtualised Windows servers Internally-developed Windows services

    Windows infrastructure Active Directory Exchange IIS 7 Ubuntu servers Docker hosts
  2. LABS AND TEAMS 18 sandboxed virtualised clones of prod environment:

    5 Agile development teams Front-line support DBAs Platforms/Devops Additional Infrastructure: NATing DNS entries Consistent internal host names
  3. PROBLEMS WITH LABS Labs built manually Labour-intensive - 1 week

    + Error-prone - consistency almost impossible No DIY capacity - teams reliant on Devops availability
  4. Adopted several on-premise Open Source tools: - Machine Image Creator

    - Orchestration - Virtualisation - Containerisation - Container Registry - Git and Continuous Integration Tooling - Project generator - Secret Management - Key/Value Store - Web Server Large portion of infrastructure executed on VMWare hosts Packer Rundeck OpenStack Docker Harbor Gitlab Yeoman Vault etcd NGINX
  5. PACKER TEMPLATES Almost all servers in prod and labs derived

    from source- controlled Packer templates OpenSSH installed on Windows boxes Monitoring and logging tools installed, feeding into for log shipping Docker metrics Windows events Common public SSH key added to every server Python and some useful libraries installed Logstash Filebeat
  6. RUNDECK Orchestration jobs defined in YAML Used to execute scripts

    on specific targets Python or Bash in most cases Powershell to control VMWare and Windows hosts Comprehensive scheduling and threading Job maintenance and configuration via API Full history aids with audit trail
  7. WHY RUNDECK? Could have used Puppet, Ansible, Chef etc GUI

    easy for non-development teams to drive YAML based config fairly easy to understand Does the job well enough to allow moving on to the next problem
  8. RUNDECK USAGE Internal Python/Node.js tool builds Rundeck job files from

    a simplified YAML config Rundeck jobs using internal Python/Bash tooling to upload new jobs from internal Git repo Developers can add or modify jobs without needing to understand full complexity Git branching allows teams to develop specific jobs without affecting other teams
  9. VALIDATION, MONITORING & CONTROL Infrastructure validation via Monitoring tools deployed

    via Docker container to each lab Stack Rundeck and above services all deployed as Docker containers Internally developed tooling also Dockerised Docker control via InSpec Prometheus ELK Grafana Portainer
  10. CONTAINER DEVELOPMENT & DEPLOYMENT Third-party and internally-developed Docker images served

    by Teams can upload images to their own libraries as they wish Promotion to prod library and subsequent deployment controlled via ticketing Applied to both third-party and internal images CI tooling ensures consistent and functional images templating aids greatly in eliminating common mistakes Harbor Gitlab Yeoman
  11. EXISTING SCRIPTS Docker containers useful to wrap a consistent interface

    around existing scripts "Black Box" nature allows support teams to execute jobs without needing to know several languages Rundeck deployment adds a level of auditing that is otherwise manually tracked
  12. ADVANTAGES Rundeck, Portainer and monitoring tools allow teams to solve

    ±80% of day-to-day issues in their labs without DevOps team support Implemented over 6 months by a 3 member team of senior developers Supported by a 6 member team of various experience levels Python and Node training provided in-house over 2 weeks sufficient to enable this support
  13. Backups performed for SQL Server and Postgres databases Originally a

    collection of SQL Server jobs and Bash Moved to Python-based Docker containers Scheduled via a dedicated DBA Rundeck Same host can be used to interact with all databases Consistent interface across all database types and schemas Different operations all handled in the same way: Backup to local storage Copy of backup files to other hosts Restoring backup files Standardised logshipping
  14. Common behaviour baked into all the images: stores DB and

    file host access credentials for file system operations on Windows hosts to execute SQL on SQL Server instances to store current state of local backup, copy and restore to DR servers YAML config served from internal Git server pulled directly by the image Explanatory commandline parser Vault pywinrm pymssql etcd argparse
  15. Avoid manual fixes if at all possible Will initially take

    longer, but pay-off should come quickly Worth asking individual developers about manual steps in their workflow Descriptive naming of automated jobs cuts down on support requirements Easy to underestimate the number of processes that aren't written down Consistent look-and-feel of related tasks eases learning Allow jobs to be re-run without negative consequences