Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Drone CI - Delivering Continuous testing for large open source projects

Drone CI - Delivering Continuous testing for large open source projects

Patrick Jahns

January 31, 2019
Tweet

More Decks by Patrick Jahns

Other Decks in Technology

Transcript

  1. 4 Delivering CI/CD at ownCloud • Hosted on Github and

    consists of ~80 separate github repositories • Built on top of a Web Application Stack (PHP, Apache, Javascript, CSS, HTML) • Over 14000 unit tests and 2200 acceptance ( ui/ api ) tests • Pull request for “core” run 15 hours of test time ( Feedback < 30 mins ) • Every night we run over 180 hours of tests • Various infrastructure components – Relational database (MySQL, MariaDB, PostgreSQL, OracleDB) – Memory Cache (Memcached, Redis) – Storage Providers (FileSystem, NFS, SMB, Swift, S3, OneDrive, Dropbox, etc.) – Identity / Authentication Providers (LDAP, Active Directory, Shibboleth / SAML) – Other Infrastructure Components (ClamAV, Elasticsearch, Collabora, etc.)
  2. 6 Where it all started... • Travis CI – Dav

    (Litmus, Carddav, Caldav) tests – PHP syntax checks – Selenium testing (arrived beginning 2017) • Jenkins – Unit tests with different PHP and database versions – Storage specific tests (Swift, Ceph, Samba, S3) – Integration tests – Upgrade tests – Smashbox tests Old Infrastructure / setup
  3. 7 Where it all started... • CI environment not reproducible

    locally, e.g. “works for me ™” • Test suites encountered regular timeouts • Feedback / Results of test runs sometimes only after days • No real plugin system, not extensible • Travis wasn‘t able to provide extended build power on our open-source repositories (only possible on private repositories)* *) changed in Summer 2018
  4. 8 Where it all started... • Difficult to keep up

    to date – Plugin updates result in changes to config format – Only managed via web UI • Secrets are managed via web UI or hacky API scripts • Frequently ran out of disk space • Wasn‘t cleaning up containers properly • Containers (services) required a lot of bash magic • Test results took hours to complete – very slow Feedback cycle • Static number of executors
  5. 12 Drone CI Your friendly neighborhood CI system • Container

    native CI/CD platform (everything runs within containers) • Easy to install & maintain (docker pull drone/drone) • Isolated builds • Multi-Arch (amd64, arm64, ) • Mutli-Machine builds ( fan-out & fan-in ) • Simple YAML Configuration • Integrates with several VCS Providers ( Github, Gitea, Gitlab, Bitbucket …) • Rich set of official plugins (any container can be a plugin) • Execute locally with “drone exec” • Open Source (https://github.com/drone)
  6. 13 Drone CI Your friendly neighborhood CI system Server SERVICES

    WORKSPACE STEP1 git clone STEP2 make STEP3 publish Agent
  7. 14 Drone CI Let’s migrate to drone • Provision drone-server

    & drone-agents via ansible • Provide Docker containers for infrastructure components (PHP / databases / storages) • Gradual migration of “owncloud/core” from Jenkins / Travis to Drone – Basic linting and unit testing – Gradually migrated integration / acceptance tests and UI tests • Expand drone to app repositories – Required “plugin” to install and configure ownCloud => https://github.com/owncloud-ci – Built further custom plugins, e.g. recorder
  8. 15 Drone CI Recap – Where are we at now?

    • Too many systems to maintain • Secrets management • Frequently ran out of disk space • Static number of executors / timeouts • Containers required a lot of bash magic • CI environment not reproducible locally • No plugin system / limited extensibility Need to maintain 3 systems: Jenkins, Drone, Travis Drone provided us with API & UI Docker isn’t great at cleaning up after itself No time restriction, but amount of parallel jobs limited Container native Containers & drone exec Any container can be a plugin
  9. 17 Entering the golden age • Dropped Travis and Jenkins

    entirely • Scaling Drone agents on demand • Number of test suites still increasing • Entirely version controlled and easily manageable infrastructure – Terraform – Ansible – Hetzner Cloud – Autoscaler Final infrastructure
  10. 19 Entering the golden age • Support for various Infrastrucute

    Providers ( AWS, Azure, Packet. Openstack, hetznercloud …) • Simple service connected to drone server • Hooked into Drone CLI, e.g. “drone server create” • Checks the Drone queue in a loop • Launch servers based on a cloud-init config • Start Drone agent via remote Docker connection (secured by TLS) • Unregister Drone agent if not needed anymore • Destroy server instance after a minimal amount of time Welcome to “Autoscaler”
  11. 20 0 5000 10000 15000 20000 25000 30000 35000 Nov

    6-Nov 11-Nov 16-Nov 21-Nov 26-Nov Dec 5-Dec 12-Dec 17-Dec 22-Dec 27-Dec Jan 5-Jan 11-Jan 16-Jan 21-Jan 26-Jan 31-Jan 4-Feb 9-Feb 14-Feb 19-Feb 24-Feb Mar 5-Mar 10-Mar 15-Mar 20-Mar 25-Mar 30-Mar 3-Apr 8-Apr 13-Apr 18-Apr 26-Apr 1-May 7-May 12-May 17-May 22-May 28-May 1-Jun 6-Jun 11-Jun 16-Jun commulated runtime time to finish time to finish (including queue wait) Entering the golden age
  12. 23 Evolution of our CI • Native integration of build

    scheduling in yaml configuration • Split feedback on pull requests per test suite • Archiving build data – we got lots of build logs • Downstream cross repository checks • Upstream cross repository checks • Ability to restart one branch of a fan-in/fan-out scenario • Triggers – Invoke builds from different sources – Split pipelines into different configurations What else is missing?
  13. 24 Evolution of our CI • Just adding ${FAVORITE_CI} as

    a tool to your company, doesn’t guarantee success – Promote the tool in your team / get the team onboard – Gradual adoption helped to gain traction within teams • Everything works great, until you move to production – Load can will kill your application at some point … and also your CI system... • Smaller Infrastructure components vs. Monolithic Infrastructure – Docker default network limitations – General resource limits, e.g. Disks, IOPS, CPU, Memory • Technical fallacies – Everything that can be unreachable, will be unreachable ( this is also true for your SaaS repository provider) – Database compression is really not a good idea for write heavy loads Lessons Learned