Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ContainerDays- How we scale DroneCi on demand

ContainerDays- How we scale DroneCi on demand

We are sharing our process of migrating to the container based DroneCI platform and our lessons learned when scaling it up for an active open source project like ownCloud. Our journey started with a static legacy CI system, which was gradually replaced with, at first, a static DroneCI infrastructure. Over the course of half a year, we further more migrated to a cloud provider in order to dynamically scale the CI system based on the build volume. The lessons learned during this journey, were transformed and contributed to the DroneCI project and resulted in the DroneCI autoscaler - which allows for automatic scaling of infrastructure resources with common cloud providers.

Patrick Jahns

June 19, 2018
Tweet

More Decks by Patrick Jahns

Other Decks in Technology

Transcript

  1. 2 • Drone contributor • Pythonista • ❤ Clean Code

    & DevOps • ownCloud QA Lead / Solution Architect Who are these guys? • Drone contributor / Plugin manager • Gopher • Automation addicted • ownCloud DevOps Engineer patrick_jahns patrickjahns tboerger tboerger
  2. 4 ownCloud – Technology Stack • Built on top of

    a Web Application Stack (PHP, Apache, Javascript, CSS) • Infrastructure Technologies – Relational database (MySQL, MariaDB, PostgreSQL, OracleDB) – Memory Cache (Memcached, Redis) – Storage Providers (FileSystem, NFS, SMB, Swift, S3, OneDrive, Dropbox, etc.) – Identity / Authentication Providers (LDAP, Active Directory, Shibboleth / SAML) – Other Infrastructure Components (ClamAV, Elasticsearch, Collabora, etc.) • Open Source – Hosted on Github (https://github.com/owncloud ) – Consists of owncloud/core and ~80 applications on top
  3. 6 Where it all started... • Travis CI – Dav

    (Litmus, Carddav, Caldav) tests – PHP syntax checks – Selenium testing (arrived mid 2017) • Jenkins – Unit tests with different PHP and database versions – Storages like Swift, Ceph, Samba – Integration tests – Upgrade tests – Smashbox tests Old Infrastructure
  4. 7 Where it all started... • CI environment not reproducible

    locally, e.g. “works for me ™” • Test suites encountered regular timeouts • Feedback / Results of test runs sometimes only after days • No real plugin system, not extensible • Travis wasn‘t able to provide extended build power on our open-source repositories (only possible on private repositories) Our pain with Travis
  5. 8 Where it all started... • A pain to keep

    it up to date – Plugin updates result in changes to config format – Only managed via web UI • Secrets are managed via web UI or hacky API scripts • Frequently ran out of disk space • Wasn‘t cleaning up containers properly • Containers (Services) required a lot of bash magic • Test results took hours to complete – very slow Feedback cycle • Static number of executors Our pain with Jenkins
  6. 12 Drone CI Your friendly neighborhood CI system • Container

    native CI/CD platform (everything runs within containers) • Easy to install & maintain (docker pull drone/drone) • Isolated builds • Simple YAML Configuration (superset of docker-compose.yml) • Integrates with several VCS Providers • Rich set of official plugins (any container can be a plugin) • Execute locally with “drone exec” • Open Source (https://github.com/drone)
  7. 13 Drone CI Your friendly neighborhood CI system Server SERVICES

    WORKSPACE STEP1 git clone STEP2 make STEP3 publish Agent
  8. 14 Drone CI Let’s migrate to drone • Provision drone-server

    & drone-agents via ansible • Provide Docker containers for infrastructure components (PHP / databases / storages) • Gradual migration of “owncloud/core” from Jenkins / Travis to Drone – Basic linting and unit testing – Gradually migrated integration / acceptance tests and UI tests • Expand drone to app repositories – Required “plugin” to install and configure ownCloud => https://github.com/owncloud-ci – Built further custom plugins, e.g. recorder
  9. 15 Drone CI Recap – Where are we at now?

    • Too many systems to maintain • Secrets management • Frequently ran out of disk space • Static number of executors / timeouts • Containers required a lot of bash magic • CI environment not reproducible locally • No plugin system / limited extensibility Need to maintain 3 systems: Jenkins, Drone, Travis Drone provided us with API & UI Docker isn’t great at cleaning up after itself No time restriction, but amount of parallel jobs limited Container native Containers & drone exec Any container can be a plugin
  10. 17 Entering the golden age • Dropped Travis and Jenkins

    entirely • Scaling Drone agents on demand • Number of test suites still increasing • Entirely version controlled and easily manageable infrastructure – Terraform – Ansible – Hetzner Cloud – Autoscaler Final infrastructure
  11. 19 Entering the golden age • Support for AWS, DigitalOcean,

    Google, HetznerCloud • Planned to support Azure, Packet.net, Scaleway • Simple service connected to Drone server • Hooked into Drone CLI, e.g. “drone server create” • Checks the Drone queue in a loop • Launch servers based on a cloud-init config • Start Drone agent via remote Docker connection (secured by TLS) • Unregister Drone agent if not needed anymore • Destroy server instance after a minimal amount of time Welcome to “Autoscaler”
  12. 20 0 5000 10000 15000 20000 25000 30000 35000 Nov

    6-Nov 11-Nov 16-Nov 21-Nov 26-Nov Dec 5-Dec 12-Dec 17-Dec 22-Dec 27-Dec Jan 5-Jan 11-Jan 16-Jan 21-Jan 26-Jan 31-Jan 4-Feb 9-Feb 14-Feb 19-Feb 24-Feb Mar 5-Mar 10-Mar 15-Mar 20-Mar 25-Mar 30-Mar 3-Apr 8-Apr 13-Apr 18-Apr 26-Apr 1-May 7-May 12-May 17-May 22-May 28-May 1-Jun 6-Jun 11-Jun 16-Jun commulated runtime time to finish time to finish (including queue wait) Entering the golden age
  13. 23 Evolution of our CI • Native integration of build

    scheduling • Split feedback on pull requests per test suite • Archiving build data – we got lots of build logs • Downstream cross repository checks • Upstream cross repository checks • Native Windows agent support • Triggers – Invoke builds from different sources – Split pipelines into different configs What else is missing?
  14. 24 Evolution of our CI • Just adding ${FAVORITE_CI} as

    a tool to your company, doesn’t guarantee success – Promote the tool in your team / Get the team onboard – Gradual adoption helped to gain traction within teams • Everything works great, until you move to production – Load can kill your system… and also your CI system... • Smaller Infrastructure components vs. Monolithic Infrastructure – Docker default network limitations – General resource limits, e.g. Disks, IOPS, CPU, Memory Lessons Learned