How to mock a mocking bird - testing dynamic infrastructure

How to mock a mocking bird testing dynamic infrastructure

About the talk • Operations specific to distributed systems •
Types and sources of failures • Resiliency patterns • Strategies for introducing testing

Common causes for outages i. Code changes ii.Deployments iii.Dependency issues
– e.g github is down iv.External factors i. Traffic spikes ii.Inconsistent I/O

Amplifiers of outages • Topology – Zookeeper over WAN? –
MySQL Synchronous replication in network with • Type of service – Persistence layer are not latency tolerant – Web services and deployments

Amplifiers of outages • Coupling – DB migrations and deployment
• Code quality – inefficient algorithms (sort, object allocation, mutability) – Inefficient sql queries

Reference topology Too simplistic • Not cross region • Third
party dependencies • Operation services

Real life topology

Fault tolerant topology • Not designed, but emerged • Internet,
genes, social networks • Evolved in response to scale and failures

Testing : Stage 1 Assert Happy path scenario (most frequently
used) works Feature: zookeeper cluster provisioning Scenario: Bootstrapping a zookeeper cluster Given I have a chef server with all our cookbooks When I run `knife provision zk 3` Then I should have “3” nodes with “zk” role

Testing : Stage 1 Assert absence of known bugs (regressions)
Feature: zookeeper cluster provisioning Scenario: Bootstrapping a zookeeper cluster Given I have a chef server with all our cookbooks When I run `knife provision zk 3` Then I should have “3” nodes with “zk” role And all zk nodes should have zk.cnf populated

Testing : Stage 1 I.Tools: Cucumber, aruba, rspec II.Most valuable
with broken or non-deterministic tools III.Time consuming IV.Steep learning curve V.Limited documentation VI.Example works: @lordcope, @sethvargo

Testing : Stage 2 • Enforce better design

Testing : Stage 2 bash 'extract_sumologic' do user 'root' cwd
node[:sumologic][:rootdir] code <<-EOH [ -x collectorbin ] && collectorbin stop tar zxf #{node[:sumologic][:collector][:tarball]} chmod 755 sumocollector/collector cp sumocollector/tanuki/wrapperdir/wrapper sumocollector EOH if !File.exists? node[:sumologic] :rootdir] action :run else action :nothing end end

Testing : Stage 2 execute 'extract_sumologic' do user 'root' cwd
node[:sumologic][:rootdir] code ”cp sumocollector/tanuki/wrapperdir wrapper” only_if { File.exists? node[:sumologic][:rootdir]} end stub_command('test-f #{node[:sumologic][:rootdir]}') expect(runner).to create_execute('extract_sumologic').with( user: 'root', cwd: node[:sumologic][:rootdir], code: ”cp sumocollector/tanuki/wrapperdir wrapper” ) end

Testing : Stage 2 • Enforce better design • Consolidate
repeats

Testing : Stage 2 include_recipe 'foo' code some more code
even more code package 'foo' do action :install end template 'baz' do action :install end service bar do action [:start, :enable] end

Testing : Stage 2 include_recipe 'foo' extend Foo value =
process(node) package 'foo' do action :install end template 'baz' do action :install end service bar do action [:start, :enable] end module Foo def process(node) code some more code even more code end end

repeats • Use appropriate language / stdlib alternatives

Testing : Stage 2 search(:node, 'roles:cassandra').partition do | other |
node.ec2.placement_availability_zone == other.ec2.placement_availability_zone end

repeats • Use appropriate language / stdlib alternatives • Use appropriate chef idioms.

Testing : Stage 2 include_recipe 'foo' if node[:foo] package 'foo'
do action :install end template 'baz' do action :install end service 'bar' do action :start end end include_recipe 'foo' package 'foo' do action :install not_if { node[:foo]} end template 'baz' do action :install not_if { node[:foo]} end service 'bar' do action :start not_if { node[:foo]} end

Testing : Stage 2 • Lint & Unit testing •
Typos, syntax errors, logic – knife, ChefSpec, rubocop, foodcritic • Fast • Easier to adopt • Invaluable for long term maintainability • Shared conventions

Testing : Stage 3 • Deployments – Dark launching –
Canary releases – Blue-green deployment

Testing : Stage 4 • Dependency – Version compatibility •
ChefSpec • ServerSpec – Hosted services • Degraded mode

Testing : Stage 5 • External factors – Traffic patterns
• Gatling, JMeter – Network isolation, latency, jitter etc • iptables, tc

Testing : Stage 5 • External factos – resource starvation
in shared environments • ulimit, cgroup for memory • nice, cgroup for cpu • cgroup blkio for I/O.

Testing : Stage 6 • Combining of failures – Whole
environment provisioning • Containers, vms etc • chef-metal – Feedback driven tests • Benchmark across services • Measure & enforce minimal system resources • Alert on rate of change

Testing : Stage 7 • Combining failures – Message passing/orchestration
• mco • Ansible • knife-ssh • serf

Search, ssh & execute def knife(klass, *name_args) $stdout.sync klass.load_deps plugin
= klass.new yield plugin.config if Kernel.block_given? plugin.name_args = name_args plugin.run end def knife_ssh(search, com, pass, concurrency) knife Chef::Knife::Ssh, search, command do |config| config[:ssh_password] = password config[:host_key_verify] = false config[:concurrency] = concurrency end end

Summary • Accept failures – Make them inexpensive, isolated •
Design matters – Read – Incremental changes • Communication influence design – Avoid knowledge silos – Adopt cross team reviews

Thank you [email protected] @RanjibDey

How to mock a mocking bird - testing dynamic in...

How to mock a mocking bird - testing dynamic infrastructure

Ranjib Dey

More Decks by Ranjib Dey

Other Decks in Programming

Featured

Transcript