Tools on Tour - A look at what it takes to test the firehose

Tools on Tour A look at what it takes to
test the firehose @mheap from @datasift 1 Thursday, 5 September 13

A little about me I’m Michael (@mheap ) I work
at @datasift I primarily write PHP But sometimes I end up writing NodeJS 2 Thursday, 5 September 13 Let’s start with a bit about me. I’m @mheap on twitter and I work for a company called DataSift. We process and ﬁlter huge amounts of data in realtime, and we’re most commonly known as a company that has access to the Twitter ﬁrehose. We store 0.5 TB of data per day, and process much more. I write PHP and Node. Today, we’re going to talk about how we test THIS:

8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 3 Thursday, 5
September 13 This is our current architecture. It looks like quite a lot to test, but it’s not too bad when you break it down into smaller pieces. So, let’s start there.

Let’s start with what’s familiar 4 Thursday, 5 September 13
I hope that unit tests are something that everyone’s at least heard of. They’re useful for making sure that given a certain input, your code will produce a certain output. They usually cover a few dozen lines of code *at the most*

Let’s start with what’s familiar Unit Tests 4 Thursday, 5
September 13 I hope that unit tests are something that everyone’s at least heard of. They’re useful for making sure that given a certain input, your code will produce a certain output. They usually cover a few dozen lines of code *at the most*

September 13 We use unit tests throughout the system for various different things, but I have most experience writing them for the delivery pipeline. We write tests to prove things like “we can parse multiple data formats simultaneously” and “if a subscription is missing a stream hash, validation will fail”. Unfortunately we have a lot of legacy code that just isn’t testable with unit tests, so we need to ﬁnd a way to test larger pieces of functionality.

1/1 The architecture 5 Thursday, 5 September 13 We use
unit tests throughout the system for various different things, but I have most experience writing them for the delivery pipeline. We write tests to prove things like “we can parse multiple data formats simultaneously” and “if a subscription is missing a stream hash, validation will fail”. Unfortunately we have a lot of legacy code that just isn’t testable with unit tests, so we need to ﬁnd a way to test larger pieces of functionality.

So how do we test it all? 6 Thursday, 5
September 13 If we can’t test individual components, the next best thing is to prove that it works end to end. We use Behat for this, as it allows our product team to spec features that we can run against features in development. If they pass, the feature is done. It’s nice to have a clear deﬁnition of done.

So how do we test it all? Acceptance Tests 6
Thursday, 5 September 13 If we can’t test individual components, the next best thing is to prove that it works end to end. We use Behat for this, as it allows our product team to spec features that we can run against features in development. If they pass, the feature is done. It’s nice to have a clear deﬁnition of done.

September 13 Back to the diagram, acceptance tests test *every* component. We try to do things like “create a destination for us to push data to” to prove that it works for customers. If it works for customers, they can give us money, and that’s the main concern, right? These tests are very brittle, however. Need to program defensively, use lookup tables for things that could change

The bit in the middle? 8 Thursday, 5 September 13
As the platform's so big, we needed something that kind of ﬁts in-between unit and acceptance tests. Something that let us test one piece of software in it's entirety without testing all of it's dependencies as well.

The bit in the middle? Integration Tests 8 Thursday, 5
September 13 As the platform's so big, we needed something that kind of ﬁts in-between unit and acceptance tests. Something that let us test one piece of software in it's entirety without testing all of it's dependencies as well.

September 13 Integration testing gets you the best of both worlds. You can prove that a subset of functionality works without needing to test every function call in isolation.

1/1 The architecture 9 Thursday, 5 September 13 Integration testing
gets you the best of both worlds. You can prove that a subset of functionality works without needing to test every function call in isolation.

Tools on Tour StoryPlayer 10 Thursday, 5 September 13 Enter
Storyplayer! This is the bit that ﬁts in the middle

Tools on Tour He’s @stuherbert He also works @datasift 11
Thursday, 5 September 13 It’s the brainchild of this fella who unfortunately can’t be with us tonight, which is why you’re stuck with me.

Storyplayer Acceptance Tests 12 Thursday, 5 September 13 Storyplayer *can*
be your acceptance test layer, but up until yesterday I would have told you that it’s not really where it’s strengths are. That changed with the addition of SauceLabs support. If you don’t know SauceLabs, they’re browser testing in the cloud, highly recommended.

Storyplayer Eight Phases 13 Thursday, 5 September 13 Storyplayer has
Eight phases. I’m going to quickly run through them then give you a demo

Storyplayer [1] Test environment Setup [2] Test Setup [1] Test
environment Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 14 Thursday, 5 September 13 The ﬁrst two are TestEnvironmentSetup and TestSetup. These are used for creating your test conditions

Storyplayer [3] Pre-test Prediction [4] pre-test Inspection [1] Test environment
Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 15 Thursday, 5 September 13 Pre-test prediction is for things like “this story *might* fail but that’s ok”. Pre-test inspection is for getting values out of your test environment before you start

Storyplayer [5] Action [6] Post-test Inspection [1] Test environment Setup
[2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 16 Thursday, 5 September 13 This is where your test run actually happens. Post-test inspection is where you assert that things happened as you expected

Storyplayer [7] Test Teardown [8] Test Environment teardown [1] Test
environment Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 17 Thursday, 5 September 13 Then you undo what you did in the setup. Things like killing processes, stopping VM’s and closing web browsers

Storyplayer [1] Test environment Setup [2] Test Setup [3] pre-test
prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 18 Thursday, 5 September 13 So to recap, we have 8 phases

prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 19 Thursday, 5 September 13 Of those, only two are mandatory.

prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 20 Thursday, 5 September 13 Actually, only *one* is mandatory as the post test inspection might be done by a person

Storyplayer * fromXXXX - Get state * ExpectsXXXX - Test
state * UsingXXXX - Change state 21 Thursday, 5 September 13 Storyplayer has 3 kinds of action. “From” to get the state of something e.g. an IP address from a VM. “Expects” to make sure something is as we expect e.g. process is running, and “Using”, which is what actually runs commands during our test

Demo Demo Here 22 Thursday, 5 September 13 Show acceptance
test:<quickly swap out to live demo of Behat’s ls example> and a browser demo.

The bit in the middle? Integration Tests 23 Thursday, 5
September 13 I’m going to focus on it’s talents as an integration test tool.

September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion
24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering Delivery 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering Delivery Historics 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering Delivery Historics Filtering 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering Delivery Historics Filtering Hornet 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

Private Data ingestion Supporting services Filtering Delivery Historics Doppelgangerd Filtering Hornet 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test

September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.

Personal environments 26 Thursday, 5 September 13 For our test
environments, we use a combination of Virtualbox, Vagrant and Ansible

PHPNW13 Go there Learn things 27 Thursday, 5 September 13
I know I’m probably preaching to the choir, but come to PHPNW13 to learn more. I’ll be there talking about vagrant + ansible so I’ll skip over this bit. If you can’t make it, come and talk to me later if you’re interested

Storyplayer [1] Test environment Setup [2] Test Setup 28 Thursday,
5 September 13 Your test environment is things that we need to set up every time. These are normally handled by a StoryTemplate. A StoryTemplate is a set of functions that you can apply to your stories to save on setup code. Kind of like inheritance, but you can be based on multiple stories. Your test setup is things you need that are speciﬁc to this story.

Storyplayer [1] Virtual Machines [2] Doppelgangerd + Hornet 29 Thursday,
5 September 13 When I was testing Filtering, this involved spinning up a VM and deploying ﬁltering to it, then spinning up Doppelgangerd and Hornet to provide additional data.

Storyplayer [3] Pre-test Prediction [4] pre-test Inspection 30 Thursday, 5
September 13

Storyplayer [3] Tests shouldn’t usually fail [4] extract numbers like
RAM usage 31 Thursday, 5 September 13 Honestly, I’ve never had to mark a test as being able to fail, but it’s useful to have the option. Extracting existing numbers is useful e.g. testing the ACL stats to see how many items are processed

Storyplayer [5] Action [6] Post-test Inspection 32 Thursday, 5 September
13

Storyplayer [5] Do your things [6] is the result correct?
33 Thursday, 5 September 13 DO ALL THE THINGS, then make sure the results you got back are correct. Is the data structure a string, does it contain what you were expecting?

Storyplayer [7] Test Teardown [8] Test Environment teaRdown 34 Thursday,
5 September 13

Storyplayer [7] Stop doppelgangerd + Hornet [8] Stop VM 35
Thursday, 5 September 13 Kill everything that we started, ready to try again.

Repeat Runs Repeat Runs 36 Thursday, 5 September 13 100%
Automated tests are the best. Run them over and over. Store pass/fail results and compare over time. Look at trends, actual usage of RAM, throughput, latency. We’re looking to hire someone to write our data collation + visualisation system. Let me know if you’re interested.

Human Factor Automated Tests 37 Thursday, 5 September 13 Sometimes
though, automated tests don't tell the whole story. You need people to look at the numbers and analyse them. For this we use a combination of Graphite and SavageD (another Datasift tool). Look at numbers in/out. Case in point, Ogre (a message queueing system) + the 0.5s sleep (for testing what happens when it backs up). Easy to miss for a human, but by graphing things it was obvious

Human Factor Automated Tests Sometimes Mask Problems 37 Thursday, 5
September 13 Sometimes though, automated tests don't tell the whole story. You need people to look at the numbers and analyse them. For this we use a combination of Graphite and SavageD (another Datasift tool). Look at numbers in/out. Case in point, Ogre (a message queueing system) + the 0.5s sleep (for testing what happens when it backs up). Easy to miss for a human, but by graphing things it was obvious

Sometimes you just have to Be invasive 38 Thursday, 5
September 13 Get right in there, put things between services to work out what’s going on.

Last week, I was a Thieving Troll 39 Thursday, 5
September 13 All our services talk to each other via ZeroMQ. By pretending to be what each service thought it was talking to, I could intercept all traffic and log it to disk before passing it on to the service that was expecting it. All it took was a little bit of conﬁguration editing.

When you’re this close, run Simple tests 40 Thursday, 5
September 13 If you’re on a machine talking to sockets, run the simplest test you can. We use zmqpp - if there’s data coming out it pipes it to stdout. Nothing fancy, but it also means nothing can go wrong. Then you know if the problem is your test or the upstream service. zmqpp doesn’t lie.

And finally Monitoring Production 41 Thursday, 5 September 13 Graphite
is our best friend. Once you know how your system *should* be running, it’s very easy to work out when it isn’t running correctly. We currently increment about 64,000 metrics per minute. We also introduced audit counters, tracking how much data ﬂows through each service + making sure that the numbers add up. Finally, taint interactions and watch them ﬂow through the system (working on this).

Roundup Multiple Strategies 42 Thursday, 5 September 13 There are
multiple strategies for testing. Personally I really like the integration test strategy, and Storyplayer’s a great tool to accomplish it. Development’s progressing rapidly on it (we’re tagging 1.4 soon!) and the docs are coming along too. Tested on OSX and Linux, but should be fairly easy to port to Windows as it’s majoritively PHP code.

Finally 43 Thursday, 5 September 13 There’s plenty of tools
out there for testing already, so why do we need to build our own? Off the shelf tools: not designed for pipeline architecture, can’t generate ﬁrehose scale (10x before launch), don’t support complex environments (one env. takes 24 VM’s, 8 physical boxes), we build what we need when we need. * no-one has ever built a billion-dollar software company without investing in their own

Finally Why build tools? 43 Thursday, 5 September 13 There’s
plenty of tools out there for testing already, so why do we need to build our own? Off the shelf tools: not designed for pipeline architecture, can’t generate ﬁrehose scale (10x before launch), don’t support complex environments (one env. takes 24 VM’s, 8 physical boxes), we build what we need when we need. * no-one has ever built a billion-dollar software company without investing in their own

Datasift We’re Hiring Devs, Ops, QA, Sales, Product + more
44 Thursday, 5 September 13 We’re hiring for pretty much any role you can think of. If you’re interested, catch me later and we can chat about it

Questions Any Questions? 45 Thursday, 5 September 13

Tools on Tour - A look at what it takes to test...

Tools on Tour - A look at what it takes to test the firehose

More Decks by Michael Heap

Other Decks in Technology

Featured

Transcript