Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tools on Tour - A look at what it takes to test...

Michael Heap
September 03, 2013

Tools on Tour - A look at what it takes to test the firehose

Testing a platform that has a lot of moving parts isn't an easy thing to do. Although existing tools kind of help, they're quite difficult to retrofit into our workflow. That's why we've ended up building a lot of our own tools. This talk will take a look at the DataSift platform architecture and talk through how we test the system as a whole as well as each individual component. As well as that, we'll cover how to control your environment so that tests are deterministic and we'll take a look at how you can deduce when something goes wrong even if you don't have tests ready to cover that piece of functionality.

Michael Heap

September 03, 2013
Tweet

More Decks by Michael Heap

Other Decks in Technology

Transcript

  1. Tools on Tour A look at what it takes to

    test the firehose @mheap from @datasift 1 Thursday, 5 September 13
  2. A little about me I’m Michael (@mheap ) I work

    at @datasift I primarily write PHP But sometimes I end up writing NodeJS 2 Thursday, 5 September 13 Let’s start with a bit about me. I’m @mheap on twitter and I work for a company called DataSift. We process and filter huge amounts of data in realtime, and we’re most commonly known as a company that has access to the Twitter firehose. We store 0.5 TB of data per day, and process much more. I write PHP and Node. Today, we’re going to talk about how we test THIS:
  3. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 3 Thursday, 5

    September 13 This is our current architecture. It looks like quite a lot to test, but it’s not too bad when you break it down into smaller pieces. So, let’s start there.
  4. Let’s start with what’s familiar 4 Thursday, 5 September 13

    I hope that unit tests are something that everyone’s at least heard of. They’re useful for making sure that given a certain input, your code will produce a certain output. They usually cover a few dozen lines of code *at the most*
  5. Let’s start with what’s familiar Unit Tests 4 Thursday, 5

    September 13 I hope that unit tests are something that everyone’s at least heard of. They’re useful for making sure that given a certain input, your code will produce a certain output. They usually cover a few dozen lines of code *at the most*
  6. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 5 Thursday, 5

    September 13 We use unit tests throughout the system for various different things, but I have most experience writing them for the delivery pipeline. We write tests to prove things like “we can parse multiple data formats simultaneously” and “if a subscription is missing a stream hash, validation will fail”. Unfortunately we have a lot of legacy code that just isn’t testable with unit tests, so we need to find a way to test larger pieces of functionality.
  7. 1/1 The architecture 5 Thursday, 5 September 13 We use

    unit tests throughout the system for various different things, but I have most experience writing them for the delivery pipeline. We write tests to prove things like “we can parse multiple data formats simultaneously” and “if a subscription is missing a stream hash, validation will fail”. Unfortunately we have a lot of legacy code that just isn’t testable with unit tests, so we need to find a way to test larger pieces of functionality.
  8. 1/1 The architecture 5 Thursday, 5 September 13 We use

    unit tests throughout the system for various different things, but I have most experience writing them for the delivery pipeline. We write tests to prove things like “we can parse multiple data formats simultaneously” and “if a subscription is missing a stream hash, validation will fail”. Unfortunately we have a lot of legacy code that just isn’t testable with unit tests, so we need to find a way to test larger pieces of functionality.
  9. So how do we test it all? 6 Thursday, 5

    September 13 If we can’t test individual components, the next best thing is to prove that it works end to end. We use Behat for this, as it allows our product team to spec features that we can run against features in development. If they pass, the feature is done. It’s nice to have a clear definition of done.
  10. So how do we test it all? Acceptance Tests 6

    Thursday, 5 September 13 If we can’t test individual components, the next best thing is to prove that it works end to end. We use Behat for this, as it allows our product team to spec features that we can run against features in development. If they pass, the feature is done. It’s nice to have a clear definition of done.
  11. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 7 Thursday, 5

    September 13 Back to the diagram, acceptance tests test *every* component. We try to do things like “create a destination for us to push data to” to prove that it works for customers. If it works for customers, they can give us money, and that’s the main concern, right? These tests are very brittle, however. Need to program defensively, use lookup tables for things that could change
  12. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 7 Thursday, 5

    September 13 Back to the diagram, acceptance tests test *every* component. We try to do things like “create a destination for us to push data to” to prove that it works for customers. If it works for customers, they can give us money, and that’s the main concern, right? These tests are very brittle, however. Need to program defensively, use lookup tables for things that could change
  13. The bit in the middle? 8 Thursday, 5 September 13

    As the platform's so big, we needed something that kind of fits in-between unit and acceptance tests. Something that let us test one piece of software in it's entirety without testing all of it's dependencies as well.
  14. The bit in the middle? Integration Tests 8 Thursday, 5

    September 13 As the platform's so big, we needed something that kind of fits in-between unit and acceptance tests. Something that let us test one piece of software in it's entirety without testing all of it's dependencies as well.
  15. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 9 Thursday, 5

    September 13 Integration testing gets you the best of both worlds. You can prove that a subset of functionality works without needing to test every function call in isolation.
  16. 1/1 The architecture 9 Thursday, 5 September 13 Integration testing

    gets you the best of both worlds. You can prove that a subset of functionality works without needing to test every function call in isolation.
  17. 1/1 The architecture 9 Thursday, 5 September 13 Integration testing

    gets you the best of both worlds. You can prove that a subset of functionality works without needing to test every function call in isolation.
  18. Tools on Tour StoryPlayer 10 Thursday, 5 September 13 Enter

    Storyplayer! This is the bit that fits in the middle
  19. Tools on Tour He’s @stuherbert He also works @datasift 11

    Thursday, 5 September 13 It’s the brainchild of this fella who unfortunately can’t be with us tonight, which is why you’re stuck with me.
  20. Storyplayer Acceptance Tests 12 Thursday, 5 September 13 Storyplayer *can*

    be your acceptance test layer, but up until yesterday I would have told you that it’s not really where it’s strengths are. That changed with the addition of SauceLabs support. If you don’t know SauceLabs, they’re browser testing in the cloud, highly recommended.
  21. Storyplayer Eight Phases 13 Thursday, 5 September 13 Storyplayer has

    Eight phases. I’m going to quickly run through them then give you a demo
  22. Storyplayer [1] Test environment Setup [2] Test Setup [1] Test

    environment Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 14 Thursday, 5 September 13 The first two are TestEnvironmentSetup and TestSetup. These are used for creating your test conditions
  23. Storyplayer [3] Pre-test Prediction [4] pre-test Inspection [1] Test environment

    Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 15 Thursday, 5 September 13 Pre-test prediction is for things like “this story *might* fail but that’s ok”. Pre-test inspection is for getting values out of your test environment before you start
  24. Storyplayer [5] Action [6] Post-test Inspection [1] Test environment Setup

    [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 16 Thursday, 5 September 13 This is where your test run actually happens. Post-test inspection is where you assert that things happened as you expected
  25. Storyplayer [7] Test Teardown [8] Test Environment teardown [1] Test

    environment Setup [2] Test Setup [3] pre-test prediction [4] pre-test inspection [5] Action [6] Post-test inspection [5] Test Teardown [6] Test environment teardown 17 Thursday, 5 September 13 Then you undo what you did in the setup. Things like killing processes, stopping VM’s and closing web browsers
  26. Storyplayer [1] Test environment Setup [2] Test Setup [3] pre-test

    prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 18 Thursday, 5 September 13 So to recap, we have 8 phases
  27. Storyplayer [1] Test environment Setup [2] Test Setup [3] pre-test

    prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 19 Thursday, 5 September 13 Of those, only two are mandatory.
  28. Storyplayer [1] Test environment Setup [2] Test Setup [3] pre-test

    prediction [4] pre-test inspection [5] Action [6] Post-test inspection [7] Test Teardown [8] Test environment teardown 20 Thursday, 5 September 13 Actually, only *one* is mandatory as the post test inspection might be done by a person
  29. Storyplayer * fromXXXX - Get state * ExpectsXXXX - Test

    state * UsingXXXX - Change state 21 Thursday, 5 September 13 Storyplayer has 3 kinds of action. “From” to get the state of something e.g. an IP address from a VM. “Expects” to make sure something is as we expect e.g. process is running, and “Using”, which is what actually runs commands during our test
  30. Demo Demo Here 22 Thursday, 5 September 13 Show acceptance

    test:<quickly swap out to live demo of Behat’s ls example> and a browser demo.
  31. The bit in the middle? Integration Tests 23 Thursday, 5

    September 13 I’m going to focus on it’s talents as an integration test tool.
  32. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 24 Thursday, 5

    September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  33. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  34. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  35. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  36. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  37. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering Delivery 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  38. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering Delivery Historics 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  39. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering Delivery Historics Filtering 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  40. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering Delivery Historics Filtering Hornet 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  41. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture Public Data ingestion

    Private Data ingestion Supporting services Filtering Delivery Historics Doppelgangerd Filtering Hornet 24 Thursday, 5 September 13 So let’s break the architecture down into groups. This week I’ve been working on our filtering component, and needed to control what went into the service. To do this, we use a tool called Hornet that generates data in our specific format. We also replace our supporting services with something called Doppelgangerd, which will serve predefined responses when asking for user info + billing status. As we control everything else, Filtering becomes our unit under test
  42. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  43. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  44. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  45. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  46. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  47. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  48. 8/19/13 1136 (1612×1072) redmine.datasift.net/attachments/download/1136 1/1 The architecture 25 Thursday, 5

    September 13 Whilst being able to control your platform like this, isn’t not really possible when you have multiple people all trying to use the same infrastructure.
  49. Personal environments 26 Thursday, 5 September 13 For our test

    environments, we use a combination of Virtualbox, Vagrant and Ansible
  50. Personal environments 26 Thursday, 5 September 13 For our test

    environments, we use a combination of Virtualbox, Vagrant and Ansible
  51. Personal environments 26 Thursday, 5 September 13 For our test

    environments, we use a combination of Virtualbox, Vagrant and Ansible
  52. Personal environments 26 Thursday, 5 September 13 For our test

    environments, we use a combination of Virtualbox, Vagrant and Ansible
  53. PHPNW13 Go there Learn things 27 Thursday, 5 September 13

    I know I’m probably preaching to the choir, but come to PHPNW13 to learn more. I’ll be there talking about vagrant + ansible so I’ll skip over this bit. If you can’t make it, come and talk to me later if you’re interested
  54. Storyplayer [1] Test environment Setup [2] Test Setup 28 Thursday,

    5 September 13 Your test environment is things that we need to set up every time. These are normally handled by a StoryTemplate. A StoryTemplate is a set of functions that you can apply to your stories to save on setup code. Kind of like inheritance, but you can be based on multiple stories. Your test setup is things you need that are specific to this story.
  55. Storyplayer [1] Virtual Machines [2] Doppelgangerd + Hornet 29 Thursday,

    5 September 13 When I was testing Filtering, this involved spinning up a VM and deploying filtering to it, then spinning up Doppelgangerd and Hornet to provide additional data.
  56. Storyplayer [3] Tests shouldn’t usually fail [4] extract numbers like

    RAM usage 31 Thursday, 5 September 13 Honestly, I’ve never had to mark a test as being able to fail, but it’s useful to have the option. Extracting existing numbers is useful e.g. testing the ACL stats to see how many items are processed
  57. Storyplayer [5] Do your things [6] is the result correct?

    33 Thursday, 5 September 13 DO ALL THE THINGS, then make sure the results you got back are correct. Is the data structure a string, does it contain what you were expecting?
  58. Storyplayer [7] Stop doppelgangerd + Hornet [8] Stop VM 35

    Thursday, 5 September 13 Kill everything that we started, ready to try again.
  59. Repeat Runs Repeat Runs 36 Thursday, 5 September 13 100%

    Automated tests are the best. Run them over and over. Store pass/fail results and compare over time. Look at trends, actual usage of RAM, throughput, latency. We’re looking to hire someone to write our data collation + visualisation system. Let me know if you’re interested.
  60. Human Factor Automated Tests 37 Thursday, 5 September 13 Sometimes

    though, automated tests don't tell the whole story. You need people to look at the numbers and analyse them. For this we use a combination of Graphite and SavageD (another Datasift tool). Look at numbers in/out. Case in point, Ogre (a message queueing system) + the 0.5s sleep (for testing what happens when it backs up). Easy to miss for a human, but by graphing things it was obvious
  61. Human Factor Automated Tests Sometimes Mask Problems 37 Thursday, 5

    September 13 Sometimes though, automated tests don't tell the whole story. You need people to look at the numbers and analyse them. For this we use a combination of Graphite and SavageD (another Datasift tool). Look at numbers in/out. Case in point, Ogre (a message queueing system) + the 0.5s sleep (for testing what happens when it backs up). Easy to miss for a human, but by graphing things it was obvious
  62. Sometimes you just have to Be invasive 38 Thursday, 5

    September 13 Get right in there, put things between services to work out what’s going on.
  63. Last week, I was a Thieving Troll 39 Thursday, 5

    September 13 All our services talk to each other via ZeroMQ. By pretending to be what each service thought it was talking to, I could intercept all traffic and log it to disk before passing it on to the service that was expecting it. All it took was a little bit of configuration editing.
  64. When you’re this close, run Simple tests 40 Thursday, 5

    September 13 If you’re on a machine talking to sockets, run the simplest test you can. We use zmqpp - if there’s data coming out it pipes it to stdout. Nothing fancy, but it also means nothing can go wrong. Then you know if the problem is your test or the upstream service. zmqpp doesn’t lie.
  65. And finally Monitoring Production 41 Thursday, 5 September 13 Graphite

    is our best friend. Once you know how your system *should* be running, it’s very easy to work out when it isn’t running correctly. We currently increment about 64,000 metrics per minute. We also introduced audit counters, tracking how much data flows through each service + making sure that the numbers add up. Finally, taint interactions and watch them flow through the system (working on this).
  66. Roundup Multiple Strategies 42 Thursday, 5 September 13 There are

    multiple strategies for testing. Personally I really like the integration test strategy, and Storyplayer’s a great tool to accomplish it. Development’s progressing rapidly on it (we’re tagging 1.4 soon!) and the docs are coming along too. Tested on OSX and Linux, but should be fairly easy to port to Windows as it’s majoritively PHP code.
  67. Finally 43 Thursday, 5 September 13 There’s plenty of tools

    out there for testing already, so why do we need to build our own? Off the shelf tools: not designed for pipeline architecture, can’t generate firehose scale (10x before launch), don’t support complex environments (one env. takes 24 VM’s, 8 physical boxes), we build what we need when we need. * no-one has ever built a billion-dollar software company without investing in their own
  68. Finally Why build tools? 43 Thursday, 5 September 13 There’s

    plenty of tools out there for testing already, so why do we need to build our own? Off the shelf tools: not designed for pipeline architecture, can’t generate firehose scale (10x before launch), don’t support complex environments (one env. takes 24 VM’s, 8 physical boxes), we build what we need when we need. * no-one has ever built a billion-dollar software company without investing in their own
  69. Datasift We’re Hiring Devs, Ops, QA, Sales, Product + more

    44 Thursday, 5 September 13 We’re hiring for pretty much any role you can think of. If you’re interested, catch me later and we can chat about it