Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evaluating a log analysis platform the wrong way (SF Metrics Meetup August 2017)

Amy Nguyen
August 09, 2017

Evaluating a log analysis platform the wrong way (SF Metrics Meetup August 2017)

I was recently asked to investigate whether my team should switch from running our own ELK stack to paying for a SaaS logging vendor. Eventually, I concluded that we should switch, and so we did - but not without encountering significant pushback and unexpected difficulties along the way. In this talk, I'll explain the criteria we started out with for switching, what we did during the evaluation period, and what I wish we had done instead. We'll cover actionable lessons such as how to evaluate security, the right way to ask for feedback, and what you might not have thought to ask about in a vendor trial.

Amy Nguyen

August 09, 2017
Tweet

More Decks by Amy Nguyen

Other Decks in Technology

Transcript

  1. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - 3

    week old engineer on Stripe's observability team - this talk is not about Stripe; that would be incredible - amynguyen.net @amyngyn Hi! I'm Amy. me irl
  2. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - We

    used logging to detect and investigate incidents. Once upon a time...
  3. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - We

    used logging to detect and investigate incidents. - Lack of expertise with maintaining hosted ELK (Elasticsearch, Logstash, Kibana) Once upon a time...
  4. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - We

    used logging to detect and investigate incidents. - Lack of expertise with maintaining hosted ELK (Elasticsearch, Logstash, Kibana) - Problem? Try adding more instances! Once upon a time...
  5. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - We

    used logging to detect and investigate incidents. - Lack of expertise with maintaining hosted ELK (Elasticsearch, Logstash, Kibana) - Problem? Try adding more instances! - ~1 full time engineer needed just for managing ELK problems Once upon a time...
  6. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - We

    used logging to detect and investigate incidents. - Lack of expertise with maintaining hosted ELK (Elasticsearch, Logstash, Kibana) - Problem? Try adding more instances! - ~1 full time engineer needed just for managing ELK problems - Why not shovel money towards our sanity? Once upon a time...
  7. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product
  8. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests
  9. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet
  10. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet 4. Gather feedback and make a decision
  11. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet 4. Gather feedback and make a decision
  12. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Sending your

    data to the vendor Can all of your data get into the system at scale?
  13. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Sending your

    data to the vendor Can all of your data get into the system at scale? AR ͞ È Y҉OU̶ ͝SUR ͠ E̸?̛ Do I look like someone you want to trust?
  14. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - What

    tools do they offer for managing spikes or rogue collectors? Managing spikes and quotas
  15. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - What

    tools do they offer for managing spikes or rogue collectors? - What does the contract say? Managing spikes and quotas
  16. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - What

    tools do they offer for managing spikes or rogue collectors? - What does the contract say? Managing spikes and quotas
  17. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Summary: Integration

    is not simple. - Test every way you want to send data. - Figure out how the vendor would work with you when you pass your quota and how customer support works. - Know how you would stop a spike from eating your quota.
  18. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet 4. Gather feedback and make a decision
  19. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 My original

    requirements - Deploy team: Is something wrong in canary? - SRE team: Is the site healthy? Why isn't it healthy?
  20. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Real customers

    Concrete problems Narrow scope My original requirements - Deploy team: Is something wrong in canary? - SRE team: Is the site healthy? Why isn't it healthy? The good parts of this approach
  21. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 The bad

    parts of my approach 1. Not specific enough about how engineers want to accomplish goals
  22. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 The bad

    parts of my approach 1. Not specific enough about how engineers want to accomplish goals Human-readable dashboards? APIs and scripts? Giant auto-refreshing TV displays? Alerts?
  23. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 The bad

    parts of my approach 1. Not specific enough about how engineers want to accomplish goals Human-readable dashboards? APIs and scripts? Giant auto-refreshing TV displays? Alerts? 2. "Requirements" are way bigger than just what engineers/users want! You have to be EXHAUSTIVE.
  24. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 The bad

    parts of my approach 1. Not specific enough about how engineers want to accomplish goals Human-readable dashboards? APIs and scripts? Giant auto-refreshing TV displays? Alerts? 2. "Requirements" are way bigger than just what engineers/users want! You have to be EXHAUSTIVE. Compliance and legal teams Security and IT teams Engineering managers Executives
  25. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? THE MOST IMPORTANT SLIDE: SECURITY
  26. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? - Will users have access to the product after they have left your company? THE MOST IMPORTANT SLIDE: SECURITY
  27. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? - Will users have access to the product after they have left your company? - Is anyone sharing accounts/passwords to access the platform? THE MOST IMPORTANT SLIDE: SECURITY
  28. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? - Will users have access to the product after they have left your company? - Is anyone sharing accounts/passwords to access the platform? - Are there audit logs for who is querying for what data in the product? THE MOST IMPORTANT SLIDE: SECURITY
  29. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? - Will users have access to the product after they have left your company? - Is anyone sharing accounts/passwords to access the platform? - Are there audit logs for who is querying for what data in the product? - How do you manage permissions for who can see each data type? THE MOST IMPORTANT SLIDE: SECURITY
  30. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 - How

    do users authenticate? Do they offer SSO and 2FA? Will access be locked down to your VPN? - Will users have access to the product after they have left your company? - Is anyone sharing accounts/passwords to access the platform? - Are there audit logs for who is querying for what data in the product? - How do you manage permissions for who can see each data type? - How is data sent to the system? How do you manage API tokens? THE MOST IMPORTANT SLIDE: SECURITY
  31. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Takeaway: It's

    dangerous to go alone! 1. Pick 1-2 teams with specific use cases and focus on evaluating the heck out of their requirements. 2. If you're not an expert in X, find someone who can help you evaluate X. 3. Make a list of every requirement and circulate it to make sure you're not missing anything.
  32. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet 4. Gather feedback and make a decision
  33. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 New features

    the product offers you For example, anomaly detection, or something-something wow machine learning!
  34. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 New features

    the product offers you For example, anomaly detection, or something-something wow machine learning
  35. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 If the

    feature does not provide immediate value to you right now in its current state, IT DOES NOT PROVIDE VALUE.
  36. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Everything that

    can go wrong during a trial 1. Integrate your data / system with the product 2. Evaluate user requirements and feature requests 3. Learn about what else the product can offer that you don't have yet 4. Gather feedback and make a decision
  37. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Gathering feedback

    and making a decision 1. Don't take "OK" for an answer!!!!!!!
  38. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Gathering feedback

    and making a decision 1. Don't take "OK" for an answer!!!!!!! 2. Simulate a situation where this product is your only tool. Is your team okay with this?
  39. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Gathering feedback

    and making a decision 1. Don't take "OK" for an answer!!!!!!! 2. Simulate a situation where this product is your only tool. Is your team okay with this? 3. Be okay with deliberating.
  40. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Recap: Constant

    Vigilance 1. Integrating your data Test for all of the situations you care about.
  41. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Recap: Constant

    Vigilance 1. Integrating your data Test for all of the situations you care about. 2. Evaluating requirements Don't do it alone!
  42. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Recap: Constant

    Vigilance 1. Integrating your data Test for all of the situations you care about. 2. Evaluating requirements Don't do it alone! 3. Learning about the product Don't get pulled in by shiny features.
  43. Amy Nguyen @amyngyn SF Metrics Meetup August 2017 Recap: Constant

    Vigilance 1. Integrating your data Test for all of the situations you care about. 2. Evaluating requirements Don't do it alone! 3. Learning about the product Don't get pulled in by shiny features. 4. Gathering feedback Make sure the feedback you get is genuine.