$30 off During Our Annual Pro Sale. View Details »

Evaluating a log analysis platform the wrong way (SF Metrics Meetup August 2017)

Evaluating a log analysis platform the wrong way (SF Metrics Meetup August 2017)

I was recently asked to investigate whether my team should switch from running our own ELK stack to paying for a SaaS logging vendor. Eventually, I concluded that we should switch, and so we did - but not without encountering significant pushback and unexpected difficulties along the way. In this talk, I'll explain the criteria we started out with for switching, what we did during the evaluation period, and what I wish we had done instead. We'll cover actionable lessons such as how to evaluate security, the right way to ask for feedback, and what you might not have thought to ask about in a vendor trial.

Amy Nguyen

August 09, 2017
Tweet

More Decks by Amy Nguyen

Other Decks in Technology

Transcript

  1. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Evaluating a log analysis
    platform the wrong way

    View Slide

  2. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - 3 week old engineer on
    Stripe's observability team
    - this talk is not about Stripe;
    that would be incredible
    - amynguyen.net
    @amyngyn
    Hi! I'm Amy.
    me irl

    View Slide

  3. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Once upon a time...

    View Slide

  4. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - We used logging to detect and investigate incidents.
    Once upon a time...

    View Slide

  5. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - We used logging to detect and investigate incidents.
    - Lack of expertise with maintaining hosted ELK
    (Elasticsearch, Logstash, Kibana)
    Once upon a time...

    View Slide

  6. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - We used logging to detect and investigate incidents.
    - Lack of expertise with maintaining hosted ELK
    (Elasticsearch, Logstash, Kibana)
    - Problem? Try adding more instances!
    Once upon a time...

    View Slide

  7. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - We used logging to detect and investigate incidents.
    - Lack of expertise with maintaining hosted ELK
    (Elasticsearch, Logstash, Kibana)
    - Problem? Try adding more instances!
    - ~1 full time engineer needed just
    for managing ELK problems
    Once upon a time...

    View Slide

  8. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - We used logging to detect and investigate incidents.
    - Lack of expertise with maintaining hosted ELK
    (Elasticsearch, Logstash, Kibana)
    - Problem? Try adding more instances!
    - ~1 full time engineer needed just
    for managing ELK problems
    - Why not shovel money
    towards our sanity?
    Once upon a time...

    View Slide

  9. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial

    View Slide

  10. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product

    View Slide

  11. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests

    View Slide

  12. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet

    View Slide

  13. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet
    4. Gather feedback and make a decision

    View Slide

  14. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet
    4. Gather feedback and make a decision

    View Slide

  15. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Sending your data to the vendor
    Can all of your data get into the system at scale?

    View Slide

  16. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Sending your data to the vendor
    Can all of your data get into the system at scale?
    AR
    ͞ È Y҉OU̶ ͝SUR
    ͠ E̸?̛
    Do I look like
    someone you want to
    trust?

    View Slide

  17. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - What tools do they offer for managing spikes or rogue
    collectors?
    Managing spikes and quotas

    View Slide

  18. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - What tools do they offer for managing spikes or rogue
    collectors?
    - What does the contract say?
    Managing spikes and quotas

    View Slide

  19. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - What tools do they offer for managing spikes or rogue
    collectors?
    - What does the contract say?
    Managing spikes and quotas

    View Slide

  20. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Summary: Integration is not simple.
    - Test every way you want to send data.
    - Figure out how the vendor would work with
    you when you pass your quota and how
    customer support works.
    - Know how you would stop a spike from
    eating your quota.

    View Slide

  21. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet
    4. Gather feedback and make a decision

    View Slide

  22. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    My original requirements
    - Deploy team: Is something wrong in canary?
    - SRE team: Is the site healthy? Why isn't it healthy?

    View Slide

  23. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Real customers
    Concrete problems
    Narrow scope
    My original requirements
    - Deploy team: Is something wrong in canary?
    - SRE team: Is the site healthy? Why isn't it healthy?
    The good parts of this approach

    View Slide

  24. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    The bad parts of my approach
    1. Not specific enough about how engineers want to accomplish goals

    View Slide

  25. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    The bad parts of my approach
    1. Not specific enough about how engineers want to accomplish goals
    Human-readable dashboards?
    APIs and scripts?
    Giant auto-refreshing TV displays?
    Alerts?

    View Slide

  26. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    The bad parts of my approach
    1. Not specific enough about how engineers want to accomplish goals
    Human-readable dashboards?
    APIs and scripts?
    Giant auto-refreshing TV displays?
    Alerts?
    2. "Requirements" are way bigger than just what engineers/users want!
    You have to be EXHAUSTIVE.

    View Slide

  27. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    The bad parts of my approach
    1. Not specific enough about how engineers want to accomplish goals
    Human-readable dashboards?
    APIs and scripts?
    Giant auto-refreshing TV displays?
    Alerts?
    2. "Requirements" are way bigger than just what engineers/users want!
    You have to be EXHAUSTIVE.
    Compliance and legal teams
    Security and IT teams
    Engineering managers
    Executives

    View Slide

  28. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  29. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  30. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    - Will users have access to the product after they have left your
    company?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  31. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    - Will users have access to the product after they have left your
    company?
    - Is anyone sharing accounts/passwords to access the platform?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  32. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    - Will users have access to the product after they have left your
    company?
    - Is anyone sharing accounts/passwords to access the platform?
    - Are there audit logs for who is querying for what data in the
    product?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  33. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    - Will users have access to the product after they have left your
    company?
    - Is anyone sharing accounts/passwords to access the platform?
    - Are there audit logs for who is querying for what data in the
    product?
    - How do you manage permissions for who can see each data type?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  34. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    - How do users authenticate? Do they offer SSO and 2FA? Will
    access be locked down to your VPN?
    - Will users have access to the product after they have left your
    company?
    - Is anyone sharing accounts/passwords to access the platform?
    - Are there audit logs for who is querying for what data in the
    product?
    - How do you manage permissions for who can see each data type?
    - How is data sent to the system? How do you manage API tokens?
    THE MOST IMPORTANT SLIDE: SECURITY

    View Slide

  35. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Takeaway: It's dangerous to go alone!
    1. Pick 1-2 teams with specific use cases and focus on
    evaluating the heck out of their requirements.
    2. If you're not an expert in X, find someone who can help
    you evaluate X.
    3. Make a list of every requirement and circulate it to
    make sure you're not missing anything.

    View Slide

  36. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet
    4. Gather feedback and make a decision

    View Slide

  37. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    New features the product offers you
    For example, anomaly detection, or something-something
    wow machine learning!

    View Slide

  38. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    New features the product offers you
    For example, anomaly detection, or something-something
    wow machine learning

    View Slide

  39. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    If the feature does not
    provide immediate value to
    you right now in its current
    state, IT DOES NOT
    PROVIDE VALUE.

    View Slide

  40. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Everything that can go wrong during a trial
    1. Integrate your data / system with the product
    2. Evaluate user requirements and feature requests
    3. Learn about what else the product can offer that
    you don't have yet
    4. Gather feedback and make a decision

    View Slide

  41. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Gathering feedback and making a decision
    1. Don't take "OK" for an answer!!!!!!!

    View Slide

  42. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Gathering feedback and making a decision
    1. Don't take "OK" for an answer!!!!!!!
    2. Simulate a situation where this product is your only tool.
    Is your team okay with this?

    View Slide

  43. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Gathering feedback and making a decision
    1. Don't take "OK" for an answer!!!!!!!
    2. Simulate a situation where this product is your only tool.
    Is your team okay with this?
    3. Be okay with deliberating.

    View Slide

  44. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Recap: Constant Vigilance
    1. Integrating your data
    Test for all of the situations you care about.

    View Slide

  45. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Recap: Constant Vigilance
    1. Integrating your data
    Test for all of the situations you care about.
    2. Evaluating requirements
    Don't do it alone!

    View Slide

  46. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Recap: Constant Vigilance
    1. Integrating your data
    Test for all of the situations you care about.
    2. Evaluating requirements
    Don't do it alone!
    3. Learning about the product
    Don't get pulled in by shiny features.

    View Slide

  47. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Recap: Constant Vigilance
    1. Integrating your data
    Test for all of the situations you care about.
    2. Evaluating requirements
    Don't do it alone!
    3. Learning about the product
    Don't get pulled in by shiny features.
    4. Gathering feedback
    Make sure the feedback you get is genuine.

    View Slide

  48. Amy Nguyen @amyngyn SF Metrics Meetup August 2017
    Thanks!

    View Slide