$30 off During Our Annual Pro Sale. View Details »

Alert Handling with Datadog Incident Management

Alert Handling with Datadog Incident Management

Takeshi Kondo

August 25, 2020
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Alert Handling
    with Datadog Incident Management
    Takeshi Kondo / @chaspy
    2020/08/25
    JDDUG meetup#1

    View Slide

  2. Datadog Incident Management

    View Slide

  3. Datadog Incident Management
    https://www.datadoghq.com/blog/incident-response-with-datadog/

    View Slide

  4. Datadog Incident Management
    https://docs.datadoghq.com/monitors/incident_management/

    View Slide

  5. Datadog Incident Management
    https://docs.datadoghq.com/monitors/incident_management/
    Cool

    View Slide

  6. Who am I
    chaspy chaspy_
    Lead Software Engineer

    Site Reliability Engineering

    at Quipper
    Takeshi Kondo

    View Slide

  7. Agenda
    • Introduction of Datadog Incident Management
    • Alert Handling in Quipper

    View Slide

  8. Agenda
    • Introduction of Datadog Incident Management
    • Alert Handling in Quipper

    View Slide

  9. Incident Response 6-Step Plan
    1. Preparation
    2. Identification
    3. Containment
    4. Eradication
    5. Recovery
    6. Review lessons learned
    https://www.varonis.com/blog/incident-response-plan/

    View Slide

  10. Incident Response 6-Step Plan
    1. Preparation
    2. Identification
    3. Containment
    4. Eradication
    5. Recovery
    6. Review lessons learned -> Postmortem
    https://www.varonis.com/blog/incident-response-plan/
    Incident Management

    View Slide

  11. Datadog Incident Management
    • Overview
    • Timeline
    • Remediation

    View Slide

  12. Datadog Incident Management
    • Overview
    • Timeline
    • Remediation

    View Slide

  13. Datadog Incident Management: Overview

    View Slide

  14. Severity Levels: Smart default and configurable

    View Slide

  15. Status Levels and Properties Fields

    View Slide

  16. Datadog Incident Management: Overview

    View Slide

  17. Datadog Incident Management
    • Overview
    • Timeline
    • Remediation

    View Slide

  18. Datadog Incident Management: Timeline

    View Slide

  19. Datadog Incident Management
    • Overview
    • Timeline
    • Remediation

    View Slide

  20. Datadog Incident Management: Remediation

    View Slide

  21. Agenda
    • Introduction of Datadog Incident Management
    • Alert Handling in Quipper

    View Slide

  22. See “Alerting Strategy for Self-Contained Team”
    https://speakerdeck.com/chaspy/alerting-strategy-for-self-contained-team

    View Slide

  23. Review alerts Daily

    View Slide

  24. Review alerts at Daily Standup

    View Slide

  25. Review alerts at Daily Standup

    View Slide

  26. Thank You!
    chaspy chaspy_
    Lead Software Engineer

    at Quipper
    Takeshi Kondo

    View Slide