1 Devopsdays global core team 4 Devopsdays İstanbul Community Software Engineer, Technical Evangelist at Opsgenie, Atlassian Podcast: Turuncu Pasaport AWS Community Hero @srhtcn
complex systems Dev - Ops Better alignment of development and operations Management - Dev Better alignment of management and development Put developers on-call
however, email is not sufficient when alerts are time sensitive and rapid response is necessary. Opsgenie uses multiple communications channels, including email, SMS, mobile push, and voice calls, to ensure recipients are notified in a timely manner.
to empower users to make effective decisions. Opsgenie alerts are not limited to a few characters! Add optional fields to your alerts and attach charts, logs, runbooks, and more to further enrich them, provide context, and enable recipients to determine the right course of action.
directly from the Opsgenie Application. In addition to the default alert actions such as "Add Note" and "Close", you can respond to alerts by executing investigative and corrective actions. For example, you can ping or restart a server or create a service ticket with a click of a button.
remediation actions in response to incoming alerts. Through integration with AWS Systems Manager or other 3rd-party automation platforms, Opsgenie will trigger your response playbooks when an alert meets your predefined criteria. The system can take corrective action without involving your on-call engineers, reducing alert fatigue and reducing MTTR.
The alert activity log presents all activity related to the alert; when the alert was created, who was notified, when the notifications were sent, and whether the recipients have seen the alert, or taken any action. Tracking is performed seamlessly without requiring specific user action, whenever possible.
differently depending on the source of the alert, priority, or time of day. Opsgenie provides the flexibility to suppress, delay, or expedite alerts based on their content and timing.
working and creating alerts? Opsgenie Heartbeats ensures alerting works end-to-end, by checking that monitoring tools are active and connected, and that custom tasks are completed on schedule. When an absence of signal is detected within a specified timeframe, Opsgenie instantly alerts you of the problem.
and custom rotations. Leverage multiple scheduling rules to use different rotations at different times. You can define sophisticated scheduling scenarios such as after-hours coverage, weekdays and weekends, and geographically distributed teams coverage.
right teams to be notified based on the source, priority, and timing of the issue. Escalations ensure that the alert gets the necessary attention when an alert is not acknowledged within a certain amount of time. For example, if the person on-call does not respond to a high priority alert within 5 minutes, you can notify another person or team, automatically
from inside your ChatOps tool, including acknowledging & closing alerts, seeing who is on-call, and defining schedule overrides. Opsgenie has bi-directional integrations with Slack, MS Teams, Campfire, MatterMost, Jabber, Flowdock, Kore, and Moxtra.
communicate with key individuals using your preferred web conferencing provider (WebEx, GoToMeeting, Skype, Jitsi). Conference bridge details are attached to the incident and shared automatically with your team.
to command, control, and coordinate incident response. Through integrated communication and incident resolution tools, it enables you to stop switching between different tools and platforms during incident response. You can view the status and progress of each responder team and track all updates and actions, from a centralized dashboard.
according to organizational specifications. Stakeholders can stay informed about incident resolution progress and service health by automatic notifications, visiting a status page, or subscribing to status page updates.
company has handled over a specified period of time, and the corresponding mean-time-to- acknowledge and mean-time-to resolve. You can easily visualize how these metrics are trending over time and with a mouse click, drill down into areas of concern to understand which alerts required more time and attention.
members’ productivity, incident response patterns, and efficiency. Understand which members are responding quickly and establish best practices for everyone.
key to fast incident resolution. During and after an ICC conference, you can analyze team participation in detail. Understand the attendance and efficiency analysis for each Incident Command Center session.
the post-incident analysis report to understand the actions taken and their timing. Identify how fast people acknowledged the issues, when status changes were communicated, and how teams participated in the resolution. Easily compare different incident responses, to identify opportunities for improvement.
the business services they impact and have a clear understanding of which teams need to respond and who needs to be kept up to date on the progress towards resolution. Disparate teams are notified simultaneously and presented with the tools they need to collaborate during resolution.
different workflows for incidents of differing priority using Opsgenie’s incident templates. For each type of incident, predefine the needed response teams, the stakeholders, and the best collaboration channels to resolve problems quickly and communicate them effectively.
systems into a single incident based on the conditions that you specify. Reduce complexity and noise to let your responders focus on the right context and resolve problems quickly.
key to a smooth resolution. Service status pages help make this happen. Stakeholders and responders are able to view information about the status of an incident at any time. Additionally, they can view the status page for any service and report a problem that they have encountered with that service. Problems are logged with detailed notes, and an alert is created and sent to the on-call team member.
phone calls to the right person using Opsgenie on-call schedules. If no one is available, Opsgenie will take a message, generate an alert, and notify the right person via their preferred notification channel. Call details are attached to the notification, and recipients can listen to the message.
the phone. You can specify the order of users or let Opsgenie pick someone randomly. Opsgenie only connects the caller, when a live person answers the phone by requesting the user to press a key.
to end. All activity including when the call is received and how it is routed as well as who answered and how long it lasted can be included in your metrics. Calls can also be recorded for training and quality assurance.
to resolve challenges when integrating internal and external solutions. • Marid can enrich data (i.e. provide physical location of server, by looking up Host ID in a database) • Marid can be used to execute actions to help further investigate and remediate issues (i.e. ping or restart server) • Marid can act as an application level proxy to ensure communication between OpsGenie and other systems when direct connection is not available (firewall issues) soon -> Opsgenie Edge Connector (OEC)
On solution in which you can control authentication of the hosted accounts on your identity provider to Opsgenie. Authentication via Single Sign-On is available on both Opsgenie web and mobile applications.
seriously. Below is a summary of our key security practices. If you have any questions, contact us at [email protected], or participate in Opsgenie’s Community Forums
that Opsgenie never receives the raw version of the payload directly. The encryption application is hosted on your own environment and acts as a bridge between Opsgenie and 3rd party tools.