Monitoring and Incident Management

Monitor system health detect and alert regarding system malfunctions and failures. While exception monitoring deals with problems within the product, incident management deals with problems on the larger stack of software and hardware.

Why

Unmanaged exceptions and incidents could lead to a breach in the SLA (Service Level Agreements) with customers. You can face penalties and portray a negative brand image in the marketplace.

How

Exceptions and incidents can occur in any software product but having the right monitoring and management process enables to minimize the negative impact of such incidents.

  • Understand the SLA requirements.
  • Cover areas of SLA and have loggers in place within components and subsystems.
  • Have a monitoring tool to detect issues by analyzing the logs, prioritize, and diagnose.
  • Have an incident reporting tool to manage the issue backlog.
  • Ensure users can report incidents.
  • The agreed SLA incident must be resolved to provide a higher level of customer satisfaction.
  • A root cause analysis of incidents must be performed to minimize similar issues being occurred in the future.

References

Monitoring and Incident Management