Having spent nearly a decade of career in handling production support for mission-critical applications of multiple banks, if I can sum up some of the common issues among all of the environments in a few words:
- There are a plethora of tools for monitoring, yet they all experience false alerts
- There is a lack of actionable suggestions from the tools for true alerts
The market has many monitoring solutions, but no tool is good enough.
Every organization has a variety of technology stacks according to its business functions, and has multiple monitoring mechanisms for various infrastructure layers according to the needs of the anomalies to be determined.
The problems with multiple monitoring tools are:
- No correlation and aggregation of alerts across different monitoring tools; the correlation and subsequent anomaly detection is mostly left to the human intelligence which is monitoring the system
- No actionable alerts, as they do not present a coherent meaningful picture based on the current context
- No mechanism to self-correct the threshold based on the trend; no heuristic learning
- No comprehensive dashboard for the health of the technology stack
Digitate’s ignio™ can change the current scenario with its ability to seamlessly integrate with a majority of the monitoring tools in the market and
- consume alerts and manage them intelligently
- consume metrics to derive actionable items and thresholds
Consuming Alerts: ignio’s Cognitive Alert Management capability to manage alerts
ignio uses its Alert Management capability to:
- Prioritize the alerts based on system criticality and its contextual awareness of the system
- Filter alerts based on Normal behavior profile of the system
- Aggregate redundant alerts
- Triage & auto-resolve, or escalate to manual
Consuming Metrics: ignio’s cognitive feedback loop to reduce alerts
ignio uses its Smart Trigger functionality to identify the dynamic thresholds of various metrics from different tools. This can be fed back to the monitoring tools to reduce the false alerts as much as 80%.
This capability can be combined with the self-triaging capability of ignio to come up with ignio’s comprehensive dashboard on the health of the KPIs.
Figure 1: ignio can present a comprehensive dashboard
ignio is based on its pre-built capabilities and the out-of-the-box KPIs impacting the performance of various technical layers of the applications. The dashboard displays the health of various applications and the underlying Infrastructure layers. These are refreshed periodically using the Incident Management capability with a link to the PCA graph.
The benefits of this comprehensive dashboard are as follows:
- Issue localization of anomalies in a complex environment
- Reduced Mean Time To Resolution (MTTR) as the support team is presented with the actual issue and Probable Cause.
by Saravanan Kandasamy