Higher Business Agility, Resilience, and Customer Satisfaction with ignio AIOps
For a leading supermarket chain in the UK
ABOUT CUSTOMER​
The customer is one of the oldest and largest supermarket chains in the United Kingdom. It has more than 1,500 stores in the UK and over the years has expanded into banking, catalog and online retail, and other industries.
VALUE REALIZATION
- Increased focus on strategic initiatives for ops team with reduced system noise and false alerts
- Minimized operational risk
- Improved operational stability, resilience, and business assurance
- Revenue losses averted with timely order fulfillment
- Improved customer experience with seamless operations of the retailer’s online stores
- Improved business availability
REACH US
Business context
Retail chains with a vast number of locations rely on a large IT backbone to ensure all the systems are working seamlessly. Any issues in the IT infrastructure can result in downtime, which interferes with the stores’ functioning and in turn the customer experience. And because supermarkets operate with very thin profit margins, they can’t afford to lose dissatisfied customers. So they need a solution to ensure maximum IT uptime, with easier issue management and resolution.
The Challenges
As the customer grew, so did its IT landscape. It scaled significantly to ensure seamless operations. However, the implemented system monitoring tool was unable to monitor the entire estate effectively. This resulted in undetected system issues, which caused long outages and business interruptions.
In addition, the legacy monitoring tool was creating false, duplicate, or trivial alerts, and required manual ticket creation. This added to the command center team’s burden, requiring them to spend substantial time distinguishing false positives from genuine alerts and updating trouble tickets. The large number of incidents drove up the Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR). Meanwhile, the customer was losing operational efficiency and resilience. Further, it was in danger of losing business due to delayed or stopped transactions.
The Problem - Ineffective Legacy Monitoring Tool and Heavy Dependency on Manual Operations
The customer’s IT landscape generated almost 100,000 alerts a year. A legacy IT monitoring tool captured these alerts and converted all of them to incident reports for the support team to resolve. About 50% of the alerts were unwanted.
The legacy monitoring tool was ineffective in false-positive suppression, filtering, and deduplication of alerts, leading to a substantial number of false alerts being converted to incidents and passed on to the command center team. This resulted in the team spending 3,800 hours per month on false alerts, which had a direct impact on the timely resolution of actual alerts impacting the business.
Dependency on tacit knowledge further affected the MTTD and MTTR, because key insights into problems weren’t being shared with the entire team. This caused operational instability with an impact on productivity, frequent outages, and system unavailability for longer durations.
ignio Solution
Digitate helped the customer review its infrastructure operations and understand the gaps in legacy monitoring tools, types of alerts generated, functioning of the IT landscape, and so on. ignio AIOps was customized to suit the needs of the customer.
ignio performs regular proactive health monitoring of several critical applications after being trained to handle over 150 alert types. ignio AIOps fetches the alerts from an IT system monitoring tool and performs checks to de-duplicate the alerts, eliminate false positives, and create incidents for the genuine alerts. It further performs automated triaging of the incidents to identify the probable causes and resolves the issues autonomously. Thus, the team was able to cut its time spent on false alerts nearly by 50%, from 3,800 hours to 1,860 man-hours per month, or 22,000 hours a year.
Automating many repetitive tasks also freed up the support team to tackle less-critical alerts, which were impacting the performance of important technology components. ignio thus helped the customer reduce the time for detecting issues, resolving them, and in turn ensuring maximum availability of its IT landscape to support an extensive chain of stores.
The Problem - Store Link Downtime and Restoration Issue
The customer has about 270 online stores. Each is connected to a JMS (Java Message Service) link application from which it can fetch order fulfillment information such as product and payment details. Previously, whenever the link to the system went down, all 270+ stores would be affected and placed orders could not be retrieved. This delayed or threatened up to 100,000 transactions a day, averaging ï¿¡90 (about $120 USD) each. To resolve a link issue, the application support team needed to log in and resolve the error manually before the system could restart. Efforts to restore the threatened orders cost the customer more than $500,000 (ï¿¡375,000) a year in extra staff costs, while delays damaged the consumer experience.
ignio Solution
ignio AIOps constantly monitors the connection of the system with the stores, always ensuring the availability of links. In case the link goes down, ignio autonomously resolves the issue by restarting the server within minutes, preventing order delays and ensuring that the online stores can keep doing business. Because ignio assures seamless operations, the customer can deliver exemplary customer experiences and on-time order fulfillment.
ignio Benefits
- 99% improvement in Mean Time to Detect (MTTD)
- 80% reduction in Mean Time to Resolve (MTTR)
- ~80% elimination of unwanted noise
- 22,000 hours of effort saved annually
- 804 incidents resolved autonomously so far
- 90% downtime reduction, from 75 hours/month to 7.5 hours/month
- 10% improvement in availability, from 89% to 99%
- Reduction in MTTR by over 90%
- 17% productivity gain
- $0.5 million in overhead costs saved