AI presents several opportunities to assist in enterprise IT operations. However, operationalizing AI presents several challenges in translating theory into practice. In this blog, we present our experiences and design decisions to operationalize AI for autonomous IT operations.
The AI fabric at a glance
The following figure presents the different layers of the AI fabric of an AIOps solution.
- The bottom-most layer consists of a library of data mining and machine learning algorithms. These are generic algorithms that are used to realize different use cases.
- The next layer is responsible for developing reasoning intelligence for AIOps. It uses the algorithms of the underlying layer in creative ways to develop reasoning patterns.
- Next layer focusses on the applicability of the reasoning intelligence. It ensures that AI insights are actionable, explainable, trustworthy, and easy to consume.
- The power of AI in the AIOps space can best be harnessed when it is complemented with human intelligence. The next layer focusses on developing augmented intelligence. It leverages GenAI to make the best use of the experience and intuition of the subject matter experts.
- The top layer uses all the underlying layers to operationalize various AIOps use cases.

Reasoning patterns
As the name suggests, reasoning intelligence develops the ability to reason about an enterprise. It can be broadly grouped into descriptive, diagnostic, predictive, and prescriptive reasoning. These reasoning patterns answer the questions ‘what is happening?’, ‘why is it happening?’, ‘what is about to happen’, ‘what is the best that can happen?’, respectively.
Consider an example of alert management. The reasoning patterns aim at answering the questions such as ‘which alerts are occurring?’, ‘what is the root cause of these alerts?’, ‘which alerts are likely to occur in future?’, ‘what corrective actions can be taken to resolve and prevent alerts?’.
Consider another example of process management. The reasoning patterns aim at answering questions such as ‘how are the business processes performing?’, ‘what is the root-cause of a process delay or failure?’, ‘which processes are likely to delay or fail in future?’, ‘what corrective actions can be taken to complete the processes on time?’.
The reasoning patterns are developed using various AI/ML tools.
- Descriptive reasoning patterns are developed using algorithms such as context mining, process mining, time-series analysis, pattern mining, and NLP.
- Diagnostic reasoning patterns are driven by algorithms for anomaly detection, interesting subset discovery, classification, and Bayesian reasoning.
- Predictive reasoning patterns involve time-series forecasting, fault propagation models, discrete event predictions, hidden Markov models, and LSTMs.
- Prescriptive reasoning patterns are built using multi-objective optimization, linear programming, genetic algorithms, and simulation modelling.
Applied AI
Business teams often struggle to make the best use of AI-driven insights. Hence, the next layer consists of various ways to improve the consumption to these insights. It focusses on making the insights actionable, explainable, accessible, and trustworthy.
- Actionable: Users often struggle to translate analytical observations into actionable recommendations. This can only be achieved by making the insights aware of the domain of IT. Consider this simple example: analytics may stop at an observation that “a time-series is observing an increasing trend”. However, this observation can be turned into a more usable insight by understanding that this time-series is about CPU utilization of a server, the values range between 0 and 100, and the increasing trend can lead to CPU saturation as utilization is likely to reach closer to 100% in next 3 months, and saturation can be prevented for a year by increasing the compute capacity by 1.2 times.
- Explainable: Artificial intelligence-driven solutions often face resistance in their adoption by businesses. One of the major reasons for this resistance is the lack of explainability of AI insights. Bringing explainability is essential not just to increase business adoption, but also to allow human experts to better contribute to train, provide feedback, and course-correct an AI model. There are several ways to enable explainable AI. Black-box algorithms can be complemented with post-hoc explainability methods that enable interpretability by analyzing the response function of a machine learning There are cases where a comprehensive explanation cannot be inferred but even in such cases, techniques can be used to infer the contributions of different features in the output of an ML model.
- Accessible: With so much data being analyzed, the users face the challenge of insight fatigue. Users are overwhelmed with the amount of information available and find it difficult to sift through various analytical observations to identify the relevant ones. Data storytelling is a modern take to bridge this gap. It links similar insights together to create high level data stories and uses recommendations to adapt these insights for the end user.
- Trustworthy: This is another important dimension to boost AI adoption. Some of the effective ways to drive trustworthiness are data quality assessment, guardrails to avoid data bias, monitoring AI accuracy, assessing data drift, and explainable AI.
Augmented intelligence for ITOps
This layer focusses on finding creative ways of human-machine interaction such that AI-driven insights can make the best use of human intuition and experience. The advances in generative AI have accelerated the development of this layer. Generative AI can transform the human-in-the-loop experience of the first mile and the last-mile of autonomous IT operations.
- Knowledge Accelerator: The first mile of autonomous IT operations relies on comprehensive knowledge for modeling IT operations. GenAI can transform this first mile process by creating a knowledge accelerator to capture enterprise context and generate automation scripts for various operations such as resource provisioning, service configurations, patch management, and so on. This allows us to easily adapt to technological changes and accelerate the automation of lifecycle-specific service operations.
- AI Assist: The last mile of autonomous IT operations requires human involvement to validate actions, guide in case of exceptions, and to consume insights for continuous improvements. GenAI can transform this last mile by creating an intelligent assistant to drive intelligent conversations. It leverages GenAI’s ability to understand language, capture user context, and learn from feedback. As a result, the analytics insights can be consumed in a much simpler and intuitive way, leading to a higher AI adoption, faster incident resolutions, and proactive problem management.
AIOps use cases in IT:
AI has a lot to offer in the space of AIOps. AI opportunities are present in various use cases such as event management, incident management, performance and capacity management, transformation planning, and many more. However, most organizations lack a systematic journey to operationalize AI. This layer addresses this problem by creating AI-powered use cases for IT operations. These use cases can be broadly grouped into four categories:
- Understand the behavior: These use cases focus on baselining the normal behavior of the system. The use cases detect trends, patterns, changes, anomalies, and problem signatures. They form the basis of advanced analysis such as localizing root cause of incidents, predicting future events, assessing impact of change, and so on.
- Derive systemic insights: These use cases focus on deriving systemic insights with specific focus such as capacity analysis, risk assessment, change management, and so on .
- Operationalize insights: These use cases operationalize the insights derived from historical behavior for specific operations such as event management and incident management.
- Drive transformation: These use cases leverage the predictive and prescriptive capabilities to predict future behavior, plan for change, and drive transformation plans.
Conclusion
AI has lots to offer to enterprise IT operations. However, harnessing the real power of AI requires addressing various rubber-meets-the-road challenges. It cannot just improve the efficiency and effectiveness of the AIOps solution, but it can also help drive transformation strategies with the adoption by the business teams. This technology holds great potential and hence demands continuous exploration with ethical use and responsible development.