5 Key Factors of a Successful Monitoring Program

12 minute read
Monitoring Success

Despite spending hundreds of thousands of dollars on monitoring tools, IT organizations continue to find themselves taking a reactionary approach to service disruptions instead of a proactive one. It may lead some to question: how do high-performing organizations do it so well? We use the same tools they do, but they seem to get better results.

The information technology infrastructure library (ITIL) v3 introduced the concept of the event management process to address the procedural framework for managing an event once it is identified. However, ITIL v3 does not address some of the key factors essential for maximizing the monitoring capabilities contributing to the event management process.

Although monitoring may be viewed as the “nerve center” for IT operations, poorly designed monitoring capabilities can provide significant challenges for many IT organizations. These five key factors can help an IT organization improve its monitoring management capabilities and maximize its related value:

1.    Define a Monitoring Framework

Many organizations apply an IT domain-specific approach to monitoring, contributing to excessive costs, a large portfolio of tools, and limited integration between tools. Many of these tools overlap each other, in terms of functionality, and yet are used for a single domain purpose.

A fundamental approach to addressing this challenge is to apply a monitoring framework when defining and executing your monitoring strategy. A monitoring framework provides a holistic strategy for understanding, deploying, and using IT monitoring tools. The intent of the framework is to maximize the flexibility of monitoring systems necessary to meet the needs of various IT support groups who depend upon these monitoring systems. This holistic strategy helps an IT organization create a context for each deployed product, and set enterprise-wide expectations for functionality and integration.

The framework is re-usable for any number of and type of monitoring tools and is seen as the glue between element managers (e.g. vendor or device specific tools), a centralized console, IT process improvement tools and process enabling technologies. The flow of events is dictated by the framework, enabling operators to view data and perform triage activities upon the information received and execute the most appropriate action. As the operator moves from data to information to knowledge, the issue being solved changes from being a single dimensional issue of simply just events to one of context and experience.

A monitoring framework provides four key benefits: facilitate planning for future monitoring capabilities; establishing a standard set of integrated tools; elimination of similar point solutions; and improving utilization of existing monitoring investments.

2.    Design and Integrate an Event Management Process

The primary goal of event management is to enable a proactive response and management to signal expected or unexpected status of applications or infrastructure. A sound and repeatable event management process provides an organization the ability to quickly detect events, understand them, and then decide on an appropriate action to prevent an incident or disruption to a service from occurring. This approach allows the business to benefit from an effective and more efficient service management.

Without a defined process for handling and managing events, organizations find it difficult to manage, continually improve, and demonstrate the value of their monitoring capabilities along with the services being monitored.

The key elements of the process that should be considered during design include:

Transactional flow

Most organizations understand transactional flow (aka work flow). However, many organizations believe the work flow is the process. Each process will likely support and enable many, many different workflows. For example, each event model may have a different work flow (e.g. an application event vs. a network event). In addition, the work flow will be different for each event type within the model (e.g. an alert event vs. an informational event). Organizations that focus on the details of the individual workflows during process design, often experience delays and stagnation with implementing the process. Acknowledging and understanding the distinction between process and work flow design is essential to accelerating process design and implementation.

Process methodology

To be effective, a process should be understood and assimilated throughout the entire organization. While individual contributors may be the primary executors of the workflows within the process, IT leadership should understand the process at a level of detail necessary to make improvement decisions.

A process methodology should be consumable or easy for all members of the organization to access and understand the process (i.e. seeing the big picture while enabling drill down into the detail needed for their role). It should be actionable where the individual contributor knows exactly how to perform a given procedural step while management is able to quickly identify process improvement opportunities. And it should be manageable where the process documentation is accompanied by policies and supported by a measurement framework enabling a foundation for continual improvement.

These attributes facilitate an organization's adoption, adaptation, and assimilation of the process. As a result, an organization is more likely to consistently apply their processes.

Data consumption and creation

While a key objective of a process is the ability to effectively and efficiently execute transactions, a key point that is often overlooked is the data that is both used and created by the process during this execution. The ability to improve is dependent on the ability to measure, which is dependent on quality data, which in turn is dependent on consistent execution of the process by all contributors. An organization should focus on the more than just the transaction but the data consumed and generated by the transaction. When designing an event management process, keep the end in mind. It is important to consider the data generated to manage the monitored services and the data needed to demonstrate the effectiveness, efficiency and value of monitoring capabilities.

Process integration

It is important to ensure the event management process is integrated with IT service management disciplines. This chart below explains the recommended integrations.

ITIL Lifecycle Phase

ITIL Process

Suggested Integrations

Strategy

Strategy management

  • Applying major continual improvement decisions
  • Aligning monitoring strategy with business and technology strategies

Demand management

  • Providing insight to capacity demands

Design

Service level management

  • Establishing notification and alert thresholds
  • Service level reporting and dashboards

Capacity management

  • Setting capacity-specific notifications and alert thresholds

Availability management

  • Setting availability-specific notifications and alert thresholds
  • Providing availability-related trends

Continuity management

  • Detecting potential disruptions to ensure operational resiliency of high-availability services

Transition

Change management

  • Assessing impact of change on existing monitors
  • Establishing base configuration for identifying potential unauthorized changes

Release management

  • Performance testing of a release prior to deployment
  • Updating monitors as part of a release

Configuration management

  • Audit and verification of configuration management database (CMDB) accuracy

Operations

Incident management

  • Proactive identification of incidents

Problem management

  • Aid in the identification/isolation of technical root cause
  • Provide information for proactive problem management

3.    Apply Service Design Disciplines to Monitoring Services

Many IT organizations simply let each IT domain decide the type and nature of events that are of importance, without considering the actionable outcomes required. It is common for each IT domain to approach monitoring events with all-inclusive mindset where all events are seen within the monitoring tool(s) and then scaled back, if and when, necessary. This approach requires a constant event configuration effort that often falls to the bottom of the “to do” list.

A service design approach that includes exception-based alerting, addresses these issues by ensuring event models are created, deployed and revised in a controlled manner. It is important that monitoring be designed as a service design package (SDP)—a collection of documents that define and document a single monitoring service. Initial monitoring SDPs should at a minimum include network, server, storage, database and key applications.

An SDP should address the following areas:

  • Service identification and ownership – Many IT organizations struggle to define the ultimate owner of the monitoring service. Simply defining and assigning an owner enables those who are responsible to approach monitoring in a coordinated manner.

  • Service goals and objectives – These should be tailored towards availability, performance and support capacity and configuration management. Monitoring data is often used to support several processes of service operations and transition. These objectives should be stated to ensure valuable data is used effectively.

  • Requirements – At a minimum, requirements should include capabilities (i.e. what is monitored and how), operational requirements (i.e. how the service is delivered) interface requirements and service monitoring requirements (i.e. ensuring the monitoring service is operating as expected). It is these requirements that form the backbone of the monitoring service and where details of how a configuration item (CI) is monitored and what operational procedures are required when event occurs. Requirements within a SDP need careful planning. The exercise of analyzing requirements, ensuring the correct type of event and its severity is created is known as event modeling. Models contain all the necessary application and infrastructure monitoring configurations. At a minimum, an event model should include:

    • Function (i.e. what is that is needed to be monitored)

    • Outcome (i.e. what needs to happen when an event is received?)

    • Level of importance (severity) related to service hours (e.g. critical, important, useful and low)

    • Capability (i.e. event, performance, capacity)

    • Monitoring tool name used

    • Policy or configuration name (e.g. the actual tool configuration template or policy used to ensure the event is realized)

  • Implementation strategy – Testing of a proposed monitoring tool or configuration is critical to ensuring that the right events will be created under the appropriate conditions. Things that should be considered: How will the monitoring be tested? What are its use cases? And how will test events be generated? The implementation strategy should also discuss how the monitoring software should be installed and/or configured and what should happen if for any reason the implementation fails.

  • Process enablement and integration – Monitoring is a subset of event management and the SDP for the monitoring service should define what processes will integrate through this monitoring service. At a minimum, most organizations will look to integrate incident and capacity management. Event management, if implemented correctly, can provide great efficiency for the incident and capacity management processes. Likewise, if implemented badly, a large number of unnecessary incidents may be created.

  • Organization readiness – Any new tool or monitors will require training for the operations bridge to properly work on defined events. Launching a monitoring service without adequate training can have disastrous effects. It is all too easy to have new events where the operations bridge is unaware of the operational procedures they should execute.

Any monitoring service should be an integral part of the design and support of IT organization's services delivered to the business, where the creation or modification of monitoring services work in tandem with the overarching IT service design, to ensure adequate monitoring capabilities are in place.

4.    Plan and Execute a Continuous Improvement Program

Since a key objective of monitoring capabilities is to enhance an organization's proactive avoidance of potential service disruptions, organizations often find it challenging to demonstrate continual improvement and value of these capabilities. A defined, structured continual improvement program (CIP) will enable the governance, coordination and valuation of improvement any improvement efforts.

Foundational to any CIP is a defined measurement framework. A measurement framework for capability improvements will consist of critical success factors (CSFs), key performance indicators (KPIs), and operational measures (OMs) used to manage the effectiveness, efficiency, and compliance or assimilation of the operational management capability (i.e. event management).

  • Critical success factors are aligned with the current period's organizational goals and demonstrate the progress towards attaining these goals. Examples of efficiency-related CSFs include minimizing the cost associated with event or monitoring management, or maximizing the efficiency of handling alerts.

  • Key performance indicators have specific targets associated with performance. For example, what is the false alert rate? What is the percentage of alerts that are determined to be false after analysis? Generally, multiple KPIs contribute to a specific CSF.

  • Operational measures are typically volume-related measures (i.e. the number of alerts and the number of false alerts). The OMs are the foundational measures for the KPIs and contribute to the KPI formula.

When establishing a measurement framework, it is important to keep the end in mind. What are your improvement goals? How will you demonstrate attainment of these goals? How will you demonstrate value from attaining these goals? The key is to measure to improve not just to prove. Your measures should demonstrate progress in your CSFs and KPIs; avoid focusing solely on OMs.

5.    Establish a Governance Structure

A well-managed governance structure establishes the organization's vision and strategy; accelerates the organization's alignment, adoption and assimilation of the strategy; and balances the maturation and improvement of the monitoring capabilities in accordance with business objectives and constraints.

Lack of governance over event management and monitoring often leads to excessive tool expenses, segregated tools and process capabilities, and contention between IT domains. For example, allowing technical domains to make independent tool decisions often contributes to an excessive number of tools with limited organization-capability. A domain may be well monitored, but it may not be integrated and correlated with other domains for end-to-end service monitoring. In addition, by not including monitoring during the design or modification of a service, the effectiveness and efficiency of the monitoring will be limited.

An effective and efficient governance framework will include a formally chartered event management committee with executive level membership representing the design, build, and run domains of the IT organization (e.g. application and infrastructure). This committee should be part of a larger IT service management steering committee that governs the entire IT service lifecycle to ensure proper alignment, integration, and maturation of the organization's overall capabilities.

Finally, ensure you measure the effectiveness and efficiency of your governance program. These measures will help you manage the program and mitigate the potential for the bureaucratic tendencies.

Maximize the Value of Your Monitoring Investments

By applying these five factors, you will improve your IT organization's ability to maximize the value of your monitoring investments and transform its event management capabilities, enhancing the proactive capabilities to detect potential service degradation and disruptions. Once implemented, the results provide a direct path to achieving value from any monitoring solution, highlighting where efforts and investments should be made. Ultimately, it should improve quality of service and increase business value.

Always keep in mind that your monitoring capabilities require strong integration of services, people, processes and tools. The best way to make the most out of your monitoring capabilities is to consider it as a continuous process of improvement in all four areas. Finally, your monitoring capabilities should evolve along with your organization’s IT services, and should always be aligned with your company’s objectives.

Transform your business with cloud
Leave a Comment

You Might Also Like

View Slideshow

About the Authors

Popular Today

Slideshows

Videos

@ForsytheTech

Design a Monitoring Framework

A monitoring framework provides a holistic strategy for understanding, deploying, and using IT monitoring tools. The intent of the framework is to maximize the flexibility of monitoring systems necessary to meet the needs of various IT support groups who depend upon these monitoring systems. This holistic strategy helps an IT organization set enterprise-wide expectations for functionality and integration.

A Defined Measurement Framework

A defined measurement framework consists of critical success factors (CSFs), key performance indicators (KPIs), and operational measures (OMs) used to manage the effectiveness, efficiency, and compliance or assimilation of the operational management capability (i.e. event management).