Operational Resiliency: The New BC/DR

8 minute read
Operational Resiliency

Companies used to focus predominately on the disaster recovery part of business continuity and disaster recovery (BC/DR) like preparing for potential earthquakes or tornadoes that could affect their corporate headquarters and on-premise data centers. Some companies even went as far as conducting business as usual without thinking about disaster recovery, let alone business continuity.

But today’s corporate world has grown much more competitive, more complex, more interconnected and more dependent on technology—increasing the potential for risks and vulnerabilities; the growing expectations of employees, shareholders and customers of real-time access; and a more mobile workforce. Today, companies shouldn’t just worry about weather events or man-made disruptions that fall under disaster recovery. They also should be concerned about data security breaches and simple technology breakdowns that could result in lost company revenue, lower employee productivity, a damaged corporate reputation and/or the possibility of regulatory fines. In addition, the overall marketplace today is simply less forgiving of any type of business or technology failure.

One of the key drivers of this change in mindset about how companies look at BC/DR is that customers and employees have higher expectations in today’s business environment. Mobile technology has raised expectations for how customers communicate with companies and how employees work. Employees are constantly connected, and customers are able to complete transactions, search for information, transfer funds, and execute all different types of business transactions, whenever and wherever. Tolerance for any downtime is rapidly declining, regardless of the circumstances.

Another driving influence is that competition is stronger. From trading stocks to buying products to choosing healthcare providers, the cost of switching from one company to another is lower today. Reliability and customer service remain two of the few areas of sustainable market advantage. Technology is the backbone of both.

The cloud paradigm shift or the movement toward the hybrid data center model is another key driver. Today, business processes may operate on IT infrastructure that lives in conventional enterprise IT systems, on-premise private clouds and third-party off-premise infrastructures.

Additionally, more industry and government regulations and standards, both directly and indirectly, require businesses to be prepared and set expectations for very high uptime of their systems. For example, in the financial services industry, trades must be cleared by the end of the business day, or companies face stiff penalties.

A SNAPSHOT: From Traditional Disaster Recovery to Forward-Thinking Operational Resiliency

Operational resiliency is bigger in scope than traditional business continuity and disaster recovery. More people are involved, and they are not just from IT. Here is a breakdown of the differences:

 

Characteristics

 

Disaster Recovery

 

Operational Resiliency

 

Focus

IT

The entire company

 

Purpose

IT recovery

Business continuity

 

Approach

Reactive, one-off

Strategic, proactive

 

Scope

Disaster

Any business interruption

 

Responsibility

Chief information officer

Chief information officer, senior leadership team and board of directors

 

Business rationale

Meet regulatory requirements

Essential to business success

 

A More Strategic, Comprehensive Approach

Top executives are recognizing they should approach BC/DR in a more strategic and comprehensive way. One that makes their companies stronger, even when there is no crisis. According to Forrester Research, a global research and advisory firm, 27 percent of enterprises say significantly upgrading BC/DR capabilities is a critical priority, while another 41 percent say it is a high priority.

As a result, many companies are turning to a new best practice called operational resiliency. This concept broadens BC/DR beyond just IT recovery, and even IT resiliency, to overall operational resiliency for a company. Operational resiliency is impossible to obtain without incorporating technology and business processes. This concept requires a fundamental shift in how IT leaders work with other business executives to develop and implement an overall company-wide strategy. The goal of this concept is to make sure the company becomes more resilient, whether they encounter a small or large disruption.

Why Operational Resiliency Is Within Reach

While companies’ and their stakeholders’ reliance on technology is quickly making operational resiliency a strategic necessity, technology also makes this goal increasingly possible and affordable. Technology capabilities continue to grow exponentially as prices plunge. For example, the cost of basic amounts of data storage used to cost thousands of dollars, but now run just 50 cents. Tools also exist now to map, analyze and monitor business processes from start to finish, including suppliers, and to identify disruptions as they occur. In addition, transformational technologies like cloud computing and virtualization provide companies with the opportunity to become much more flexible and responsive.

However, not all business processes and systems should be resilient. If companies can ensure that key business processes continue, they may be able to push out the recovery time objective (RTO) and recovery point objective (RPO) of other processes and IT systems. This would help the company save money overall. By assessing and prioritizing the operations as a whole, companies can strike a balance to keep costs in check.

To achieve operational resiliency, companies should first make it a corporate imperative, not just an IT department initiative. This priority should be embedded into the entire technology and business flow, from sourcing to business processes to IT infrastructure and applications. It should serve as a key filter for making business and IT decisions, whether it be choosing suppliers or selecting new servers. To ensure ongoing operational resiliency, companies should create a culture in which IT and business units work together to proactively anticipate, manage and incorporate ever-changing technologies, business requirements, potential risks, data dispersion, growth opportunities, and best practices.

The 3-Phase Approach to Operational Resiliency

Phase I: Determining Functional Requirements

Building an operational resiliency culture begins with establishing the desired endpoint—keeping specific critical operations running despite IT outages or other disasters. This effort should include top-level decision-makers who have broad and deep understanding of the company, so it is generally led by the chief information officer (CIO) or chief technology officer (CTO) along with a team of senior business leaders.

Keep in mind that critical operations vary from company to company. For example, a hedge fund company may prioritize its systems for reconciling trades while putting its payroll system on the back burner. Or in the retail industry, a restaurant chain needs to record its daily receipts and place orders each night for the next day’s food deliveries, but it can delay ordering new staff uniforms to another day.

To set its functional requirements, begin with a business impact analysis that will determine the key processes and how they are interdependent, and set a priority order for those functions. Key considerations when evaluating operations: How much revenue is connected to each process, and to what degree will customers be affected. For example, keeping ATMs operating and stocked with cash would be a top concern for a bank since customers expect access to their money whenever they want it.

As a next step, map each process and understand the infrastructure and applications that enable it. Then, set target service levels, both application uptime and RTOs and RPOs, and identify the gap in current capabilities for reaching that target service level. Because of their importance, most ATMs are architected with dual systems and dual data centers, so if one location goes down, there is an automatic switchover to the other. Of course, not every process demands that level of redundancy to prevent downtime.

Phase II: Developing the Strategy

With a firm foundation from the functional requirements phase, the next phase is to architect a solution. The CIO, CTO and other business executives leading the effort would develop and evaluate several strategic options based on achieving specific uptimes and their accompanying costs and risks, illustrated with functional diagrams (e.g., build a new data center compared to relying on a cloud provider for backup). Then, the team should make a recommendation to the broader leadership team.

The solutions should not focus on technology uptime, but instead on productivity or production uptime; in other words, business uptime. Many companies running a complex enterprise resource planning software, for example, cannot afford to shut down production for 12 to 24 hours once a month for upgrades or patches, so they invest in two complete environments and switch between them as they take turns shutting down one system every other month for the software updates.

So the solutions would aim to make the technology as resilient as possible, such as building two call centers, so if one goes down, calls can be quickly rerouted to the other. They also will likely include lower-tech steps, such as the restaurant phoning in its next-day food order when it cannot order online. Once it has validated the chosen option with key stakeholders and secured budget approval, the team builds a strategy implementation roadmap.

Phase III: Implementing the Strategy

At this point, focus on building, then executing detailed designs and implementation project plans for operational resiliency. Because of the dynamic nature of business, the team should put in place processes and safeguards that ensure it takes into account and maintains operational resiliency as it makes changes to business processes and IT.

In addition, create an operational recovery plan for IT disasters, including holding recovery exercises to see how well the plan works. However, with the goal of operational resiliency, the traditional approach to disaster recovery is no longer adequate.

For example, it is common from an IT perspective to track the recovery time of a disruption, beginning with the official declaration, which occurs anywhere from two to 10 hours after the actual interruption takes place. In setting a higher performance bar with operational resiliency, the clock should start ticking when the outage occurs. Starting with the occurrence more accurately reflects the true business continuity gap during a disaster—the difference between business requirements and business capabilities. And occurrence is the measure that customers, employees and other stakeholders will use.

Technology and Business Linked

With technology and business inextricably linked, an IT crisis today also means a crisis for the business. As a result, companies should take action to become stronger and build operational resiliency so they can remain up and running during any type of crisis. This level of operational performance requires a new mindset and bold action by senior management. It starts with making resiliency a cornerstone of corporate strategy and embedding risk management into all technology and business decisions that a company makes every day.

Leave a Comment

You Might Also Like

About the Authors

Popular Today

Slideshows

Videos

@ForsytheTech