IT Focus Area: Infrastructure Optimization
December 5, 2011
5 Steps to Storage Sanity
Organizations of all sizes are desperate for ways to control exploding data storage growth, thanks in part to the digitization of forms, images, and video, and increased use of messaging and social networking applications. Much of this information has to be stored for longer and longer periods—sometimes forever—thanks to numerous regulations, and has to be backed up and protected at a time when backup and maintenance windows are disappearing.
According to International Data Corporation (IDC), worldwide digital information has grown five-fold in just four years, and will reach 2,500 exabytes by 2012. Most organizations have seen data growth rates of anywhere from 30 to 100 percent or more annually. The capital (CAPEX) and operational (OPEX) expenses of purchasing new storage constantly and keeping backup and disaster recovery systems up to date have become a significant burden on information technology (IT) budgets.
There are many emerging technologies and products available to help organizations tame the data storage beast. Most IT departments have heard of deduplication, virtualization, thin provisioning, and continuous data protection. In a time of understaffed IT departments with limited budgets, however, the rapid progress in storage technologies can be daunting. Many struggle with challenges such as:
How to understand and track all the new technologies that impact storage management and backup.
How to decide which technologies and products make sense for an organization’s business requirements and IT infrastructure.
How and when to introduce new technologies and address backup issues without generating excessive CAPEX and OPEX costs and disrupting operations.
How to cope with the time and expense of training IT staff to work with these new technologies.
Addressing these issues can be extremely challenging given the blurring pace of storage growth and the complexity of solutions for addressing it. However, there is a viable, sane strategy organizations can follow to make the job easier.
Step 1: Understand Your Data
Many organizations pay a lot more for expensive high-performance storage hardware than they need to. Why? Because they don’t understand their data. In many cases, only a portion of organizational data is mission critical enough to be stored on the highest-performing storage. Many organizations store a much higher percentage of data this way than is necessary.
Before addressing strategies for managing storage, backup, compliance, and disaster recovery, it makes sense first to understand your data. What types of data are used and stored in your organization, where does that data come from, where does it go, where is it stored, and how is it protected?
It’s important to take into account both structured data typically stored in relational databases and the growing volume of unstructured data used by organizations’ file, messaging, and media servers. Don’t forget mobile data. Often data stored on dispersed mobile devices is some of the most recent and important data an organization possesses.
Once IT understands where data resides and how it moves through the infrastructure and organization, it can start assessing its relative importance. For example, evaluate which data is mission critical and must have the highest level of access, performance, security, and protection. Frequently this is data that is less than 24 hours old, such as the day’s trades in a financial firm. Also consider which data is important but not strategic and can have a lower level of performance and protection. Frequently this is slightly older data—between 24 and 72 hours old, for example. And finally, identify which data would make up another layer with an even less stringent level of performance and protection. Understanding these factors will help organizations create storage tiers that can reduce the requirement for the highest performance, costliest storage, disaster recovery, and backup systems.
Examining all your data will require performing something resembling an audit and can be done most effectively via meetings between IT and the business units. The business units can help IT understand the types of data they use, where it resides, and how critical it is to business revenues and day-do-day operations. Many organizations will also want to include legal counsel to determine which data is affected by compliance issues and how that data might be handled.
Business units can also provide crucial insight into future growth affecting storage, such as acquisitions and new initiatives and lines of business.
Step 2: Understand Your Data Infrastructure
As you audit your organization’s data, take the opportunity to gain a deeper understanding of the storage and other infrastructure used to store it. What storage hardware types and products are used; what are their ages; which systems are under and overutilized; what performance, management, virtualization, and other features are being used; and, perhaps more important, what features are offered by the manufacturer that are not used? Also important is how these systems integrate and work or don’t work together. Do the same for all your backup and disaster recovery hardware, software, and cloud services, if applicable.
This information will prove critical when you get to a future step of evaluating new technologies and their applicability to your organization. You will need to assess the level of upgrading necessary for their use and potential CAPEX and OPEX costs and the disruption that introducing these new technologies will bring to your daily operations. You may be surprised to learn that many of the technologies and features you require are actually included in the hardware management tools or backup software you already use, or may be obtained with a relatively simple upgrade.
Indeed, the lines between backup software and storage and disaster recovery solutions have blurred, with many of today’s widely used backup solutions offering technologies such as continuous data protection, array-based snapshots, deduplication, and archiving. Deduplication is also offered as a feature of many virtual tape library and network attached storage (NAS) products. Using what you already have may be less disruptive to operations than introducing brand-new infrastructure with all its cost, integration, and training headaches. The advantages have to be weighed against your performance and infrastructure integration needs.
Step 3: Create and Update Policies
Once you understand your data and its infrastructure, it’s time to take a fresh look at your backup and retention policies, with an eye on the knowledge you’ve gained of the relative importance of different data sources and compliance requirements. You will have to identify the business owners of each data source and work with them and legal counsel to translate regulations into specific policies for storage, protection, security, and retention. Setting a schedule for updating policies regularly is also a good best practice.
Software and other tools are available to create and document policies and workflow that can be used to automatically enforce policies electronically, but you’ll still have to define the policies first.
Step 4: Start with Tiering
Once you understand which data requires the highest levels of performance storage, backup, and disaster recovery, start implementing a storage tiering strategy. You can extend the useful life of your most expensive storage hardware and cut CAPEX costs dramatically by moving non-mission-critical data to less expensive storage, with less cost-intensive disaster recovery (DR). You also reduce the OPEX costs of managing and maintaining high-performance storage and disaster recovery over time.
Many organizations implement at least three tiers of storage, such as business critical, business normal, and storage, which has to be retained but is accessed very rarely.
Step 5: Move on to Data Deduplication and eDiscovery
The most effective next step, in our long consulting experience, is to reduce the sheer volume of data with data deduplication technologies. Data deduplication is one of the most effective ways to tame the storage beast with the highest return, both immediate and in the long run.
An effective deduplication strategy can reduce storage CAPEX and OPEX costs dramatically and enhance service levels in several ways:
Having less data to store—up to a factor of 20 to 1 for many deduplication implementations—immediately translates into lower equipment costs. Existing storage can be retained longer and demand for future capacity is greatly reduced. Our case studies show that CAPEX savings can be dramatic in the second or third year after a deduplication initiative.
Savings in network bandwidth, data center real estate, power, cooling, and management resources are also dramatic when the sheer volume of storage is reduced significantly.
With the size of data reduced, archival data is more accessible and backup window requirements are reduced.
A major financial exchange saw spending on storage technology drop by 80 percent with deduplication, from $10 million annually to $2 million over five years, plus measurable service-level agreement (SLA) and uptime improvements, even as the company data requirements continued to grow at a fast pace.
Your data and infrastructure discovery process will point you to the best place to start your deduplication efforts to get the biggest bang for your buck. It might be source code archives, information required for legal requirements, or personal storage tables. Typically, deduplication is best applied to one of the less business-critical tiers. Apply what you’ve learned in your data audit.
Implementing deduplication will require an extensive evaluation of deduplication architectures, such as in-line, post-process, object, and block-based deduplication, as well as source- and target-based implementations to determine which meet your architectural and performance needs. Your choice will also be based on ease of integration with other technologies you’re considering and the rest of your storage, backup, and DR infrastructure and the practicality of using features already offered by your existing hardware, software, and backup infrastructure.
Another essential place to begin or to consider a major upgrade is eDiscovery. Companies that think they can get away with not addressing eDiscovery until an audit or lawsuit risk steep fines and painful, time-consuming eDiscovery quagmires. If you have no eDiscovery strategies or technologies in place, now is the time to do it. If you haven’t evaluated existing eDiscovery policies and methods, it’s probably time to use your data and infrastructure and analysis to implement a significant upgrade.
Eye on Cost and Complexity
In evaluating technologies for deduplication, eDiscovery and other purposes, always keep an eye on cost and complexity. Look at the tools you already have in place and evaluate whether they can be upgraded to offer the new functionality you require. If you can upgrade easily and the technology integrates well with your existing infrastructure, then execute.
If not, take into account whether the new solutions you’re evaluating can integrate easily with each other and your existing infrastructure and the impact they will have on data access and backup. If an upgrade presents many risks and unknowns, it’s important to identify risks, whether staff, technology, or business related, and build a detailed, effective remediation plan.
The number of solutions and interactions are endless, so it’s important to get help where you need it. By following a clear, effective strategy, you really can tame the storage beast.