Conducting a Disaster-Recovery Test

Spending time on your disaster-recovery test plan (DRP) may not be at the top of your list, but it definitely should be. Having an IT recovery strategy has become crucial for companies of any size, as you never know when trouble might strike. Just because your company is small doesn’t mean you’re not a target for hackers or won’t be hit by other disasters at some point.

Even more importantly, networks and systems alone have become more complicated and powerful, and they are often solely relied on to provide services to customers in an “always on” environment. The same systems also store sensitive data, increasing the potential for problems to occur.

Just having a disaster-recovery plan isn’t enough, however; you need to know that it works effectively and efficiently. Testing can be expensive and time consuming, but by making time to test your DRP you can validate your recovery plan. It also helps to uncover any issues with the plan, as well as procedures that need changing to avoid these issues in the future.

Your plan should take IT, people and processes into account so that every area of the business is covered and fully prepared for when disaster strikes.

How Often Should You Test Your Disaster-Recovery Plan?

Although there is no one right answer for how often you should test your DRP, you will need to make sure that your plan is up to date and reflects your current business systems. A good general guide is to test when these systems change—for example, the lead tester leaves the company or new IT systems have been put in place. The timing could be monthly or annually depending on how frequently your company changes.

By incorporating business and personnel updates into your change management plan, you can ensure that the testing schedule is designed around the way your business operates. But you should always make provisions to ensure you can carry out a full-scale test at least once a year to make sure any minor changes made have not affected other aspects of the DRP.

Preparing for a Test

Before you test your DRP, you must be fully prepared to ensure the outcome is as realistic as possible and the results are useful for future tests. Make sure that everyone with DR responsibilities is involved in each testing process and that more than one person is capable of executing all procedures. This way, your business will be better equipped during a disaster if some individuals are unavailable.

Those who created the DRP should avoid being involved in the testing process. This approach ensures that the instructions are easy to follow and the necessary tasks carried out properly. It also ensures that the plan can be carried out when its designers are absent, and it provides a good idea of how long the process will take to complete without their involvement.

Every detail of the test should be recorded, including any issues that arise and how smoothly the overall operation runs. The test should be timed from start to finish, in addition to timing each individual part of the plan to see how long each takes to complete. The final information to record is the impact the test has on the business: how did the downtime affect overall operations, customers and revenue?

Ways to Test

1. Plan Review

The plan review is the most basic DRP test; it involves the continuity-management and disaster-recovery planners going over existing processes and identifying areas for improvement or potential changes.  This part can be carried out regularly without too much of a drain on resources and should be part of your business schedule a few times a year.

2. Tabletop Exercise

Tabletop exercise is a good to test whether everyone involved is fully aware of the DRP and the procedures they must follow in the event of a disaster. It should be treated as a serious rehearsal.

All personnel must gather and carry out a “walk-through” of a disaster scenario, with a given objective to focus on. Each individual should describe the actions he or she would carry out in certain conditions, making sure that they are in line with the DRP. Analyze each response and work out whether the objective was met. Any misunderstandings of the process, or any lack of clarity in the DRP, should be identified and addressed during this exercise.

3. Full-Scale Test

A full-scale test is where your DRP and processes are validated. It must be as close to a real-life scenario as possible. As a result, you will likely need to spend both time and money carrying one out. You must also account for downtime in systems and personnel, as well as any problems it will cause to your business. Some companies opt to keep the test secret from employees to fully gauge how they react when disaster strikes.

For these tests, you will need to use company resources such as recovery sites and backup systems, and in some cases allow personnel to leave the site to implement backup systems and restart the business technology.

What If Something Goes Wrong?

If something goes wrong during a DR test, it can cause concern, but the purpose of running a test is to identify and resolve these problems so they don’t occur during a real disaster. Any faults made apparent under test conditions will be much more apparent in the event of a real disaster, so it’s vital that any glitches are addressed right away.

Any faults should be recorded in detail during testing so they can then be categorized and investigated after the test is complete. This information should then be used to fix any issues and update the test procedure. As you should be testing your DRP each time changes are made, this is the perfect time to retest with the changes in place to make sure the issues are fully resolved and new problems haven’t popped up. You can keep retesting until your plan runs smoothly and you spot no other issues along the way. Conduct the test with different people and in different scenarios, and make sure everyone knows what they are accountable for.

For help setting up a DRP, you can also speak with IT consultants to ensure your plan is watertight and you haven’t overlooked any important aspects.

Image (adapted) courtesy of Wladimir Labeikovsky under a Creative Commons license

About the Author

Donald K. Bowker, CBCP, ITIL-F, is a Lead Senior Consultant at Sungard Availability Services. With over 30 years in technology services delivery, Don has been involved in hundreds of IT-related projects, spanning all project phases from assessing client needs to solution delivery through ongoing operations. His positions have included 2 years as director of information technology for an international telecommunications firm, 15 years managing technical services for two data centers and 9 years at Sungard Availability Services delivering operational-resiliency solutions at Fortune 500 companies and SMBs in the continental U.S. At Sungard Availability Services, Don is responsible for leading the delivery of strategic projects. His success is measured in terms of customer satisfaction and the delivery of projects on time and within budget. In addition to project-management responsibilities, Don participates directly in delivery of operational-resiliency services. via