In-Depth

Windows 2003 Disaster Recovery Best Practices for the MCSE

Wait long enough and disaster will strike your servers. It's time you developed an effective recovery plan.

As most disaster recovery experts will tell you, when it comes to disaster, it's not a matter of if but when the disaster will strike. If you plan ahead and minimize your risks, not only will you sleep better at night, you'll also be able to recover from a disaster that might otherwise have a significant impact on your organization and career.

Although most major organizations have some kind of disaster recovery plan, it's amazing how many small- to medium-sized companies don't have such plans in place. It's hard to justify not having a disaster recovery plan, regardless of the size of your organization.

For large enterprises, the recovery plan can be expensive and complex, as the financial stakes can be higher. They can't afford to have a down time in their data centers, and their business-critical services must be available 99.999 percent of the time, so they may opt to utilize third-party disaster recovery solutions and/or services.

Smaller organizations might not need the kind of availability that large organizations require, so there's often no incentive to spend on reliability, which can then leave them exposed to potential disaster. And for smaller companies, a recovery plan makes perfect sense because the cost is low, the networks are smaller, the complexity level is lower, and there are several built-in tools from Microsoft that can be used to implement such a plan.

In this article we look at some of the Windows Server 2003 disaster recovery best practices that you as a newly minted MCSE will need to put in place at any business that hires you. Although I focus on business environments, because the tools discussed here are built-in tools available with the Windows operating system, even home networkers will find the advice useful. And while the advice here might seem obvious, it's always a good idea to revisit your disaster recovery plans on a regular basis.

First, What's Important
Information technology resources and assets are crucial to an organization's day-to-day operations. When a server is down or the data is lost, a quick recovery is important. Disaster recovery is the process of resuming normal business operations quickly after the disaster, such as loss of electronic data or computer hardware failure, strikes.

Only you can decide what's important to your organization because every environment is different. The first step in a disaster recovery plan is to identify what's important to your business continuity. Then you need to come up with a plan that's reasonable, practical, fits within your budget and has the blessing of your management.

Finally, you need to test and implement such a plan. You should look at your plan periodically to ensure that it meets your needs. Changes in information technology are beyond your control, so it's up to you to stay on top of newer technologies and update your plan to accommodate such changes.

Further Readings on Disaster Recover at MCPmag.com

Windows 2000 and Windows XP users can benefit from my articles on disaster recovery:

Disaster Anticipation is Key
In an ideal world, you'll be able to avoid a disaster with proper planning. Unfortunately, we don't live in an ideal world. However, this should not discourage you from taking appropriate measures to avoid a disaster.

To avoid disasters, you must anticipate events that can affect your particular business environment. If your Internet Service Provider doesn't offer reliable service, you must plan for that disruption.

Here are some of the actions you can take to minimize risks to your data and hardware:

  • Perform regular backups and keep those backups in a safe and secure place. Ensure that you have a copy of your backup at an off-site location.
  • Use geographically dispersed data centers for redundancy to provide business continuity in case of a regional disaster.
  • Place your servers in a secure locked room and follow the best practices for securing your hardware. When securing hardware, remember to secure servers, routers, switches, hubs, and other network devices.
  • Use uninterruptible power supplies for mission-critical network hardware.
  • Monitor your network and services on a regular basis.

Document or Die
I hate to state the obvious, but a well-developed disaster recovery plan should be written down. Your documented disaster recovery plan should include information on what constitutes a disaster, recovery procedures and guidelines.

Before you document recovery procedures, you must first decide what's at stake. What are you trying to recover? What's crucial to the continuity of your business operations? For some organizations the availability of messaging services will be at the very top of the priority list, while others will consider accessibility to the databases an absolute necessity. A dental clinic may not have a serious impact on the business if its Web server is temporarily down. However, for a business that depends on e-commerce, availability of Web servers will be of utmost importance.

While there's no one-size-fits-all rule that can be applied to all organizations, typically a disaster recovery plan will include electronic data and network infrastructure hardware. Your plan must not include only the network servers, because there may be other devices that are required to provide the continuity in services, such as uninterruptible power supplies, external storage devices, and routers, to name a few.

Once you've identified what you must recover in case of a disaster, you should also decide how quickly you want to recover from the disaster. While it's easy to say the answer should be "right away," that may not be always feasible or affordable. Keep in mind that you may need additional staff and resources at distant locations and at unusual hours. Once these questions have been answered, the next step would be to figure out how much it's going to cost so management can weigh its options and decide if it's within the budget. Once that has been accomplished, you can form a disaster recovery team and make sure that the members know what their roles and duties are. You will test your plan by simulating a disaster, and finally document the entire plan.

To summarize what you must keep in mind when developing your plan, here are some of the questions that you should address:

  • What constitutes a disaster for your organization?
  • How quickly will you be able to recover from the disaster?
  • Do you have the necessary staff and resources to continue your operations after a disaster?
  • What will be the effect of down time on your business?
  • Who will be responsible for the overall disaster recovery plan?
  • Are the individual team members trained on what to do when the disaster strikes?

Because I'd like to focus on best practices using a bare-metal Windows server, let's look at some of the built-in tools available for Windows Server 2003 that can be used to recover from a disaster. These tools include Backup, Automated System Recovery, Recovery Console, Safe Mode, Last Known Good, and Shadow Copies. (For more details on each of these features, MCPmag.com has covered each feature extensively.)

Test Your Backups
Performing regular backups is an essential part of any data recovery plan, but so is testing it regularly. Some administrators test the plan only once at the time they develop the disaster recovery plan. This is a risky proposition. I suggest you make trial restorations a part of your regular management plan. You may want to consider trial restorations once every three months.

When you perform the backups, include system and boot volume on the servers, along with all the data. This will allow you to restore the server quickly in case of a hard disk failure.

The Backup utility in Windows Server 2003 allows you to create a log. By default, only the summary information is logged. This only includes certain key operations, such as loading of a tape or failure to open a certain file. You must select the Detailed option to log important information about files and folders, such as the name, date, time, and the attributes (see Figure 1). It's a good idea to print a copy of the backup log.

Detailed Backup Logs
Figure 1. Be sure to click the Detailed button to include more information in the Backup Log.

Create an Automated System Recovery set when you install new hardware, service packs or make other major changes to the operating system. This will help in recovering from a system failure. However, you must back up your data separately, because data is not included in the ASR back set.

Secure your backup media and storage devices. Also, ensure that the backup media is kept in a secure location in an appropriate storage that's environmentally controlled. If backup tapes are kept in a fireproof safe that was designed to protect paper, you should be using a safe that's specifically meant to protect tape media. Tapes offer a convenient and sometimes inexpensive solution to backup data, but they also have too many issues, such as reliability and shelf life (read this article for more information). If longer shelf life is important to your business, consider using optical drives, DVDs or other such technologies for backup storage.

Tip: The built-in Backup program doesn't support backing up data directly to a CD device. However, you can first back up data to a hard drive and then copy the backup file to CD media, such as CD-R or DVD-R disk. The Backup program supports restoring data from a CD-R, CD-RW or DVD-R disk.

There are several storage solutions available today; do some research and select a solution that meets most if not all your business requirements. Because every media is different, follow the manufacturer's recommendations for rotating, storing, and discarding the media. You might take chances at a Las Vegas casino, but don't do it with your company's data by using bad backup media.

Microsoft recommends you retain at least three backup copies, one of which should be kept off-site. This will offer redundancy in case of media failure. While it may seem that three copies is overkill, when it comes to backing up your corporate data, it's a wise choice.

Recovery Console
The Recovery Console can be useful for performing administrative tasks such as enabling or disabling services and device drivers, and for copying files to local hard disks (including NTFS drives). The Recovery Console can either be started using the Setup CD or you can install it locally on your computer. As a best practice, you should install Recovery Console on all your x86-based servers so it's available as a menu option at system startup. You cannot install Recovery Console on an Itanium-based computer but you can still use the Setup CD to start the Recovery Console.

Do not enable the Recovery console: Allow automatic administrative log-on setting under Security Options (see Figure 2). This will make it easier for anyone with physical access to get into your system with administrative credentials.

Tip: On Domain Controllers, update the password for the local Administrator account in Security Accounts Manager (SAM) whenever you update the Active Directory Administrator password. Although the local SAM Administrator password can be changed using NTDSUTIL (set DSRM password) while you are running Active Directory, if you forget the Administrator password for local SAM and your system is not bootable, you will not be able to logon to the Recovery Console.

Setting up Recovery Console
Figure 2. Do not allow automatic administrative logon for the Recovery Console.

Volume Shadow Copies
Volume Shadow Copies are a Windows Server 2003 feature that provides point-in-time copies of files that are located on a network server. Users can quickly restore files that they accidentally deleted from the server without involving the Help Desk. They can also compare the versions of files using the Previous Versions client.

Do not use Volume Shadow Copies on computers that are dual booting, or on volumes that use mount points. By default Volume Shadow Copies are created Monday through Friday at 7 a.m. and 12 p.m. If you want to change the default, make sure you have enough disk space because there is an upper limit of 64 copies per volume before the oldest copy is deleted in favor of newer copies. In addition, scheduling Volume Shadow Copies too often can have an impact on the file server's performance. Microsoft recommends that you schedule Volume Shadow Copies for no more than once per hour.

When using the Windows Backup program to back up your system, make sure that you do not disable the default Volume Shadow Copy backup method. Otherwise, any open system files will not be included in your backup.

For better performance and for reliability create Volume Shadow Copies on a separate disk. This will prevent deletion of Volume Shadow Copies due to high I/O load. A plus is that, on busy servers, it will improve performance. When you create a volume for Volume Shadow Copies, set the cluster size to be at least 16 KB, especially if you plan to defragment the volume. If your cluster size is smaller than 16 KB, chances are that the number of changes caused by the defragmentation may cause some Volume Shadow Copies to be deleted.

Tip: If you decide to delete a Volume Shadow Copy volume at some time, make sure that you first delete the scheduled task of creating Volume Shadow Copies before deleting the volume. Otherwise, your event log will be filled with Event ID: 7001 errors indicating that it cannot create Volume Shadow Copies as scheduled.

Warning! Volume Shadow Copies are not a substitute for regular backups. You must not rely on Volume Shadow Copies as your only source of recovering files.

Automated System Recovery
As mentioned earlier, as a best practice you should include ASR backups as part of your system recovery plan. Create ASR backups when you install new hardware or service packs to help you recover from a system failure. However, ASR does have its limitations and it should be used only as a last resort after you've tried other recovery methods, such as Safe Mode, Last Known Good configuration and the Recovery Console.

Remember to back up your data separately because ASR backup sets don't include data.

Tip: One limitation of ASR is that it only supports up to 2.1 GB FAT16 volumes. It doesn't support 4 GB FAT 16 partitions that use 64K clusters. If you are currently using 4 GB FAT16 partitions, you should convert them to NTFS before creating ASR backup sets.

Managing System Startup Behavior
Clicking on Control Panel | System and then the Advanced tab allows you to configure your system's Startup and Recovery settings. You can configure the duration of the startup menu. You can also configure some settings for system failure, such as sending an administrative alert and writing events to the system log. In general, it's a good idea to leave your system setting to the default behavior, which is to restart automatically after a system failure.

Hopefully disk space is not an issue on your server. If you have lots of disk space available, configure the server for a complete memory dump (see Figure 3). In case of a system crash, it will write the information to the memory.dmp file, which you can then use for debugging.

Debugging Disasters
Figure 3. Be sure to do a Complete Memory Dump, which can be useful for debugging recurring disasters.

Your Job Is Never Done
Finally, here are some additional considerations to keep in mind when planning for a disaster recovery:

  • Make sure your servers support booting from CD-ROM. This requires that your computer's BIOS support the El Torito Bootable CD-ROM format (no emulation mode).
  • Install Recovery Console on all servers for quick recovery.
  • Keep the source CD for operating systems handy.
  • Create a Windows startup disk to work around certain issues, such as damaged boot sectors, damaged master boot records (MBR), damaged system files (NTLDR, NTDETECT.COM, etc.), or virus infections.

Disasters are unavoidable, but at least we can be better prepared to minimize the effect of disasters by carefully planning, testing, implementing and occasionally updating our backup and recovery policy. If you haven't already implemented such a policy, it's never too late to start. A disaster recovery policy is pretty similar to an insurance policy in some ways—you hope that you never ever have to use it, but if you have it in place, it definitely gives you peace of mind.

comments powered by Disqus
Most   Popular