In-Depth
Windows 2003 Disaster Recovery Best Practices for the MCSE
Wait long enough and disaster will strike your servers. It's time you developed an effective recovery plan.
As most disaster recovery experts will tell you, when it comes to disaster,
it's not a matter of
if but
when the disaster will strike.
If you plan ahead and minimize your risks, not only will you sleep better
at night, you'll also be able to recover from a disaster that might otherwise
have a significant impact on your organization and career.
Although most major organizations have some kind of disaster recovery
plan, it's amazing how many small- to medium-sized companies don't have
such plans in place. It's hard to justify not having a disaster recovery
plan, regardless of the size of your organization.
For large enterprises, the recovery plan can be expensive and complex,
as the financial stakes can be higher. They can't afford to have a down
time in their data centers, and their business-critical services must
be available 99.999 percent of the time, so they may opt to utilize third-party
disaster recovery solutions and/or services.
Smaller organizations might not need the kind of availability that large
organizations require, so there's often no incentive to spend on reliability,
which can then leave them exposed to potential disaster. And for smaller
companies, a recovery plan makes perfect sense because the cost is low,
the networks are smaller, the complexity level is lower, and there are
several built-in tools from Microsoft that can be used to implement such
a plan.
In this article we look at some of the Windows Server 2003 disaster recovery
best practices that you as a newly minted MCSE will need to put in place
at any business that hires you. Although I focus on business environments,
because the tools discussed here are built-in tools available with the
Windows operating system, even home networkers will find the advice useful.
And while the advice here might seem obvious, it's always a good idea
to revisit your disaster recovery plans on a regular basis.
First, What's Important
Information technology resources and assets are crucial to an organization's
day-to-day operations. When a server is down or the data is lost, a quick
recovery is important. Disaster recovery is the process of resuming normal
business operations quickly after the disaster, such as loss of electronic
data or computer hardware failure, strikes.
Only you can decide what's important to your organization because every
environment is different. The first step in a disaster recovery plan is
to identify what's important to your business continuity. Then you need
to come up with a plan that's reasonable, practical, fits within your
budget and has the blessing of your management.
Finally, you need to test and implement such a plan. You should look
at your plan periodically to ensure that it meets your needs. Changes
in information technology are beyond your control, so it's up to you to
stay on top of newer technologies and update your plan to accommodate
such changes.
Further
Readings on Disaster Recover at MCPmag.com |
Windows 2000 and Windows XP users can benefit from
my articles on disaster recovery:
|
|
|
Disaster Anticipation is Key
In an ideal world, you'll be able to avoid a disaster with proper planning.
Unfortunately, we don't live in an ideal world. However, this should not
discourage you from taking appropriate measures to avoid a disaster.
To avoid disasters, you must anticipate events that can affect your particular
business environment. If your Internet Service Provider doesn't offer
reliable service, you must plan for that disruption.
Here are some of the actions you can take to minimize risks to your data
and hardware:
- Perform regular backups and keep those backups in a safe and secure
place. Ensure that you have a copy of your backup at an off-site location.
- Use geographically dispersed data centers for redundancy to provide
business continuity in case of a regional disaster.
- Place your servers in a secure locked room and follow the best practices
for securing your hardware. When securing hardware, remember to secure
servers, routers, switches, hubs, and other network devices.
- Use uninterruptible power supplies for mission-critical network hardware.
- Monitor your network and services on a regular basis.
Document or Die
I hate to state the obvious, but a well-developed disaster recovery plan
should be written down. Your documented disaster recovery plan should
include information on what constitutes a disaster, recovery procedures
and guidelines.
Before you document recovery procedures, you must first decide what's
at stake. What are you trying to recover? What's crucial to the continuity
of your business operations? For some organizations the availability of
messaging services will be at the very top of the priority list, while
others will consider accessibility to the databases an absolute necessity.
A dental clinic may not have a serious impact on the business if its Web
server is temporarily down. However, for a business that depends on e-commerce,
availability of Web servers will be of utmost importance.
While there's no one-size-fits-all rule that can be applied to all organizations,
typically a disaster recovery plan will include electronic data and network
infrastructure hardware. Your plan must not include only the network servers,
because there may be other devices that are required to provide the continuity
in services, such as uninterruptible power supplies, external storage
devices, and routers, to name a few.
Once you've identified what you must recover in case of a disaster, you
should also decide how quickly you want to recover from the disaster.
While it's easy to say the answer should be "right away," that
may not be always feasible or affordable. Keep in mind that you may need
additional staff and resources at distant locations and at unusual hours.
Once these questions have been answered, the next step would be to figure
out how much it's going to cost so management can weigh its options and
decide if it's within the budget. Once that has been accomplished, you
can form a disaster recovery team and make sure that the members know
what their roles and duties are. You will test your plan by simulating
a disaster, and finally document the entire plan.
To summarize what you must keep in mind when developing your plan, here
are some of the questions that you should address:
- What constitutes a disaster for your organization?
- How quickly will you be able to recover from the disaster?
- Do you have the necessary staff and resources to continue your operations
after a disaster?
- What will be the effect of down time on your business?
- Who will be responsible for the overall disaster recovery plan?
- Are the individual team members trained on what to do when the disaster
strikes?
Because I'd like to focus on best practices using a bare-metal Windows
server, let's look at some of the built-in tools available for Windows
Server 2003 that can be used to recover from a disaster. These tools include
Backup, Automated System Recovery, Recovery Console, Safe Mode, Last Known
Good, and Shadow Copies. (For more details on each of these features,
MCPmag.com has covered each feature extensively.)
Test Your Backups
Performing regular backups is an essential part of any data recovery plan,
but so is testing it regularly. Some administrators test the plan only
once at the time they develop the disaster recovery plan. This is a risky
proposition. I suggest you make trial restorations a part of your regular
management plan. You may want to consider trial restorations once every
three months.
When you perform the backups, include system and boot volume on the servers,
along with all the data. This will allow you to restore the server quickly
in case of a hard disk failure.
The Backup utility in Windows Server 2003 allows you to create a log.
By default, only the summary information is logged. This only includes
certain key operations, such as loading of a tape or failure to open a
certain file. You must select the Detailed option to log important information
about files and folders, such as the name, date, time, and the attributes
(see Figure 1). It's a good idea to print a copy of the backup log.
|
Figure 1. Be sure to click the Detailed button
to include more information in the Backup Log. |
Create an Automated System Recovery set when you install new hardware,
service packs or make other major changes to the operating system. This
will help in recovering from a system failure. However, you must back
up your data separately, because data is not included in the ASR back
set.
Secure your backup media and storage devices. Also, ensure that the backup
media is kept in a secure location in an appropriate storage that's environmentally
controlled. If backup tapes are kept in a fireproof safe that was designed
to protect paper, you should be using a safe that's specifically meant
to protect tape media. Tapes offer a convenient and sometimes inexpensive
solution to backup data, but they also have too many issues, such as reliability
and shelf life (read this
article for more information). If longer shelf life is important
to your business, consider using optical drives, DVDs or other such technologies
for backup storage.
Tip: The built-in Backup program doesn't support backing
up data directly to a CD device. However, you can first back up data
to a hard drive and then copy the backup file to CD media, such as CD-R
or DVD-R disk. The Backup program supports restoring data from a CD-R,
CD-RW or DVD-R disk.
There are several storage solutions available today; do some research
and select a solution that meets most if not all your business requirements.
Because every media is different, follow the manufacturer's recommendations
for rotating, storing, and discarding the media. You might take chances
at a Las Vegas casino, but don't do it with your company's data by using
bad backup media.
Microsoft recommends you retain at least three backup copies, one of
which should be kept off-site. This will offer redundancy in case of media
failure. While it may seem that three copies is overkill, when it comes
to backing up your corporate data, it's a wise choice.
Recovery Console
The Recovery Console can be useful for performing administrative tasks
such as enabling or disabling services and device drivers, and for copying
files to local hard disks (including NTFS drives). The Recovery Console
can either be started using the Setup CD or you can install it locally
on your computer. As a best practice, you should install Recovery Console
on all your x86-based servers so it's available as a menu option at system
startup. You cannot install Recovery Console on an Itanium-based computer
but you can still use the Setup CD to start the Recovery Console.
Do not enable the Recovery console: Allow automatic administrative log-on
setting under Security Options (see Figure 2). This will make it easier
for anyone with physical access to get into your system with administrative
credentials.
Tip: On Domain Controllers, update the password for the
local Administrator account in Security Accounts Manager (SAM) whenever
you update the Active Directory Administrator password. Although the
local SAM Administrator password can be changed using NTDSUTIL (set
DSRM password) while you are running Active Directory, if you forget
the Administrator password for local SAM and your system is not bootable,
you will not be able to logon to the Recovery Console.
|
Figure 2. Do not allow automatic administrative
logon for the Recovery Console. |
Volume Shadow Copies
Volume Shadow Copies are a Windows Server 2003 feature that provides point-in-time
copies of files that are located on a network server. Users can quickly
restore files that they accidentally deleted from the server without involving
the Help Desk. They can also compare the versions of files using the Previous
Versions client.
Do not use Volume Shadow Copies on computers that are dual booting, or
on volumes that use mount points. By default Volume Shadow Copies are
created Monday through Friday at 7 a.m. and 12 p.m. If you want to change
the default, make sure you have enough disk space because there is an
upper limit of 64 copies per volume before the oldest copy is deleted
in favor of newer copies. In addition, scheduling Volume Shadow Copies
too often can have an impact on the file server's performance. Microsoft
recommends that you schedule Volume Shadow Copies for no more than once
per hour.
When using the Windows Backup program to back up your system, make sure
that you do not disable the default Volume Shadow Copy backup method.
Otherwise, any open system files will not be included in your backup.
For better performance and for reliability create Volume Shadow Copies
on a separate disk. This will prevent deletion of Volume Shadow Copies
due to high I/O load. A plus is that, on busy servers, it will improve
performance. When you create a volume for Volume Shadow Copies, set the
cluster size to be at least 16 KB, especially if you plan to defragment
the volume. If your cluster size is smaller than 16 KB, chances are that
the number of changes caused by the defragmentation may cause some Volume
Shadow Copies to be deleted.
Tip: If you decide to delete a Volume Shadow Copy volume
at some time, make sure that you first delete the scheduled task of
creating Volume Shadow Copies before deleting the volume. Otherwise,
your event log will be filled with Event ID: 7001 errors indicating
that it cannot create Volume Shadow Copies as scheduled.
Warning! Volume Shadow Copies are
not a substitute for regular backups. You must not rely on Volume Shadow
Copies as your only source of recovering files.
Automated System Recovery
As mentioned earlier, as a best practice you should include ASR backups
as part of your system recovery plan. Create ASR backups when you install
new hardware or service packs to help you recover from a system failure.
However, ASR does have its limitations and it should be used only as a
last resort after you've tried other recovery methods, such as Safe Mode,
Last Known Good configuration and the Recovery Console.
Remember to back up your data separately because ASR backup sets don't
include data.
Tip: One limitation of ASR is that it only supports up
to 2.1 GB FAT16 volumes. It doesn't support 4 GB FAT 16 partitions that
use 64K clusters. If you are currently using 4 GB FAT16 partitions,
you should convert them to NTFS before creating ASR backup sets.
Managing System Startup Behavior
Clicking on Control Panel | System and then the Advanced tab allows you
to configure your system's Startup and Recovery settings. You can configure
the duration of the startup menu. You can also configure some settings
for system failure, such as sending an administrative alert and writing
events to the system log. In general, it's a good idea to leave your system
setting to the default behavior, which is to restart automatically after
a system failure.
Hopefully disk space is not an issue on your server. If you have lots
of disk space available, configure the server for a complete memory dump
(see Figure 3). In case of a system crash, it will write the information
to the memory.dmp file, which you can then use for debugging.
|
Figure 3. Be sure to do a Complete Memory Dump,
which can be useful for debugging recurring disasters. |
Your Job Is Never Done
Finally, here are some additional considerations to keep in mind when
planning for a disaster recovery:
- Make sure your servers support booting from CD-ROM. This requires
that your computer's BIOS support the El Torito Bootable CD-ROM format
(no emulation mode).
- Install Recovery Console on all servers for quick recovery.
- Keep the source CD for operating systems handy.
- Create a Windows startup disk to work around certain issues, such
as damaged boot sectors, damaged master boot records (MBR), damaged
system files (NTLDR, NTDETECT.COM, etc.), or virus infections.
Disasters are unavoidable, but at least we can be better prepared to
minimize the effect of disasters by carefully planning, testing, implementing
and occasionally updating our backup and recovery policy. If you haven't
already implemented such a policy, it's never too late to start. A disaster
recovery policy is pretty similar to an insurance policy in some waysyou
hope that you never ever have to use it, but if you have it in place,
it definitely gives you peace of mind.