Tech Line
Disk Signature Disaster
Saving yourself some pain when replacing failed cluster disks.
Chris: We recently had a hard disk fail on or Windows 2003
cluster and it was an absolute nightmare. I replaced the failed disk and
could not get the cluster server to recognize the new disk in order to
restore the missing disk files from backup. I assigned the same drive
letter to the new disk, but each time I would try and bring the disk resource
online, it would fail.
Since we were in a pinch, we decided to scrap the cluster and start again
from scratch. After rebuilding the cluster, we were able to restore files
to the two cluster virtual servers from backup. I'm sure there has to
be an easier way to recover a failed cluster. What else could I have done?
James
Tech HelpJust An
E-Mail Away |
Got a Windows, Exchange or virtualization question
or need troubleshooting help? Or maybe you want a better
explanation than provided in the manuals? Describe
your dilemma in an e-mail to the MCPmag.com editors
at mailto:[email protected];
the best questions get answered in this column and garner
the questioner with a nifty MCPmag.com baseball-style
cap.
When you send your questions, please include your
full first and last name, location, certifications (if
any) with your message. (If you prefer to remain anonymous,
specify this in your message, but submit the requested
information for verification purposes.)
|
|
|
After talking with James, I learned that his cluster ran two virtual
servers: a virtual file server and a virtual print server. Each virtual
server resided in its own group on the cluster. With this relatively simple
setup, rebuilding his cluster did not take a long time. Since the point
of having a cluster is high availability, taking down an entire cluster
is never the best option. The reason James had this problem is due to
how Microsoft Cluster Service (MSCS) treats disk signatures.
The MSCS associates physical disk resources by the disk signature that's
written to each physical disk when the disk is initialized by a Windows
OS. If you replace a physical disk within the cluster, the Cluster service
will see the original disk as failed and will not even see the new disk.
To have the new disk seen as the original disk, the original disk's signature
reference in the cluster configuration must match the new disk. While
there are a few tools that can do this, by far the easiest method is to
associate the new disk with the failed disk is by running the Server Cluster
Recovery Utility.
The Server Cluster Recovery Utility is included in the Windows Server
2003 Resource Kit and can be downloaded from Microsoft at http://www.microsoft.com/downloads.
This tool is especially useful when replacing a shared cluster disk or
in a disaster recovery scenario when a cluster is being rebuilt using
new physical disk resources. Oftentimes, after a cluster quorum is restored,
physical disk resources will still not be able to come online. That's
because the signature for the disks stored in the cluster configuration
does not match the signature of the new disks. In these instances, the
Server Cluster Recovery Utility can be used to return the disks to a usable
state.
To use the Server Cluster Recovery Utility, first install the replacement
disk and use Disk Management to initialize and format the new disk as
NTFS. Then go to Cluster Administrator and create a new resource for the
newly added physical disk. Here are the steps:
- In Cluster Administrator, right-click the Resources container, select
New, and then click Resource.
- In the New Resource dialog box, enter a name for the new resource,
select "Physical Disk" as the resource type, and then select
the group in which to associate the resource.
- Select the possible owners for the disk (same as original disk) and
click Next.
- In the Dependencies dialog box, click Next.
- The newly added disk should be displayed in the Disk drop-down menu.
Select the disk and click Finish.
With the newly installed disk associated with the cluster, you can now
use it to replace the failed disk resource. To do this, first ensure that
the Windows Server 2003 Resource Kit Tools are installed on the node you
plan to perform the procedure on and then follow these steps:
- Run clusterrecovery.exe to open the Server Cluster Recovery Utility.
- Once the tool opens, enter the name of your cluster in the Cluster
Name field. Then select the "Replace a physical disk resource"
radio button and click Next.
- Select the original (failed) disk in the "Old physical disk
resource" drop-down menu and then select the new physical disk
from the "New physical disk resource" drop-down menu. Then
click Replace.
- Next you are given a friendly reminder from the Server Cluster Recovery
Utility to delete the original disk resource and then change the drive
letter of the new disk resource so that it matches the drive letter
assigned to the original (failed) disk. Click OK.
- Click Exit to close the Server Cluster Recovery Utility.
- In Cluster Administrator, locate the failed disk resource. The failed
disk resource will be easy to spot in Cluster Administrator because
it will have the word "(lost)" next to its name. Right-click
on the lost resource and select Delete. When prompted to confirm, click
Yes.
- Use Disk Management to change the drive letter associated with the
new disk.
At this point, you can bring the virtual server resources back online
and restore the original virtual server data from backup.
Note that some resources may fail to come online. For example, a File
Share resource will fail if the original folder that the resource is associated
with is not present. After the backup is restored, you will be able to
bring all resources in the group (virtual server) online. Also keep in
mind that depending on how your enterprise back-up software is configured,
you'll most likely need to reinstall your back-up agent software into
the virtual server in order to perform the restore.
Before the days of the Server Cluster Recovery Utility, cluster disk
recovery was fraught with pain. As soon as I would hear of a problem,
my mind would instantly fill up with the burnt tooth smell that serves
as an ominous sign at most dentist offices. Now when I hear of a cluster
disk failure, I just smile from ear to ear. This could mean that either
I'm comforted by the ease of the Server Cluster Recovery Utility, or that
my sanity is starting to return!