In-Depth

From the Trenches: The Confusing Case of the Two-PDC Domain

Getting a new ERP application up and running was only the first challenge for this MCP.

I work as a networking consultant, providing services to a number of small and mid-sized clients in the Chicago area. In my work I’ve seen various disasters, “weird” quandaries, and just plain old problems. But the worst thing I saw was when my single NT domain decided to have two Primary Domain Controllers (PDCs) at once!

I was on-site at a mid-sized steel manufacturer, helping it implement J.D. Edwards’ OneWorld enterprise resource planning (ERP) product. This Windows-based application requires IBM AS/400, an NT server running a SQL database, and fixed TCP/IP addressing. The client software runs on Windows 95 or Windows NT 4.0 Workstation, but any development work (and with any ERP project, there are tons of development work) has to be done on NT Workstation.

The Scene of the Crime

The client had completed the project’s first phase, and so a number of users were “live” on OneWorld, running on the PDC. The developers now needed a test machine for the next phase. I built a standard NT server on a Dell server-class chassis, and added it to the domain as a Backup Domain Controller (BDC) during the install process. So, I now had a fairly simple network, consisting of a PDC (let’s call it “OLD”) a BDC (“NEW”), an AS/400, all in a single NT domain. All devices had the then-latest service packs installed.

About two weeks later OLD crashed due to a hardware failure. It was a fairly quick fix; but to allow users some network services, I promoted NEW to PDC while I fixed OLD. Within a few hours, I got OLD back up. It came up as a PDC in Server Manager. I demoted NEW to BDC in Server Manager without errors—or so I thought. After a brief refresh both Server Manager displays agreed; OLD was the PDC, NEW was the BDC. I had taken all client PCs down during this swap and now asked them to log back in.

The majority of people, including all production users (with Windows 95 PCs), were able to proceed with their work. However, none of the developers with NT workstations could get past the domain login screen. Thanks to another consultant’s oversight, we didn’t have the local user account name or password. The developers’ PCs were now useless.

Red Herrings

While attempting to troubleshoot this problem, I realized that the domain was having more serious trouble. Changes made to user account information on the PDC weren’t being communicated to the BDC, despite repeated use of the “Force Synchronization” command on both machines’ Server Manager. Also, NT workstations were unable to join the domain even on a fresh install.

I checked name resolution, and could PING by name. I installed NetBIOS on both servers, rebooted after hours, and again found that the two PCs weren’t talking. I tried to promote NEW to PDC, and was able to do so without errors. (Of course, that should have failed.) Frustrated and with an unhappy client, I called Microsoft Technical Support.

After retracing my steps with IP name resolution, tech support had me try a command line utility called Nltest (available in the Windows NT 4.0 Resource Kit). This has several options, including “force synchronization” and “query” options. (For more information, see TechNet article Q158148, “Domain Secure Channel Utility: Nltest.exe” on Microsoft’s Web site). The end result was failure. The two servers, OLD and NEW, weren’t talking.

The diagnosis was that the “secure channel” between the two PCs had failed. NT servers use this “secure channel” to pass RPC calls between controllers in a domain or between domains in trust relationships. Specifically, the failure was on OLD—the production server! This was why I couldn’t get my NT workstations to connect, even if I did a clean install. The only reason my Windows 95 production machines were working was because that OS doesn’t integrate into the domain as tightly as NT 4.0 Workstation.

The Solution Revealed

At this point I had one choice: Format OLD’s hard drive, reinstall everything, and restore from backup. This was on Tuesday night. Not wanting to lose the weekend as well as three days of the developers’ work, I pressed for another option. Tech support offered a potential way out: Rename the PDC! I’d been taught that doing this would be equivalent to putting a gun to my head, but I had nothing to lose.

After hours that night I stopped all the services (SQL, backup, and the like), ran a backup, and set all but the minimum services to “manual.” Then I renamed OLD (to GIHTW for “God, I Hope This Works”) and rebooted. Much to my surprise, GIHTW came up clean and declared itself PDC of the domain. More important, changes made on OLD/GIHTW’s User Manager immediately appeared on NEW. Time for step two: Change GIHTW back to OLD and reboot. Again, everything worked fine. The two machines, OLD and NEW, were talking to each other and propagating changes. Plus, I was able to restart all the services—including the critical SQL databases—without incident. Even better, the developers could log into their NT workstations without a hitch.

Epilogue

The results left me happy (and my weekend plans intact). And since I’d prepared the client for the worst (re-install and recover), I looked like a hero to him. Also, I learned two valuable lessons from this situation.

First, you should be very careful about promoting and demoting domain controllers. Although it should work fine, it may not. Likewise, you need to verify that your domain is working—looking at Server Manager isn’t enough.

Second, don’t give up. In my odyssey several people suggested I “just format and re-install.” By being persistent, I got the client up without risk of data loss or excessive overtime charges.

About the Author

Chris Gerrib, MCP, CNE, has been in high-tech for five years, the last four with Hinsdale, Illinois-based consulting firm Information Technologies International. He started out as a “screwdriver holder” for the senior technicians and worked his way up to his current position as VP of Operations. He holds degrees from Southern Illinois University and the University of Illinois.

comments powered by Disqus
Most   Popular

Upcoming Training Events

0 AM
Live! 360 Orlando
November 17-22, 2024
TechMentor @ Microsoft HQ
August 11-15, 2025