In-Depth
Get Active Directory Replication Right!
There’s a method to the madness of Active Directory replication, but many of the concepts can be tough to decipher...
There’s a method to the madness of Active Directory replication, but
many of the concepts can be tough to decipher. The following four tales
demonstrate the range of problems you can encounter with this little-understood
aspect of AD.
Scene 1: Deny the Defaults
A company with a hub and spoke network topology was unable to figure
out why they had so many AD replication connections across their WAN links.
They were tapping into one of AD’s most important features—the ability
to use Sites to control replication and authentication traffic. Their
AD design called for a single domain spanning all their WAN links, so
they really needed to make sure replication was as tuned as possible.
I’d seen the same problem they were having on numerous occasions and was
confident of the solution.
The first thing I told them was they hadn’t done anything wrong with
their configuration. Instead, they were seeing the results of a particular
default AD setting. In order to minimize the amount of replication latency
in AD, all Site Links are bridged by default. This means a domain controller
from any site will try to create replication connections to DCs in any
other site that has a DC from the same domain. In addition to that connection,
there will also be a replication connection between every Site in order
to replicate AD Schema and Configuration information.
Figure 1 shows that there were connection objects between all of the
DCs in all the Sites. This is because all four DCs are from the same domain
and, by default, all the Site Links are bridged. Changing the default
settings involves the following steps:
- Open AD Sites and Services.
- Expand InterSite-Transports.
- Right click on IP and select Properties.
- On the General tab uncheck the box that says “Bridge all Site Links.”
After you uncheck this box, the number of replication connections will
be reduced, after the Knowledge Consistency Checker (KCC) runs on every
DC in the topology. This happens every 15 minutes by default but can be
triggered manually through AD Sites and Services by highlighting the NTDS
Settings under each DC, clicking Action and selecting, “Check Replication
Topology”. Figure 2 shows how the replication connections changed after
removing the Bridge All Sites feature.
|
Figure 1. Before: Leaving the default Site Link
settings results in a plethora of connection objects and lots of replication
traffic. |
|
Figure 2. After: The same domain after making
changes to the Site Bridging properties. |
The company’s last requirement was to reduce replication latency between
their two manufacturing sites. To facilitate this, we simply created a
site link bridge and added the two manufacturing site links to it.
Scene 2: Satellite Slowdown
A company with several satellite WAN links was having problems
getting AD replication to complete successfully. The WAN link bandwidth
should have been enough to allow smooth replication, and they were confused
about why it wasn’t happening. This was an issue I’d dealt with myself
a few years ago, so I was familiar with the problems they were having.
Figure 3 shows an example of their setup.
The first problem was that satellite links are notorious for having higher
amounts of latency than other connections like frame relay. AD uses Remote
Procedure Calls (RPC) as its default replication protocol; RPC is extremely
susceptible to network latency. The first thing I suggested was the possibility
of upgrading their WAN links from the satellite connections they were
using. They said no, since they were committed to making replication work
over their current connections.
|
Figure 3. Before: The satellite links of this
company’s WAN weren’t replicating properly. |
|
Figure 4. After: The reworked network topology
included two new domains and addition of the SMTP protocol. |
Active Directory replication has just two available protocols: RPC and
Simple Mail Transport Protocol (SMTP). Since their links weren’t able
to support RPC replication, their only other option was to switch to SMTP
replication across the satellite connections.
First, though, we had to address some major Windows 2000-related SMTP
replication restrictions. One is that SMTP replication is only available
between sites, while RPC is the only protocol that you can use within
a site. This makes sense, since you should have plenty of bandwidth within
a site for RPC replication to work without any problems.
The most important restriction is that DCs from the same AD domain can’t
use SMTP replication. So if this company wanted to use SMTP replication,
they’d have to create a separate AD domain for every remote site that
had a satellite connection.
They weren’t particularly excited about doing this, but in order to get
their replication working and keep their satellite WAN links, they decided
it would be the only solution that made sense. Global Catalog server,
Schema, and configuration data is available through SMTP replication,
so they were still able to provide a local Global Catalog server for these
remote sites. Figure 4 shows what the SMTP replication topology looked
like.
Configuring SMTP replication was a fairly straightforward process. For
a step-by-step guide to setting it up, see “Additional
Information.”
Scene 3: Beware Consultants Who Know Nothing
A company had been working with another consultant on its AD design but
was questioning his recommendations. The consultant told them they should
have DNS installed on every DC in their environment because AD replication
wouldn’t work if you didn’t. Fortunately, I was able to help them go through
a redesign before they implemented a solution that would have been difficult
to maintain and support.
The advice they’d received was absolutely incorrect. AD was designed
to use DNS to locate services running on DCs. It shouldn’t change the
way you’d normally configure a DNS infrastructure; rather, it should just
build off what’s already in place. Many companies choose to use BIND for
DNS, which wouldn’t be running on the DCs since BIND is typically installed
on either a Unix or Linux platform. Before talking too much about their
DNS infrastructure, we revisited their AD domain design to ensure that
they knew exactly what they wanted. This is always a good idea since every
AD domain requires a DNS domain with the same name. Figure 5 shows an
example of what their AD domain structure and DNS infrastructure looked
like after following the advice of the consultant.
Deciding to create multiple forests is a big decision and one I never
take lightly. Talking with this company’s IT department convinced me that
they had good reason to have the division within their environment. The
reason for having two forests is that they had a section of the network
that was not as trusted as the rest, so they wanted those minimally trusted
domains to have limited access to the rest of the network resources. They
also wanted to ensure that the only DNS records accessible from the external
network were for resources that should be seen.
|
Figure 5. Before: This company’s proposed network
would have had DNS installed on every domain controller—not a good
idea. |
|
Figure 6. After: The redesigned DNS structure,
using “Shadow zones” for the external forest. S.P. represents a Standard
Primary zone, S.S. a Standard Secondary zone. |
They were aware that with Win2K DNS, security can only be set on AD-integrated
zone files, but they were still having trouble figuring out if their proposed
solution would work. But since the DNS records are stored in the domain
partition in AD, only DCs in the same domain can have an AD-integrated
copy of a DNS zone file. So, for example, if a DC from the public1.net
domain hosts an AD-integrated copy of the public1.net DNS domain, only
other public1.net DCs can hold AD-integrated copies of that zone.
Another interesting caveat is that a DNS server that’s also a DC can
host any DNS zone as an AD-integrated zone, including a zone that will
be hosting records for a separate AD domain.
The company was also curious about what a change to their proposed DNS
infrastructure would do to their AD replication topology. I explained
that since the external network had a separate forest, there wouldn’t
be any AD replication between the external and internal networks. I also
showed them another option that would satisfy all their requirements.
Since they were going to stick with Win2K DNS, there was really only
one feasible option to allow them to control what records were seen by
the external network: Shadow zones. When using this method, the DNS servers
in the external network actually have a primary copy of zone files used
in the internal network. The internal domain admins ensure that any records
for machines that should be seen by the external network are manually
added to the external zone file. In this situation, the number of records
was small, so it didn’t add much of an administrative burden. None of
the AD service location records was needed in the external zone files
because there wouldn’t be any replication between the two forests. Figure
6 shows the redesigned DNS infrastructure with the public1.net and public2.net
name servers hosting shadow copies of the internal zones. This allows
the internal administrators to control exactly what records they want
visible to the public network.
Scene 4: Hidden Costs
A company with multiple redundant WAN links was having trouble getting
their replication connections to work the way they wanted. The company
had connections between two of their branch offices for redundancy, but
figured that since there wasn’t much traffic going over the link it could
be used to reduce replication latency. They’d changed the costs on their
AD site links but still weren’t getting the desired result. Their main
problem was a misunderstanding of how site costs work.
Although the AD connection objects showed the connections between the
two branch offices, the replication traffic was still going over the two
T1 links. To truly see what was going on, we diagrammed their router and
site link costs in their environment (see Figure 7).
|
Figure 7. The excessive router cost between the
two branches was forcing this company's traffic through the more saturated
T1 links. |
Notice that the actual network routing cost between the two branch offices
is more than the combined cost between the branch offices and the corporate
hub. This is obviously because the network traffic has been designed to
go through the corporate site with the 256k link designed to be a backup
connection. The AD site costs show that the cost between the branch offices
is less than the combined cost between the branches and the corporate
office; however, the traffic was actually going through the corporate
office.
I’ve always felt that the costs of AD site links were one of the most
difficult concepts to understand. The costs placed on Site Links affect
only where the connection objects will connect within the replication
topology. So for example, even though the Site Link cost will ensure that
the connection objects will be directly between the DCs in the two branch
office locations, the network costs force the actual traffic through the
routers at the corporate office. One way to get the actual traffic to
go directly over the 256 link between the two branch offices would be
to change the network routing costs so that the cost between the two branch
offices was less than the combined through the corporate office. This
wasn’t optimal in the scenario, however, because that would force all
traffic between the branch offices to follow that same path. The better
way to get just the AD replication traffic to follow that path was to
add routes directly to the DCs. This was done simply by using the command
line “route add” command on the Win2K DCs in the branch offices. Normally
DCs would communicate to each other through the use of their default gateways.
The command, “route add destination ip mask 255.255.255.255 remote office
router ip”, caused the DCs to communicate across the 256K connection.
Note: The reason that the destination IP address was used and not the
subnet is because we only wanted traffic between the DCs to go across
that connection.
Additional
Information |
Read TechNet's "Active Directory Branch Office Planning
Guide Series," to learn more about AD replication components
and examples for implementing a branch office replication
topology. It's available here: TechNet home | Products
& Technologies | Active Directory | Windows 2000 Server
| Deploy | Active Directory Branch Office Guide Series
(or click
here).
You'll find useful information in the Windows 2000
Resource Kit on AD architecture here: TechNet home |
Products & Technologies | Windows 2000 Server | Resource
Kits | Windows 2000 Server Distributed Systems Guide
(or click
here).
To learn more about configuring SMTP replication, visit
TechNet home | Products & Technologies | Windows 2000
Server | How-To Resources | Step-by-Step Guide to Setting
up ISM-SMTP Replication (or click
here).
|
|
|
Replication Gratification
I’ve faced many challenges in the last couple of years working with AD.
Every company I’ve worked with has had a unique environment, and I’m never
surprised to see something I haven’t before. I hope that these tales will
help you along your path to a smoothly replicating AD environment.