In-Depth
Storage Strategies for a New Exchange
How storage needs have changed in Exchange Server 2007
Planning storage for Exchange Server databases has always been something
of an art form. Databases have to be arranged in a way that reduces the
chances of a catastrophic failure, while maintaining an acceptable level
of performance.
In smaller organizations this might simply mean keeping databases and
transaction logs on separate disks. In larger organizations though, ensuring
performance and resilience often means investing in a SAN.
Whether you are planning a large or small scale Exchange Server deployment
though, one of your most important considerations will be capacity planning.
In this article I show you how to calculate your storage needs.
Changes to Exchange Server Databases
In spite of early rumors that Exchange Server 2007 was going to do away
with the JET database format and use SQL Server databases instead, Microsoft
chose to continue using JET in Exchange Server 2007. Although Exchange
2007 retains the same basic database format as its predecessors, there
have been some changes made to JET that will ultimately affect the way
that you have to plan for storage.
Determining Required Disk Space
There is one storage consideration that holds true for all Exchange Server
deployments, simple or complex: The databases must reside on a separate
disk from the transaction logs. Not only does keeping the databases and
transaction logs isolated from each other improve performance, it also
ensures that data that has not yet been backed up will not be lost if
the drive containing the databases was to fail.
Once you have accepted the fact that transaction logs and databases need
to be stored separately, then the next logical question is how much disk
space you are going to need for each. Fortunately, Microsoft gives us
some simple formulas for determining how much disk space will be needed.
Calculating Space for the Database Disk
Determining how much space you are going to need on the disk that stores
your Exchange Database is more complicated than you might initially assume.
It seems logical enough that if you multiply the number of anticipated
mailboxes by the disk quota that you plan on imposing on each mailbox,
you will have derived the maximum database size and will therefore know
how much disk space will be required. This method won't even give
you close to a correct answer though.
Multiplying the number of anticipated mailboxes by the mailbox space
quota does give us a good starting point though. However, you must also
take into account overhead caused by things like receiving or deleting
messages and the nightly maintenance process.
In order to be able to calculate how much disk space will be consumed
by overhead, you need to have an idea of how much mail users send and
receive each day on average. To see how this process works, let's
pretend that you have 100 users and have set a 250 MB mailbox quota for
each user. Let's also assume that each user sends and receives a
total of about 15 MB of messages each day on average. Of course in the
real world you would also want to plan for future growth, not just account
for the users that you have now.
Database Dumpster
With this in mind, the maximum amount of space that can be consumed by
messages is about 25 GB (250 MB multiplied by 100 mailboxes). We also
need to account for the amount of space that will be consumed by the dumpster
though. In case you are unfamiliar with the dumpster it is used as a temporary
repository for messages that have been deleted. By default, deleted messages
will be retained in the dumpster for 14 days after deletion.
The amount of space consumed by the dumpster will vary for a while because
users may not delete the same amount of messages each day. Eventually
we have to assume that each user will ultimately reach their storage quota.
At that point, each user must delete enough old messages each day to make
room for the new messages that are coming in.
To see how much space could potentially be consumed by the dumpster,
let's go back to my earlier example in which an Exchange 2007 server
contains 100 mailboxes and each mailbox has a 250 MB storage quota. Since
we said that users send and receive a combined total of 15 MB worth of
messages a day, let's pretend that the received messages account
for about 10 MB per user and that sent messages account for the other
5 MB per user. With this in mind, you can determine the maximum amount
of space that should theoretically be consumed by the dumpster by taking
the daily total of received mail, multiplying it by the total number of
mailboxes, and then multiplying your result by the fourteen day retention
period (10 MB * 100 mailboxes * 14 days)
The result is that about 14 GB of space could be consumed by the dumpster.
Being that the total amount of space consumed by messages is only 25 GB,
the dumpster's size could be up to 56% of the total size of the database.
In the real world the overhead percentage will vary depending on the dumpster's
retention period, your inbound mail volume, and your disk quotas. It is
also possible for the dumpster to temporarily exceed its estimated size
if a user unexpectedly deletes a large number of messages.
White Space
Another factor that you must consider when planning the amount of disk
space to dedicate to your databases is the amount of whitespace that exists
within the database. White space is created when messages are deleted
from the database. When a message is deleted, an empty database page is
left behind. This empty page consumes disk space even though it doesn't
actually contain data. When the maintenance process runs at night, the
database is defragmented and whitespace is grouped together, but not removed.
The only way to get rid of white space is to perform an offline defragmentation
(which is not usually recommended).
The amount of whitespace in a database is constantly changing, but you
can estimate the maximum amount of space consumed by whitespace by multiplying
the number of mailboxes on the server by the average amount of mail sent
and received by each user. Therefore, if you had 100 mailboxes and each
user sent and received a combined total of 15 MB worth of messages each
day on average, then about 1.5 GB of disk space could be consumed by white
space.
Indexing
In an Exchange 2003 environment, I usually advised people not to use content
indexing because it consumed a tremendous amount of system resources.
In fact, the index itself was 35 to 45 percent of the total size of the
database. In Exchange Server 2007 though, the index has been completely
redesigned and only adds about five percent to the size of the database
that is being indexed.
The Fluff Factor
One last thing that you need to account for when planning your database's
size is something known as the fluff factor. The fluff factor is just
Microsoft speak for the unknown. The basic idea is that your database
will sometimes swell beyond its normal size for reasons beyond your control.
For example, earlier I mentioned that the dumpster size could increase
if a user unexpectedly deleted an excessive number of messages. In order
to account for the fluff factor, Microsoft recommends adding 20% to your
projected database size.
Maintenance and Disaster Recovery
So far I have discussed the primary factors that contribute to a database's
physical size. The amount of disk space that you are going to need is
going to be different from the database's size though. There are
some maintenance and disaster recovery processes that require a temporary
copy of the database to be created. An example of such a process is the
offline defragmentation that I mentioned earlier. To account for any possible
maintenance or disaster recovery processes that need to be performed,
you must ensure that the disk housing your database has room for two copies
of the database, plus 10% for overhead.
Calculating Database Size Requirements
As you can see, there are a lot of factors that contribute to the amount
of disk space that your databases are going to consume. We already calculated
that only about 25 GB would be consumed by user mailbox data, but let's
see how much disk space this 25 GB of data will actually require:
Factor |
How it's Calculated |
Result |
Mailbox and Messages |
Mailboxes multiplied by mailbox quota |
25 GB |
Database Dumpster |
Messages Received multiplied by mailboxes multiplied
by 14 days |
14 GB |
White Space |
Number of mailboxes multiplied by daily mail volume |
1.5 GB |
Index |
5% of the database size |
2.02 GB |
Fluff Factor |
20% of the cumulative total so far |
8.5 GB |
Maintenance Capacity |
Double the database size and add 10% |
61.2 GB |
Total Disk Space Needed |
112.22 GB |
|
|
Remember Those Log Files
Now that I have shown you how to calculate the space required to support
Exchange Server databases, let's take a look at the log files. Determining
the space requirements for log files isn't nearly as complicated
as figuring out how much space will be needed by databases. All you really
need to know is that every transaction is initially written to a log file,
and that log files are purged when the database is backed up.
To see how this works, let's go back to our earlier example in which
there were a hundred mailboxes and each user sent and received a combined
total of 15 MB worth of messages on average each day. All we would have
to do to figure out how much space would be consumed by log files is to
multiply 15 MB by 100 users. As such, the log files would consume about
1.5 GB of space.
Before you settle on such a small amount of disk space though, there
are two things that you need to keep in mind. First, log files are not
purged if the backup fails. Therefore, I recommend dedicating enough disk
space to accommodate a week's worth of logs just in case you ever
have backup problems. That would mean that your log file volume would
need about 10.5 GB of space.
The other thing to keep in mind is that any databases in a storage group
share a common set of log files. Therefore, you will need to account for
the combined total log file needs of each database in the storage group
when planning log file capacity.
Is the Database Too Large?
The last thing that I want to talk about is database capacity. Even though
the database's capacity might technically be unlimited, there is
a practical limit to databases sizes, after which performance begins to
suffer. Microsoft recommends that databases not exceed 100 GB in size
unless some form of continuous replication is being used. If Local Continuous
Replication or Cluster Continuous Replication is being used, then Microsoft
recommends keeping Exchange Server databases under 200 GB in size.
One thing that I want to clarify is that this recommendation applies
to the database itself, not to the amount of disk space set aside for
the database. When we were estimating disk space requirements with our
earlier example, we estimated that just over 112 GB of disk space would
be required. This does not exceed the 100 GB threshold though because
the database itself is only 51.02 GB in size. The remaining 61.2 GB is
just empty disk space that has been set aside for use in maintenance or
disaster recovery situations.
If you did determine that your database was too large, then you would
want to create another database and move some mailboxes to it. Although
a single storage group can hold up to 50 databases (5 in the Standard
Edition), you are usually better off using a separate storage group for
each database. Of course you will also want to dedicate separate disks
for each database and for each set of transaction logs.
Sharing a disk between multiple databases completely defeats the purpose
of keeping the databases small. If your database is too large though and
your server can't accommodate any more disks, one large database
will typically perform better than multiple small databases that share
a common disk.
Room To Grow
As you can see, there are a number of factors to consider when planning
your Exchange Server's storage capacity. Even so, it is worth working
through the calculations because running low on disk space can cause performance
problems and could potentially cause difficulties in disaster recovery
situations.