In-Depth

Storage Strategies for a New Exchange

How storage needs have changed in Exchange Server 2007

Planning storage for Exchange Server databases has always been something of an art form. Databases have to be arranged in a way that reduces the chances of a catastrophic failure, while maintaining an acceptable level of performance.

In smaller organizations this might simply mean keeping databases and transaction logs on separate disks. In larger organizations though, ensuring performance and resilience often means investing in a SAN.

Whether you are planning a large or small scale Exchange Server deployment though, one of your most important considerations will be capacity planning. In this article I show you how to calculate your storage needs.

Changes to Exchange Server Databases
In spite of early rumors that Exchange Server 2007 was going to do away with the JET database format and use SQL Server databases instead, Microsoft chose to continue using JET in Exchange Server 2007. Although Exchange 2007 retains the same basic database format as its predecessors, there have been some changes made to JET that will ultimately affect the way that you have to plan for storage.

Determining Required Disk Space
There is one storage consideration that holds true for all Exchange Server deployments, simple or complex: The databases must reside on a separate disk from the transaction logs. Not only does keeping the databases and transaction logs isolated from each other improve performance, it also ensures that data that has not yet been backed up will not be lost if the drive containing the databases was to fail.

Once you have accepted the fact that transaction logs and databases need to be stored separately, then the next logical question is how much disk space you are going to need for each. Fortunately, Microsoft gives us some simple formulas for determining how much disk space will be needed.

Calculating Space for the Database Disk
Determining how much space you are going to need on the disk that stores your Exchange Database is more complicated than you might initially assume. It seems logical enough that if you multiply the number of anticipated mailboxes by the disk quota that you plan on imposing on each mailbox, you will have derived the maximum database size and will therefore know how much disk space will be required. This method won't even give you close to a correct answer though.

Multiplying the number of anticipated mailboxes by the mailbox space quota does give us a good starting point though. However, you must also take into account overhead caused by things like receiving or deleting messages and the nightly maintenance process.

In order to be able to calculate how much disk space will be consumed by overhead, you need to have an idea of how much mail users send and receive each day on average. To see how this process works, let's pretend that you have 100 users and have set a 250 MB mailbox quota for each user. Let's also assume that each user sends and receives a total of about 15 MB of messages each day on average. Of course in the real world you would also want to plan for future growth, not just account for the users that you have now.

Database Dumpster
With this in mind, the maximum amount of space that can be consumed by messages is about 25 GB (250 MB multiplied by 100 mailboxes). We also need to account for the amount of space that will be consumed by the dumpster though. In case you are unfamiliar with the dumpster it is used as a temporary repository for messages that have been deleted. By default, deleted messages will be retained in the dumpster for 14 days after deletion.

The amount of space consumed by the dumpster will vary for a while because users may not delete the same amount of messages each day. Eventually we have to assume that each user will ultimately reach their storage quota. At that point, each user must delete enough old messages each day to make room for the new messages that are coming in.

To see how much space could potentially be consumed by the dumpster, let's go back to my earlier example in which an Exchange 2007 server contains 100 mailboxes and each mailbox has a 250 MB storage quota. Since we said that users send and receive a combined total of 15 MB worth of messages a day, let's pretend that the received messages account for about 10 MB per user and that sent messages account for the other 5 MB per user. With this in mind, you can determine the maximum amount of space that should theoretically be consumed by the dumpster by taking the daily total of received mail, multiplying it by the total number of mailboxes, and then multiplying your result by the fourteen day retention period (10 MB * 100 mailboxes * 14 days)

The result is that about 14 GB of space could be consumed by the dumpster. Being that the total amount of space consumed by messages is only 25 GB, the dumpster's size could be up to 56% of the total size of the database. In the real world the overhead percentage will vary depending on the dumpster's retention period, your inbound mail volume, and your disk quotas. It is also possible for the dumpster to temporarily exceed its estimated size if a user unexpectedly deletes a large number of messages.

White Space
Another factor that you must consider when planning the amount of disk space to dedicate to your databases is the amount of whitespace that exists within the database. White space is created when messages are deleted from the database. When a message is deleted, an empty database page is left behind. This empty page consumes disk space even though it doesn't actually contain data. When the maintenance process runs at night, the database is defragmented and whitespace is grouped together, but not removed. The only way to get rid of white space is to perform an offline defragmentation (which is not usually recommended).

The amount of whitespace in a database is constantly changing, but you can estimate the maximum amount of space consumed by whitespace by multiplying the number of mailboxes on the server by the average amount of mail sent and received by each user. Therefore, if you had 100 mailboxes and each user sent and received a combined total of 15 MB worth of messages each day on average, then about 1.5 GB of disk space could be consumed by white space.

Indexing
In an Exchange 2003 environment, I usually advised people not to use content indexing because it consumed a tremendous amount of system resources. In fact, the index itself was 35 to 45 percent of the total size of the database. In Exchange Server 2007 though, the index has been completely redesigned and only adds about five percent to the size of the database that is being indexed.

The Fluff Factor
One last thing that you need to account for when planning your database's size is something known as the fluff factor. The fluff factor is just Microsoft speak for the unknown. The basic idea is that your database will sometimes swell beyond its normal size for reasons beyond your control. For example, earlier I mentioned that the dumpster size could increase if a user unexpectedly deleted an excessive number of messages. In order to account for the fluff factor, Microsoft recommends adding 20% to your projected database size.

Maintenance and Disaster Recovery
So far I have discussed the primary factors that contribute to a database's physical size. The amount of disk space that you are going to need is going to be different from the database's size though. There are some maintenance and disaster recovery processes that require a temporary copy of the database to be created. An example of such a process is the offline defragmentation that I mentioned earlier. To account for any possible maintenance or disaster recovery processes that need to be performed, you must ensure that the disk housing your database has room for two copies of the database, plus 10% for overhead.

Calculating Database Size Requirements
As you can see, there are a lot of factors that contribute to the amount of disk space that your databases are going to consume. We already calculated that only about 25 GB would be consumed by user mailbox data, but let's see how much disk space this 25 GB of data will actually require:

Factor How it's Calculated Result
Mailbox and Messages Mailboxes multiplied by mailbox quota 25 GB
Database Dumpster Messages Received multiplied by mailboxes multiplied by 14 days 14 GB
White Space Number of mailboxes multiplied by daily mail volume 1.5 GB
Index 5% of the database size 2.02 GB
Fluff Factor 20% of the cumulative total so far 8.5 GB
Maintenance Capacity Double the database size and add 10% 61.2 GB
Total Disk Space Needed
112.22 GB

Remember Those Log Files
Now that I have shown you how to calculate the space required to support Exchange Server databases, let's take a look at the log files. Determining the space requirements for log files isn't nearly as complicated as figuring out how much space will be needed by databases. All you really need to know is that every transaction is initially written to a log file, and that log files are purged when the database is backed up.

To see how this works, let's go back to our earlier example in which there were a hundred mailboxes and each user sent and received a combined total of 15 MB worth of messages on average each day. All we would have to do to figure out how much space would be consumed by log files is to multiply 15 MB by 100 users. As such, the log files would consume about 1.5 GB of space.

Before you settle on such a small amount of disk space though, there are two things that you need to keep in mind. First, log files are not purged if the backup fails. Therefore, I recommend dedicating enough disk space to accommodate a week's worth of logs just in case you ever have backup problems. That would mean that your log file volume would need about 10.5 GB of space.

The other thing to keep in mind is that any databases in a storage group share a common set of log files. Therefore, you will need to account for the combined total log file needs of each database in the storage group when planning log file capacity.

Is the Database Too Large?
The last thing that I want to talk about is database capacity. Even though the database's capacity might technically be unlimited, there is a practical limit to databases sizes, after which performance begins to suffer. Microsoft recommends that databases not exceed 100 GB in size unless some form of continuous replication is being used. If Local Continuous Replication or Cluster Continuous Replication is being used, then Microsoft recommends keeping Exchange Server databases under 200 GB in size.

One thing that I want to clarify is that this recommendation applies to the database itself, not to the amount of disk space set aside for the database. When we were estimating disk space requirements with our earlier example, we estimated that just over 112 GB of disk space would be required. This does not exceed the 100 GB threshold though because the database itself is only 51.02 GB in size. The remaining 61.2 GB is just empty disk space that has been set aside for use in maintenance or disaster recovery situations.

If you did determine that your database was too large, then you would want to create another database and move some mailboxes to it. Although a single storage group can hold up to 50 databases (5 in the Standard Edition), you are usually better off using a separate storage group for each database. Of course you will also want to dedicate separate disks for each database and for each set of transaction logs.

Sharing a disk between multiple databases completely defeats the purpose of keeping the databases small. If your database is too large though and your server can't accommodate any more disks, one large database will typically perform better than multiple small databases that share a common disk.

Room To Grow
As you can see, there are a number of factors to consider when planning your Exchange Server's storage capacity. Even so, it is worth working through the calculations because running low on disk space can cause performance problems and could potentially cause difficulties in disaster recovery situations.

comments powered by Disqus
Most   Popular