All Broken Up -- Microsoft Certified Professional Magazine Online

NTFS is designed to be efficient, but it isn’t foolproof. To avoid seriously fragmented disks, you’ll need careful planning and regular maintenance.

All Broken Up

NTFS is designed to be efficient, but it isn’t foolproof. To avoid seriously fragmented disks, you’ll need careful planning and regular maintenance.

By Michael Chacon
07/01/1999

We spend unconscionable amounts of money making sure that Windows NT has all the resources it needs—and it needs plenty. Tons of RAM, multiple fast processors, striped RAID sets with fast, fat, and wide SCSI drives are all important components of a speed machine. But there other things you can do to keep performance up to snuff without blowing your budget. For example, one of the most effective yet unexciting tasks you can regularly perform on your NT machines is to defragment the disk drives.

In many cases you probably already have a disk defragmentation program installed. If you don’t, you need to correct that mistake immediately. Regardless of your current situation, I’m going to discuss some of the low-level details of why defragmentation is important and how that process is accomplished.

Inside NTFS

Before I discuss fragmentation solutions, let’s review how data is allocated to disks in NT. I hope all your volumes are formatted with NTFS rather than FAT, for reasons I’ve outlined in these pages many times. As with most file systems, NTFS is contained in a volume, which is a logical partition on a physical disk—and, of course, there can be multiple partitions on one disk. Unlike FAT, which contains areas specifically formatted for use by the various components of the file system, NTFS stores all system files, including the Master File Table (MFT) and the bootstrap file, as ordinary files.

As with the FAT file system, NTFS uses clusters to allocate disk space. The size of the cluster is determined during the format process and can range from 512 bytes up to 64K. The default cluster size for most disks today is 4K to support large partitions, avoid wasting disk space, and minimize disk fragmentation (see Table 1). Also, keep in mind that NTFS file compression isn’t supported on any partition with a cluster size greater than 4K.

Table 1. Default cluster sizes in NTFS

NTFS volume size	Default Cluster size
Up to 512M	512 bytes (or the sector size if > 512 bytes
Larger than 512M and up to 1G	1K
Larger than 1M and up to 2G	2K
Larger than 2G	4K

Here’s an extreme example to illustrate the point. Let’s say you have 5,000 files that are each 2K in size. On a partition with 2K clusters, they’d consume 10M of disk space (multiply 5,000 by 2,000 and you get 10M), with each file fitting neatly in each cluster. Theoretically, there wouldn’t be any wasted space or fragmentation. If you copied those same files to a partition with 64K clusters, they’d be allocated in 320M (5,000 x 64,000 = 320M) of disk space with no cluster fragmentation but with massive internal fragmentation, otherwise known as wasted space. NTFS doesn’t concern itself with sector sizes and uses a minimum of one complete cluster for each file, hence the wasted space in the example. The sector size for hard drives is determined when the drive is originally low-level formatted and the tracks on the disk are broken up into sectors.

Avoiding Disk Fragmentation

On the other hand, if you use the same two partitions and one 10M file, you’ll have something else to consider: fragmentation. With the 64K cluster size, the 10M file will be allocated just under 160 clusters, while the 2K cluster partition would allocate a whopping 5,000 clusters. The more clusters needed to store a file, the more likely the clusters won’t be located contiguously on the partition. This lack of continuity means that the read/write head of the physical disk has to move more often to access any given file.

Because the read/write operation of a disk drive is the slowest point in the disk access process, keeping file fragmentation to a minimum can play a significant role in system performance. When reading a sequential file in one physical read operation, the system can use read-ahead to extract more of the file’s data and keep it in cache for later retrieval. Extracting this data from cache the next time it’s needed is much faster than performing another physical read. Obviously, in the real world systems don’t have uniformly sized files, but you get the point. The lack of uniformity in file size makes choosing the cluster size to avoid fragmentation a very poor strategy.

Making things more complex for us but more flexible for the file system, there are two types of clusters within NTFS: Logical Cluster Numbers (LCNs) and Virtual Cluster Numbers (VCNs). The LCNs are directly mapped to a physical disk address by multiplying the cluster size of the partition by a given sequential LCN. This provides an offset measure in the number of bytes that the disk driver uses to read and write data—very low-level stuff. VCNs map individual files to LCNs using a series of sequential numbers incremented for as many clusters as needed to contain the file. NTFS uses VCNs to store files, and then VCNs use LCNs to allocate the information to the disk.

Consistency is Key

The core of any NTFS volume is the Master File Table (MFT), which is implemented as a file containing an array of 1K records, regardless of sector size, and each of which represents a file within the partition. Each 1K segment of the array contains attributes for the file, such as the security descriptor, filenames, timestamp, and interestingly enough, the data. I call this interesting because storing the data as just another file attribute helps give NTFS a consistent architecture. If the data fits within the 1K record, it’s stored in the MFT and referred to as a resident attribute. Obviously very few files are this span, so there’s also a nonresident attribute, otherwise referred to as a “run,” that’s stored in the next available clusters. As a file grows in size, more runs are allocated to contain the additional data. Although this process is usually associated with data files, any attribute that can grow is handled in the same manner. For example, if many users have permissions to files individually rather than through group membership, the Discretionary Access Control Lists (DACLs) can grow too large to remain resident, in which case they’ll be allocated in a run.

Another example of non-resident file attributes being stored in runs is a directory with a large number of files. Directories are listed in the MFT like other files except that they have an index root attribute containing a list of the files associated with the directory. If the index of files can’t be contained in the MFT record, a run is created to allocate the overflowing information in as many clusters as necessary to contain the filenames and their associated VCN-to-LCN mappings. Such a consistent approach to treating all information as attributes and any increasing information as runs helps NTFS remain flexible as different data types are created for future applications. Regardless of its source or destination, data is simply stored in attribute streams. NTFS doesn’t need to be concerned with data types—it leaves that issue to higher-level application processes.

Metadata Files

Along with the MFT is another set of files that complete the NTFS structure: metadata files. These files use a $filename naming convention, and each has a particular function in the file system. During the NT boot process, the kernel loads all the device drivers, including the NTFS file system driver. During the volume mounting process, the NTFS system driver looks for the $Boot file, which contains the bootstrap code. The $Boot file is created during the formatting process and is located at a specific disk address. This file locates the physical disk address of the $MFT, which contains the VCN-to-LCN information, to obtain all of the MFT file attributes and MFT runs. The first record in the MFT contains the attributes of the MFT. In this manner the MFT first references itself, then all other files in the partition. The second record in the MFT contains the attributes of a partial copy of the MFT, called $MFTMirr, which is a file placed in the middle of the partition away from the MFT for redundancy purposes. Because these are normal files, you can see them with the DIR command (see Figure 1).

Figure 1. Metadata files use a $filename naming convention and can be viewed using the DIR command.

You can use the $MFTMirr file to locate the metadata files if the MFT is somehow corrupt or missing. By implementing the MFT as a normal file that references itself, NTFS eliminates the need to locate it in any particular area of the partition. This means that NTFS can relocate the MFT file if it encounters a bad cluster or other disk error. Two other interesting files are the $BadClus file, which keeps a record of bad clusters on the disk; and $Volume, which records the name, NTFS version number, and corrupted disk bit—meaning it requires CHKDSK to be run against it.

Additional Information

A great reference that delves into the NTFS internals even further is David A. Solomon's Inside Windows NT, Second Edition, Microsoft Press, ISBM 1-57231-677-2. Chapter 9 covers NTFS.

Protecting System Files

One of the most compelling architectural benefits of NTFS is its ability to provide transaction-based recovery. This doesn’t extend to the user’s data files, but it does protect the NTFS system files. This means that if the system has a power failure or otherwise comes crashing down, the partition will always be in a consistent state and ready to offer a useful file system to the operating system. Applications can also work to protect user data by periodically flushing the cache to the same log file the system uses.

The transaction-based recovery process is managed by the Log File Service (LFS), part of the NTFS device driver. Each time an NTFS volume is mounted and then accessed by an application, the partition goes through a recovery process where unresolved I/O transactions are either completed or rolled back to the last known consistent state, based on information contained in a transaction log.

To accomplish this, every five seconds the NTFS driver writes a checkpoint record into a metadata file called $Logfile, marking the entry of update records that are copies of two tables of transaction information (see Figure 2). One is the dirty page table, which contains changes to the file structure that haven’t been written to the disk. The other is the transaction table, which is a record of all disk transactions that are underway but haven’t been completed.

Figure 2. The $Logfile metadata file contains update records that are copies of two tables of transaction information: file structure changes that haven't been written to the disk, and disk transactions that are underway but not complete.

During the recovery process the LFS can either redo the steps that make up a complete transaction or undo a partial set of steps of an uncompleted transaction. The LFS knows whether to redo or undo the transaction based on the existence of a record that declares a transaction complete or, in database terminology, “committed.” If there’s no record declaring a transaction committed, the LFS will undo each step recorded by the $Logfile in the reverse order of operation to rollback the transaction. In either case, the file structure will be in a consistent, usable state. This process of creating transaction records in the log file occurs whenever performing operations such as creating, deleting, renaming, setting security permissions, or making any other type of change to the file system attributes.

Next Month: Tools of the Trade

As you can see, the NTFS environment is a busy and complex place. Although the architecture of the file system is designed to be efficient, the basis of the allocation of disk space is still fundamentally at the cluster level. Because of this, as files are deleted, expanded, and otherwise altered, the MFT runs that keep track of the data attributes can be scattered all over the disk in fragments that can have a decided impact on I/O performance. Based on this understanding of how the NTFS functions, next month I’ll discuss specific tools that help manage this problem, and I’ll show how they actually work with the NTFS.

Register! Top 5 Hybrid AD Management Mistakes and How to Avoid Them