To see a great web site on the internals of the VFAT file system, check http://www.pcguide.com/ref/hdd/file/fat-c.html
Overall Structure
Master |
File |
File |
Root |
All Other Data ... The Rest of the Disk |
Master Boot Record (MBR)
When you turn on your PC, the processor has to begin processing. However, your system memory is empty, and the processor doesn't have anything to execute, or really even know where it is. To ensure that the PC can always boot regardless of which BIOS is in the machine, chip makers and BIOS manufacturers arrange so that the processor, once turned on, always starts executing at the same place, FFFF0h. This is discussed in much more detail here.
In a similar manner, every hard disk must have a consistent "starting point" where key information is stored about the disk, such as how many partitions it has, what sort of partitions they are, etc. There also needs to be somewhere that the BIOS can load the initial boot program that starts the process of loading the operating system. The place where this information is stored is called the master boot record (MBR). It is also sometimes called the master boot sector or even just the boot sector.
The master boot record is always located at cylinder 0, head 0, and sector 1, the first sector on the disk. This is the consistent "starting point" that the disk always uses. When the BIOS boots the machine, it will look here for instructions and information on how to boot the disk and load the operating system. The master boot record contains the following structures:
Master Partition Table: This small table contains the descriptions of the partitions that are contained on the hard disk. There is only room in the master partition table for the information describing four partitions. Therefore, a hard disk can have only four true partitions, also called primary partitions. Any additional partitions are logical partitions that are linked to one of the primary partitions. Partitions are discussed here. Master Boot Code: The master boot record contains the small initial boot program that the BIOS loads and executes to start the boot process. This program eventually transfers control to the boot program stored on whichever partition is used for booting the PC. This code is not used when just accessing a disk, only on boot up.
Due to the great importance of the information stored in the master boot record, if it ever becomes damaged or corrupted in some way, serious data loss can be--in fact, often will be--the result. Since the master boot code is the first program executed when you turn on your PC, this is a favorite place for virus writers to target.
Volume Boot Sectors
Each DOS partition (also called a DOS volume) has its own volume boot sector. This is distinct from the master boot sector (or record) that controls the entire disk, but is similar in concept. Each volume boot sector contains the following:
Disk Parameter Block: Also
sometimes called the media parameter block, this is a data table that
contains specific information about the volume, such as its
specifications (size, number of sectors it contains, etc.), label
name, etc.
Volume Boot Code: This is code that is specific
to the operating system that is using this volume and is used to
start the load of the operating system. This code is called by the
master boot code that is stored in the master boot record, but only
for the primary partition that is set as active. For other
partitions, this code sits unused.
The volume boot sector is created when you do a high-level format of a hard disk partition. The boot sector's code is executed directly when the disk is booted, making it a favorite target for virus writers.
File Allocation Tables
The structure that gives the FAT file system its name is the file allocation table. In order to understand what this important table does, you must first understand how space on the hard disk is allocated under DOS (and its derivatives that also use FAT).
While data is stored in 512-byte sectors on the hard disk, for performance reasons individual sectors are not normally allocated to files. The reason is that it would take a lot of overhead (time and space) to keep track of pieces of files that were this small. The hard disk is instead broken into larger pieces called clusters, or alternatively, allocation units. Each cluster contains a number of sectors. Typically, clusters range in size from 2,048 bytes to 32,768 bytes, which corresponds to 4 to 64 sectors each. Clusters and how they work are described in full detail in this section.
The file allocation table is where information about clusters is stored. Each cluster has an entry in the FAT that describes how it used. This is what tells the operating system which parts of the disk are currently used by files, and which are free for use. The FAT entries are used by the operating system to chain together clusters to form files.
The file allocation tables are stored in the area of the disk immediately following the volume boot sector. Each volume actually contains two identical copies of the FAT; ostensibly, the second one is meant to be a backup of sorts in case of any damage to the first copy. Damage to the FAT can of course result in data loss since this is where the record is kept of which parts of the disk contain which files. The problem with this built-in backup is that the two copies are kept right next to each other on the disk, so that in the event that for example, bad sectors develop on the disk where the first copy of the FAT is stored, chances are pretty good that the second copy will be affected as well.
File Chaining and FAT Cluster Allocation
The file allocation table (FAT) is used to keep track of which clusters are assigned to each file. The operating system (and hence any software applications) can determine where a file's data is located by using the directory entry for the file and file allocation table entries. Similarly, the FAT also keeps track of which clusters are open and available for use. When an application needs to create (or extend) a file, it requests more clusters from the operating system, which finds them in the file allocation table.
There is an entry in the file allocation table for each cluster used on the disk. Each entry contains a value that represents how the cluster is being used. There are different codes used to represent the different possible statuses that a cluster can have.
Every cluster that is in use by a file has in its entry in the FAT a cluster number that links the current cluster to the next cluster that the file is using. Then that cluster has in its entry the number of the cluster after it. The last cluster used by the file is marked with a special code that tells the system that it is the last cluster of the file; this is often a number like 65,535 (16 ones in binary format). Since the clusters are linked one to the next in this manner, they are said to be chained. Every file (that uses more than one cluster) is chained in this manner. See the example that follows for more clarification.
In addition to a cluster number or an end-of-file marker, a cluster's entry can contain other special codes to indicate its status. A special code, usually zero, is put in the FAT entry of every open (unused) cluster. This tells the operating system which clusters are available for assignment to files that need more storage space. Another code is used to indicate "bad" clusters. These are clusters where a disk utility (or the user) has previously detected one or more unreliable sectors, due to disk defects. These clusters are marked as bad so that no future attempts will be made to use them.
Accessing the entire length of a file is done by using a combination of the file's directory entry and its cluster entries in the FAT. This is confusing to describe, so let's look at an example. Let's consider a disk volume that uses 4,096 byte clusters, and a file in the C:\DATA directory called "PCGUIDE.HTM" that is 20,000 bytes in size. This file is going to require 5 clusters of storage (because 20,000 divided by 4,096 is around 4.88).
OK, so we have this file on the disk, and let's say we want to open it up to edit it. We open our editor and ask for the file to be opened. To find the cluster on the disk containing the first part of the file, the system just looks at the file's directory entry to find the starting cluster number for the file; let's suppose it goes there and sees the number 12,720. The system then know to go to cluster number 12,720 on the disk to load the first part of the file.
To find the second cluster used by this file, the system looks at the FAT entry for cluster 12,720. There, it will find another number, which is the next cluster used by the file. Let's say this is 12,721. So the next part of the file is loaded from cluster 12,721, and the FAT entry for 12,721 is examined to find the next cluster used by the file. This continues until the last cluster used by the file is found. Then, the system will check the FAT entry to find the number of the next cluster, but instead of finding a valid cluster number, it will find a special number like 65,535 (special because it is the largest number you can store in 16 bits). This is the signal to the system that "there are no more clusters in this file". Then it knows it has retrieved the entire file.
Since every cluster is chained to the next one using a number, it isn't necessary for the entire file to be stored in one continuous block on the disk. In fact, pieces of the file can be located anywhere on the disk, and can even be moved after the file has been created. Following these chains of clusters on the disk is done invisibly by the operating system so that to the user, each file appears to be in one continuous chunk of disk space.
Internal Directory Structures
Every file on the system is stored in a directory. A directory is nothing more than a file itself, except that it is specially structured and marked on the disk so that it has special meaning. A directory is a table that contains information about files (and subdirectories) that it contains, and links to where the file (or subdirectory) data begins on the disk. The paper analogy would be a table of contents to a book, except that directories of course use a hierarchical tree structure and books do not.
Each entry in a directory is 32 bytes in length, and stores the following information:
File Name and Extension: This is the 11-character name of the file using the conventional 8.3 DOS file naming standard, for example, COMMAND.COM. Note that the "dot" in "COMMAND.COM" is implied and not actually stored on the disk. See here for more on file naming and also on VFAT long file names, which use a special structure. The file name field is also used to indicate directory entries that have been deleted.
File Attribute Byte: There are several different attributes which the operating system uses to give special treatment to certain files; these are stored in a single byte in each directory entry. Theseattributes are discussed in detail here. Note that it is one of these file attributes that indicates whether anentry in the directory represents a "real" file, or a subdirectory.
Last Change Date/Time: There is a space for each file to indicate the date and time that it was createdor modified. You should know that these fields can be arbitrarily modified by any program to be whatever they want, so this date/time shouldn't be taken too religiously. I occasionally am asked if the date/time on a file can be used to prove when someone did something or not on their PC. It cannot, because it's too easy to change this information.
File Size: The size of the file in bytes.
Link to Start Cluster: The number of the cluster that starts the file (or subdirectory) is stored in the directory. This is what allows the operating system to find a file when it is needed, and how all the different files and directories are linked together on the disk. See here for more on cluster chaining.
Every regular directory on the disk has two special entries, that refer to the directory itself and to the parent directory. These are named "." (single dot) and ".." (double dot) respectively. These entries are used for navigation purposes; if you type "chdir .." then DOS will change your current directory to the parent of the one you were in.
Root Directory and Regular Directories
The directory at the "base" of the directory structure that defines the logical tree that organizes files on a hard disk is the root directory. The root directory is special because it follows special rules that do not apply to the other, "regular" directories on the hard disk.
There can only be one root directory for any disk volume; obviously, having more than one would result in chaos, and there isn't any need to have more than one anyway. In order to "anchor" the directory tree, the root directory is fixed in place at the start of the DOS volume. It is located directly below the two copies of the FAT, which is itself directly below the other key disk structures. This contrasts with regular (sub) directories, which can be located anywhere on the disk.
In addition to being fixed in location, the root directory is also fixed in size. Regular directories can have an arbitrary size; they use space on the disk much the way files do, and when more space is needed to hold more entries, the directory can be expanded the same way a file can. The root directory is limited to a specific number of entries because of its special status. The number of entries that the root directory can hold depends on the type of volume:
Volume Type
Maximum Number of Root Directory Entries
360KB 5.25"
Floppy Disk 112
720KB 3.5" Floppy Disk
112
1.2MB 5.25" Floppy Disk 224
1.44MB 3.5"
Floppy Disk 224
2.88MB 3.5" Floppy Disk
448
Hard Disk 512
Note that the newer FAT32 version of the FAT file system does not have the restriction on placement and size of the root directory. In this enhancement the root directory is treated like a regular directory and can be relocated and expanded in size like any other.
There are a couple of other special things about the root directory. One is that it cannot be deleted; the reason for this I would think to be obvious. Also, the root directory has no parent, since it is at the top of the tree structure. The root directory still contains a ".." entry, but instead of pointing to the cluster number of the parent directory like a regular directory's parent entry, it contains a null value (zero).
Long File Names
Until the release of Windows 95, all file names using DOS or Windows 3.x were limited to the standard eight character file name plus three character file extension. This restriction tends to result in users having to create incredibly cryptic names, and having the situation still like this 15 years after the PC was invented seemed laughable, especially with Microsoft wanting to compare its ease of use to that of the Macintosh. Users want to name their files "Mega Corporation - fourth quarter results.DOC", not "MGCQ4RST.DOC", because the second name will mean zippo to the user a few months after they create it.
Microsoft was determined to bring long file names (LFNs) to Windows 95 much as it had for Windows NT. The latter, however, has a new file system designed from the ground up to allow long file names. Microsoft had a big problem on its hands with Windows 95: it wanted to maintain compatibility with existing disk structures, older versions of DOS and Windows, and older applications. It couldn't just "toss out" everything that came before and start fresh, because doing this would have meant no older programs could read any files that used the new long file names. File names were restricted to "8.3" (standard file name sizes) within the directories on the disk.
What Microsoft needed was a way to implement long file names so that the following goals were all met:
Windows 95 and applications written for Windows 95 could use file names much longer than 11 total characters.
The new long file names could be stored on existing DOS volumes using standard directory structures, for compatibility.
Older pre-Windows-95 software would still be able to access the files that use these new file names, somehow.
The VFAT file system accomplishes these goals, for the mostpart, as follows. Long file names of up to 255 characters per file can be assigned to any file under Windows 95 or by any program written for Windows 95 (although file names under 100 characters are recommended so that they don't get cumbersome to use). Support for these long file names is also provided by the version of DOS (7.x) that comes with Windows 95. File extensions are maintained, to preserve the way that they are used by software. The long file name is limited to the same characters as standard file names are, except that the following additional characters are allowed: + , ; = [ ].
To allow access by older software, each file that uses a long file name also has a standard file name alias that is automatically assigned to it. This is done by truncating and modifying the file name as follows:
The long file name's extension (up to three characters after a ".") are transferred to the extension of the alias file name.
The first six non-space characters of the long file name are analyzed. Any characters that are valid in long file names but not in standard file names (+ , ; = [ and ]) are replaced by underscores. All lower-case letters are converted to upper case. These six characters are stored as the first six characters of the file name.
The last two characters of the file name are assigned as "~1". If that would cause a conflict because there is already a file with this alias in the directory, then it tries "~2", and so on until it finds a unique alias.
So to take our example from before, "Mega Corporation - fourth quarter results.DOC" would be stored as shown, but also under the alias "MEGACO~1.DOC". If you had previously saved a file called "Mega Corporation - third quarter results.DOC" in the same directory, then that file would be "MEGACO~1.DOC" and the new one would be "MEGACO~2.DOC". Any older software can reference the file using this older name. Note that using spaces in long file names really doesn't cause any problems because Windows 95 applications are designed knowing that they will be commonly used, and because the short file name alias has the spaces removed.
Long file names are stored in regular directories using the standard directory entries, but using a couple of tricks. The Windows 95 file system creates a standard directory entry for the file, in which it puts the short file name alias. Then, it uses several additional directory entries to hold the rest of the long file name. A single long file name can use many directory entries (since each entry is only 32 bytes in length), and for this reason it is recommended that long file names not be placed in the root directory, where the total number of directory entries is limited.
In order to make sure that older versions of DOS don't get confused by this non-standard usage, each of the extra directory entries used to hold long file name information is tagged with the following odd combination of file attributes: read-only, hidden, system and volume label. The objective here is to make sure that no older versions of DOS try to do anything with these long file name entries, and also to make sure they don't try to overwrite these entries because they think they aren't in use. That combination of file attributes causes older software to basically ignore the extra directory entries being used by VFAT.
While long file names are a great idea and improve the usability of Windows 95, Microsoft's streeeeeetch to keep them compatible with old software kind of shows. Basically, the implementation is a hack built on top of the standard FAT file system, and there are numerous problems that you should be aware of when using LFNs:
File Attributes
Each file is stored in a directory, and uses a directory entry that describes its characteristics such as its name and size, and also contains a pointer to where the file is stored on disk. One of the characteristics stored for each file is a set of file attributes that give DOS and application software more information about the file and how it is intended to be used.
The use of attributes is "voluntary". What this means is that any software program can look in the directory entry to discern the attributes of a file, and based on them, make intelligent decisions about how to treat the file. For example, a file management program's delete utility, seeing a file marked as a read-only system file, would be well-advised to at least warn the user before deleting it. However, it doesn't have to. Any program that knows what it is doing can override the attributes of a file, and certainly, viruses will do this routinely.
That said, DOS and most other operating systems assign definite meanings to the attributes stored for files, and will alter their behavior according to what they see. If at a DOS prompt you type "DIR" to list the files in the directory, by default you will not see any files that have the "hidden" attribute set. You have to type "DIR /AH" to see the hidden files.
A file can have more than one attribute attached to it, although only certain combinations really make any sense. The attributes are stored in a single byte, with each bit of the byte representing a specific attribute (actually, only six bits are used of the eight in the byte). Each bit that is set to a one means that the file has that attribute turned on. (These are sometimes called attribute bits or attribute flags). This method is a common way that a bunch of "yes/no" parameters are stored in computers to save space. The following are the attributes and the bits they use in the attribute byte:
Attribute Bit Code
Read-Only
00000001
Hidden 00000010
System
00000100
Volume Label l00001000
Directory
00010000
Archive 00100000
So, the attribute byte for a hidden, read-only directory would be 00010011, which is simply the codes for those three attributes from the table above, added together. Here is a more detailed description of what these attributes mean (or more accurately, how they are normally used). Note that each of the attributes below apply equally to files and directories (except for the directory attribute of course!):
Read-Only: Most software, when
seeing a file marked read-only, will refuse to delete or modify it.
This is pretty straight-forward. For example, DOS will say "Access
denied" if you try to delete a read-only file. On the other
hand, Windows Explorer will happily munch it. Some will choose the
middle ground: they will let you modify or delete the file, but only
after asking for confirmation.
Hidden: This one is pretty
self-explanatory as well; if the file is marked hidden then under
normal circumstances it is hidden from view. DOS will not display the
file when you type "DIR" unless a special flag is used, as
shown in the earlier example.
System: This flag is used
to tag important files that are used by the system and should not be
altered or removed from the disk. In essence, this is like a "more
serious" read-only flag and is for the most part treated in this
manner.
Volume Label: Every disk volume can be assigned
an identifying label, either when it is formatted, or later through
various tools such as the DOS command "LABEL". The volume
label is stored in the root directory as a file entry with the label
attribute set.
Directory: This is the bit that
differentiates between entries that describe files and those that
describe subdirectories within the current directory. In theory you
can convert a file to a directory by changing this bit, but of course
in practice trying to do this would result in a mess because the
entry for a directory has to be in a specific format.
Archive:
This is a special bit that is used as a "communications link"
between software applications that modify files, and those that are
used for backup. Most backup software allows the user to do an
incremental backup, which only selects for backup any files that have
changed since the last backup. This bit is used for this purpose.
When the backup software backs up ("archives") the file, it
clears the archive bit (makes it zero). Any software that modifies
the file subsequently, is supposed to set the archive bit. Then, the
next time that the backup software is run, it knows by looking at the
archive bits which files have been modified, and therefore which
need to be backed up. Again, this use of the bit is "voluntary";
the backup software relies on other software to use the archive bit
properly; some programs could modify the file without setting the
archive attribute, but fortunately most software is "well-behaved"
and uses the bit properly.
Most of the attributes for files can be modified using the DOS ATTRIB command, or by looking at the file's properties through the Windows 95 Windows Explorer or other similar file navigation tools.