EXT3 Links
Why Redhat 7.2 chose EXT3 http://linuxtoday.com/news_story.php3?ltsn=2001-08-22-004-20-NW-RH
Where to find a paper on the EXT3 design: ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
EXT3 FAQ: http://people.spoiled.org/jha/ext3-faq.html
How EXT2 works http://euclid.nmu.edu/~randy/Research/Papers/EXT2/
Goals of EXT3
-
QUICK fscks
-
NO/MINIMAL/SOME DATA LOSS on failure
Predictable data loss
-
Totally EXT2 compatable, with in place upgrades and downgrades
-
Handle VERY large filesystems.
-
Has a /.autofsck file
Definition -- Metadata vs Data
Problem: Drives sometimes reorder
How a Log works
-
There is NO data strucutre except the log
-
On write
-
Write to the log what you intend
-
Put the 'commit record' which must fit into a single block
-
Have a 'sweeper process' come along and clean ignorable parts of the log
-
On recovery
-
Everything with a commit record is valid
-
Everything without can be cleaned
How a journal works
-
Writing
-
There are nornal data items on a normal disk
-
create new data blocks holding the new data
-
write all metadata changes to the log
-
write the commit record
-
free the old data blocks
-
Then change the normal metadata items
-
Recovery
-
Scan the log
-
For every comit record, do what was committed
-
Ignore the other stuff
The Linux Implementation
What's in a journal
-
Metadata blocks
-
entire blocks of metadata that the file system would like to have inserted
into the file systems collection of metadata
-
Even if one byte has changed, we write the whole block
-
Descriptor blocks
-
An array of tuples (block in the journal, block on the filesystem to update)
-
Headers
-
Lots of them at fixed location
-
Describe the current head and tail
-
Have a sequence number
-
Header not corrupted with the highest sequence number wins.
-
On Write
-
Write only to the log, both data and metadata (depending on options)
-
If the log fills up, stop all opperations from continuing
-
On commit
-
Flush the journal log to the disk
-
Update the headers (commit!!!)
-
Flush the updates to their home locations
-
Free the journal space
-
On recovery
-
Scan the headers for the highest sequence number
-
Find the head and tail
-
Every record, copy those blocks to their home locations
Performance boost... group many operations into a single transaction.
Then structures which updat all the time, like a directory where many files
are created, or block bitmaps, or quota info, can be updated fewer times.
EXT3 Journalling modes
-
data=journal Log all changes to data and metadata.
-
Needs a larger log
-
Is slower
-
Best integrity
-
data=writeback Updata data in place, then log metadata
-
Can have some inconsistent data
-
Can never have inconsistent metadata
-
data=ordered Log all metadata, then write all data
-
Default
-
All files will be consistent
-
Some files might be old
The orphaned file story
How is the code written: There is a journalling layer, sometimes
called JFS or JDB. The EXT3 just thinks in transactions of data blocks.