File System Basics
File system Defined
A file is a named collection of data.
A file is the smallest unit of data within a file system.
Attributes include name, type, location (tape, disk), address (block number), size, maxsize, access_control, time created/modified/read, owner, creator_application.
Many file systems also have extended attributes like thumbnail, access-control-list, capability, or a digital signature.
Directory is a named collection of files.
Goals of a file system
Should safely store data.
Should allow me to share data with others
Should protect my data from men in raincoats.
Should help me organize my data.
Should let me recall previous versions of my data.
Should be always available.
Should be fast.
Should help me search for data I've misplaced.
Should let me undelete my data and undo changes to my data.
Should never run out of space.
Should keep root out of my diary.
Types of files
How to identify types
By extension (*.bat, *.exe)
Can associate one application with each extension.
Means common formats must share an application (like *.gif).
By attribute associated with file.
Use by MVS, Macs, etc.
By Magic number (sucks!!)
Is two-byte code at begining of file.
Applications must skip over these two bytes.
Cannot tell type without opening file.
Not used by all applications.
Types include
Text
Executable (sometimes more than one executable, like .com and .exe and .bat, or elf and a.out).
ISAM/BTRIEVE
Fixed record
String of bytes
Application specific. (not all system support this type.)
Tradeoff is functionality vs. bloat.
Pick a few important types works
Have very general system works.
Where are file systems stored?
In RAM (i.e. tmpfs)
On a hard drive
On a partition of a hard drive (most common by far)
On a set of hard drives that look like one hard drive because of the OS. (i.e. RAID)
On a set of hard drives managed by the file system (most advanced)
Fake files i.e. /proc or /dev
Files
Operations on files
Sequential Access
Open (set pointer at the beginning)
Read or write (moves pointer for each operation)
Close (de-allocates internal data structures)
Conventional file access
Open a file
Resolves names, checks permissions
Can be expensive
Returns a file-pointer or a FCB (file control block).
Create a file.
Same as open, but includes delete previous file, make directory entry, allocate space.
Write a file.
Read a file.
Delete a file.
Truncate a file.
Append to a file. ( like a log file).
Think accounting software, video games
Seek within a file (for file pointers).
Zero
For security reasons.
Sync (commit data to disk).
Undo (kill all changes).
Memory Mapped Access
Unix has this, AS/400 uses this exclusively
Map -- bring into addess space
Unmap -- finished with it.
Directories
Organization
One level
DOS 1.0 did this, commodore, apple
Name collisions huge problem
Two level
IBM VM/CMS
Has names that look like "myproject text a" with spaces.
My Amoco project had directories with thousands of files, all in 8 letter names.
Tree
Works well.
Allows for great organization.
Each file gets ONE ABSOLUTE NAME!!
Graph
Is the result of trees with LINKS.
A file can have literally infinite number of ABSOLUTE NAMES.
Can make searching hard (but this is solvable).
Database-Like
Allows paths like /usr/lib/date>1-1-97/size>1024K/lib*.a
Can be hard to implement in the kernel.
Cool idea for micro-kernel projects.
VMS has this but only for versions.
Relational Algebra
Allows for Union, intersection, etc.
Suppose you write or delete a file in a union directory, what happens?
Operations
Read -- dir
Create/delete a file
Move a file.
Rename a file.
Change directory
Make/delete directory.
Append a file to the directory
Traverse whole file system.
Protection Basics
Can use the UNIX usr/grp/other scheme.
Can use acl's.
Give a default permissions
List special people and what they can do.
Sometimes done by a .acl file, sometimes not.
Can use passwords
Imagine keeping one password per every file you want.
Works O.K. for just a few files or few directories.
The way the web works.
Consistency Basics
Single-Image approach
Everyone sees one image of this file.
Hard to implement with distributed caching (like on a net).
Session approach (Andrew)
Changes are committed at close time.
You see the file as it exists at open time.
Done with whole-file caching.
NFS approach
Cache has a timeout value.
Make no other guarentees.
Let people do there best.
Immuatable -- See bullet system
Example -- The Bullet file system.
GOAL IS SPEED, nothing else.
All files are stored contigiously. (This can cause external fragmentation).
All files are named after there starting block number. (easy to find)
All files are written ONE TIME at create time.
All files are read in one large chunk.
Can NEVER CHANGE A FILE
yes, this makes log files hard!
Never have to check cache consistency.
No security -- If you know the number, your in.
To make this human livable, had directory server to
provide translations
Implement security.