File System Basics

  1. File system Defined

    1. A file is a named collection of data.

    2. A file is the smallest unit of data within a file system.

      1. Attributes include name, type, location (tape, disk), address (block number), size, maxsize, access_control, time created/modified/read, owner, creator_application.

      2. Many file systems also have extended attributes like thumbnail, access-control-list, capability, or a digital signature.

    3. Directory is a named collection of files.

  2. Goals of a file system

    1. Should safely store data.

    2. Should allow me to share data with others

    3. Should protect my data from men in raincoats.

    4. Should help me organize my data.

    5. Should let me recall previous versions of my data.

    6. Should be always available.

    7. Should be fast.

    8. Should help me search for data I've misplaced.

    9. Should let me undelete my data and undo changes to my data.

    10. Should never run out of space.

    11. Should keep root out of my diary.

  3. Types of files

    1. How to identify types

      1. By extension (*.bat, *.exe)

        1. Can associate one application with each extension.

        2. Means common formats must share an application (like *.gif).

      2. By attribute associated with file.

        1. Use by MVS, Macs, etc.

      3. By Magic number (sucks!!)

        1. Is two-byte code at begining of file.

        2. Applications must skip over these two bytes.

        3. Cannot tell type without opening file.

        4. Not used by all applications.

    2. Types include

      1. Text

      2. Executable (sometimes more than one executable, like .com and .exe and .bat, or elf and a.out).

      3. ISAM/BTRIEVE

      4. Fixed record

      5. String of bytes

      6. Application specific. (not all system support this type.)

      7. Tradeoff is functionality vs. bloat.

        1. Pick a few important types works

        2. Have very general system works.

  4. Where are file systems stored?

    1. In RAM (i.e. tmpfs)

    2. On a hard drive

    3. On a partition of a hard drive (most common by far)

    4. On a set of hard drives that look like one hard drive because of the OS. (i.e. RAID)

    5. On a set of hard drives managed by the file system (most advanced)

    6. Fake files i.e. /proc or /dev

  5. Files

    1. Operations on files

      1. Sequential Access

        1. Open (set pointer at the beginning)

        2. Read or write (moves pointer for each operation)

        3. Close (de-allocates internal data structures)

      2. Conventional file access

        1. Open a file

          1. Resolves names, checks permissions

          2. Can be expensive

          3. Returns a file-pointer or a FCB (file control block).

        2. Create a file.

          1. Same as open, but includes delete previous file, make directory entry, allocate space.

        3. Write a file.

        4. Read a file.

        5. Delete a file.

        6. Truncate a file.

        7. Append to a file. ( like a log file).

          1. Think accounting software, video games

        8. Seek within a file (for file pointers).

        9. Zero

          1. For security reasons.

        10. Sync (commit data to disk).

        11. Undo (kill all changes).

      3. Memory Mapped Access

        1. Unix has this, AS/400 uses this exclusively

        2. Map -- bring into addess space

        3. Unmap -- finished with it.

  6. Directories

    1. Organization

      1. One level

        1. DOS 1.0 did this, commodore, apple

        2. Name collisions huge problem

      2. Two level

        1. IBM VM/CMS

        2. Has names that look like "myproject text a" with spaces.

        3. My Amoco project had directories with thousands of files, all in 8 letter names.

      3. Tree

        1. Works well.

        2. Allows for great organization.

        3. Each file gets ONE ABSOLUTE NAME!!

      4. Graph

        1. Is the result of trees with LINKS.

        2. A file can have literally infinite number of ABSOLUTE NAMES.

        3. Can make searching hard (but this is solvable).

      5. Database-Like

        1. Allows paths like /usr/lib/date>1-1-97/size>1024K/lib*.a

        2. Can be hard to implement in the kernel.

        3. Cool idea for micro-kernel projects.

        4. VMS has this but only for versions.

      6. Relational Algebra

        1. Allows for Union, intersection, etc.

        2. Suppose you write or delete a file in a union directory, what happens?

    2. Operations

      1. Read -- dir

      2. Create/delete a file

      3. Move a file.

      4. Rename a file.

      5. Change directory

      6. Make/delete directory.

      7. Append a file to the directory

      8. Traverse whole file system.

  7. Protection Basics

    1. Can use the UNIX usr/grp/other scheme.

    2. Can use acl's.

      1. Give a default permissions

      2. List special people and what they can do.

      3. Sometimes done by a .acl file, sometimes not.

    3. Can use passwords

      1. Imagine keeping one password per every file you want.

      2. Works O.K. for just a few files or few directories.

      3. The way the web works.

  8. Consistency Basics

    1. Single-Image approach

      1. Everyone sees one image of this file.

      2. Hard to implement with distributed caching (like on a net).

    2. Session approach (Andrew)

      1. Changes are committed at close time.

      2. You see the file as it exists at open time.

      3. Done with whole-file caching.

    3. NFS approach

      1. Cache has a timeout value.

      2. Make no other guarentees.

      3. Let people do there best.

    4. Immuatable -- See bullet system

  9. Example -- The Bullet file system.

    1. GOAL IS SPEED, nothing else.

    2. All files are stored contigiously. (This can cause external fragmentation).

    3. All files are named after there starting block number. (easy to find)

    4. All files are written ONE TIME at create time.

    5. All files are read in one large chunk.

    6. Can NEVER CHANGE A FILE

      1. yes, this makes log files hard!

      2. Never have to check cache consistency.

    7. No security -- If you know the number, your in.

    8. To make this human livable, had directory server to

      1. provide translations

      2. Implement security.