Memory management background.
What you can do depends on hardware support
What you can do depends on the software you want to run.
Goals
Processes can access RAM
RAM use minimized
Processes protected
Everything is fast
Easy of programing for applications
Address binding.
Assuming the file to be run resides on on a disk as a binary image ...
Addresses of variables/functions could be fixed at compile time
Generates absolute code (the fastest kind).
Code must be loaded into a fixed location and kept there.
Addresses could be fixed at load time
Uses absolute code
Load step can be long.
Code cannot be moved once loaded.
Addresses could be variable during run time
Uses absolute code.
Can be moved during execution.
Requires hardware support
Most CPUs have this support. Some (like car computers or Nintendo) do not.
Hardware provides the illusion that the process has not moved, even though it has.
Distinguishes between logical and physical addresses.
Swapping
Bring whole process in
Run process
Swap whole process out
But what about speed?
Dynamic loading
Idea: Don't load code until it's actually called.
Linux Implementation: Load first small chunk, then jump into it. Let rest come in on page faults.
Advantage: produces quick loading code at first. Saves memory.
Disadvantage: Total loading time can be increased.
Dynamic Linking
Idea: share libraries between processes.
Implementation:
Process includes a stub for each dynamically linked subroutine.
When stub is called, stub loads library function (hopefully already in shared memory) and gets address routine.
Can call routine ...
Can change caller to call routine directly, then call routine.
Overlays
Idea: Have a fixed portion of memory to bring in routines.
Implementation:
Partition program into phases
Load and run phase one
Load and run phase two ...
Advantages
Can work in a very small amount of memory
Don't need any O.S. support
Disadvantages
Need to partition application
Need to program phases explicitly.
Used especially when DOS had a 640K limit.
Hardware support
Logical vs. Physical addressing
Logical is what the program generates and sees. Process only cares about logical addresses.
Physical is what the memory (RAM) sees. RAM only cares about physical.
Change comes from the MMU.
Segmentation, Partitioning and Swapping
Swapping
Idea: Free up memory by storing other partitions on disk.
Advantage:
Can run more processes than have memory for.
Can use swapping to relocate and help external fragmentation.
Disadvantage:
Must swap WHOLE PARTITIONS. That's expensive.
A 100K partition on a 1M/sec disk requires 2 * 100 / 1000 = 0.2 second to swap.
Used a lot as a medium term scheduler. Toss processes out when you get overloaded.
Segmentation
Idea: Use this simple hardware to provide multiple processes memory at the same time.
Scheme:
Give each process a place in RAM. Set base register to the start of this segment, and limit register to it's length.
When process generates an address A,
MMU generates A + base
MMU checks to make sure that does not exceed limit.
Can store a process anywhere in memory, and have it think it's at a different place.
Advantages: Simple, can relocate things at will.
Disadvantage:
All segments must be contiguous.
Problems
External fragmentation: Free space exists but not in a big enough single piece to be useful.
Internal fragmentation: Applications have free space in each segment, but the operating system cannot use it.
Allocation schemes can help
First fit. Put new segment in first free whole. (cheap, easy).
Best fit. Put new segment in smallest free whole. (saves free big wholes, produces many small free wholes.)
Worst fit. Put free segment in biggest free whole. (produces no small free holes, but uses up big free holes.)
id personal experiment out of 10 tries, FIRST one 1.5, worst won1, best won 3.5, and 4 ties.
Example Problem
Process |
Memory |
Start Time |
Required Time |
1 |
600K |
1 |
10 |
2 |
1000K |
5 |
5 |
3 |
300K |
7 |
20 |
4 |
700K |
10 |
8 |
5 |
500K |
15 |
15 |
6 |
600K |
16 |
10 |
7 |
1000K |
20 |
5 |
8 |
300K |
21 |
20 |
9 |
700K |
24 |
8 |
10 |
500K |
27 |
15 |
Compaction can help.
Simple to do (just move memory and change relocation registers)
Can be expensive (lots of memory copying).
PAGING ...
Basic Idea: Permit noncontigious memory allocation.
Break memory up into pages, which are normally 512 bytes to 8K bytes and each contigious and normally a power of two big.
Break logical addresses into page-num and offset on a bit boundry.
Translate logical into physical via PAGETABLE[page-num]+offset.
Associate with each page a set of permissions bits, and a valid-invalid bit.
Problems
Can slow a CPU down. Normally the page table is cached in a Translation Lookaside Buffer, which is associative memory.
Now TLB must be clear on every page table change.
Normally, can get 98% hit rates.
Can slow process switch time down. Page table must be switched. Luckily, most CPUs offer a "page table base register", and only that need be changed. Or can associate with the CPU an PID field, and with each page a PID field. On the IBM/360, access is allowed only if fields are equal or one is zero.
Allocating pages can be slow, since the OS might need to keep track of hundreds of pages. (linked list of free pages can help).
Internal fragmentation. Average waste is 1/2 page per segment.
For large address space machines, page table can become rediculus.
A 32 bit CPU with 2K pages has 2^20 pages!!
This is a problem for large address-space machines, not large memory machines!
A two-level page table can be used, with un-needed sections simply not allocated. VAX.
For 64 bit machines, this is not enough. Instead, can use 3 (SPARC) and 4 (moto) level paging schemes.
Can also use an INVERTED PAGE TABLE. (IBM RT, HP)
Table contains one entry for every physical page, not every virtual one. table is tuples of the form <PID, PAGE-NUM>.
When a virtual memory is to be found, lookup up <PID, PAGE_NUM> in the table. Find it at location 'i'.
Real memory is at <i, offset>.
Page table no longer has all associations, since some virtual pages will be swapped out. Each process must keep a whole page table with it.
Not sure how to share memory without redoing the inverted page table on each switch.
Not sure how to map in the O.S