3rd Normal Form and Database Design
Requirements of third normal form:
-
All items are atomic
-
All tables have a primary key
-
Every row is determined by it's primary key
-
No duplicate rows
-
Every column is dependent on ONLY the primary key.
Why atomic items?
Consider a database like the one below. How would you determine
the total number of hammers ordered in a year?
This often happens for addresses, where ZIP code and CITY and
STATE should all be seperate. However, often streat number and
street name are in the same field. That's generally OK but can be
viewed as a technical violation.
CusID |
Item |
Date |
Amount |
0001 |
3 screwdrivers, 2 hammers |
01/03/97 |
$12.72 |
Why a Primary Key?
Every column should be dependent on the primary key, which can be either
a single column or a group of columns. Consider a database like the
one below. Note how there is no primary key, and as a result you
end up repeatidly typing in the same customer address. BAD!
The second set of tables below every table has a primary key, and there
is not repeated data entry. Breaking a table into two tables like
this is called decomposition.However, to get the address of those who ordered
a fridge, you must do a join.
CustID |
Address |
Item |
Quant |
0001 |
1212 West Fair |
TV |
2 |
0002 |
918 Cleveland |
Fridge |
1 |
0001 |
1212 West Fair |
Fridge |
1 |
CustID |
Address |
0001 |
1212 West Fair |
0002 |
918 Cleveland |
|
CustID |
Item |
Quant |
0001 |
TV |
2 |
0001 |
Fridge |
1 |
0002 |
Fridge |
1 |
|
Why no dependency except on primary key?
Every column should be dependent on only the primary key. Otherwise,
inconsistency can arrise. Consider the table below. The primary
key is the Car#, but because the address is not dependent soley on the
car#, it ends up being repeated data.
Car# |
Owner |
Color |
Address |
19765 |
Vance Armstrong |
Red |
1298 Fair Ave |
19767 |
Vance Armstrong |
White |
1298 Fair Ave |
90876 |
Juli Baracz |
Green |
7 Third St. |
When to denormalize?
Genearally, never. But if denormalizing can avoid an expensive
join, and that join would happen over and over again, and performance is
critical, then you might denormalize. But if so, you will have to
add code to every application that changes the database to check for consistency.