Database Query Optimization
There are several ways a database query can be executed. Some are
much more effecient than others. To be concrete, consider a movie
database with 100,000 movies listed, including 10,000 movies made after
1970. Also consider a actors database with 10,000 actors/movie pairs,
including 20 listings showing the twenty moves made by John Wayne.
I'll guess that there are 3,000 actors who have made movies since the seventies.
Finally, assume that John Wayne made 10 movies after 1970.
How does one execute this query?
select * from movies, actors where actor.name="John Wayne" and year
> 1970 and actor.movie_title = movie.title
Strategy One
-
Join the two tables together. That produces 1,000,000,000 records.
-
Search the records for all listings with actor.name="John Wayne".
That cuts it down to 2,000,000 records.
-
Search that result for all movies with a year > 1970. That produces
200,000 records.
-
Search the result for matching titles in actor.movie_title and movie.title.
That gets the ten records you want.
Total records handled. 100,000+10,000+1,000,000,000+2,000,000+200,000+10
= 1,002,310,010 records
Strategy Two
-
Join the tables together. That produces 1,000,000,000 records.
-
Search the result for matching titles in actor.movie_title and movie.title.
That gets 100,000 records.
-
Search the records for all listings with actor.name="John Wayne".
That cuts it down to 20 records.
-
Search that result for all movies with a year > 1970. That produces
the ten records you want.
Total records handeled: 100,000 + 10,000 + 1,000,000,000 + 100,000
+ 20 + 10 = 1,000,210,030 records
Strategy Three
-
Search the actor table for all records with name="John Wayne". That
produces 20 records.
-
Search the movie table for all records with year > 1970. That
produces 10,000 records.
-
Join the two resukts above. That produces 200,000 records.
-
Search the result for matching titles in actor.movie_title and movie.title.
That gets the ten records you want.
Total records handled: 100,000 + 10,000 + 20 + 10,000 + 200,000
+ 10 = 320,030
Strategy Four
-
Search the movie table for all records with year > 1970. That
produces 10,000 records.
-
Find each movie title in the actor database. That produces a list
of every actor who's made at least one move in the seventies, maybe ???
3000 records. THIS IS A SCATTERED OPERATION, and not
just a search through the list
-
Search that result for actors with name "John Wayne". That produces
the ten records you want.
This handles 100,000 + 10,000 + 3,000 + 10=113,010 records, including
a scattered read of 3000 records.
Strategy Five
-
Search the name table for actors matching "John Wayne". That produces
20 records.
-
Find each movie title in the movie database. This produces 20 records.
THIS IS A SCATTERED OPERATION.
-
Search the results for movies with year > 1970. That gets the ten
movies you want.
Total records handled: 10,000 + 20 + 10 including 20 scattered
reads.