Notes About HTTP
HTTP Basics
HTTP is the hypertext transfer protocol. The basic scenario
is the following.
-
Server opens a socket and waits for a connection at some well known port
number (generally 80).
-
Client connects to the server at that port on that host.
-
Client sends a request down the socket to the server.
-
Server sends a reply.
-
Server closes the connection.
If the client wants more than one item from a server, it must build a connection
for each item requested. HTTP/1.1 allows multiple requests per connection,
but is not yet widely implemented. Most clients can make multiple
requests at once, even using HTTP versions less than 1.1. They do
this by having multiple simultaneous connections.
HTTP Caching
Clients don't normally just request files. Instead, they normally
check the cache first, and only request files if the file is not
found in the cache. The cache is a store of recently requested files.
Sometimes the client will verify the cache contents with the server.
This incurs the latency penalty, but not the transfer penalty. Verify
is done with the head command (see below).
When caching works it has several advantages
-
Improves response time
-
Reduces network load
-
Reduces server load
-
Improves performance of OTHER clients and OTHER requests
Caching does have several disadvantages
-
Slows response time when it fails
-
Makes hit counts hard to measure
-
Takes substantial disk/memory resources
HTTP Proxies
A proxy is a server/client combination that sits between the original server
and the original client. In other words, the picture changes from
the thing on the left to the thing on the right. Proxys are useful
for implementing network security, for (sometimes) improving performance,
and for solving some network addressing/routing problems. Most clients
do not use proxies, however.
Client <----> Server Client <----> Proxy <-----> Server
One common use of a proxy is to put a proxy at each gateway in order
to cache files, and reduce network traffic across the network. One
study I read said that if all interior Internet gateways had a proxy server,
total Internet traffic could be reduced by 30%.
HTTP Requests
Requests go from the client to the server, and a requests from the client
asking the server to perform some service. Each requests starts with
a method, followed by a resource-indicator (generally a filename),
and a protocol-version. Optionally, there can be one or more
modifiers. There are three main methods, used in examples below.
-
GET /index.html http/1.0
retrieve the meta-data and the body of /index.html
-
HEAD /robots.txt http/1.0
retrieve only the meta-data of /robots.txt
-
PUT /my/secret/file http/1.0
create or modify the file on the server
All requests can have one or more modifiers. Examples include...
-
If-Modified-Since: Sat, 29 Oct 1994 19:43:21 GMT
-
Content-Length: 3472
-
Authorization: Basic Qwxyehsuzjehgsoiznshyebsn
The If-Modified-Since modifier tells the server to send the data only if
the data has changed since the given date. This is most useful for
clients that wish to cache.
The Content-Length modifier is used only for the PUT method, and tells
the length of the file body to follow. All put requests must have a
body.
The Authorization modifier encodes the user's name and password in a
base-64 encoding scheme. This scheme provides protection against
only the most casual snooping attempts, since base 64 encoding can be decoded
by anyone without need to know a secret password.
HTTP Responses
Responses come from the server to the client in response to client requests.
Each response is a series of lines describing the status (success or failure)
of the request, followed optionally by the meta-data for the requested
object and optionally the body of the file
GET requests return a status code, and if successful the file meta-data
and the file data. The status code is the first line returned by
the server, the meta-data are the next few lines, and the body of the file
starts after the first blank line. For example,
HTTP/1.0 200 OK
Date: Wed, 22 Oct 1997 04:02:44 GMT
Server: Apache/1.1.1
Content-type: text/html
Content-length: 2919
Last-modified: Wed, 15 Oct 1997 18:14:24 GMT
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD> (Body continues here)
HEAD and PUT are just like the GET request without the body.
HTTP Performance
HTTP performance can be divided into several parts.
-
Network latency
-
Network bandwidth
-
Server latency
-
Server bandwidth
Latency means the time after the request is issued until the first byte
of the answer is received. Bandwidth is the rate at which data flows
after the first byte is received. For large files bandwidth across
the internet dominates total time (normal internet bandwidth is 4 to 40
KB/sec). Network latency is typically in the hundred millisecond
range. Server latency/bandwidth is hard to quantify but depends on
many factors
-
Server load
-
File type (cgi-bin files and database requests are slow)
-
File location (across network and deep inside subdirectories are slow)
-
Reference frequency (files that are recently accessed are fast)
Access to small files across the local net to our server (Euclid) can take
about 100 ms. Full downloads of very large files across the whole
internet can take hours.
If there is a modem anywhere in the download path then normally
modem performance dominates over other considerations.