Notes About HTTP

HTTP Basics

HTTP is the hypertext transfer protocol.  The basic scenario is the following.
  1. Server opens a socket and waits for a connection at some well known port number (generally 80).
  2. Client connects to the server at that port on that host.
  3. Client sends a request down the socket to the server.
  4. Server sends a reply.
  5. Server closes the connection.
If the client wants more than one item from a server, it must build a connection for each item requested.  HTTP/1.1 allows multiple requests per connection, but is not yet widely implemented.   Most clients can make multiple requests at once, even using HTTP versions less than 1.1.  They do this by having multiple simultaneous connections.
 

HTTP Caching

Clients don't normally just request files.  Instead, they normally check the cache first, and only request files if the file is not found in the cache.  The cache is a store of recently requested files.  Sometimes the client will verify the cache contents with the server.  This incurs the latency penalty, but not the transfer penalty.  Verify is done with the head command (see below).

When caching works it has several advantages

Caching does have several disadvantages

HTTP Proxies

A proxy is a server/client combination that sits between the original server and the original client.  In other words, the picture changes from  the thing on the left to the thing on the right.  Proxys are useful for implementing network security, for (sometimes) improving performance, and for solving some network addressing/routing problems.  Most clients do not use proxies, however.
        Client <----> Server         Client <----> Proxy <-----> Server
One common  use of a proxy is to put a proxy at each gateway in order to cache files, and reduce network traffic across the network.  One study I read said that if all interior Internet gateways had a proxy server, total Internet traffic could be reduced by 30%.

HTTP Requests

Requests go from the client to the server, and a requests from the client asking the server to perform some service.  Each requests starts with a method, followed by a resource-indicator (generally a filename), and a protocol-version.  Optionally, there can be one or more modifiers.  There are three main methods, used in examples below. All requests can have one or more modifiers.  Examples include... The If-Modified-Since modifier tells the server to send the data only if the data has changed since the given date.  This is most useful for clients that wish to cache.

The Content-Length modifier is used only for the PUT method, and tells the length of the file body to follow. All put requests must have a  body.

The Authorization modifier encodes the user's name and password in a base-64 encoding scheme.  This scheme provides protection against only the most casual snooping attempts, since base 64 encoding can be decoded by anyone without need to know a secret password.
 

HTTP Responses

Responses come from the server to the client in response to client requests.  Each response is a series of lines describing the status (success or failure) of the request, followed optionally by the meta-data for the requested object and optionally the body of the file

GET requests return a status code, and if successful the file meta-data and the file data.  The status code is the first line returned by the server, the meta-data are the next few lines, and the body of the file starts after the first blank line.  For example,
 

HEAD and PUT are just like the GET request without the body.
 

HTTP Performance

HTTP performance can be divided into several parts. Latency means the time after the request is issued until the first byte of the answer is received.  Bandwidth is the rate at which data flows after the first byte is received.  For large files bandwidth across the internet dominates total time (normal internet bandwidth is 4 to 40 KB/sec).  Network latency is typically in the hundred millisecond range.  Server latency/bandwidth is hard to quantify but depends on many factors Access to small files across the local net to our server (Euclid) can take about 100 ms.  Full downloads of very large files across the whole internet can take hours.
If there is a modem anywhere in the download path  then normally modem performance dominates over other considerations.