Domain Name System

Purpose

The purpose of the domain name system is to translate domain names into IP numbers.  Secondary purposes include translating IP numbers back into domain names, and keeping track of MAIL HOSTS for each domain name.

It is possible to have an IP number and yet have no domain name.  Our lab printer is an example of such a machine.

Organization

The domain name system is a heirarchical system, just like a file system.  Names are to be read right to left (reverse order).  It must be heirarchical since otherwise the administration becomes unweildly.  There is no canonical list for the same reason.

The top level has the names .com .org .gov .mil .net .edu and the country names, and a bunch of others like .aero.  The generic names are all managed for the next year by a company in Ann Arbor.  The contry names are managed by organizations based in that country.  (Weird fact.  the Federated States of Miconesia and Armenia have both sold rights to radio stations).

Some nations split the county names into .edu, .gov, like we do.  Japan has a .edu.jp.  Other nations do not, like the Netherlands.

Names

Domain names are not limited to roman letters, but allow all of unicode. Other parts of the URL generally allow whatever the hosts's OS allows.  Maximum component length is 63 bytes, and max total length is 255 bytes.

THERE IS NO CONNECTION BETWEEN DOMAIN NAMES AND PHYSICAL LOCATIONS.  There is no connection between domain names and which physical networks.  a.a.com and a.b.com may or may not be in the same builing and the same LAN.  They could even be the same machine.

Each domain name coresponds to one IP address, but more than one name can corespond to the same address.

There is no canonical list of domain names.  No one knows all the names on the internet.

There is no way to tell, just looking at a name, if it descibes a subdomain or a specific host.  What is cs.purdue.edu?  cs.nmu.edu?

Generally each host will have a default domain name that if appends to any unqualified names.  Some hosts have more than one, and try each one in order.

DNS Software

Dns servers (sometimes called nameservers) are programs that listen on udp port 53 (tcp port 53 is optionally used).  Note that since they use UDP ports, the service is unreliable, out of order.  Resolvers are pieces of software that clients use to look up names.  Often resolvers are shared libraries that the client can link to, and that offer functions like gethostbyname().  All resolvers and nameservers speak the same DNS protocol.  If a packet is lost on the internet, resolvers generally use a timeout and retransmit mechanism.

Each dns namespace is required to have both a primary nameserver and a backup nameserver.  These must have different access paths to the internet (cannot use the same cablemodem).  Lots of companies will provide this service for you if you want.

DNS Lookup

Different locations can translate the same name to different IP numbers. Netflix.com might translate to a different server in California and Connecticut.

There are two ways to do a dns lookup.  You can either use recursive mode, or iterative mode. I *think* almost everyone uses iterative mode.

Recursive mode:
You ask the nameserver to find the answer.  If it does not know, it asks a second name server, which might ask a third, and so on.  The answers propogate backwards until the final anwser gets to you.  For euclid to look up xinu.cs.purdue.edu ...

  1. Euclid asks lisa.nmu.edu (our nameserver)
  2. Lisa does not know, so it asks 'dot'.  Euclid knows dot since all machines know dot.
  3. Dot does not know, so it asks the nameserver for .edu (which it knows since .edu is directly under dot).
  4. The nameserver for .edu does not know, so it asks the nameserver for purdue.edu (which it is required to know).
  5. The nameserver for purdue.edu does not know, so it asks the nameserver for cs.purdue.edu (which it is required to know).
  6. This nameserver knows the answer, and replies to cs.purdue.edu
  7. The nameserver for cs.purdue.edu replies with the answer to purdue.edu
  8. The nameserver for .purdue.edu replies with the answer to .edu
  9. The nameserver for edu replies with the answer to 'dot'.
  10. The nameserver for 'dot' replies with the anser to lisa.nmu.edu
  11. Lisa.nmu.edu replies with the answer to euclid
Iterative mode:
You ask the nameserver for the answer.  If it does not know, it will point you up the chain until it does
  1. Euclid asks lisa.nmu.edu (our nameserver).
  2. Lisa does not know, but does tell us the address for 'dot'.
  3. We ask dot for the IP number for the nameserver of 'edu'
  4. Dot tells us the IP number for edu.
  5. We ask edu for the IP number of the nameserver of purdue.edu.
  6. Edu's nameserver answers.
  7. We ask the namserver of purdue.edu for the IP number of the nameserver of cs.purdue.edu
  8. The nameserver answers
  9. We ask the nameserver for cs.purdue.edu for the IP number of xinu.cs.purdue.edu
  10. That nameserver answers.  We have an answer!!

Ways to Make this Faster

Caching.  Keep track of recently asked questions, and the associated answers.  Also keep track of negative answers (host not found).  This works even better for recursive queries, since the cache becomes a centralized resource for the whole organization/network/group.  All cached entries are marked non-authoritative, and also have a time to live field to prevent stale data form lasting forever.

Collapse the heirarchy.  Have dot not only just the big six plus country codes, but also store the answers for all of .com  Have the nameserver for purdue.edu store all the answers for the whole university (which also reduces the total number of nameservers, which might help in the world of $$).

Reverse Name Lookups

To lookup an IP number aaa.bbb.ccc.ddd and get the hostname, simply run a standard querry on ddd.ccc.bbb.aaa.in-addr.arpa

Other Things the DNS Can Tell You

A dns lookup can tell you
 
Recored ID Contents
A IPv4 number for the host
AA IPv6 number for the host
CNAME The cannonical hostname (useful if it's an alias)
HINFO CPU and OS type
MINFO Mail information
MX The mail exchange (what machine receives mail for this machine)
NS The authoratative name server for this machine.
SOA A list of names for which this nameserver is authoratative.
TXT Notes
SPF
Sender Prefered From .. an anti-fake-email idea.

DNSSEC

DNSSEC is a cryptographic signing of the DNS records. Otherwise, anyone who can effect your packets and send you wrong answers. It uses public key crypt and a complicated system to sign records to prove that they are correct. DNSSEC is used by some but no all organizations.

Fake DNS

One easy way to implement the censorship system is to force the DNS server to provide a bad IP number for the people you don't want seen. This only allows you to block web sites, and not web pages (Wikipedia has both wholesome and icky content). It can be defeated by having people use IP numbers instead of names.

Charter used to provide an IP number to their server for every non-existant DNS lookup. They could show you ads. If you opted out of it, they would give you a different IP number for every non-existent DNS lookup, one that went to a "web site not found" page. This messed up email.

Interesting Questions

  1. Why should a nameserver know the IP number of it's parent in the heirarchy?  Would the IP number be enough?
  2. How would you get a list of all names in the DNS?
  3. Would it make since to allow querries like a*l.com?

Links

International domain names at http://www.nunames.nu/lldemo/default.htm.
Worlds Longest Domain names at http://www.oreillynet.com/onlamp/blog/2005/06/the_worlds_longest_domain_name.html.
List of all top level domains at http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains.
How many hostnames are ther at http://www.domaintools.com/internet-statistics/.