DNS: THE DOMAIN NAME SYSTEM

The Internet Protocol addresss is a 32- bit integer. If somebody wants to send a message it is necessary to include the destination address, but people prefer to assign machines pronounceable, easily remembered names (host names). For this reason the Domain Name System is used. These logical names also allow independence from knowing the physical location of a host. A host may be moved to a different network, while the users continue to use the same logical name.

The Domain Name System (DNS) is a distributed database used by TCP/IP applications to map between hostnames and IP addresses, and to provide electronic mail routing information. Each site (university department, campus,company, or department within a company, for example) maintains its own database of information and runs a server program that other systems across the Internet can query. The DNS provides the protocol which allows clients and servers to communicate with each other.

The system accesses the DNS through a resolver. The resolver gets the hostname and returns the IP address or gets an IP address (fig.1) and looks up a hostname. As we can see in fig.1 the resolver returns the IP address before asking the TCP to open a connection or sending a datagram using UDP.

DNS Organization

The domain name system uses a hierarchical naming scheme known as domain names,which is similar to the Unix filesystem tree. The root of the DNS tree is a special node with a null label. The name of each node (except root) has to be up to 63 characters.The domain name of any node in the tree is the list of labels, starting at that node, working up to the root, using a period ("dot") to separate the labels (individual sections of a name might represent sites or a group, but the domain system simply calls each section a label ). The difference between the Unix filesystem and the tree of the DNS is that in the DNS we start on the ground and "go up" till the root. Writing them in this order makes it possible to compress messages that contain multiple domain names. Thus, the domain name "tau.ac.il" contains three labels: "tau", "ac", and "il". Any suffix of a label in a domain name is also called a domain. In the above example the lowest level domain is "tau.ac.il" (the domain name for the Tel-Aviv University Academic organization in Israel), the second level domain is "ac.il" (the domain name for Academic organizations of Israel), and the top level domain (for this name) is "il" (the domain name for Israel). The node il is the second level node (after root) (Fig.2)

Every node in the tree must have a unique domain name, but the same label can be used at different points in the tree. The top-level domains are divided into three areas:

* 1. arpa is a special domain used for address-to-name mapping.
* 2. The seven 3-character domain names ( generic (organizational) domains).
* 3. The 2-character domains are based on the country codes. These are called the country (the geographical) domains.

The seven generic domains are depicted in the fig.3 :

* Domain Name Meaning
* COM Commerical organizations
* EDU Educational institutions
* GOV Government institutions
* MIL Military groups
* NET Major network support centers
* ORG Organizations other than those above
* INT International organizations

Fig.3 The three-character generic domain The Internet scheme can accomodate a wide variety of organizations, and allows each group to choose between geographical or organizational naming hierarchies. Most sites follow the Internet scheme so they can attach their TCP/IP installations to the connected Internet without changing names. The zone is a subtree of the DNS that is administered separately. A common zone is a second-level domain, "ac.il" for example. Thus a lot of second-level domains divide their zone into smaller zones.

Whenever a new system is installed in a zone, the DNS administrator for the zone allocates a name and an IP address for the new system and enters these into the name server's database. A name server is said to have authority forone zone or multiple zones. Often, server software executes on a dedicated processor, and this computing machine is called the name Server.

The person responsible for a zone must provide a primary name server for that zone and one or more secondary name servers. The main difference between a primary and a secondary is that the primary loads all the information for the zone from disk files, while the secondaries obtain all the information from the primary. When a secondary obtains the information from its primary it is called a zone transfer.

When a new host is added to the zone, the administrator adds the appropriate information (name and IP address) to a disk file on the system running the primary. The primary name server is then notified to reread its configuration files. The secondaries query the primary on regular basis (normally every 3 hours) and if the primary contains newer data, the secondary obtains the new data using a zone transfer.

If the name server doesn't contain the information requested, it must contact another name server. Not every server, however, knows how to contact every other server. Instead, every name server must know how to contact the root name servers. The root servers then know the name and location (i.e. IP address) of each authoritative name servers for all the second-level domains. There are six root servers in the world and every primary name server has to know the address of one of root server. The fig. 2 we can depict the tree of servers fig.4.

In practice, the organization often collects information from all of their sub-zones into a single server. Thus we can depict fig.5 which is more realistic than fig.4.

We have to say that the tree in fig.5 shows how a given server can contact other servers only. This tree doesn't indicate physical network connection. Servers may be located at arbitrary locations on the network. Therefore, the tree of servers is a logical conection between servers, which uses the Internet for communication.

DNS Caching

A fundamental property of the DNS is caching. That is, when a name server receives information about a mapping, it caches that information. Thus a later query for the same mapping can use the cached result, and not result in additional queries to other servers. The DNS uses the caching for optimizing search cost.
How does it work?
Every server has a cache for recently used names as well as a record of where the maping information for that name was obtained. When a client ask the server to resolve certain name the server does as follows:

1. Check if it has authority for the name. If yes, the server doesn't need caching information.
2. if not, the server checks its cache whether the name has been resolved recently. if yes, the server reports the caching information to its clients.

We can examine the cache when the server cashed the information once, but didn't change it. Since information about a particular name can change, the server may have incorrect information in its caching table. The Time to Live (TTL) value is used to decide when to age information. Whenever an authority responds to a request, it includes a TTL value in the response which specifies how long it guarantees the binding to remain.

DNS MESSAGE FORMAT

When the user wants to send a message, it invokes an application program and supplies the name of a machine with which the application must communicate. The application program must find the machine's IP address. It passes the domain name to a local resolver (L.R.) and requests an IP address. The local resolver checks its cache and:

* If the L.R. has an answer, it returns the answer.
* If the L.R. hasn't one, it sends the message to the server. The server then returns a similar message that contains the answer to the questions for which the server has bindings. If the server can't answer, it sends responsive information about other servers that the client can contact.

Fig.6 shows the DNS message format.

Explanation of Fig.6:
o The IDENTIFICATION is set by the client and returned by the server.
o The 16-bit PARAMETER consists of:
+ 0-th bit field -qr: 0 means the message is a query,1 means it's a response.
+ 1-4 bit fields - OPCODE:
# 0 -is a normal value (Standard query).
# 1 - an inverse query.
# 2 - the server status request.
+ 5-th bit field - Authoritative answer. The name server is authoritative for the domain in the question section.
+ 6-th bit field is set if message truncated. With UDP this means that the total size of the reply exceeded 512 bytes, and only the first 512 bytes the of the reply were returned.
+ 7-th bit field - Recursion Desired.This bit can be set in a query and is then returned in the response.
+ 8-th bit field - Recursion Available.
+ 9-11 -th bits field has to be 0.
+ 12-15 -th bits field - Return Code. 0- no error, 3- name error.
o The fields labeled NUMBER OF ... give each a count of entries in the corresponding sections in the message.
o The QUESTION SECTION contains queries for which answers are desired. The client fills in only the question section; the server returns the question and answers with its response. Each question has Query Domain Name followed by Query Type and Query Class fields (as depicted in Fig.7.)
o ANSWER,AUTHORITY,ADDITIONAL INFORMATION sections consist of a set of resource records that describe domain names and mappings. Each resource record describes one name (as depicted in fig.8.).

The RESOURCE DOMAIN NAME contains the destination name, and can be in an arbitrary length. The TYPE field specifies the type of the data record. The CLASS field specifies its class. The TIME TO LIVE field contains an integer that specifies the number of seconds information in this resource record can be cached. It is used by clients who have requested a name binding and may want to cache the results.The RESULTS DATA LENGTH field specifyies the count of octets in the RESOURCE DATA field.

This document was written by Meir Galperin & Ira Gordin
for the "Protocols and Computer Networks" course,
lectured by Dr. Debby Koren
Studied at Tel-Aviv University.