opsschool-curriculum/dns_101.rst

*******
DNS 101
*******

The Domain Name System is a very central part of how the internet we use today works. Before the introduction of DNS, networked computers were referenced solely by IP address. After a while, this became confusing to remember and use on a daily name, thus DNS was born.

A Brief History of DNS
======================

As the popularity of the internet grew, and more networked computers came online, there was an increasing need to be able to reference remote machines in a less-confusing way than solely by IP address. With a small enough network, referencing machines by IP address alone can work absolutely fine. The addition of descriptive names, however, makes referencing machines much easier.

The first example of DNS was a ``HOSTS.TXT`` file created by staff running [ARPANET](http://en.wikipedia.org/wiki/ARPANET). ARPANET staff would amend this global ``HOSTS.TXT`` file on a regular basis, and it was distributed to anyone on the internet who wanted to use it to reference machines by name rather than by number. Eventually, as the internet grew, it was realised that a more automated system for mapping descriptive names to IP addresses was needed. To that end, ``HOSTS.TXT`` can be seen as the immediate forerunner to the Domain Name System we use today.

In 1983, Paul Mockapetris authored [RFC 882](http://tools.ietf.org/html/rfc882), which describes how a system mapping memorable names to unmemorable IP addresses could work. A team of students at [UC Berkeley](http://berkeley.edu) created the first implementation of Mockapetris' ideas in 1984, naming their creation Berkeley Internet Name Domain (BIND) server. Today, BIND is still the most widely-used nameserver software on the internet with over 70% of domains using it, according to the [ISC](http://isc.org/downloads/bind) and Don Moore's [survey](http://mydns.bboy.net/survey/). Since then, a number of RFC documents have been published which have continued to improve how DNS works and runs.

Terminology
===========

Domain name
^^^^^^^^^^^
A domain name is likely the way you interface with DNS most often when browsing the internet. Examples are, quite literally, everywhere - a very limited set of examples includes ``google.com`` and ``wikipedia.org``.

Top-Level Domain
^^^^^^^^^^^^^^^^
A top-level domain is an important, but rather generic, part of a domain name. Examples include ``com``, ``net``, ``gov`` and ``org`` - they were originally defined in [RFC 920](http://tools.ietf.org/html/rfc920).  ICANN controls the TLDs, and delegate responsibility for the registration and maintenance of specific domains to registrars.

Fully Qualified Domain Name (FQDN)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A fully-qualifid domain name is equivalent to the absolute name. Domain names can also be relative to one another and therefore have a tendency to become ambiguous at times. Specifying an FQDN relative to the root ensures that you have specified exactly which domain you are interested in. Examples of FQDNs include ``www.google.com.`` and ``www.gov.uk.``.

IP address
^^^^^^^^^^
An IP address is used to uniquely address a machine on a network in numerical (IPv4) or alphanumerical (IPv6) form. It is important that we understand the concept of "network" used here to be relative to what we are trying to achieve. For instance, trying to contact another computeri inside your home or office network means that the IP address of the machine you are trying to reach must be unique within your home or office. In terms of websites and publically-accessible information available via the internet, the 'network' is - in fact - the internet.

There are two types of IP: one is becoming increasingly popular as we get close to running out of avaialble IPv4 addres.

An IPv4 address referenced everything in four sets of three period-separated digits. For instance, ``8.8.8.8`` and ``102.92.190.91`` are examples of IPv4 addresses. As more devices and people across the world come online, the demand for IPv4 addresses hit a peak, and ICANN are now very close to running out of available addresses. This is where IPv6 comes in.

IPv6 follows similar principles to IPv4 - it allows for machines to be uniquely referenced on the network on which they reside, but the addressing syntax incorporates alphanumeric characters to increase the number of available addresses by a significant base. They are written as ``2001:0db8:85a3:0042:1000:8a2e:0370:7334``, although short-hand notations do exist (for instance, ``::1`` to refer to the local machine at any time).

Zonefile
^^^^^^^^
A zonefile is simply a text file made of a variety of different records for an individual domain. Each line of a zonefile contains the name of a particular domain, and then the value and type associated with it. For instance, in ``google.com``'s zonefile, there may exist a line which denotes ``www`` translates, via an ``A record``, to ``173.194.34.68``.

Records
^^^^^^^
A DNS record is a single mapping between a domain and relevant data - for instance, an IP address in the case of an ``A record``, or a mail server's domain name, in the case of an ``MX record``. Many records make up a zonefile.

How DNS works
=============

Root Servers
^^^^^^^^^^^^
At the very top of the DNS tree are root servers. These are controlled by the [Internet Corporation for Assigned Names and Numbers](https://icann.org). As of writing, there are thirteen unique root servers - however, an interesting caveat applies in that each of these root servers is actually a pool of root servers, acting in a load-balanced fashion in order to deal with the huge number of requests they get from the billions of internet users daily. Their purpose is to handle requests for information about Top-Level Domains, such as ``.com`` and ``.net``, where lower level nameservers cannot handle the request sufficiently. The root servers don't hold any records of real use, insofar as they cannot respond with an answer to a query on their own. Instead, they respond to the request with details of which nameserver is best advised to proceed further.

For example, let's assume that a request for ``www.google.com`` came straight in to a root server. The root server would look at its records for ``www.google.com``, but won't be able to find it. The best it will be able to produce is a partial match for ``.com``. It sends this information back in a response to the original request.

TLD Servers
^^^^^^^^^^^
Once the request for ``www.google.com`` has been replied to, the requesting machine will instead ask the nameserver it received in reply to the original request to the root server where ``www.google.com`` is. At this stage, it knows that this server handles ``.com``, so at least it is able to get some way further in mapping the address to an IP address. The TLD serve will try to find ``www.google.com`` in its records, but it will only be able to reply with details about ``google.com``.

Domain-level nameservers
^^^^^^^^^^^^^^^^^^^^^^^^
By this stage, the original request for ``www.google.com`` has been responded to twice: once by the root server to tell it that it doesn't handle any records, but knows where ``.com`` is handled, and once by the TLD server which says that it handles ``.com``, and knows where ``google`` is. We've still got one more stage to get to, though - that's the ``www`` stage. For this, the request is played against the server responsible for ``google.com``, which duly looks up ``www.google.com`` in its records and responds with an IP address (or more, depending on the configuration).

We've finally got to the end of a full request! In reality, DNS queries take place in seconds, and there are measures in place which we'll come on to in these DNS chapters about how DNS can be made faster.

Resource types
==============

Whilst at it's most basic, DNS is responsible for mapping easily-remembered domain names to IP addresses, it is also used as a form of key/value database for the internet. DNS can hold details on which mail servers are responsible for a domain's mail and arbitrary human-readable text which is best placed in DNS for whatever reason.

The most common types you'll see are:

- ``A``: responsible for mapping individual hosts to an IP address, for instance ``www`` in the ``google.com`` zonefile to ``173.194.34.67``
- ``AAAA``: similar to an `A` record, except for IPv6. It could be used to map ``www`` in the ``google.com`` zonefile to ``2001:4860:b002::68``
- ``CNAME``: used to alias one record to another, for instance ``bar.example.com.`` could be aliased to ``foo.example.com.``
- ``MX``: specifies mail servers responsible for handling mail for the domain. A priority is also assigned to denote an order of responsibility
- ``SOA``: specifies authoritative details about a zonefile, including the zonemaster's email address, serial number (for revision purposes) and primary nameserver
- ``SRV``: a semi-generic record used to specify a location. Used by newer services instead of creating protocol-specific records such as ``MX``.
- ``TXT``: originally for human-readable information that did not fit other records, but now mostly used to create ``SPF <http://en.wikipedia.org/wiki/Sender_Policy_Framework>``__ records

There's a good in-depth list of every record type, the description of its use and the related RFC in which it is defined in `this Wikipedia article <http://en.wikipedia.org/wiki/List_of_DNS_record_types>`__.

An example zonefile
===================

.. code-block:: bash

   $TTL     86400;	// specified in seconds, but could be 24h or 1d
   $ORIGIN  example.com

   @ 1D IN SOA ns1.example.com. hostmaster.example.com. (
               123456 ; // serial
	       3H     ;	// refresh
               15     ; // retry
               1w     ; // example
               3h     ; // minimum
               )

         IN NS ns1.example.com
	 IN NS ns2.example.com // Good practice to specify multiple nameservers for fault-tolerance
         IN NS ns1.foo.com     // Using external nameservers for fault-tolerance is even better
         IN NS ns1.bar.com     // And multiple external nameservers is better still!

         IN MX 10 mail.example.com // Here, 10 is the highest priority mail server, so is the first to be used
         IN MX 20 mail.foo.com     // If the highest priority mail server is unavailable, fall back to this one

   ns1   IN A     1.2.3.4
   ns1   IN AAAA  1234:5678:a1234::12 // A and AAAA records can co-exist happily. Useful for supporting early IPv6 adopters.
   ns2   IN A	  5.6.7.8
   ns2   IN A     1234:5678:a1234::89
   mail  IN A     1.3.5.7
   www   IN A     2.4.6.8
   sip   IN CNAME www.example.com.
   ftp	 IN CNAME www.example.com.
   mail  IN TXT   "v=spf1 a -all"

   _sip._tcp.example.com. IN SRV 0 5 5060 sip.example.com.

Host-specific DNS configuration
===============================

If you are administering systems, specifically Unix systems, you should be aware of two pieces of host-side configuration which allow your machines to interface with DNS:

  - ``/etc/hosts``
  - ``/etc/resolv.conf``

``/etc/hosts``
^^^^^^^^^^^^^^

The ``/etc/hosts`` file has the purpose of acting as a local alternative to DNS. You might use this when you want to override the record in place in DNS on a particular machine only, without impacting that record and its use for others - therefore, DNS can be over-ridden using ``/etc/hosts``. Alternatively, it can be used as a back-up to DNS: if you specify the hosts that are mission-critical in your infrastructure inside ``/etc/hosts``, then they can still be addressed by name even if the nameserver(s) holding your zonefile are down.

However, ``/etc/hosts`` is not a replacement for DNS - in fact, it is far from it: DNS has a much richer set of records that it can hold, whereas ``/etc/hosts`` can only hold the equivalent of ``A`` records. An ``/etc/hosts`` file might, therefore, look like:

.. code-block:: bash

   127.0.0.1	     localhost
   255.255.255.255   broadcasthost
   ::1               localhost
   fe80::1%lo0	     localhost

   192.168.2.2	     sql01
   192.168.2.3       sql02
   192.168.1.10      puppetmaster puppet pm01

The first four lines of ``/etc/hosts`` are created automatically on a Unix machine and are used at boot: they shouldn't be changed unless you really know what you're doing! In fact, the last two lines of this section are the IPv6 equivalents of the first line. After these first four lines, though, we can specify a name and map it an IP address. In the above example, we've mapped ``sql01`` to ``192.168.2.2``, which means that on a host with the above ``/etc/hosts`` configuration, we could refer to ``sql01`` alone and get to the machine responding as ``192.168.2.2``. You'll see a similar example for ``sql02``, too. However, there is a slightly odd example for the box named ``puppetmaster`` in that multiple friendly names exist for the one box living at ``10.0.0.2``. When referenced in this way - with multiple space-separated names against each IP address - the box at ``10.0.0.2`` can be reached at any of the specified names. In effect, ``puppetmaster``, ``puppet``, and ``pm01`` are all valid ways to address ``10.0.0.2``.

``/etc/resolv.conf``
^^^^^^^^^^^^^^^^^^^^

``/etc/resolv.conf`` exists on Unix machines to allow system administrators to set the nameservers which the machine should use. A DNS domain can also be referenced in this file, too. An example ``/etc/resolv.conf`` might look like:

.. code-block:: bash

   domain     opsschool
   nameserver 192.168.1.1
   nameserver 192.168.1.2
   nameserver 192.168.1.3

In this example, we would be specifying that any of ``192.168.1.1``, ``192.168.1.2`` and ``192.168.1.3`` can be used by the host with the above configuration to query DNS. We are actually telling the host that it is allowed to use any of the nameservers in this file when it resolves (ie: makes a request for an entry and waits for a response) a host in DNS.

Setting the ``domain`` directive - as in the above example, where we specified it as ``opsschool`` - allows users to specify hosts by address relative the domain. For instance, a user could reference ``sql01``, and a query would be sent to nameservers specified asking for records for both ``sql01`` and ``sql01.home``. In most cases, the responses should match - just be careful if they don't, as you'll end up with some very confused machines when DNS has split-brained like this!