Systems Engineering and Programming

Herein I delve into a variety of topics involving creating systems, modifying them, administering them, and programming for them.


The 'dig' command and things to be aware of

On Unix, there are various commands that can be used for performing lookups on hosts and IP addresses in the Domain Name System (DNS). The 'nslookup' command is commonly the go-to command to do this; but I find it clunky...something from the past. For casual lookups, I use the 'host' command, because it's simple, strightforward, and accepts either a hostname or an IP address without having to do anything special to distinguish them to the command.

The best, modern command for DNS lookups is the 'dig' command (where the name stands for Domain Information Groper). The most fundamental syntax for the command is:

dig Name [Type}

Example: dig lowes.com

You can specify that the lookup is to be to a specific DNS server like:

dig lowes.com @use2.akam.net

You can add a record type, for a specific type of DNS record to be returned as the results, such as A for the host record, MX for the mail server(s) for this host, NS for its name servers, etc. There is also type ANY, which is to return all the types. However, the ANY spec is widely misunderstood: you will see many websites blithely saying that ANY will return everything about the system. But that's not really true! ANY is very misleading. What ANY actually does is is return all the information that the DNS server currently has. Given that the DNS servers we get info from are caching what they got from another DNS server closer to the one where the system is hosted, what you are getting is what's currently in the cache. What's in the cache is reflective of what queries the DNS server recently received about the system, and that may be incomplete. For example, you do an ANY type query on a system and you don't get back any MX records: does that mean that there are no MX records associated with the system? Not necessarily. It typically means that no recent MX lookups have been done for the system. If your ANY lookup did not return MX records, and then you repeat the lookup using type MX and get MX records, and then you repeat the ANY lookup, you will then see the MX records magically appear in the output, where they have conspicuously longer TTL values than the other records, because they were freshly acquired from beyonder DNS servers.

Understanding Apache configuration processing

If you have been thrust into Apache you will be faced with what so many of us have: trying to get your head around how its configuration is processed. As with anything which has evolved over many years to meet diverse requirements, the Apache configuration can be as bewildering as the U.S. tax code. And in trying to understand that you can perform Web searches for information and come away frustrated with what's out there. The best information is the official Apache documentation, which is better written and better organized than info you will find for other subsystems, where the authors go above and beyond to provide how-to and don't-do information. Herein I present what I hope is further helpful info, based upon my experience with Apache in the Linux environment.

Multiple pieces can contribute to it

The basic Apache configuration is in directory /etc/httpd/conf/, provided in installed file httpd.conf. (The providers put a lot of helpful comments into the file to help guide your coding.) That file is yours to modify at will: httpd RPM package updates will leave it alone. Personally, I like to leave the httpd.conf file as is, and instead create my own file in that directory, by copying httpd.conf and editing as needed. Apache will use your copy instead if you change file /etc/sysconfig/httpd to have like: OPTIONS="-f /etc/httpd/conf/our-httpd.conf".

It's important to understand that, these days, that .conf file is not intended to contain the entirety of your Apache configuration. If you inspect the httpd.conf that gets installed, fairly near the top you will find directive line: Include conf.d/*.conf. That specification tells Apache to also pull in all files having a .conf suffix that reside in /etc/httpd/conf.d/, at that point in the configuration. (The /etc/httpd/ portion comes from the ServerRoot directive earlier in the httpd.conf file.) That directory provides some modularity, allowing other packages to conveniently contribute to the configuration, without having to somehow alter a monolithic configuration file.

What you see in httpd.conf is a series of directives, on a series of sequential lines. All of this constitutes what Apache calls "Main" — a term borrowed from programming, meaning the mainline portion of the whole. (You'll also see this named Main Server and main_server.) This defines the main body of the Apache configuration. Okay, so "main body" suggests that there are other sections, right? Yes. Those sections are principally the sections, in which you can code specific handling of requests which arrive via a specific IP address (as in the server having multiple network interfaces) and/or a specific port number — and possibly a specific server name encoded in an HTTP header that is part of the request (which constitutes what is called name-based virtual hosting — a separate discussion). Okay, so you can code certain directives within the section; but do you have to code an exhaustive set of directives? No — because Apache performs "inheritance": directives which are not coded in the VirtualHost block but which are in Main are inherited from Main...with the default exception of rewrite directives (which you can cause to be inherited in a VirtualHost via RewriteOptions Inherit and its companions.

VirtualHost and SSL

There is lots of good info on the Web about the nuances of coding VirtualHost. I will touch on something which engenders some confusion and hesitancy. Historically, Apache operated on plain text, HTTP protocol requests arriving on its port 80. As provided, the Apache configuration has no VirtualHost sections for port 80 processing, so you can freely devise your own; and the simplicity of HTTP allows your VirtualHost coding to be rather simple. As privacy and security became needed in Web services, Apache added certificate-based SSL request capability, through port 443, protocol HTTPS. (In reality, HTTPS is just HTTP with encrypted transmission: what the browser composes and Apache ultimately sees is plain text HTTP.) This was implemented via RPM package mod_ssl, which contributes configuration fragment file /etc/httpd/conf.d/ssl.conf. This conf file contains a variety of complex, SSL-named configuration directives, which contributes to Main, and a VirtualHost section coded as: <VirtualHost _default_:443>. This VirtualHost section obviously incites certain SSL processing, and causes logging to be to SSL-specific log files, such as ssl_access_log rather than the ordinary http log access_log.

Okay, here's the quandry: This VirtualHost section, which essential to SSL processing, is exceptional in that it is provided to you, but seems to be the thing that is dealing with all HTTPS traffic. (An aside: the '_default_' coding says that this is an IP-based virtual host construct, as opposed to '*' in that position designating name-based virtual hosting, where the latter would have a ServerName directive in the block to match a hostname arriving in the HTTP header.) If you want to code your own HTTPS VirtualHost for special handling of requests, how do you do that? Per Apache rules, you would code your block to appear after the ssl.conf one, achievable by adding a file to /etc/httpd/conf.d/ with a name which alphabetically follows ssl.conf: this is because, for a given IP address and port number among multiple VirtualHost blocks, if a request does not exactly match any of them, the first one in the configuration sequence is what is used to handle the request. (Your VirtualHost is added after that in ssl.conf so that the SSL directives which ssl.conf contributed to Main will be inherited by your VirtualHost block.) While port 80 VirtualHost constructs are rather simple and don't require manadatory directives, the port 443 VirtualHost you would want to create needs to contain SSL directives as you find in the VirtualHost that is in ssl.conf. Thus, your port 443 VirtualHost has to start out with a bunch of directives unrelated to your processing needs.