Scapy: DNS Query Script

Thursday , 8, August 2019

I’m new to Python although I’ve been programming for years in other languages. So imagine my surprise when I found out how great scapy is! In a couple of earlier posts I wrote about crafting custom packets and using scapy commands, so now I’m going to talk about programming with scapy. Instead of finishing scripts and then moving on to something else, it’s nice to write a post about one because going over it in detail really reinforces what I’ve learned.

This script is a basic DNS lookup tool not unlike nslookup or dig, only not as full featured. It currently only supports querying A recs and NS recs, but it will allow you to first lookup the nameservers for a domain, then find the A rec using an authoritative nameserver.

I’m using Github for scapy scripts despite being a longtime Bitbucket user, so feel free to grab the code and look as I go through it. In future, I’ll setup a gitea server and write a post about how it goes. Meanwhile here is the dns_query.py script. If people see it I might even be guilted into going back and refactoring it. It will never be dig, but honestly it could use more features like looking up MX and SOA recs.

So the basic usage is this:
Usage: $ sudo ./dns_query.py <domain>
$ sudo ./dns_query.py <domain> [A|NS|ALL]
$ sudo ./dns_query.py A [<nameserver>]

The DNS lookup results look like this:

simple IP address lookup
DNS lookup for A rec – not bothering to show TTL, etc.
query 8.8.8.8 for google.com nameservers

So let’s go through just how easy it was to make this happen using scapy. Since this is the first script I’ve posted about, and one of the first I’ve written in Python I’ll go through all of it.

#!/usr/bin/env python3
import sys, re
from scapy.all import DNS, DNSQR, IP, UDP, sr1

I start out with this line invoking the python interpreter in a way that should work on Windows too, although apparently scapy on Windows is dicey. I think it’s a bit weird instead of just
#!/usr/bin/python3
Anyway on line 2 we import code we’re going to need to accept input args from the command line and to use regular expressions that we’ll use to check for valid inputs. On line 3 we import specifically the things from scapy that we’ll use.

if script gets called with no args it shows usage hints

Here is my check for a valid domain name. My regex is both flawed and less than optimal in terms of efficiency which is unusual. Typically people strive for correctness and either performance or readability. People find regular expressions to be tricky but I love them, so let me explain this one.

domain = sys.argv[1]
validdomain = re.compile('^(?:[A-z]\w*\.)+[A-z]{2,}$')
if validdomain.match(sys.argv[1]) == None:
exit("Sorry, \"{}\" is not a valid domain name".format(domain))


This regex says, match a string that starts with an alphabet character, then continue up to a word boundary with alphanumerics and underscore characters, i.e. [A-Za-z0-9_] followed by a period. This pattern can be used one or more times and then the string needs to end with 2 or more alphabet characters. That non-capture repeating group of characters followed by a dot allows for variable length subdomains, like example.com or www.invidio.us or ftp.us.debian.org, or some.ridiculously.long.subsubsubsubsub.domen.de.

So this accepts invalid domain names and rejects valid ones in specific ways, and I’m ok with those edge cases but it’s important to know where it fails. First of all there is no limit on length, so a thousand character long domain name is considered valid just as one with 99 subdomains. Actually valid domain names are limited to 253 character in length, and 63 characters is the maximum length for each part.

There are also some valid domains that are not recognized as such by this script because they are not ASCII. 鎌倉市.com is wrongly considered invalid because it is unicode.

Below is the core logic this script depends on. It’s scapy and it needs a bit of explanation. Packets are formed in layers in scapy, starting with the one that’s implied here – layer 2. Scapy has good default values for pretty much everything it seems, including assuming Ethernet frames for the data link layer.

NAMESERVER = sys.argv[3] if len(sys.argv) > 3 else '8.8.4.4'
......
i = IP(dst=NAMESERVER)
u = UDP(dport=53)
resp = sr1(i/u/DNS(rd=1, qd=DNSQR(qname=domain, qtype='A')), verbose=0)

The IP and UDP layers are set separately for the sake of readability. They could hvae been set in the already lengthy last line. For the IP layer, a destination must be set – how else is scapy supposed to know where to send this packet? The UDP layer actually does default to using destination port 53, since lots of UDP traffic is actually DNS, but I decided to be explicit about it for some reason. The last line actually does all the heavy lifting for the entire program – constructs the packet, sends it and stores the reply.

So breaking that line down further, the way scapy builds packets is by using the slash character ‘/’ to bind the layers, with the layer 2 assumed to be Ethernet if you rely on the default. The entire packet is used by the scapy command sr1(), which sends it at layer 3 and stores one reply packet, as designated by the 1 in sr1. Scapy has an entire family of commands that send and receive packets that work like this.

send : Send packets at layer 3
sendp : Send packets at layer 2
sendpfast : Send packets at layer 2 using tcpreplay for performance
sr : Send and receive packets at layer 3
sr1 : Send packets at layer 3 and retur
n only the first answer
sr1flood : Flood & receive packets at layer 3 and return only first answer
srbt : send and receive using bluetooth socket
srbt1 : send and receive 1 packet using a bluetooth socket
srflood : Flood and receive packets at layer 3
srloop : Send a packet at layer 3 in loop and print the answer each time

srploop : Send a packet at layer 2 in loop, print the answer each time
srp : Send and receive packets at layer 2
srp1 : Send and receive packets at layer 2, return the first answer

The DNS query is constructed by specifying how many levels of recursion (rd=1) and indicating a DNS Query (DNSQR) with the domain to ask about. Finally, if there is no ANAME returned from the query, we simply print that this domain was not found.

this is part of the result when “ALL” is chosen, it’s pretty long and not likely to be popular

That’s all it takes to do DNS lookup with scapy – barely 50 lines of code! Above is a portion of the result when using the “ALL” option instead of ‘A’ or ‘NS’. It’s quite long and probably not too useful. Let’s walk through another script real soon.