Anonymous user counting through DNS en

By Gerco on Monday 27 October 2014 22:00 - Comments (5)
Category: -, Views: 4.488

Imagine the following scenario:
You are maintaining a free application and you have a fair number of users, but you don’t know exactly how many. Because you want to understand your user base a bit better, you would like to count the number of users. This is the situation I found myself in recently.

Requirements
My requirements for a system like this were as follows:
  • Must work through restrictive corporate proxies;
  • Collect the minimum amount of data possible;
  • Do not annoy users;
  • Little application size increase;
  • Count number of users as accurately as feasible.
I investigated a few options, most of those were unacceptable to me for various reasons. They would either bloat my application, collect too much data or cause issues with the user’s network configuration. A lot of my users are behind a corporate proxy and my application is implemented in Java. Those of you who are familiar with Java will know that HTTP requests will - more or less randomly - pop up a proxy authentication dialog. Since I didn’t want to annoy my users I needed another solution.

I came up with a solution that fulfills most of the above requirements:
  • Works through proxies because the application never makes a request to any server outside the corporate network;
  • Collects only a random unique id that gets created on application startup. No user data is included whatsoever, not even the user’s IP address.
  • Cannot pop up authentication dialogs or cause application delays due to firewalls restricting access to outside;
  • Does not require any new libraries since all the code is already built in to in every OS.
This is how it works:
  1. On application startup, the application creates a random uuid;
  2. The application then attempts to resolve $uuid.stats.domainname.tld through DNS
  3. The DNS zone file for domainname.tld specifies a custom DNS server for stats.domainname.tld.
  4. stats.domainname.tld is running a custom DNS server that logs queries to $uuid.stats.domainname.tld.
There you have it, just count the unique ids in the DNS server’s log file and you’re done. This even has the added bonus of working on machines that have no internet access for security reasons. Even though they cannot connect anywhere outside the corporate network, they mostly still can resolve DNS queries! Now on to the implementation, which is surprisingly simple:

Client
In the client application, all you need to do is generate a unique id and resolve it. In Java, this can be done as follows:

Java:
1
2
3
4
5
6
7
8
public void getLatestVersionNumber() {
  Hashtable<String, String> env = new Hashtable<String, String>();  
  env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
  DirContext ctx = new InitialDirContext(env);
  Attributes attrs = ctx.getAttributes(uuid + ".stats.domainname.tld", new String[] { "TXT" }); 
  Attribute attr = attrs.get("TXT");
  return attr.get().toString();
}


In this case, I’m retrieving the TXT record, since that will allow my custom DNS server to be able return some information to the user, like the most recent version of the application (to alert the user an update is available, for example).

DNS configuration
The client code above causes the application to send a DNS request to resolve the domain name asked for: $uuid.stats.domainname.tld. In order to resolve this, the operating system will first need to resolve (in order): tld, dominate.tld and finally stats.domainname.tld. We can’t (or don’t want to) control which DNS server serves .tld or domainname.tld so we enter the following records to direct queries for *.stats.domainname.tld to our custom server:

code:
1
2
3
4
5
; Set nameserver for stats.domainname.tld. to stats-ns.domainname.tld
stats     IN     NS     stats-ns

; Set IP address for stats-ns.domainname.tld. to 1.2.3.4
stats-ns  IN     A     1.2.3.4


Replace 1.2.3.4 with the IP address of the machine that you will be running your custom DNS server on. You will need root-access to that server since a DNS server must run on port 53. A shared hosting server will not work.

DNS Server
On to the meat of the matter. Since I like experimenting with programming languages and my current language-du-jour is Go, I’m implementing this server in Go. I used the excellent DNS dns library from Miek Gieben. The server is based on his “reflect” example (with most of the code removed). The only interesting part is below:

Go:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
func (h DNSHandler) ServeDNS(w dns.ResponseWriter, r *dns.Msg) {
     m := new(dns.Msg)
     m.SetReply(r)

     for _, q := range r.Question {
          // We only respond to TXT queries. Anything else does not exist on this DNS server
          if q.Qtype == dns.TypeTXT {
               var responseText string

               if strings.HasSuffix(q.Name, "."+h.config.DomainName) {
                    uuid := q.Name[:strings.Index(q.Name, ".")]
                    c := *newCheckIn(uuid, time.Now())
                    saveCheckin(c)

                    responseText = getLatestApplicationVersion()
               } else {
                    // The query is for config.DomainName, reply nothing
               }

               if len(responseText) > 0 {
                    t := new(dns.TXT)
                    t.Hdr = dns.RR_Header{Name: q.Name, Rrtype: dns.TypeTXT, Class: dns.ClassINET, Ttl: 3600}
                    t.Txt = []string{responseText}
                    m.Answer = append(m.Answer, t)
               }
          }
     }

     w.WriteMsg(m)
}


This simplified code stores the unique ids received and replies with the latest application version.

That's all there is to it! Completely anonymous user counting without collecting any personal data, not even the user’s IP address. Naturally this only counts instances of the application and not the number of humans using it, but this is as close as you’re likely going to get.

Volgende: Netatalk 3.1.8 on Ubuntu 15.10 via PPA 02-'16 Netatalk 3.1.8 on Ubuntu 15.10 via PPA
Volgende: Gratis universiteit! (Nu ja, bijna dan) 09-'12 Gratis universiteit! (Nu ja, bijna dan)

Comments


By Tweakers user johnkeates, Tuesday 28 October 2014 03:37

Easier: make an HTTP request to a vhost and log vhost domains.

http://uuid.stats.domain.tld/

Have nginx / apache / lighty listen for all domains, record requests and put them in a database (use database logger and custom log format). Because the HTTP host header is sent with request, it's not going to have any trouble with DNS caches and problematic hotspots. Bonus: you can record UA strings that are sent out anyway. You _could_ record IP addresses. But DNS servers van do that too. Any app using the IP protocol can. You can read it off the socket at any time you want.

By Tweakers user Gerco, Tuesday 28 October 2014 03:54

@johnkeates: Not quite true. Making an HTTP request can trigger the proxy login dialog box, which it seems to do randomly, but consistently over many systems I've tested.

Also: Making a DNS request passes through the operating systems DNS resolvers, which then call the ISP DNS servers, which will go through some other number of servers to finally arrive at my DNS server. The machine making the request to MY server however is NOT the client machine, but some machine further up the stack.

The whole idea is that with this approach is that I cannot store user IP addresses, not even if I wanted to. Simply because it's not the users machine that connects to mine.

[Comment edited on Tuesday 28 October 2014 03:55]


By Tweakers user WoLFjuh, Tuesday 28 October 2014 23:05

There are, to my knowledge, four cases where this does/will not work:
  • Having no internet connection while resolving (this is the obvious one, but just for verbosity ;) )
  • When the user is on a network with a captive portal. In this case it is dependent on the implementation of the captive portal if all dns requests (in your case TXT rr's) are captured and replaced by a reply to redirect the user to a login site.
  • When the user is on a network with a http proxy server for all internet communications. In this case resolving of a dns request could be done by the proxy (HTTP or SOCKS) server instead of the clients own resolver.
  • And last, when defensive software blocks you. Some users are allergic to sending the smallest of fingerprinting to a vendor. You could be added to block-lists of popular malware/privacy/anti-spyware applications, and maybe to a bigger extend you would have hoped for (like blocking your complete domain name).
Of course this should all be put in perspective how precise you want your measurements to be, and some cases are just to small to take in account for your solution. I would recommend also implementing a simple HTTP connection and maybe even as a primary solution (while using the proxy settings defined globally on the client device).

I was testing with a small dns resolver to do something similar: determining (or approximating) the dns servers the client was using by taking interest in the source IP of the dns request. I used PowerDNS with Lua script.

By Tweakers user Gerco, Tuesday 28 October 2014 23:32

Interesting @WoLFjuh. I didn't consider the captive portal use case. That would block it quite effectively if they also redirect TXT record queries. I'll have to do some testing to see what that does.

In the proxy server case, I think it will work just fine. It doesn't matter who resolves the query, the proxy, the client, their ISP's DNS server or anyone else. Only that it gets resolved somehow. Proxy or no proxy, if that unique hostname gets resolved, I can count it as a user.

I'm not expecting to be blocked by any malware scanners, what I'm doing is quite benign in my opinion and is really no different from performing an update check over HTTP. In any case, my user base is so small that I shouldn't be on anyone's radar anyway.

By Tweakers user WoLFjuh, Wednesday 29 October 2014 00:00

Usually, in case of a proxy server, there is only a internal dns server which does not resolve internet zones; only internal zones. The reason for this setup is that a proxy server can fully monitor all internet access on OSI layer 7. If I was asked to create such an setup there would be no outside communications what so ever.

I worked at various hotspot providers and I can tell you not all captive portal setups will redirect -all- dns traffic. Some block all dns traffic until authorized, some will redirect all RR's, but some will only replace A, CNAME or AAAA requests with the dns entry of the captive portal.

It is not uncommon to use dns to tunnel traffic. A proper setup of a network with proxy-only internet access or a captive portal would not allow dns traffic other than internal and/or the bare necessities to fulfill its needs.

Comments are closed