Abstract (my emphasis throughout)
Previous attacks that link the sender and receiver of traffic in the Tor network (“correlation attacks”) have generally relied on analyzing traffic from TCP connections. The TCP connections of a typical client application, however, are often accompanied by DNS requests and responses. This additional traffic presents more opportunities for correlation attacks.
This paper quantifies how DNS traffic can make Tor users more vulnerable to correlation attacks. We investigate how incorporating DNS traffic can make existing correlation attacks more powerful and how DNS lookups can leak information to third parties about anonymous communication. We (i) develop a method to identify the DNS resolvers of Tor exit relays; (ii) develop a new set of correlation attacks (DefecTor attacks) that incorporate DNS traffic to improve precision; (iii) analyze the Internet-scale effects of these new attacks on Tor users; and (iv) develop improved methods to evaluate correlation attacks. First, we find that there exist adversaries who can mount DefecTor attacks: for example, Google’s DNS resolver observes almost 40% of all DNS requests exiting the Tor network.
We also find that DNS requests often traverse ASes that the corresponding TCP connections do not transit, enabling additional ASes to gain information about Tor users’ traffic. We then show that an adversary who can mount a DefecTor attack can often determine the website that a Tor user is visiting with perfect precision, particularly for less popular websites where the set of DNS names associated with that website may be unique to the site. We also use the Tor Path Simulator (TorPS) in combination with traceroute data from vantage points co-located with Tor exit relays to estimate the power of AS-level adversaries who might mount DefecTor attacks in practice.
Key Take Home Messages
I’ll save you reading the academic paper if you are time-poor:
1. Correlation Attacks using TCP flows are greatly enhanced by also doing DNS request traffic correlation.
2. Attackers observing occasional DNS requests may still be able to link both ends of the communication, even if they can’t observe TCP traffic between the Tor exit and the server.
3. Loading a single webpage can generate hundreds of DNS requests to many different domains.
4. The way Tor exit relays are currently configured means that Google (yes, your good friend Google) are in a central position to deanonymize Tor users if they so wished, as 40% of the Tor bandwidth uses Google’s public DNS servers to resolve DNS queries.
5. Note: simply increasing the scale of the Tor network itself is not much of a defense against this fingerprinting technique.
6. Less frequently visited websites - the kind that privacy-conscious people like to visit e.g. Wikileaks, Whonix etc. - are more prone to deanonymization.
7. Not all Tor client (browser) locations are equal. The data suggests you will be de-anonymized in a shorter average period due to network effects in France, Germany and Russia than in (ironically) the US or UK.
8. Website Tor fingerprinting defenses are needed now more than ever, due to the increased precision possible with this attack. That is, Tor developers need to ASAP implement padding as per Torspec #254. See:
*This comes with significant network overhead and requires specific fine-tuning, which is why it hasn’t yet been implemented.
Short-term solutions for Tor developers and Tor exit relay operators:
Exit relay operators should avoid public resolvers such as Google and OpenDNS. Instead, they should either use the resolvers provided by their ISP, or run their own, particularly if the operator’s ISP already hosts many other exit relays. Local resolvers can further be optimized to minimize information leakage, by (for example) enabling QNAME minimization.
In addition to making recommendations to exit relay operators, we can remotely influence the cache of each exit relay’s resolver. For example, using exitmap, we can continuously resolve potentially sensitive domains over each exit relay, right before its TTL is about to expire. In such a setup, an attacker gains no advantage from observing DNS traffic from the exit relays because the domain is always in every exit relay’s resolver cache. This approach scales poorly, considering the potentially large number of domain names that would need to be cached (recall that the long tail of unpopular sites are most vulnerable to DefecTor attacks), but it allows us to eliminate DNS-based correlation attacks for a select number of sites.
Finally, Tor can fix the Tor clipping bug we discovered and consider significantly increasing the minimum TTL for the DNS cache at exit relays to make DefecTor attacks less precise. This adjustment requires finding the longest acceptable TTL that does not have a notable negative detriment to user experience. Further, as soon as the clipping bug is fixed,website operators of sensitive websites can opt to increase the TTL of their DNS records.
Long-term solutions for Tor developers:
Zhu et al. proposed T-DNS, which employs several TCP optimizations to transport the DNS protocol over TLS and TCP. The TLS layer provides confidentiality between exit relays and their resolvers. Finally, site operators whose users are particularly concerned about safety should
offer an onion service as an alternative. Facebook, for example, set up facebookcorewwwi.onion. When connecting to the onion service, Tor users never leave the Tor network, and hence do not need DNS (as long as the onion service does not embed non-onion service content).
Deploying defenses against website fingerprinting attacks in Tor [noted in point 8 above] should be an important long-term goal, as well. Although growing the Tor network will help defend against DefecTor attacks to some degree, the most important change is to deploy defenses against these attacks in the first place. Since DefecTor attacks significantly increase precision of website fingerprinting attacks, defenses should be designed to significantly reduce the recall of website fingerprinting attacks, even when the website fingerprinting attack is configured to sacrifice precision for recall.