Strong Flow Correlation Attacks on Tor Using Deep Learning

iry · September 21, 2018, 1:51am

Latest shocking research on traffic correlation published on the top security conference CCS: https://arxiv.org/pdf/1808.07285.pdf

We show that with moderate learning, DeepCorr can correlate Tor connections (and therefore break its anonymity) with accuracies significantly higher than existing algorithms, and using substantially shorter lengths of flow observations. For instance, by collecting only about 900 packets of each target Tor flow (roughly 900KB of Tor data), DeepCorr provides a flow correlation accuracy of 96% compared to 4% by the state-of-the-art system of RAPTOR using the same exact setting.

We hope that our work demonstrates the escalating threat of flow correlation attacks on Tor given recent advances in learning algorithms, calling for the timely deployment of effective countermeasures by the Tor community.

HulaHoop · September 21, 2018, 4:22pm

Scary indeed. I’ve ping tor-dev about it to let them know and see what they are working on.

torjunkie · September 22, 2018, 1:32am

The real question is how long it takes an advanced adversary to implement DeepCorr based on this research - since they have multi-billion dollar budgets, I’d say not very long at all.

The countermeasures are interesting, but will not be implemented network-wide anytime soon:

6.1 Obfuscate Traffic Patterns

[…]

We see that meek and obfs4 with IAT=0 provide no protection to DeepCorr; note that a 0.5 TP is comparable to what we get for bare Tor if trained on only 400 flows (see Figure 9), therefore we expect correlation results similar to bare Tor with a larger training set. The results are intuitive: meek merely obfuscates a bridge’s IP and does not deploy traffic obfuscation (except for adding natural network noise). Also obfs4 with IAT=0 solely obfuscates packet contents, but not traffic features. On the other hand, we see that DeepCorr has a significantly lower performance in the presence of obfs4 with IAT=1 (again, DeepCorr’s accuracy will be higher for a real-world adversary who collects more training flows). Our results suggest that (public) Tor relays should deploy a traffic obfuscation mechanism like obfs4 with IAT=1 to resist advanced flow correlation techniques like DeepCorr. However, this is not a trivial solution due to the increased cost, increased overhead (band-width and CPU), and reduced QoS imposed by such obfuscation mechanisms. Even the majority of Obfsproxy Tor bridges run obfs4 without traffic obfuscation (IAT=0). Therefore, designing an obfuscation mechanism tailored to Tor that makes the right balance between performance, cost, and anonymity remains a challenging problem for future work.

6.2 Reduce An Adversary’s Chances of Performing Flow Correlation

[…]

To counter, several proposals suggest new relay selection mechanisms for Tor that reduce the interception chances of malicious ASes. None of such alternatives have been deployed by Tor due to their negative impacts on performance, costs, and privacy. We argue that designing practical AS-aware relay selection mechanisms for Tor is a promising avenue to defend against flow correlation attacks on Tor.

9jnc7 · September 22, 2018, 3:19am

I have doubts about their claims. The Tor Project has a disclaimer somewhere on their site about misleading research by eager computer scientists. I can’t find it, though… does someone remember that statement?

That’s a MASSIVE improvement. DeepCorr research…

Maybe that’s the reason for their amazing success?

Notice anything funny with the following statement?

torjunkie · September 22, 2018, 5:18am

Presumably if the Tor devs or other researchers are keen on testing the validity of these results, they will try to repeat them - in some large-scale chutney network or using public data.

TorChutneyGuide · Wiki · Legacy / Trac · GitLab

One research paper does not = confirmed new attack vector, but it is concerning.