Tor Entry Guard Persistence Assumptions

entr0py · June 7, 2016, 5:54am

As Qubes-Whonix users, we have not been “benefitting” from Tor’s Persistent Entry Guard design and will not until bind-directory functionality is implemented upstream. Lately, I’ve been wondering if I want persistence at all.

First, let’s assume that I am a permanently stationary user. So disregard any drawbacks related to Persistence & Location Tracking.

Some arguments in favor of persistence (from some heavyweights):
Improving Tor's anonymity by changing guard parameters | The Tor Project
Why is a longer guard rotation period with fewer guards better than the other way around? - Tor Stack Exchange

I’ve had an uneasiness with the conclusions above; and @mirimir’s comment to Peter Palfrader’s stackexchange answer helped me pin down my thoughts. The comment first:

I’ve never quite understood why it’s better for a few users to be more-likely compromised while most users are less-likely compromised. Is it that, once a given user has been compromised, further compromise doesn’t matter? Do these models assume particular patterns of Tor usage, such as daily versus occasional?

I think my question would be somewhat analagous to that comment:
Would I rather have a 100% probability of having 5% of my traffic observed and correlated? (by using non-persistent entry guards)
Or would I rather have a 5% probability of having 100% of my traffic observed and correlated? (by using persistent entry guards)

Obviously, the proportions used are completely made up but I think my point stands across a wide range of inputs. I think (like @mirimir) there is an implied assumption, in the conclusions that favor persistence, that any observation is fatal. Since I spend most of my time online watching Beyonce videos on YouTube , clearly that assumption does not hold for me. While third-party observation of my Beyonce viewing habits would certainly violate my privacy, I would not consider that as fatal a de-anonymization as having 100% of my traffic observed and correlated.

If all of my traffic is sensitive, and observation of any of it will result in complete de-anonymization, then in my example, both scenarios will equate and there will be a 5% chance of being de-anonymized regardless of the entry-guard persistence model. However, if only a portion, say X, of my traffic is critically sensitive, then in my example, non-persistent entry guards will only be fatal X*5% of the time. So I would be willing to risk more and more of my traffic being observed as the amount of sensitive traffic decreases.

Does this reasoning hold any validity?

Ego · June 7, 2016, 6:15am

Good day,

While I can see where you’re coming from, I can’t quite agree with the result.

After all, of only the entry guards fall victim to the limited pool, that means that your exit node being compromised is still as unlikely as before thus mathematically making it harder to deanonymize you even when one entry node in your limited pool is an observer.

Or, to compare the two:

Random guards:

Both new entry and exit nodes are chosen periodically. That means that after some time and enough random confections you will be observed. It will only happen for that connection but it will. And according to the random nature of this system, it is rather likely.

Persistent guards:

The entry guards aren’t changed randomly, but kept over a certain point in time. The rest of the connection is chosen randomly. What this means is that, even when you have a “listening” entry guard in your list, an adversary still needs to have the luck to get the rest of the randomly made connection right as he needs the exit node as well to eavesdrop. Because the first node isn’t randomly chosen there are two scenarios possible.

First: There is no “evil” node in the list, thus you’re safe for the near future.

Second: There is one in the list. Still though, he can only eavesdrop on a small percentage of your connection as he needs to get the right exit node for correlation as well. That makes this scenario rather safe as well even though an adversary is on our list, as every connection where he doesn’t control the exit as well has no worth for him.

Looking at it from that aspect, I feel like having the exit node random as well would only harm anonymity.

Have a nice day,

Ego

entr0py · June 7, 2016, 7:16am

@Ego (I think) I understand the arguments in favor of Persistency so I don’t disagree with anything you’ve written.

Also, it was probably a bad idea to use numbers in my OP (especially since the numbers are completely unrealistic).

I missed these relevant notes from the comments section of https://blog.torproject.org/blog/improving-tors-anonymity-changing-guard-parameters:

ajohnson:

As you can tell, for the uses of Tor that seem most likely to me, users are better off giving up none or all of the activity to a malicious guard. Users are trying to protect behavior rather than TCP connections. Behavior operates on the order of weeks, months, and years, which - not coincidentally - is the length of time that a given guard is used.

arma:

In general it seems to me that the advertising companies are doing really well at the “see a sample of a user’s behavior and piece together the rest of it” problem. That is, if Google only got to see a random 10% of a user’s traffic, all their algorithms (to track, recognize, and predict people) would probably still work fine.

These comments seem to imply that there is indeed an assumption that even a fraction of a user’s traffic holds predictive power and is likely to be fatal. Then it should follow logically that as a user’s internet usage becomes more diverse and random, and more anonymous (as opposed to pseudonymous) that the relative value of persistent entry guards should decrease. How to quantify that I have no idea… (already failed miserably once).

Ego · June 7, 2016, 1:05pm

Good day,

Yes, no one argues against that. My point was though that regardless of which system (random vs persisten entry guards) is used you will eventually once take a route where your traffic is being observable simply by force of randomness. The thing is that persistent entry guards make this far less likely to happen. It will still happen sometimes, but mathematically speaking, it won’t happen nearly as often as when choosing a random route for every connection.

Have a nice day,

Ego

anon36816226 · June 7, 2016, 4:26pm

I share your uneasiness with the current “solution” (extending the Lifetime of an entry guard or entry guards in general)
Because I see a couple of problems regarding the Research that is/have been done.
Some quotes from : Improving Tor's anonymity by changing guard parameters | The Tor Project

Tariq’s paper considered a quite small adversary: he let all the clients
pick honest guards, and then added one new small guard to the 800 or so
existing guards

…

The second paper to raise the issue is from Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Weinmann in Luxembourg.[…]In this case they run the numbers for a
much larger adversary: if they run 13.8% of the Tor network for eight
months there’s more than a 90% chance of a given hidden service using
their guard sometime during that period.

…

And that brings us to the third paper, by Aaron Johnson et al: Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries (upcoming at CCS 2013)[…]they simulated running a few entry guards that together make up 10% of
the guard capacity in the Tor network, and they showed that (again using
historical Tor network data, but this time from October 2012 to March
2013) the chance that the user has made a circuit using the adversary’s
relays is more than 80% by the six month mark.

I think its quite funny that they don’t (want to) think about a real adversary, like the US-, UK-, German- or the Russian- Evil Alphabet Agencies?
I mean 10-13% is a joke, considering who’s funding this mess and created it.
We know that the Agencies gather everything , we know that the Agencies run Tor entry, middle and exit nodes by the hundreds/thousands (why wouldn’t they ?, snowden even ran some for them (not to support the network, duh)) We know that they are running early, fast and reliable nodes.
What would now be the best thing to do ? Right, stick with one or three for a long time and "screw you"if its a .Gov’ run node your gonna like it and we’ve got some juicy .Gov exit nodes for you too (tasty)
As you now may can tell I’m no Tor fanboy, the exact opposite would fit.
You can hide from some “3rd World” Dictator but you can’t hide from those funding Tor (Don’t bite the hand that feeds you (and created you))
Lets Help Tor against the Great Chinese F-Wall and the bad bad Persians (Iranian) who block the Internet, after that we can bring them some Peace and Democrazy. (lmfao)(sorry im getting offtopic)

Brave new World in 1984

my 2 cents

Ego · June 7, 2016, 4:58pm

Good day,

What do you mean by that? Where do you get that from that the creators of those papers didn’t consider this beeing employed by a government agency?

Why? Using smaller samples and calculating based on those to get results is how almost every scientific study is done. 10% of the ENTIRE Tor-Network is actually very much for such a study and thus leads to a very easily reviewable result. Or to give you a comparison, even though lab mice are smaller than men, they are still very helpful and deliver perfectly valid data for studying drugs.

Yes, but that’s what fixed entry guards are for. To make complete control over the network not just harder, but even if possible, still meaningless, because, opposite to simply choosing every guard per random, the entrie connection is harder to control.

And if you don’t stick to a few selected entry guards over a long time? Well, then it will be much easier and faster to gain access to a connection simply by forcing new connections continuously.

That doesn’t explain though why controlling the connection is still harder than with the alternative (random entry guards), as the studys, as well as the most basic statistical analysis, suggested. And like I’ve said, with such a giant pool of results (10%, do you know how big that test was?! Considering the size of Tor as a whole, even a pool of just 1% would deliver a statistical significance high enough to take the results for granted. There is no way those results can’t be at least a bit trustworthy), this seems to be more than just an assumption based on some small statistics.

Have a nice day,

Ego

entr0py · June 7, 2016, 5:55pm

Agreed. My point (question, really) is that although it happens less, it seems like it would hurt more when it does. So the (rhetorical) question is: Would I prefer to suffer smaller hits more often or risk a major hit with smaller probability?

Thinking some more about it and taking into account my/most users’ usage patterns, I’m starting to agree with ajohnson’s view that very little of what we do online is independent and random. Will try to get a sense of this over the next few months - as in, which of my circuits are non-fatal if observed…

Ego · June 7, 2016, 6:00pm

Good day,

Not sure I understand this. The “hit” has the same size once it happens. After all, even when the entry guard is persitent, the rest is only until the connection is “reloaded”. And once that happens, even the malicious entry guard can’t eavesdrop. So while the leak is just as big, it is less likely. Or am I missunderstanding something?

Have a nice day,

Ego

entr0py · June 7, 2016, 6:15pm

Yes. Poor phrasing on my part.

The research papers assume: 1 hit and your dead. (Hit = bad entry guard + bad exit node). They don’t consider subsequent hits because you’re already dead.

What if 1 hit doesn’t kill you? Then consider 2 scenarios:

A. Random entry guards: probability of being hit = P(bad entry guard) * P(bad exit node)
B. Persistent entry guard: probability of being hit = Zero or P(bad exit node)

What I was trying to say in my previous post was that with Random guards, yes, you will over time be guaranteed to be hit - but they will be spread out and represent some (small) fraction of your total traffic.

With Persistent guards, on average, you will be hit less - that is the point of all the research papers. But if you are unlucky and are stuck with a bad entry guard, you will be hit much more often.

In this situation, I’m much more worried about the worst-case scenario (catastrophic risk) as opposed to average probabilities. It’s one thing to walk away with a loss after making a good probabilistic bet at the track. But it’s of little comfort to say you made a good probabilistic bet if you’re the one that rolled an evil entry guard. When it comes to security, I think you have to begin with the assumption that you will be unlucky.

Also, given this, it’s vital to determine whether 1 hit really kills you or not.

HulaHoop · June 7, 2016, 6:25pm

Random guards increase probability for everyone that they will choose a malicious entry guard and increases discoverability risks for HSs. Paul Syverson mentioned they chose to settle on the “few clients compromised longer” model as a lesser evil. Its not a good idea to against the recommendations of people who wrote the software.

With upcoming protections like padding the risks will be minimal even if your guard is bad.

Patrick · June 7, 2016, 6:29pm

Incorrect. Whonix bind-directories is in place. bind-dirs is the improved, generalized, upstreamed version.

anon36816226 · June 7, 2016, 6:37pm

Hi Ego (i knew you would ask those questions )
( I would like to talk to you some time so we could discuss this exact issue in more detail )

I think you know the amount of Budget all those Agencies have at their disposal ?
They didn’t consider this because they stated ( like you did ) that 10 -13 %
is much, but consider that there are several agencies and many of them share their work, it isn’t.
And think about how many People do you know, who are running Tor relays or exit Nodes ? I know I do , but else none. And I don’t see a huge demand for normal People to host them.
Most Users use the Tor network but don’t run relays or exit nodes.
So I ask you, who are running those thousands fast and reliable Nodes ?
“Freedom” and “Privacy” loving People (who aren’t that much (when it comes to doing something)) who can afford this much bandwidth ?
Come on.

I haven’t said anything against how this study got their result.
I said they calculated with too small numbers.

It depends on how you look at it, as I said above 10% isn’t a challenge for large Adversaries.

Thanks, you made me laugh.
It doesn’t matter how much something is downscaled if the study itself isn’t done reasonable. [quote=“Ego, post:6, topic:2559”]
To make complete control over the network not just harder, but even if possible, still meaningless, because, opposite to simply choosing every guard per random, the entrie connection is harder to control.
[/quote]
If you have more “good honest” people running them yes,
if not your screwed.
Like I said above , I don’t think that there are more People running them than .Gov spooks.(That is in the end what it was designed for .Gov spooks hiding and we the minions make them blend in)

why ? If the network is mostly run by “good” people it shouldn’t matter that much and the client could be made more cautious if connections continuously break to not change connection that frequently and then decide to stick to something like a guard node. I am not totally against that Idea, I just don’t think it can work in the current Tor Network and Concept in general.

Most of these problems exist because in the early days we emphasized
reachability (“make sure Tor works”) over anonymity (“be very sure that
your guard is gone before you try another one”).

Again it wasn’t designed to hide you or me, its for those we want to hide from.

If your guard is controlled you stay for a long time with that controlled guard. Its either way pretty bad and its intended.[quote=“Ego, post:6, topic:2559”]
And like I’ve said, with such a giant pool of results (10%, do you know how big that test was?! Considering the size of Tor as a whole, even a pool of just 1% would deliver a statistical significance high enough to take the results for granted.
[/quote]
Why do you keep bringing that number up ? Its bad, 10 % = 80 % likely owned after 6 months. That isn’t good, and its again too optimistic.

A bit, sure , as a whole well … you can trust anything you like.[quote=“entr0py, post:9, topic:2559”]
With Persistent guards, on average, you will be hit less - that is the point of all the research papers. But if you are unlucky and are stuck with a bad entry guard, you will be hit much more often.
[/quote]

Right, and if you really depend on that anonymity its game over.
Otherwise you could get lucky if the entry or exit isn’t compromised or they dont get enough information.

Why wouldn’t a adversary slowly setup a massive amounts of exit nodes and entry guards ?

FIY: a good read why all mixing anon networks will sooner or later fail :
https://cpunks.org/pipermail/cypherpunks/2016-June/013212.html

You too

entr0py · June 7, 2016, 6:56pm

Had no idea! Assumed because of this that they weren’t implemented yet:

TODO: Does not work yet. Files need to exist first.

/usr/lib/qubes/bind-dirs.sh umount

/usr/lib/qubes/bind-dirs.sh

Is it documented yet? Or which scripts should I look at to learn about them?

Patrick · June 7, 2016, 6:57pm

https://github.com/Whonix/qubes-whonix/blob/master/usr/lib/qubes-whonix/bind-directories

Redirecting to Google Groups

entr0py · June 7, 2016, 7:40pm

I’m not getting a satisfying answer - I know it’s probably because it doesn’t exist.

That’s fine unless I’m one of those clients. I would rather they increase everyone’s risk by a marginal amount to ensure that no one experiences a catastrophic failure. (More Gaussian, less heavy-tailed)

Generally, yes - but that’s what’s great about open-source!
You’re just being polite, you should’ve said:

Its not a good idea to against the recommendations of people who are smarter and more knowledgeable than you.

That I can live with

Ego · June 7, 2016, 8:22pm

Good day,

Sure, maybe we can arrange something.

Oh, I see what you mean. I was presuming you meant small for the context of a study.

Well, first of all, a lot of institutions, like the EFF, as well as many universities own quite massive Tor nodes. A complete control of the stronger nodes by government agencys thus is rather unlickely. Adding to that, the thing is that, even if the bigger part of the network is controlled by agencys, they’d still need a lot of luck. With random nodes less than with persitent ones. Because if your node is persistent AND owned by organisation A, that organisation still needs the luck to get you to use an exit node controlled by them as well. Which while possible, isn’t as lickely to happen as it is when both entry and exit are random.

That isn’t necessarily the case. It is possible to scale these numbers and adapt them to almost any scenario, including a completley government controlled Tor network in which, like mentioned before, the persistent guards would still win, because if you have no bad guard in your list, you’re fine and if you have, your observers still need the luck to find you using one of their exits at the right time, making it harder under any scenario.

That is the next question. How big is the Tor network? And how big are 10% of it? And would it be possible to controll a significant amount realistically?

According to this, the entire Tor network has an advertised/potential bandwith of about 130 Gbit/sec for all guard nodes and of about 40 Gbit/sec for all exit nodes: Traffic – Tor Metrics

Now, since such an operation needs to be streched out over the entire planet and since it can hardly be done using big “government owned” server farms (as those would be easily spotted and filtered out) it is most likely that an agency would use private servers for this which are already famous for beeing used to host Tor nodes. According to this, the fastest server provider is AWS, whose servers apperantly offer up to 10 Gbits/sec backbone, though only reach 7 Gbits/sec: https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer/

According to Amazon, AWS offers 38 serverfarms which offer this monumental speed. Thus, if a government agency would want to, they could (provided the necessary finances of course) simply secretly buy a bit more than half of the complete bandwith and computational power AWS offers and “take” over the entire Tor network. Admittedly, that wasn’t the conclusion I wanted to find, though here we go…

That however doesn’t change the fact that making the first guard random would decrease the security even further.

Let’s imagine a network with one bad guy. Let’s name him Bob. Everone else is good, meaning he/she doesn’t try to eavesdrop on connections.

So Bob wants to listen in on what the rest of the network are doing and thus sets up an entry and exit node. Now, when using random entry nodes, if he waits long enough, he will be lucky and find some poor person using his entry and exit node simultaniously. He has won. If however, the entry guards are persistent, everyone chooses three entry guards before. If his isn’t in there, he has lost for the next few months. If it is (which already requires luck), he still needs enough luck to find someone now choosing his exit node as well. That means that here he needs to have the luck of the random example, plus some “extra luck”.

If Bob would control the a bigger part of entire network, than the persistent method would still be more secure because now it is far more likely for you to “fall in his trap” when choosing random entry guards.

Adding to that, prooving something like Tor against people trying to “force” a route upon you will be hard. Because of its public nature, every server used is fundamentally different and has its quirks. These often can’t be predicted or filtered out. It is hard to judge whether something is malicious or not in this context. Such a “filter concept” would potentially scare the few still keeping the network alive away. That quote you posted sadly still applies to a certain extent.

Yes, but that controlled guard is only a problem if an adversary also controls the exit. Which like mentioned before still needs the luck normally soely required to “attack” a completley random connection.

Like I’ve said, I misinterpreted you stating the small size as small for a science paper, not small in regards to a agency.

Have a nice day,

Ego

anon36816226 · June 8, 2016, 3:26pm

great , looking forward to it. For me at the Weekend is best.

Well, i would love to explain why the EFF is not the White Knight in the shiny armor who’s going to save us, but that would get me too much offtopic.
As for Universities, please think about that again, who funds them ? Right…[quote=“Ego, post:16, topic:2559”]
including a completley government controlled Tor network in which, like mentioned before, the persistent guards would still win, because if you have no bad guard in your list, you’re fine and if you have, your observers still need the luck to find you using one of their exits at the right time, making it harder under any scenario.
[/quote]

Sorry but this doesn’t make any sense, if its 100% controlled why should there be a good entry guard and exit node ? Magic ?[quote=“Ego, post:16, topic:2559”]
Now, since such an operation needs to be streched out over the entire planet and since it can hardly be done using big “government owned” server farms (as those would be easily spotted and filtered out) it is most likely that an agency would use private servers for this which are already famous for beeing used to host Tor nodes
[/quote]
https://cpunks.org/pipermail/cypherpunks/2016-June/013212.html
Right and that is what they are doing, again tell me why wouldn’t they do this ?

Its not a question of random or persistent, its a question about do you trust this node and do you trust it to see 6+ Month of your traffic .[quote=“Ego, post:16, topic:2559”]
Yes, but that controlled guard is only a problem if an adversary also controls the exit. Which like mentioned before still needs the luck normally soely required to “attack” a completley random connection.
[/quote]
He does control the exit nodes, thats what the whole copyright,CP ,Drugs etc abuse reporting is/was about , to scare Tor exit node admins to shut down their service.

But lets not get this conversation more offtopic (than I already did (sorry @entr0py and the rest) and discuss the Rest in a mumble chat or whatever you like.

I think we can agree this topic is highly dependent on your Adversary and we agree to disagree on certain points.

You too

Edit:
You may want to read what what the Tor Developer say to a realistic scenario :
http://www.nrl.navy.mil/itd/chacs/biblio/users-get-routed-traffic-correlation-tor-realistic-adversaries

entr0py · June 8, 2016, 7:22pm

Hah, no worries! I haven’t had time to read through your dialogues yet, but I would love an Executive Summary once you guys reach a consensus.