IP address anonymity set
I forgot about that snippet I found. Reminding us here.
(Firstly an earlier 2018 paper I linked to here isn’t clear but they mention connecting to 200 Zcash nodes and seem to assume it was the size in 2018.)
Then the paper in question: https://arxiv.org/pdf/1907.09755.pdf
In 2019:
Makes sense for a four month observation. There was also official Tor .onion support in zcashd at that time, though perhaps now it is similar situation - fair amount of Tor IPs - just not on .onion addresses but direct IPs.
I guess the 5k IPs in my peers.dat
are not necessarily unique nodes, but just all the IPs used by nodes. (Given 90%+ are not Tor IPs, and thus far less likely to cycle per node user, it gives me hope if most peer.dat
IPs are very recent, e.g. last 30 days, but definitely not if it contains IPs from 12 months ago or earlier.)
I learned that nodes are only distinguishable by (WAN) IP:PORT (and clash on the network if they are on both the same - another reason to reintroduce .onion support BTW - someone can run 10 zcashd hidden service node addreses on one static IPv4 VPS, increases deniability).
This is helpful for node blurriness however, and it’s interesting: how can an attacker know at any given time how many nodes there are, since there’s so much IP flooding and shape-shifting, and no easy way to know the ratio of total observed IPs to total separate nodes in any given time range?
Some Tor-connected nodes may turn on for short times (only spewing out 4 Tor IPs within ten minutes), and not long periods (spewing 200 Tor IPs in one session). Perhaps to most threat actors, there is some deniability as to what the ratio is.
This is why we need to really know the pretty accurate estimate of current node numbers.
Attacker would have to be running several spy nodes, maybe at least 10% of the network so they can see all connections at once. Not that hard to do among only 200-300 nodes. They’ll know all the data and not have to guess about all this.
And, if attacker is running reliable nodes, they’ll be evenly spread out across the network and recommended by DNS seeders (or themselves be DNS seeders), and BTW users like me are never connected to other Tor nodes. Inherent design problem? Time to fix? (Well, -onlynet=onion
solved that when it was offered. :/) So I assume a spy node can trivially observe ‘ninja’ Tor nodes starting and ending sessions without plausibility of them being multiple nodes.
The constantly changing IPs in the spy node’s logs won’t help - a sane analyst would determine that it’s probably one single node, and not multiple nodes. Deniability is a weak defence in that case. (Deniability needs a convincing plausible explanation and for the attacker to know about that explanation).
(This is why connecting only to just external peer - -maxconnections=1
- may reduce risk of this entire attack to a large degree.)
So anyway, maybe the 474 Tor IPs I found in peers.dat
translate to much less than 474 nodes. Tor IPs change very often, perhaps 100 of the IPs are attributable to my own damn recent usage. (Laughable anonymity at the network level.)
We need to get to the bottom of this.