This is a continuation of the IP address weakness discussion in this thread.
I am trying to figure out, currently:
- How many Zcash peer nodes there really are
- How many of them are on Tor exit node IPs
So that we can know how big the anonymity set is for people using Zcash in the most private way currently - running your own full node on Tor IPs.
This is an underestimated linkability vector for high security users. Even if ZEC sender address / recipient address / Tx amount is hidden, the IP address of the node used to send your transactions is a major fingerprint to probabalistically link them together. It does not matter how many wallets you use, or ‘churns’ you do, to thwart tracing or data matching (e.g. to unlink an externally exposed zaddr on your way in from a known Tx amount on your way out).
Zcash is a famous cryptocurrency in the top 50, repeatedly mentioned by Snowden. We should assume that certain big data companies like Chainalysis, hostile governments, or other malign actors are running at least some surveillance nodes/DNS seeders 24/7. Why wouldn’t they? There’s money in it - there’s valuable data to sell to law enforcement, dictators, or insurance companies who want to invade your privacy, just waiting to be easily harvested.
To such threat actors (not even nation states), the type of IP address is a heuristics fingerprint attached to every shielded ZEC transaction being watched by other peers. People could collect this real-time metadata today, then use it/sell it to other parties in the future and correlate it with other sophisticated analysis of the blockchain or outright quantum decryption of shielded Tx values and recipient address one day.
Some hope
Initially I was despairing. Blockchair’s list of nodes lists only ~200 nodes on the current block height and none of them are Tor IPs. The 8 various peers I’ve been connected to (to checkL zcash-cli getpeerinfo | grep "addr\":"
) have never been Tor IPs and always all been listed in Blockchair’s list.
(If Tor IPs are deprioritised as peers to pass on to other peers, probably not an issue if all Tor ninja users get connected to high rated non-Tor peers in the same way - same network map heuristics there, nothing compromised for the sake of reliability.)
But flicker of hope turned on: if Blockchair is not listing my Tor IP node - and others in this forum currently say they run zcashd on Tor - then clearly Blockchair’s data is not complete or accurate.
My research
-
To check my own node’s IP, I temporarily add
-listen -discover
to myzcashd
command. Then,debug.log
shows my node IP inadvertizing address
line. I dotail -f 'debug.log' | grep advertizing
to easily collect the cycling Tor IPs which changes quite quickly. -
Then I go to https://metrics.torproject.org/exonerator.html to confirm that my node IPs are Tor exit nodes - also https://check.torproject.org/torbulkexitlist is another way. (Note: Tor’s officially published list appears incomplete, e.g. no IPv6 exit addresses, of which there are many on Tor, such as my zcashd node sometimes. https://www.dan.me.uk/torlist could be complete (since there’s 8k lines there and Tor metrics show ~7k total nodes), but it includes non-exit IPs so is only a possible Tor match.)
-
My continued hope: zcashd log seems to show actual IPs of other nodes I don’t necessarily connect to. Is this right? Attempted connections or returned peer IPs from seeds? I have done
tail -f debug.log | grep "SOCKS5 connecting"
a few times and it’s aways returned IPs different from myadvertizing address
, and which are not on Blockchair, AND, some of which are Tor exit IPs! I suddenly found these 12 yesterday (among the larger number of collected total IPs), and they were NOT identical to my node’sadvertizing address
:
(Expand)
109.70.100.19
5.2.78.69
213.164.204.146
185.220.101.2
185.112.144.68
185.129.61.5
193.110.95.34
185.220.101.139
109.70.100.84
109.70.100.80
185.247.226.96
(BTW, my one-liner to locally cross reference against known Tor IP lists saved to local text files: first clean up zcash-harvested IPs to one IP per line - IPv6 must be expanded to full form - and this returns what node IPs are a Tor IP match: awk 'FNR==NR{a[$1];next}($1 in a){print}' manually-gathered-zcash-peer-list.txt tor-exit-list.txt
)
Maybe there’s much more than 200 regular nodes? 300, 500? 1000? Hours later another check returned no Tor IPs which was strange. So I’m not 100% concluded on exactly what data I’m gathering.
More hope:
I realised peers.dat
might contain node peers I’m looking for. In one such file I had lying around, I extracted four other nodes with Tor IPs. Positive!
184.105.220.24
185.220.101.41
185.220.101.40
45.56.70.111
How I extracted those IPs (it seemed severely truncated extraction though): this tool, make a copy of peers.dat
to another location, ./bitcoin-data-tool.py --datadir=~/another/location --peers
Both those four Tor IP zcash nodes, and many more non-Tor IPs extracted, were not contained on the Blockchair list. Blockchair makes Zcash look underused! Not good.
Major more hope:
I found a Go script bitpeers (blog instructions) to extract fuller/full data from peers.dat
. Install latest Go tarball, working install command: go install github.com/RaghavSood/bitpeers/cmd/bitpeers@latest
and then I called the Go binary directly: ~/go/bin/bitpeers --format text --filepath ~/another/location/peers.dat
- the massive list of IPs look like individual node peers to me (telling datapoints like Attempts:
- and Source:
must be the seeder IP it’s gotten from.)
I gathered 5527 peers. (Crazy. Is this like Bitcoin falsely being reported as having 10k nodes in the media, and even on stats sites, when actually it’s 100k? Wonder what Monero is if properly measured?)
From there, let’s turn that into a clean list of IPs - also just ones using port 8233
, i.e the vast majority. (Ones using a different port aren’t helpful nodes for anonymity set purposes - if you normally use the default port, those nodes can’t possibly plausibly be you. If not already, custom ports should be discouraged by Zcash project in docs):
~/go/bin/bitpeers --addressonly --format text --filepath ~/another/location/peers.dat | grep ":8233" | sed 's/:8233//g' > z-peers.txt
(IPv6 addresses are annoying to deal with, could filter them out but just left them there.)
From there (using awk
command), 424 of 5067 peers on port 8233 are Tor exit nodes - 8.4% of a decently large number (compared to ~200). Not as dire as I feared. Maybe slightly more too, due to IPv6 Tor nodes.
Making sense of peers.dat
:
How many IPs are current actual peers used recently - and how recently? I assume they’re not long-dead peers, but I could be wrong. Perhaps an extreme range of IPs (prioritised at several levels) is delivered to nodes including super low priority ones stored since the date of earliest observation which could be years ago, for resiliency.
Better scraping needed:
I was going to run https://github.com/ZcashFoundation/dnsseeder to try to scrape hundreds of peer IPs to map it out more properly but seems you need a domain name / free dynamic DNS / stable IP VPS. I don’t have time to securely set that up.
Perhaps a Zcash god like @str4d is already in a position) to easily give it a spin and share the full data collected via DNS seeder (and knows how to interpret the huge list of peers), and provide a list of matched Tor IPs, and a percentage calculation of how many that is among the full current peer set.
Zcash foundation should also present monthly stats of things like this - so users know how safe they are.
Please note Zcash network-level anonymity has regressed due to support for v3 .onion node not being implemented (even though Bitcoin has had it). Wonderful projects like this given Zcash grants fell by the wayside later on, no longer being supported by zcashd software as one possible reason.
I also hope a researcher reads this and proposes a proper study to attack the Zcash shielded pool in real-time to see how fingerprintable the ~200 Tx’s occurring each day are, solely based on IP address pattern analysis, if gathering data from N nodes. (No need to even analyse timing heuristics.) I’m sure Zcash would give them a grant to fund it at a decent scale (i.e. scale of N nodes injected into the network by researcher).