IP address fingerprinting and resultant Tx traceability on the Zcash network

This is a continuation of the IP address weakness discussion in this thread.

I am trying to figure out, currently:

  1. How many Zcash peer nodes there really are
  2. How many of them are on Tor exit node IPs

So that we can know how big the anonymity set is for people using Zcash in the most private way currently - running your own full node on Tor IPs.

This is an underestimated linkability vector for high security users. Even if ZEC sender address / recipient address / Tx amount is hidden, the IP address of the node used to send your transactions is a major fingerprint to probabalistically link them together. It does not matter how many wallets you use, or ‘churns’ you do, to thwart tracing or data matching (e.g. to unlink an externally exposed zaddr on your way in from a known Tx amount on your way out).

Zcash is a famous cryptocurrency in the top 50, repeatedly mentioned by Snowden. We should assume that certain big data companies like Chainalysis, hostile governments, or other malign actors are running at least some surveillance nodes/DNS seeders 24/7. Why wouldn’t they? There’s money in it - there’s valuable data to sell to law enforcement, dictators, or insurance companies who want to invade your privacy, just waiting to be easily harvested.

To such threat actors (not even nation states), the type of IP address is a heuristics fingerprint attached to every shielded ZEC transaction being watched by other peers. People could collect this real-time metadata today, then use it/sell it to other parties in the future and correlate it with other sophisticated analysis of the blockchain or outright quantum decryption of shielded Tx values and recipient address one day.

Some hope

Initially I was despairing. Blockchair’s list of nodes lists only ~200 nodes on the current block height and none of them are Tor IPs. The 8 various peers I’ve been connected to (to checkL zcash-cli getpeerinfo | grep "addr\":") have never been Tor IPs and always all been listed in Blockchair’s list.

(If Tor IPs are deprioritised as peers to pass on to other peers, probably not an issue if all Tor ninja users get connected to high rated non-Tor peers in the same way - same network map heuristics there, nothing compromised for the sake of reliability.)

But flicker of hope turned on: if Blockchair is not listing my Tor IP node - and others in this forum currently say they run zcashd on Tor - then clearly Blockchair’s data is not complete or accurate.

My research

  • To check my own node’s IP, I temporarily add -listen -discover to my zcashd command. Then, debug.log shows my node IP in advertizing address line. I do tail -f 'debug.log' | grep advertizing to easily collect the cycling Tor IPs which changes quite quickly.

  • Then I go to https://metrics.torproject.org/exonerator.html to confirm that my node IPs are Tor exit nodes - also https://check.torproject.org/torbulkexitlist is another way. (Note: Tor’s officially published list appears incomplete, e.g. no IPv6 exit addresses, of which there are many on Tor, such as my zcashd node sometimes. https://www.dan.me.uk/torlist could be complete (since there’s 8k lines there and Tor metrics show ~7k total nodes), but it includes non-exit IPs so is only a possible Tor match.)

  • My continued hope: zcashd log seems to show actual IPs of other nodes I don’t necessarily connect to. Is this right? Attempted connections or returned peer IPs from seeds? I have done tail -f debug.log | grep "SOCKS5 connecting" a few times and it’s aways returned IPs different from my advertizing address, and which are not on Blockchair, AND, some of which are Tor exit IPs! I suddenly found these 12 yesterday (among the larger number of collected total IPs), and they were NOT identical to my node’s advertizing address:

(Expand)
109.70.100.19
5.2.78.69
213.164.204.146
185.220.101.2
185.112.144.68
185.129.61.5
193.110.95.34
185.220.101.139
109.70.100.84
109.70.100.80
185.247.226.96

(BTW, my one-liner to locally cross reference against known Tor IP lists saved to local text files: first clean up zcash-harvested IPs to one IP per line - IPv6 must be expanded to full form - and this returns what node IPs are a Tor IP match: awk 'FNR==NR{a[$1];next}($1 in a){print}' manually-gathered-zcash-peer-list.txt tor-exit-list.txt)

Maybe there’s much more than 200 regular nodes? 300, 500? 1000? Hours later another check returned no Tor IPs which was strange. So I’m not 100% concluded on exactly what data I’m gathering.

More hope:

I realised peers.dat might contain node peers I’m looking for. In one such file I had lying around, I extracted four other nodes with Tor IPs. Positive!

184.105.220.24
185.220.101.41
185.220.101.40
45.56.70.111

How I extracted those IPs (it seemed severely truncated extraction though): this tool, make a copy of peers.dat to another location, ./bitcoin-data-tool.py --datadir=~/another/location --peers

Both those four Tor IP zcash nodes, and many more non-Tor IPs extracted, were not contained on the Blockchair list. Blockchair makes Zcash look underused! Not good.

Major more hope:

I found a Go script bitpeers (blog instructions) to extract fuller/full data from peers.dat. Install latest Go tarball, working install command: go install github.com/RaghavSood/bitpeers/cmd/bitpeers@latest and then I called the Go binary directly: ~/go/bin/bitpeers --format text --filepath ~/another/location/peers.dat - the massive list of IPs look like individual node peers to me (telling datapoints like Attempts: - and Source: must be the seeder IP it’s gotten from.)

I gathered 5527 peers. (Crazy. Is this like Bitcoin falsely being reported as having 10k nodes in the media, and even on stats sites, when actually it’s 100k? Wonder what Monero is if properly measured?)

From there, let’s turn that into a clean list of IPs - also just ones using port 8233, i.e the vast majority. (Ones using a different port aren’t helpful nodes for anonymity set purposes - if you normally use the default port, those nodes can’t possibly plausibly be you. If not already, custom ports should be discouraged by Zcash project in docs):

~/go/bin/bitpeers --addressonly --format text --filepath ~/another/location/peers.dat | grep ":8233" | sed 's/:8233//g' > z-peers.txt

(IPv6 addresses are annoying to deal with, could filter them out but just left them there.)

From there (using awk command), 424 of 5067 peers on port 8233 are Tor exit nodes - 8.4% of a decently large number (compared to ~200). Not as dire as I feared. Maybe slightly more too, due to IPv6 Tor nodes.

Making sense of peers.dat:

How many IPs are current actual peers used recently - and how recently? I assume they’re not long-dead peers, but I could be wrong. Perhaps an extreme range of IPs (prioritised at several levels) is delivered to nodes including super low priority ones stored since the date of earliest observation which could be years ago, for resiliency.

Better scraping needed:

I was going to run https://github.com/ZcashFoundation/dnsseeder to try to scrape hundreds of peer IPs to map it out more properly but seems you need a domain name / free dynamic DNS / stable IP VPS. I don’t have time to securely set that up.

Perhaps a Zcash god like @str4d is already in a position) to easily give it a spin and share the full data collected via DNS seeder (and knows how to interpret the huge list of peers), and provide a list of matched Tor IPs, and a percentage calculation of how many that is among the full current peer set.

Zcash foundation should also present monthly stats of things like this - so users know how safe they are.

Please note Zcash network-level anonymity has regressed due to support for v3 .onion node not being implemented (even though Bitcoin has had it). Wonderful projects like this given Zcash grants fell by the wayside later on, no longer being supported by zcashd software as one possible reason.

I also hope a researcher reads this and proposes a proper study to attack the Zcash shielded pool in real-time to see how fingerprintable the ~200 Tx’s occurring each day are, solely based on IP address pattern analysis, if gathering data from N nodes. (No need to even analyse timing heuristics.) I’m sure Zcash would give them a grant to fund it at a decent scale (i.e. scale of N nodes injected into the network by researcher).

10 Likes

This is a bit over my head, but just would like to say that I welcome and appreciate this discourse.

2 Likes

IP address anonymity set

I forgot about that snippet I found. Reminding us here.

(Firstly an earlier 2018 paper I linked to here isn’t clear but they mention connecting to 200 Zcash nodes and seem to assume it was the size in 2018.)

Then the paper in question: https://arxiv.org/pdf/1907.09755.pdf

In 2019:

Makes sense for a four month observation. There was also official Tor .onion support in zcashd at that time, though perhaps now it is similar situation - fair amount of Tor IPs - just not on .onion addresses but direct IPs.

I guess the 5k IPs in my peers.dat are not necessarily unique nodes, but just all the IPs used by nodes. (Given 90%+ are not Tor IPs, and thus far less likely to cycle per node user, it gives me hope if most peer.dat IPs are very recent, e.g. last 30 days, but definitely not if it contains IPs from 12 months ago or earlier.)

I learned that nodes are only distinguishable by (WAN) IP:PORT (and clash on the network if they are on both the same - another reason to reintroduce .onion support BTW - someone can run 10 zcashd hidden service node addreses on one static IPv4 VPS, increases deniability).

This is helpful for node blurriness however, and it’s interesting: how can an attacker know at any given time how many nodes there are, since there’s so much IP flooding and shape-shifting, and no easy way to know the ratio of total observed IPs to total separate nodes in any given time range?

Some Tor-connected nodes may turn on for short times (only spewing out 4 Tor IPs within ten minutes), and not long periods (spewing 200 Tor IPs in one session). Perhaps to most threat actors, there is some deniability as to what the ratio is.

This is why we need to really know the pretty accurate estimate of current node numbers.

Attacker would have to be running several spy nodes, maybe at least 10% of the network so they can see all connections at once. Not that hard to do among only 200-300 nodes. They’ll know all the data and not have to guess about all this.

And, if attacker is running reliable nodes, they’ll be evenly spread out across the network and recommended by DNS seeders (or themselves be DNS seeders), and BTW users like me are never connected to other Tor nodes. Inherent design problem? Time to fix? (Well, -onlynet=onion solved that when it was offered. :/) So I assume a spy node can trivially observe ‘ninja’ Tor nodes starting and ending sessions without plausibility of them being multiple nodes.

The constantly changing IPs in the spy node’s logs won’t help - a sane analyst would determine that it’s probably one single node, and not multiple nodes. Deniability is a weak defence in that case. (Deniability needs a convincing plausible explanation and for the attacker to know about that explanation).

(This is why connecting only to just external peer - -maxconnections=1 - may reduce risk of this entire attack to a large degree.)

So anyway, maybe the 474 Tor IPs I found in peers.dat translate to much less than 474 nodes. Tor IPs change very often, perhaps 100 of the IPs are attributable to my own damn recent usage. (Laughable anonymity at the network level.)

We need to get to the bottom of this.

Timing analysis anonymity set

More on the other closely related problem worth bringing in - the ‘anonymity set’ of how many Tx’s are happening every day, which affects timing analysis (Attacker here is anyone looking at the blockchain, timestamps only):

  • Timezone fingerprint - when you don’t tend to transact reveals your likely timezone
  • Time of day fingerprint - habit of paying for things same time in the day, even if it’s 3am in your own timezone? Heuristic.
  • Closeness of Tx’s to each other - the more close to each other, the dramatically more likely (due to human behaviour) they’re related.
  • Pattern of spacing between your Tx’s - you’re paying for something every 3 days, for whatever reason? Link.

All of the above issues in Zcash would be drastically reduced if there were just more shielded Tx’s happening every day.

There’s only 200-something right now.

200 per day means on average one every 7.2 minutes. Not bad if you think of it that way, since everything else about Zcash is extraordinarily private and anonymous (or can be).

But still very small. Chainalysis etc. could bring in other data points and eliminate some transactions to make what remains sparser. So the more the better. We really need more.

Once numbers are high enough, even just 1000 per day perhaps (one Tx every 1.4 mins), none of the above will matter - past some threshold, nothing can be told apart! There’s too many dark holes with no links to elsewhere for them to pull apart the data points.

(Monero has tens of thousands Tx day. God knows it needs it. Hilarious how IMO Monero still isn’t as safe as Zcash.)

But crazily, it would take just one person setting up a zcash-cli script on a VPS to start sending 1 zatoshi shielded Tx’s between itself at random intervals all day long, to have an impact immediately.

Start by sending a small amount, slowly increase the amount to make it look like normal growth, and no one would even know that it’s just one person doing this service for everybody.

Could also be on Tor to help cover for other Zcash Tor users. I’d bet people in Monero community have run scripts like this, just to help XMR in the same way.

I keep thinking of good ideas and not doing them, I know that. :man_facepalming: But this may inspire someone.

3 Likes

If you have concrete plan to do R&D them, please apply for Zcash community grant.

4 Likes

Thx, Zchurn. This is great to stir the pot. Your notes here make it easier for others to get started.

I fully and freely consent for anyone to use my ideas and apply for their own grant. :slight_smile:

@zooko et al. is there any transcript or recording of this discussion on the same topic?

Also I hope someone can help solve the OP of this thread eventually.

1 Like

Zcash has solved the private money sending feature using cryptocurrency, and Zcash users have brought up(over the years) the missing piece of privacy at the network layer. There have been discussions around leveraging nym and Zcash Community Grants has funded arti and OMR to anonymize & protect interactions with the Zcash blockchain.

Ideally, the growth in shielded transactions growth should occur organically. If the VPS is sending the transactions to itself all day, wouldn’t it be easy for a traffic analyzer to just isolate the transactions broadcasted from the single VPS and continue surveilling the network?

Do you have recommendations to help decentralize Zcash nodes? Any other ideas to improve privacy at the network layer?

1 Like

Quite possibly, but the benefit of this is to thwart blockchain analysis, which is just one of the problems discussed here. There are many threat models for users.

A good attacker could probably isolate one such VPS, but possibly a good node operator doing this could partially thwart them using all those ‘ninja node’ ideas I’ve thought of. (be hard to find among the nodes in general.)There’s also ways to send raw transactions using other people’s nodes, I see - even in a web browser.

A diversity of could take place, even automatedly. But should be scaled up with new nodes added if need be, like a torservers.net-type project, to add more capacity to handle the higher Tx volume. (Zcash is so nice and quick right now.)

I don’t know the dynamics of spam transactions clogging up node networks and how they’re dealt with if no one is offering enough nodes to deal with the capacity. Perhaps miners are incentivised to add more nodes anyway?

Anyway, if an attacker - doing the hard work to determine which Txs in shielded pool are ‘normal’ ones vs. the VPS’ ones - is not sharing their findings with the world, then the VPS project would still help protect against every other attacker who’s not done 24/7/365 network sniffing of Zcash historically nor has access to such powerful data.

Yes.

Much has been suggested in recent threads, and this thread is as much about learning how bad the weakness current is.

But quick list off the top of my head:

  • Dandelion++

  • As you say, one of the modern, low-latency-enough mix network technologies (like Nym)

  • A full analysis of malicious peer sniffing issues by devs and allocating funding to see what major design overhauls can be designed to leapfrog it on the defence side, while retaining low latency at scale.

  • Fixing .onion node support once again, making that default like Bisq is so that everyone is running ).

  • Another idea I’ve not mentioned yet to take the above further: Like the Bisq app, make every node (in Bisq’s case, it’s every user opening their GUI or CLI app) run as a Tor .onion hidden service. If there’s redundancy of multiple peer connections by default and a robust design to make sure nodes have enough connections ready to relay transactions at any given time (I notice monerod can struggle with this), it might be fast and reliable enough. If need be, keep seeders on faster clearnet (and nodes connect to them via DNS or raw IP), but make ALL regular nodes run as tor .onion services and code out clearnet/direct IPv4/direct IPv6 node peering completely. Seeders only deliver onion addresses to nodes. Then EVERY node has the SAME IP address signature. One major problem solved.

I don’t know if there is a recording of that. Perhaps the Zcash Foundation does.

Here’s the recording…

3 Likes