Feel free to add your system specs and benchmark results.
i7-2600 (4 core, 2MB L3 cache per core) 3.4GHz, ddr3 1333
i5-5200U (2 core, 1.5MB L3 cache per core) up to 2.2GHz, ddr3 1600
Thanks, that’s good info.
i5-750 (4 core, 2 MB L3 cache per core ) 2.66 GHz, ddr3 1333
70 to 76 seconds, same as before
Number of cores should not matter as benchmark should be running 1 thread. Benchmark is not a complete test.
Can you swap you RAM to see the effect?
Intel(R) Core™ i7-6700K CPU @ 4.00GHz:
alex@sky:~/mine/zcash ./src/zcash-cli zcbenchmark solveequihash 1
“runningtime” : 35.48080400
That’s impressive. DDR4 2133MHz memory?
what we are looking for is to have a high or low runningtime value ?
Yes, the machine is equipped with DDR4 memory but the 35s were the lower boundary.
Lower number. It’s an approximate rate that 1 core can do 2 solves. Typical I’ve seen here is 50 seconds. So a 4-core CPU can do about 10 solves per minute.
Wow thats fast! How much DDR4 is installed?
The machine has 64 GB but a single core uses just between 100 & 500 MB.
I suspect that the amount of L3 cache is a significant factor in Equihash performance. It’d be interesting to test on a Xeon or other cpu with silly amounts of L3…
He’s got the extreme i7 which has twice as much L3 cache as some Xeon. I’ve got as much or more L3 cache as yours (2 MB/core) but I’m 40% slower. You’ve got more bandwidth on the processor that is less than mine, and more processor than me on the one has my bandwidth. So it seems RAM bandwidth times CPU speed is how it is determined. The sorting routines needed are not able to make much use of L3. The whole memory needs to be randomly accessed throughout. It does not do much good to pull in a bunch of data that you will not need on the next processing step.
My understanding of Equihash is that the total dataset is divided into cells and, while sorting occurs throughout the dataset, some sorting is also done within individual cells. I’m inclined to think that if the total amount of L3 cache is sufficient to contain an entire cell of the dataset then the cell specific sorting will be sped up.
Any opinion I state in regards to the Equihash paper is subject to error. I read a few science papers here and there, and that’s one of the steepist learning curves I’ve seen.
it’ll be interesting to see how quad channel ddr4 effects speeds on a X99 platform.
Unless there’s a way to allocate memory in a particular channel and Zcash takes advantage of it, a more effective way to acquire hashing power is probably to get a bunch of cheap, single channel systems.
–Edit-- Just wanted to add a graph of the available results…
4x E7-8880 v3 @ 2.30GHz (16 cores per package = 64 physical, 128 virtual cores) and 1952 GB of DDR4 RAM (2133 MHz?):
“runningtime” : 61.11619100 on a single core. Ignoring hyperthreading, that’d give about 2.09 solves per second.
i7-5820k and 32 GB of DDR4 RAM (2133 MHz):
“runningtime” : 42.83929600 on a single core. Ignoring hyperthreading, that’d give about 0.28 solves per second.
Left that 64-core machine running for the last ~1.5 hours, got 17 of the last 36 blocks. Only about 48 GB of its RAM is being used. Also I listed each package as 16 cores rather than 18, as I’m in a virtualized environment and only allocated 16 of the 18 cores of each package to the VM this is running in.
Thank you for posting those results.
Are you able to view a per-precess system monitor on that Xeon system? btw How are you determining the solves per second figures?
Are you talking about something like htop? Certainly can:
To get the solves per second, I’m just dividing the solvetime by the number of available cores (yields seconds per equihash solution), then calculating (1/seconds per equihash solution) to get equihash solutions per second, and multiplying by 2 (since I believe each equihash solution is two block candidates):
61.116 / 64 = 0.955 seconds per equihash solution
1 / 0.955 = 1.047 equihash solutions per second
1.047 * 2 = 2.094 solves per second
Since the equihash solver only runs a single core, it’s possible that turbo boost makes the solvetime a bit more optimistic than would be encountered when pinning all of the cores at 100%.
Each core that you hope to use will require a dedicated memory channel / path to memory - and, even with that, each of those cores are still going to take somewhere between 30 and 60 seconds to complete a solution.
btw I suspect that your virtualised instance is being constrained by the host OS. If you’re able to increase the priority of the virtualisation software you’re using and have as few other programs running on the host and virtual OS, I think you’ll get a much better result on that system.
Correct, each solution itself takes about 60 seconds, but if you had 64 cores each finding a solution every 60 seconds, then that’s a bit more than one solution per second on average for the entire system.
You bring up a good point about memory bandwidth saturation–is there any way to perform a multithreaded benchmark? Unfortunately, getmininginfo doesn’t return any semblance of a local hashrate. Probably wouldn’t be hard to throw a quick hack into the code to do some kind of rolling average for the number of solutions in the last x timeframe.
The 61 seconds seems about right to me, since each core is at 2.3 GHz. It’s an EC2 box, so I don’t have direct control over the hypervisor, although I doubt anything else is running on the host OS.