Toomim Bros GPU mining software and cloud mining

I thinks r9 nano’s will do much better on this algo with its 64 compute units.

76.5 ms now. That should be 24.6 Sol/s.

@miner we do have at least one R9 295 in our arsenal, but I don’t think it would be worth our time to test it. It would likely give similar results to the other AMD cards from the same generation. Thanks for the offer, though. We also will not use any computers that we do not physically control during testing. I prefer not to tempt people with the chance of stealing our code.

@austin-williams I made a goof in my math: 1% of 20 Sol/s is 0.2 Sol/s, and 0.1% would be 0.02 Sol/s. Off by 5x.

I lived in India for 6 months. They don’t have enough electricity to go around. It’s not terribly expensive (about $0.07/kWh), but rolling blackouts are a daily occurrence. If the government is subsidizing a smartphone program that ends up being used for burning scarce electricity, I doubt the program would last very long.

Also, if a $300 GPU makes about $1 per day on 24 Sol/s, a midrange smartphone would make around $0.01 or less per day. You also would need to subtract the data costs for stratum from that. Stratum uses about 3 kB/minute. Data costs around Rs0.004/kB for 3G, or about $0.001 per minute. Unless you also have free wifi (which is uncommon in rural India), this means you would be making a loss of $1.439/day even when ignoring electricity costs.

3 Likes

71 ms, which should be about 26.4 Sol/s.

Great progress, Jonathan!

Don’t forget to update your top post with the CPU & GPU summary…

1 Like

We are changing our standard benchmark methodology slightly. Instead of doing the average of 10 runs, we’re going to start doing the average of 50 runs. For some reason, this speeds things up by a few percent. The new benchmark should be more precise/reliable as well as being a better indicator of actual mining performance.

These two numbers were using the same code except for the benchmark method.

Old benchmark: 68 ms
New benchmark: 66 ms (28.5 Sol/s)

1 Like

Snarks wouldn’t help in this case because they are used to prove logical statements, not IRL performance of an algo on particular hardware.

1 Like

64.5 ms (~29.1 Sol/s). Our code now runs faster than 1/1,000,000th of the speed for Ethereum mining.

How are you generating the unsorted data that gets fed into your program? Does it vary from one run to another?

We are running blake2b with an empty header string (todo/fixme) plus a nonce, then the counter. The nonce varies for each of the 50 runs in our benchmark, but it’s the same set of nonces each time the benchmark is run. I think the nonces are just the values 0 through 49 in either text or uint form. The blake2b execution is included in the benchmark times.

@zooko @daira might it be a good idea consider raising the K parameter in the event these kinds of CPU/GPU performance claims pan out?

After all, the K parameter was reduced (solely?) to reduce solving times to be fast enough to make p2pool feasible.

1 Like

They’ve said that they are not doing that before launch, but they will consider it for the next (first?) hard fork.

Fast GPU miners do not compromise security. On the contrary, they enhance it. They do make it a bit harder for amateur, hobbyist, and casual miners, but they make it much harder for botnets or for government supercomputer attacks. There’s a tradeoff to be had. While this is not the scenario that they were aiming for, it’s not necessarily a bad one.

2 Likes

One of today’s earlier optimizations improved performance by about 2.5 ms at the expense of correctness. We were getting totally bogus results. Bug fixed, back to 67 ms.

2 Likes

I tested out an RX 480. In order to do so, I had to switch drivers from fglrx 15.something (15.10?) to the new AMDGPU-PRO driver.

122 ms on the 480. D’oh!

On a hunch, I switched back to the R9 290 without switching drivers. 134 ms.

Believe it or not, this is good news. There is a well-documented performance difference between different versions of AMD drivers for OpenCL performance, with the 16.x series usually performing worse, at least for ethereum mining. This performance issue can often be ameliorated even with the new drivers by making a copy of the old version’s OpenCL.dll or libOpenCL.so file, and linking to it at runtime. On Windows, you do that by putting it in the same folder as the mining program. On Linux, you do that with “LD_PRELOAD=/path/to/good/libOpenCL.so ./MyMiningSoftware”. I’ll have to hunt down the desired files and give that a try with the RX 480.

The RX 480 does 10% better than the R9 290 with the AMDGPU-PRO driver and libOpenCL.dll. Perhaps it will still do 10% better with the Catalyst 15 libOpenCL.dll? If so, the RX 480 might get down to 60 ms. I’m curious to find out. Tomorrow, maybe.

2 Likes

@jtoomim Have you tested any High Bandwidth Memory GPUs yet?

Jonathan, as a correctness/coverage test, could you run your solver on the 10000 nonces from 10000 through 19999, and tell us total number of solutions found?

1 Like

@mjdecour We don’t have any HBM devices yet. I think our first one (a Fury) will arrive on Tuesday. I’m hopeful, but I’ve heard from ethash devs that you can’t make good use of the HBM unless your memory access size is large and/or sequential, so it’s possible they might be mediocre.

@tromp We’ve got a bug right now that prevents us from doing more than around 100 runs in a single execution. Until we fix that, testing 10k nonces would be inconvenient. We get 1.77 solutions in the first 50 nonces. I expect the deviation from that and 1.88 is due to statistical variance.

We get 1.77 solutions in the first 50 nonces.

I just ran my solver with empty header on nonces 0 through 49 and get 94 total solutions, for an exact 1.88 average per run.

I guess our blake2b personalization differs…
I tried to exactly match that in str4d’s solver.

1 Like

Our blake2b is missing personalization at the moment. My C code does it properly, but it hasn’t been implemented in the OpenCL code yet.

Our blake2b is missing personalization

Ok; I just disabled personalization in my code and reran to find a total of 100 solutions. Hmmm, still some unexplained difference in our blakes…

1 Like

I wouldn’t be surprised. We haven’t spent much time on the opencl blake2b code, and haven’t checked its output for correctness yet.

Actually, I think our initial nonce value is not 0…