The paper's stated goal seems to imply that it solves the problem of GPUs having an advantage over CPUs:
the requirement for fast verification so far made it an easy prey for
GPU-, ASIC-, and botnet-equipped users. The attempts to rely on
memory-intensive computations in order to remedy the disparity
between architectures have resulted in slow or broken schemes
However, the paper goes onto say
Thus the total advantage of GPU over CPU is about the factor of 4
So the paper is basically refuting itself, and saying that the paper doesn't prove the stated abstract goal at all. I don't get this at all.
Also, the study where gpus have an advantage over cpus by a factor of 4 is from 2011, but gpu performance has steadily accelerated over cpu performance in the past few years (gpus benefit directly from moore's law because you can fit twice as many gpu cores every few years, whereas the number of cores per cpu has decreased by a much smaller rate). So the 2011 study is basically completely irrelevant in 2016, and the factor of gpu efficiency vs cpu at the same price point might be like 10 or 15x now.
Am I missing something, or does this totally not work?
Update: Ok so it seems that without factoring electricity, the cost efficiency is 4x -10x for gpus vs cpus. Now what I'm puzzling over is how zcash can only have a 4-10x efficiency for gpus vs cpus, whereas ethereum has a 100x efficiency gain for typical user devices versus dedicated gpu hardware. Theoretically, if typical laptops are in fact memory-bound when running equihash/ ethash mining algorithm, then the gain should only be 4-10x.
However, I hypothesize that the actual bottleneck for typical laptops / user devices is in fact not memory, but actually the number of cores (e.g, for a laptop with 32gb ram with just 2 cores, having more threads per core only results in a limited gain in mining efficiency. So you're effectively only using like 2-4gb of that ram for mining).
For example, for ethereum I saw numbers for 30 mh/s for a gpu costing like $300. However, a macbook pro retina using gpu can mine about 3 mh/s. Macbook pro retina cpu mining is about .3 mh/s, about 100x less efficient. So my guess is that the optimum number of threads per core is just 1 or 2, so much less memory and memory bandwidth is used. Thoughts?