DDR3 1333 is maxed out at 21 S/s on the best CPUs and the xenoncat code. DDR3 1600 and DDR4 2400 results are in agreement with this (when using 2 cards).
The equihash paper paper says GTX480 DDR5 bus was 134 GB/s vs DDR3 1600 which is 17 GB/s. Using this as a baseline and the above CPU observations and looking at the DDR5 MHz and bus width of all the GPUs, I get the following,
RX 470 4GB max is 5.4x faster than DDR3 1333 = 114 S/s max (my baseline)
RX 480 4GB bus is 7% faster =121 S/s max
RX 480 8GB bus is 21% faster =138 S/s max (best buy?)
R9 270x 2 GB bus is 25% faster = 142 S/s max
R9 280x 3 GB bus is 34% faster = 152 S/s max
R9 290 4GB bus is 49% faster = 170 S/s max
HD 7850 2 GB bus is 75% as fast = 85 S/s max
1070 8GB bus is 20% faster = 136 s/s (worst buy?)
Nano and Fuxy use HBM instead of DDR5, so I can't compare, except by using optiminer's data where he has a nano going 38% faster than where DDR5 memory would have placed it, for his code (127 S/s is where DDR5 would put it). So I get as future max values:
R9 nano: 187 S/s max
Fury: 374 S/s ??
Compare this to all the results and I think you'll see the memory bus bandwidth is determining how fast every card and CPU can go.
RX 470 watts are not max out, but DDR5 is.
Therefore devs should use more core watts to spare DDR5 bus.
If GPU algorithms can become as efficient the CPU algorithm if they do not parallelize the code in an inefficient way. So far they are doing good as the watts are not maxed out. Now they will probably have to parallelize in inefficient ways, so the follow are max, if CPUs are maxed out.