One could build an FPGA system that kills GPUs performance, however it won't be cheap to do so. The cost will probably be the deciding factor. FPGAs can achieve extremely high throughputs when latency isn't a issue. They also allow for extremely flexible memory designs that can be tailored to fix an algorithm for which equihash would provide many areas where memory could be saved. However in the end it would still require extremely expensive FPGAs likely coupled with extremely costly static RAM to get speeds that are meaningful.
ASICs typically are even faster but the up-front costs are prohibitive for the amount of money involved here. The resulting systems would also be on the order of thousands of USD because of the RAM requirements. It doesn't make sense to spend so much money on something that is still limited by the memory bandwidth.
I think that SOC FPGAs could provide a little better performance than GPUs with cheaper GPU style RAM. Typically there is a very fast interface in the FGPA fabric to extend the processor functions and the processor could make it easier to pack the memory such that the memory accesses would be more efficient. Still the costs are going to be higher than a GPU. Power usage would probably be lower. Biggest problem is one would have to probably design their own board.
So in the end I think the only real thing blocking FPGAs is economics. Maybe five years down the road when FPGAs with HBM are reasonable cost, things will be different.
With GPUs, companies like NVIDIA are working on ways to reduce power consumption. That is the real cost ... For example with a 1050ti I'm getting around 140 SOL/s but the laptop and card are pulling close to 80W. Since my power is close to $0.25 / KWH ... power is 1/2 of my costs.
The fact that the dev/team would probably change the algorithm is another factor. That 100% kills ASICs and again currently FPGAs wouldn't be a minor investment either because of the memory requirements. If N were 100 ... then some FPGAs would probably have enough RAM blocks, but N is 200 and that means it takes about 1000x more memory.