Like many others, I was hyped when it came out. Then it turns out that BERT with half the param count still kicks their butts in accuracy.
They also only model the .real component and ignore the .imaginary component entirely, which you can't do and expect good results.
But, FFTs are so cool and under-explored that I'm sure they'll be making the rounds in NNs soon. There are lots of advantages to frequency space representations.
I don't understand why the first operations of a CNN are FFT-based decomposition into spatial wavelets. This is basically all the first layers are doing anyways. In fact the filters usually learn both an edge and a centroid at a given orientation. You can get both at the same time with complex wavelets.
Like many others, I was hyped when it came out. Then it turns out that BERT with half the param count still kicks their butts in accuracy.
They also only model the .real component and ignore the .imaginary component entirely, which you can't do and expect good results.
But, FFTs are so cool and under-explored that I'm sure they'll be making the rounds in NNs soon. There are lots of advantages to frequency space representations.