Report copyright - When Can Self-Attention Be Replaced by Feed Forward Layers? · ing the hidden representation for each time step of a sequence, a self-attention layer has a global view of the entire
Please pass captcha verification before submit form