RWKV6, xLSTM, Mamba, Griffin, GLA, HRGN2, ... All are similar "matrix-valued dynamic exponential decay"🙂 Only differences are sharing some parameters / adding some tweaks / adding some attention (hybrid).
RWKV6 is the most battle-tested AFAIK: 7B dense @ 2.5T (attention-free to…