Moonshot AI's 'Attention Residuals' Paper Rethinks Transformer Architecture