Linear Attention Sequence ParallelismApr 3, 2024·Weigao Sun,Zhen Qin,Dong Li,Xuyang Shen,Yu Qiao,Yiran Zhong· 0 min read PDF Cite CodeLast updated on Apr 3, 2024 AuthorsWeigao SunYoung Scientist ← HGRN2: Gated Linear RNNs with State Expansion Apr 11, 2024CO2: Efficient Distributed Training with Full Communication-Computation Overlap Jan 29, 2024 →