Papers

(2024). HGRN2: Gated Linear RNNs with State Expansion. In COLM 2024.
(2024). Linear Attention Sequence Parallelism. In arXiv preprint arXiv:2404.02882.
(2024). CO2: Efficient Distributed Training with Full Communication-Computation Overlap. In ICLR 2024 (Spotlight).
(2024). Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models. In arXiv preprint arXiv:2401.04658.