Papers

(2025). A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond. In arXiv preprint arXiv:2503.21614.
(2025). Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts. In ICLR 2025-SCOPE Workshop (Oral).
(2025). Liger: Linearizing Large Language Models to Gated Recurrent Structures. In arXiv preprint arXiv:2503.01496.
(2025). MoM: Linear Sequence Modeling with Mixture-of-Memories. In arXiv preprint arXiv:2502.13685.
(2025). LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid. In arXiv preprint arXiv:2502.07563.
(2025). Minimax-01: Scaling Foundation Models with Lightning Attention. In arXiv preprint arXiv:2501.08313.
(2024). LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training. In arXiv preprint arXiv:2411.15708.
(2024). Scaling Laws for Linear Complexity Language Models. In EMNLP 2024.
(2024). Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention. In ICML 2024.
(2024). Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective. In ICML Workshop 2024.