Weigao Sun
Weigao Sun

Young Scientist

About Me

I am a Young Scientist at Shanghai AI Laboratory. Currently, I fortunately collaborate with Prof. Yu Cheng, work on the revolution of foundational model architecture, including algorithm and system co-innovations on Linear Sequence Modeling (Linear Attention) and Mixture-of-Experts.

See our research projects: Linear-MoE, Linearization and MoM for technical details. My previous work Lightning Attention and LASP series are the key techniques in MiniMax-01 456B LLM and VLM.

🔥 I am looking for talented interns to work with me on above projects and beyond. Please feel free to hit me up with your CV or any questions if you are interested.

From 2020 to 2022, I was an AI Researcher at Linx Lab, Turing Architecture and Design Department at 2012 Lab, Huawei, supervised by Jiashu Lin and Heng Liao, worked on large scale distributed training algorithms (see CO2). I led to collaborate with Pengcheng Laboratory to complete the ranking of MLPerf V1.0, and completed the training of ResNet50 and Bert on up to 1000 Ascend 910 chips. During at Huawei, I served as the project leader for the Huawei side of Science and Technology Innovation 2030 – Next Generation Artificial Intelligence Major Project.

I earned my PhD degree from Huazhong University of Science and Technology (HUST) at 2020.04, co-supervised by Prof. Hai-Tao Zhang and Prof. Ye Yuan, and jointly trained at School of Artificial Intelligence and Automation (AIA) and HUST Innovation Institute (with the First-class Grant).

(Updated at 2025.05)

Interests
  • Linear Attention
  • Efficient Sequence Modeling
  • Mixture-of-Experts
  • Large Language Model
Recent Papers
(2025). A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond. In arXiv preprint arXiv:2503.21614.
(2025). Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts. In ICLR 2025-SCOPE Workshop (Oral).
(2025). Liger: Linearizing Large Language Models to Gated Recurrent Structures. In arXiv preprint arXiv:2503.01496.
(2025). MoM: Linear Sequence Modeling with Mixture-of-Memories. In arXiv preprint arXiv:2502.13685.
(2025). LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid. In arXiv preprint arXiv:2502.07563.
(2025). Minimax-01: Scaling Foundation Models with Lightning Attention. In arXiv preprint arXiv:2501.08313.
Recent News