Minimax-01: Scaling Foundation Models with Lightning AttentionJan 14, 2025·Weigao Sun,Et Al· 0 min read PDF Cite CodeLast updated on Jan 14, 2025 AuthorsWeigao SunYoung Scientist ← LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Feb 11, 2025LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Nov 24, 2024 →