LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-TrainingNov 24, 2024·Xiaoye Qu,Daize Dong,Xuyang Hu,Tong ZhuWeigao Sun,Yu Cheng· 0 min read PDF Cite CodeLast updated on Nov 24, 2024 AuthorsWeigao SunYoung Scientist ← Minimax-01: Scaling Foundation Models with Lightning Attention Jan 14, 2025Scaling Laws for Linear Complexity Language Models Jun 24, 2024 →