Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsJan 9, 2024·Zhen QinWeigao Sun,Dong Li,Xuyang Shen,Weixuan Sun,Yiran Zhong· 0 min read PDF Cite CodeLast updated on Jan 9, 2024 AuthorsWeigao SunYoung Scientist ← CO2: Efficient Distributed Training with Full Communication-Computation Overlap Jan 29, 2024