Weigao Sun
  • About Me
  • Papers
  • News
  • Recent & Upcoming Talks
    • Example Talk
  • Papers
    • A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
    • Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
    • Liger: Linearizing Large Language Models to Gated Recurrent Structures
    • MoM: Linear Sequence Modeling with Mixture-of-Memories
    • LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
    • Minimax-01: Scaling Foundation Models with Lightning Attention
    • LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
    • Scaling Laws for Linear Complexity Language Models
    • Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
    • Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
    • HGRN2: Gated Linear RNNs with State Expansion
    • Linear Attention Sequence Parallelism
    • CO2: Efficient Distributed Training with Full Communication-Computation Overlap
    • Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
  • Projects
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python
  • Blog

Linear Attention Sequence Parallelism

Apr 3, 2024·
Weigao Sun
Weigao Sun
,
Zhen Qin
,
Dong Li
,
Xuyang Shen
,
Yu Qiao
,
Yiran Zhong
· 0 min read
PDF Cite Code
Last updated on Apr 3, 2024
Weigao Sun
Authors
Weigao Sun
Young Scientist

← HGRN2: Gated Linear RNNs with State Expansion Apr 11, 2024
CO2: Efficient Distributed Training with Full Communication-Computation Overlap Jan 29, 2024 →

© 2025 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.