Tensor Parallelism Under Pressure: What Breaks When You Scale Width
A field guide to the failure modes that emerge when you scale tensor parallelism across GPU ranks: communication bottlenecks, numerical drift, and load imbalance.
Magos Veridian
· · 4 min read