Flash Attention Memory Footprint: What Your GPU Actually Allocates During Prefill
A practical breakdown of how FlashAttention allocates GPU memory during prefill, where the pressure points are, and how to diagnose OOM failures before they surprise you.
Magos Veridian
· · 4 min read