4 GPU Programming

GPU architecture (5090 example). On the implementation side:

170 SM cores.
Each SM core features 4 schedulers and can hold up to 64 warps.
- Each warp is 32 execution contexts, so up to \(64 \times 32 = 2048\) execution contexts.