The newer HGX B200 offers a massive boost in performance for AI workloads compared to the HGX H200, particularly in areas like FP8, INT8, FP16/BF16, and TF32 Tensor Core operations, where it boasts a 125% improvement.
However, when we look at FP32 and FP64, it’s a smaller leap, at around 18.5%.
Surprisingly, the FP64 Tensor Core performance actually takes a hit, dropping by about 40%.
The B200 does shine in the memory department, offering a bigger total memory capacity (1.5 TB vs 1.1 TB) and a doubled NVSwitch GPU-to-GPU bandwidth. This faster communication is a game-changer for large-scale AI model training.
However, when you bring the estimated price into the picture, things get interesting.
The B200’s price tag is about 21.5% higher, so while you get a big boost in AI performance, the compute-per-dollar improvement is less dramatic, at around 85% for most AI operations (still huge).
For workloads relying heavily on FP32 and FP64, you might even be getting slightly less bang for your buck with the B200.
Feature | Unit | HGX H200 (8x H200 SXM) | HGX B200 (8x B200 SXM) | Performance Difference | Compute per Dollar Difference |
---|---|---|---|---|---|
INT8 Tensor Core | POPS | 32 | 72 | 125.00% | 85.11% |
FP4 Tensor Core | PFLOPS | – | 144 | – | – |
FP6 Tensor Core | PFLOPS | – | 72 | – | – |
FP8 Tensor Core | PFLOPS | 32 | 72 | 125.00% | 85.11% |
FP16/BF16 Tensor Core | PFLOPS | 16 | 36 | 125.00% | 85.11% |
TF32 Tensor Core | PFLOPS | 8 | 18 | 125.00% | 85.11% |
FP32 | TFLOPS | 540 | 640 | 18.52% | -2.50% |
FP64 | TFLOPS | 270 | 320 | 18.52% | -2.50% |
FP64 Tensor Core | TFLOPS | 540 | 320 | -40.74% | -51.25% |
Memory | TB | 1.1 | 1.5 | 36.36% | 12.18% |
NVSwitch GPU-to-GPU Bandwidth | GB/s | 900 | 1800 | 100.00% | 64.52% |
Total Aggregate Bandwidth | TB/s | 7.2 | 14.4 | 100.00% | 64.52% |
Estimated Price | USD | 290000 | 352500 | 21.55% | – |