Announced on 18 March 2024 by NVIDIA CEO Jensen Huang at GTC 2024
NVLink Enable Trillion-Parametre-Scale AI Models
Latest Blackwell Tensor Cores and TensorRT- LLM Compiler Reduce LLM Inference Operating Cost and Energy by up to 25 times
New Accelerators Enable Innovation in Data Processing, Electronic Design Automation, Computer-Aided Engineering and Quantum Computing
After months of anticipation, NVIDIA CEO Jensen Huang finally took the stage today at “The #1 AI Conference for Developers” GTC 2024. His highly-anticipated keynote kicked off the event, promising to reveal cutting-edge innovations that will power the new era of Generative AI.
The announcement revealed NVIDIA's latest breakthrough called the “Blackwell”. It is the NVIDIA Blackwell architecture succeeding the NVIDIA Hopper architecture launched on September 20, 2022. The NVIDIA Blackwell GPU is named after David Harold Blackwell, a statistician and mathematician specialising in game theory and statistics.
The NVIDIA Blackwell GPU will enable organisations across the globe to build and run real-time inference on trillion-parameter large language models at 25x less cost and energy consumption than its predecessor. The NVIDIA Blackwell architecture features six transformative technologies for generative AI and accelerated computing, which will help in breakthroughs in data processing, electronic design automation, computer-aided engineering and quantum computing.
NVIDIA Blackwell GPUs are powered by six revolutionary technologies that will enable AI training and real-time LLM inference for models scaling up to 10 trillion parametres. The ground-breaking GPU will include the following features:
Also, this architecture adds capabilities at the Blackwell chip level to utilise AI-based preventative maintenance to run diagnostics and forecast reliability issues. This maximises system uptime and improves resiliency for massive-scale AI deployments to run uninterrupted for weeks or even months at a time and to reduce operating costs.
Based on the Blackwell architecture NVIDIA, the NVIDIA B200 Tensor Core GPU delivers a massive leap forward in speeding up inference workloads, making real-time performance a possibility for resource-intensive and multitrillion-parameter language models.
Two B200 GPUs are combined in Blackwell’s flagship accelerator, the NVIDIA GB200 Grace Blackwell chip, which also utilises an NVIDIA Grace CPU. The GB200 provides a 30x performance increase compared to the NVIDIA H100 Tensor Core GPU for LLM inference workloads and reduces cost and energy consumption by 25x.
For the highest AI performance, GB200 supports the NVIDIA Quantum-X800 InfiniBand and Spectrum™-X800 Ethernet platforms which deliver advanced networking options at speeds up to 800 Gb/s. The GB200 NVL72 also includes NVIDIA BlueField®-3 data processing units to enable cloud network acceleration, composable storage, zero-trust security and GPU compute elasticity in hyperscale AI clouds.
The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale platform for the most compute-intensive workloads. It combines 36 Grace Blackwell chip, which include 72 B200 GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink.
To help accelerate the development of Blackwell-based servers from its partner network, NVIDIA announced NVIDIA HGX B200, a server board that links eight B200 GPUs through high-speed interconnects to develop the world’s most powerful x86 generative AI platforms. HGX B200 supports networking speeds up to 400 Gb/s through the Quantum-2 InfiniBand and Spectrum-X Ethernet platforms, along with support for BlueField-3 DPUs.
Check out the latest NVIDIA Blackwell GPU specifications for NVIDIA Blackwell GB200, B200 and B100:
Per GPU Specifications |
NVIDIA Blackwell GB200 NVL72 |
NVIDIA Blackwell B200 |
NVIDIA Blackwell B100 |
FP4 Tensor Core |
20 petaFLOPS |
18 petaFLOPS |
14 petaFLOPS |
FP8/FP6 Tensor Core |
10 petaFLOPS |
9 petaFLOPS |
7 petaFLOPS |
INT8 Tensor Core |
10 petaOPS |
9 petaOPS |
7 petaOPs |
FP16/BF16 Tensor Core |
5 petaFLOPS |
4.5 petaFLOPS |
3.5 petaFLOPS |
TF32 Tensor Core |
2.5 petaFLOPS |
2.2 petaFLOPS |
1.8 petaFLOPS |
FP64 Tensor Core |
45 teraFLOPS |
40 teraFLOPS |
30 teraFLOPS |
GPU memory |
Up to 192 GB HBM3e |
Up to 192 GB HBM3e |
Up to 192 GB HBM3e |
Bandwidth |
Up to 8 TB/s |
Up to 8 TB/s |
Up to 8 TB/s |
Multi-Instance GPU (MIG) |
7 |
7 |
7 |
Decompression Engine |
Yes |
Yes |
Yes |
Decoders |
2x 7 NVDEC, 2x 7 NVJPEG |
2x 7 NVDEC, 2x 7 NVJPEG |
2x 7 NVDEC, 2x 7 NVJPEG |
Power |
Up to 1200W |
Up to 1000W |
Up to 700W |
Interconnect |
5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s |
5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s |
5th Generation NVLink: 1.8TB/s, PCIe Gen6: 256GB/s |
NVIDIA Blackwell is the latest ground-breaking GPU architecture announced on 18 March by NVIDIA CEO Jensen Huang at the “The #1 AI Conference for Developers” GTC 2024 in San Jose.
The Blackwell Architecture NVIDIA is powered by six revolutionary technologies that will enable AI training and real-time LLM inference for models scaling up to 10 trillion parametres:
The GB200 is a key component of the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale platform for the most compute-intensive workloads. It combines 36 Grace Blackwell Superchips, which include 72 B200 GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink.
The NVIDIA Blackwell power consumption varies for each blackwell product such as: