Nvidia’s New Dual GPU Die Architecture

Guest Post

Just yesterday Nvidia unveiled their new architecture, named Blackwell, at their developers conference. What’s interesting about Blackwell is that not only is the chip physically larger, there are two GPU dies.

That increase in size of course means there’s much more room for transistors on each die, Blackwell has 208 billion transistors while the H100 has 80 billion, leading to a significant performance gain. In fact Blackwell can do 20 petaflops while the previous H100 can only do 4 petaflops. That’s a five fold increase in power.

God only knows how hot Nvidia’s new systems will run.

That datacenter heat waste is going to be a significant environmental / efficiency problem in future. I know people are already working on it, but solutions don’t seem to be keeping pace with the rise of artificial intelligence. AI is going to suck down electricity and spit out loads of heat. I wish I could see the plans for Microsoft’s new build datacenters to see exactly how they’re managing heat.

Anyways Blackwell will lead to very significant computing power gains for AI companies… if they can get their hands on chips that is. We might even see an increase in generating speed for ChatGPT and DALL-E, who aren’t exactly the fastest responders around

Advertisements

Now let’s ask AI how the increased transistor count from using both larger dies and a double die configuration affect performance…

The total transistor count in a GPU (Graphics Processing Unit) is a key factor that significantly affects its performance. This relationship is rooted in the basic functions of transistors within the GPU architecture and how they contribute to its overall capabilities. Here’s a breakdown of how the total transistor count impacts GPU performance:

1. Processing Power

  • Each transistor can be thought of as a tiny switch that can either block or allow the flow of electrical current. In a GPU, millions or even billions of these transistors work together to perform complex calculations required for rendering graphics. The higher the transistor count, the more processing units (cores) a GPU can contain, leading to greater parallel processing capabilities. This results in faster and more efficient rendering of graphics and handling of compute tasks.

2. Energy Efficiency

  • Modern transistors are designed to be as small as possible, allowing more of them to be packed into a given space. This miniaturization, described by Moore’s Law, also leads to improvements in energy efficiency. More transistors can perform the same amount of work at a lower energy cost, or more work at the same energy cost, compared to older designs with fewer transistors. This is crucial for both high-performance gaming and professional graphics workstations, as well as for mobile devices where battery life is important.

3. Functional Complexity

  • A higher transistor count allows for more sophisticated GPU architectures. This means not just more cores, but also the inclusion of specialized processing units for tasks like ray tracing, tensor operations (useful for AI computations), and advanced shading. These specialized units can dramatically improve performance and enable new features for graphics rendering and computational tasks that were not possible or too slow on older GPUs with fewer transistors.

4. Heat Dissipation

  • While having more transistors can lead to better performance, it also presents challenges in terms of heat production. More transistors switching on and off more frequently can generate a lot of heat, which must be effectively dissipated to prevent thermal throttling or damage. Advances in cooling technology and power management are therefore also essential to fully leverage the benefits of increased transistor counts.

5. Manufacturing Complexity and Cost

  • Increasing the transistor count can make the manufacturing process more complex and expensive. This is because higher transistor densities require more precise fabrication technologies, which can increase the cost of GPUs. However, consumers and professionals who require high-performance computing are often willing to pay these higher prices for the significant gains in performance.

In conclusion, the total transistor count is a fundamental indicator of a GPU’s potential performance. It influences not just the raw computational power, but also the efficiency, functionality, and even the cost of the GPU. As semiconductor technology continues to advance, finding innovative ways to increase transistor counts while managing heat and power consumption remains a key challenge for GPU manufacturers.