In fact, Nvidia has basically pulled the plug on SLI (scalable link interface) multi-GPU gaming solutions. Scaling workloads that are already designed to be split up across potentially thousands of nodes in a supercomputer is inherently different from scaling real-time gaming workloads. MCM for GeForce does present more difficulties, of course. However, this same workload-semiconductor philosophy can be applied across Nvidia's GPU-based product stack, including consumer GeForce. The company's approach prioritizes the high-margin HPC and AI markets first, which makes sense, especially considering how multiple companies have been encroaching on that space with their own custom solutions (for example, Cerebras with its Wafer Scale Engine and Lightelligence with its photonics-based PACE). COPA is Nvidia's attempt to simulate the effects of multiple chiplet design decisions, and how they relate to performance. Certain hardware blocks can't currently be separated without incurring extreme performance penalties. Not all hardware blocks are made equal, however. So with 50% less memory bandwidth (and removing the hardware that enables it), Nvidia can claw back chip space for other, more appropriate hardware blocks that would deliver more performance than has been lost for the appropriate workload. Cutting the available bandwidth by a further 25% added another 10% performance penalty. The paper shows how a 25% memory bandwidth reduction actually only slows down HPC workloads by an average of 4%. To that effect, Nvidia has been simulating how different MCM designs and configurations could allow it to mix and match the required hardware blocks for each workload. COPA allows for the mix and match of different hardware blocks, building upon certain workload requirements in detriment of others, and a higher number of more specialized (and more performant) chip designs. Clearly, Nvidia is concerned that its single-product approach (read: the GA100 accelerator and its predecessors) will start losing ground towards the increasing workload specialization in those areas.Ī diagram comparing a monolithic GPU, which crams all the execution units and caches for a true general purpose GPU. This latest research piece is more concerned with how Nvidia will handle the increasing differentiation between HPC and AI workloads, which have been drifting apart for a while now. Nvidia's doubling-down on MCM GPUs is called the Composable On Package GPU, or COPA. Of course, being smaller, these chips should also present better thermals and power efficiency than their larger brethren. Cutting up a large die into several smaller ones helps improve yields (smaller dies have fewer chances of having critical manufacturing defects), and also allows for more computing resources to be chained together than a single, monolithic die ever could. The first factual information on AMD's research into MCM came to light in 2017 (opens in new tab), when the company demonstrated how an MCM design with four chiplets could outperform the biggest monolithic GPU that could be built at the time by a whopping 45.5%. This could be achieved by "mix and matching" different hardware blocks according to the intended workloads, and that's exactly where MCM comes in. As computing becomes more and more heterogeneous, Nvidia seems to be looking for a way to add flexibility to its semiconductor designs. Nvidia researchers have published an article detailing the various ways the company is exploring how Multi Chip Module (MCM) designs can be deployed for future products.
0 Comments
Leave a Reply. |