linkedin
linkedin

GB200 NVL72 - Powering the new era of computing.

linkedin

GB200 NVL72 is an ODM co-development project involving hundreds engineers from both Nvidia and Supermicro, collaboratively working for over a year. It stands the most powerful AI and HPC infrastructure on the earth.

Executive Summary

GB200 NVL72 SuperCluster features the new advanced in-rack coolant distribution unit (CDU) and custom cold-plates designed for the compute trays housing the NVIDIA GB200 Grace™ Blackwell Superchips. With extensive experience deploying large scale direct-to-chip (DLC) liquid-cooled AI systems, Supermicro’s leading Liquid-Cooling technology advancement powers NVIDIA GB200 NVL72, an exascale computing in a single rack, providing up to 25x more performance at the same power than the previous generation could offer.

The GB200 NVL72 delivers exascale computing capabilities in a single rack with fully integrated Liquid-Cooling. It incorporates 72 NVIDIA Blackwell GPUs and 36 Grace CPUs interconnected by NVIDIA’s largest NVLink™ network to date. The NVLink Switch System facilitates 130 terabytes per second (TB/s) of total GPU communications with low latency, enhancing performance for AI and high-performance computing (HPC) workloads.

linkedin
linkedin

My main involvements

Tracked and monitored product development progress, testing results, and HW, SW, and FW compatibility from the component level to L10 and L11 stages.

Requested product allocations from external partners and planned and managed limited internal resources to effectively support the engineering team's needs.

Coordinated cross-functional teams to request and prioritize resources for the project, collaborating with nearly every department within the company.

Co-launched the product at the 2023 GTC with Nvidia during the Keynote presentation. Built a static demo rack for the company booth and published announcements across various marketing channels, capturing significant attention in the semiconductor industry.

Hosted multiple weekly product alignment meetings with internal and external stakeholders to review product status. Established effective communication and escalation channels and created necessary SOPs.

Project is on going...

Highlight

Enabled the company to become the first and only Nvidia ODM partner to successfully boot up the entire infrastructure.

© 2024-Present Xuedinan Gao. All Rights Reserved.