Analysis Commodities Featured News

Elon Musk shows off Cortex AI supercluster — first look at Tesla’s 50,000 Nvidia H100s

post-img

Elon Musk’s supercomputing exploits continue to press forward this week, as the technocrat shared a video of his newly renamed “Cortex” AI supercluster on X. The recent expansion to Tesla’s “Giga Texas” plant will contain 70,000 AI servers and will require 130 megawatts (MW) of cooling and power at launch, upscaling to 500 MW by 2026.

Musk’s video of the Cortex supercluster shows off the in-progress assembly of a staggering number of server racks. From the fuzzy video, the racks seem to be laid out in an array of 16 compute racks per row, with four or so non-GPU racks splitting the rows. Each computer rack holds 8 servers. Somewhere between 16-20 rows of server racks are visible in the 20-second clip, so rough napkin math estimates 2,000 GPU servers can be seen, less than 3% of the estimated full-scale deployment.

Musk shared in Tesla’s July earnings call that the Cortex supercluster will be Tesla’s largest training cluster to date, containing “50,000 [Nvidia] H100s, plus 20,000 of our hardware.” This is a smaller number than Musk previously shared, with tweets from June estimating Cortex would house 50,000 units of Tesla’s Dojo AI hardware. Previous remarks from the Tesla CEO also suggest that Tesla’s own hardware will come online at a later date, with Cortex expected to be solely Nvidia-powered at launch.

The Cortex training cluster is being built to “solve real-world AI,” per Elon’s Twitter. In Tesla’s Q2 2024 earnings call, this means training Tesla’s Full Self Driving (FSD) autopilot system for Tesla—which will power consumer Teslas and the promised “Cybertaxi” product—and training AI for the Optimus robot, an autonomous humanoid robot expected to begin limited production in 2025 to be used in Tesla’s manufacturing process.

Cortex first turned heads in the press thanks to the massive fans under construction to chill the entire supercluster, shown off by Musk in June. The fan stack cools the Supermicro-provided liquid cooling solution, built to handle an eventual 500 MW of cooling and power at full power. For context, an average coal power plant may output around 600 MW of power.

Cortex joins Elon Musk’s stable of supercomputers in development. So far, the first of Musk’s data centers to become operational is the Memphis Supercluster, owned by xAI and powered by 100,000 Nvidia H100s. All of Memphis’ 100,000 servers are connected with a single RDMA (remote direct memory access) fabric, and are likewise cooled with help from Supermicro. Musk has also announced plans for a $500 million Dojo supercomputer in Buffalo, New York, another Tesla operation.

The Memphis Supercluster is also expected to upgrade its H100 base to 300,000 B200 GPUs, but delays on Blackwell’s production due to design flaws have pushed this massive order back by several months. As one of the largest single customers of Nvidia AI GPUs, Musk seems to be following Jensen Huang’s CEO math: “The more you buy, the more you save.” Time will tell whether this rings true for Musk and his supercomputer collection.

Related Post

[mstock id="67"]