Elon Musk’s xAI has a new AI supercomputer that runs on a whopping 100,000 Nvidia GPUs.
That could help Musk’s company catch up to some AI frontrunners, like Meta.
But some tech leaders aren’t impressed.
Elon Musk’s AI company, xAI, recently unveiled a supercomputer dubbed Colossus. And like its name implies, it’s big.
The computer is an artificial-intelligence training system that Musk says runs on a whopping 100,000 Nvidia H100 chips, powerful graphics processing units that have become critical to the AI race.
To put that into perspective, Meta’s Llama 3 large language model was trained with 16,000 H100 chips. Meta said in March it would continue to invest in its AI infrastructure by adding two 24,000-chip clusters.
In other words, Musk’s Colossus is powerful. And it could help him catch up to the AI industry’s frontrunners.
But some prominent tech leaders are not so sure.
LinkedIn’s cofounder Reid Hoffman told The Information, a tech publication, that the xAI supercomputer was mere “table stakes” in the competitive field of generative AI.
According to The Information, Hoffman meant that Colossus only allows xAI to catch up to other, more advanced AI companies, like OpenAI and Anthropic.
Chris Lattner, the CEO of Modular AI, said during a panel discussion at The Information’s AI Summit last week that Musk’s heavy reliance on Nvidia’s expensive and finite chips was also inconsistent with the billionaire’s effort to build his own GPU, called Dojo.
Meta, Microsoft, Alphabet, and Amazon are all developing their own AI chips, even as they continue to stockpile Nvidia GPUs.
“The difference is that Elon has been working on Dojo for many years now,” Lattner told Business Insider in an email.
Musk has expressed concern about the challenges of acquiring more of Nvidia’s highly sought-after chips and said that his Dojo project will help decrease his company’s dependence on the chipmaker.
“We do see a path to being competitive with Nvidia with Dojo,” Musk said during a Tesla earnings call in July. “We kind of have no choice.”
When talking about Colossus on X this month, Musk said he aimed to double the supercomputer’s chips to 200,000 in a few months.
He said the cluster was built in just 122 days — an impressive feat that no other company has matched, The Information said.
It’s unclear whether Colossus runs 100,000 GPUs at the same time, which would require sophisticated networking technology and a lot of energy.
“Musk previously said the 100,000-chip cluster was up and running in late June,” The Information reported. “But at that time, a local electric utility said publicly that xAI only had access to a few megawatts of power from the local grid.”
Last month, CNBC reported that an environmental advocacy group had said that xAI was running gas turbines to produce more power for its data center without authorization.
The outlet reported that the Southern Environmental Law Center wrote in a letter to the local health department that xAI had installed and was operating at least 18 unpermitted turbines, “with more potentially on the way,” to supplement its massive energy needs.
The local utility, Memphis Light, Gas and Water, told CNBC it had provided 50 megawatts of power to xAI since the beginning of August but that the facility required an additional 100 megawatts to operate.
Data-cluster developers told The Information that this could power only a few thousand GPUs. Musk’s company would need another electric substation to get enough power to run 100,000 chips.
Hoffman and Musk did not immediately respond to requests for comment from BI.