AMD says creaky old servers are a big problem for AI

  • AMD says clunky old servers are taking up much needed data center power

  • Upgrading and consolidating servers could free up power that can then be used for AI

  • Dell’Oro Group’s Baron Fung agreed consolidation can yield footprint and power savings, but noted there are good reasons enterprises and hyperscalers are stretching replacement cycles

High-power artificial intelligence chips from the likes of Nvidia and AMD have been stealing headlines. But is there room for them in data center server racks? Perhaps – but there could be more, lots more.

Robert Hormuth, corporate VP of architecture and strategy for AMD’s Data Center Solutions Group, told Silverlinings there are millions of old servers gumming up the works in data centers. That’s a big problem for data centers, which are already struggling with power constraints and rising energy costs across the globe, as well as enterprises looking to add AI servers into the mix to take advantage of all the technology has to offer.

“The vacancy rate of colo[cation facilities] and data centers is at an all time low around the world,” Hormuth said.

“Most of the new construction that is going on is already allocated…that’s hurting a lot of enterprises as they plan the future. How do you bring AI into your business if you have no more power, space, cooling?”

Hormuth and data from Gartner indicated that part of the problem is enterprises and cloud providers alike have been stretching IT refresh cycles to help maximize opex and cut costs. So, replacement timelines that once ranged from three to five years have been pushed to five to seven years.

But given advancements in packaging, node process and components in recent years, that means there’s a lot of power being wasted by those servers.

As Hormuth explained, a higher wattage chip is more efficient than multiple lower-power chips with the same total wattage (so, one 1,000-watt chip vs two 500-watt chips, for instance). That’s because a lot of power is lost in communication channels when two chips need to talk to each other.

“You end up spending a lot of power trying to get from chip A to chip B,” he said. “Any copper trace is resistance. The further you go, the more resistance, the harder you have to drive – the more electricity you have to push to get from point A to point B because of that inherent resistance…when you integrate it together, I can use a lot lower power [serializer/deserializer].”

Fewer servers, more power

This is where server consolidation comes in.

By upgrading to servers with newer chips, Hormuth continued, organizations can consolidate the amount of gear they have at “incredible ratios.” For instance, an enterprise could go from six or eight servers to one, depending on the original server age.

He added one AMD customer was able to consolidate 15 servers to three. That move, he said, left enough power for the company to add an AI server into its rack.

How much power are we talking here? Well, Hormuth said depending on the load, memory and storage configuration, a general purpose server can consume up to around 2,200 watts of power. AI servers, meanwhile, can run into the 5,000- to 6,000-watt range, he said.

“While I do see some major companies putting it more and more in their boardroom and CXO topics, I think we need to get a little more urgency around consolidation, energy savings, efficiency to really propel the industry forward with AI,” Hormuth stated.

Factors at play

We have to note that these comments are a little self-serving from AMD’s perspective, since obviously server consolidation would help boost sales of its Epyc chips and other portfolio products. And the company probably wouldn’t mind the bump given revenue for the full year 2023 fell 4% to $22.7 billion while net income dropped 35% to $854 million.

However, Dell’Oro Group’s Baron Fung said there is some truth to AMD’s claims.

As Hormuth indicated, Fung said cloud providers and enterprises can reap efficiency gains from replacing older servers with the latest CPUs. He noted one server OEM hosted a tour of a customer data center last year where the customer was able to replace "many hundreds of servers with fewer than a hundred” upgraded ones. That delivered footprint and power reduction benefits for the customer, Fung said.

“The latest servers have more processor cores, memory channels and capacity, have higher speed interfaces inside and out of the server, etc.,” he told Silverlinings in an email. “As a result, each server can host more users and applications at once…The latest CPUs can also be customized with embedded accelerators that can accelerate AI inference workloads, so that would be a bonus as well.”

But on the flip side, Fung said there are good reasons enterprises and cloud providers are stretching server replacement cycles.

On the enterprise side, IT budgets have remained stagnant for a variety of reasons, Fung said. In addition to pulling back on spending in light of an uncertain macro-economic climate, companies are also up against rising costs, pandemic supply chain hangovers and a push to spend more on AI and less on servers for general purpose compute.

Enterprises also have the option to tap into various computing services from cloud providers, allowing them to skirt the need to update their own kit.

But hyperscale cloud providers are also stretching server lifespans, albeit for different reasons. Here, Fung said that because hyperscalers often build their own servers, they’ve been able to not only improve reliability with design refinements but also to “optimize their infrastructure through software to prolong the [life] of the infrastructure.”

The latter lets cloud providers free up more spending for AI and other investments.

Plus, “with Moore’s Law of the CPU slowing, there is less incentive to replace servers with every new CPU update,” Fung explained.

That said, Fung conceded that the latest CPUs from Intel and AMD do offer a “big step up” compared to previous generations.

We'll be watching closely to see whether a server upgrade wave sweeps the market and what that could mean for AI.