Gazettabyte

Marvell has unveiled a chip that enables copper cables to send 1.6 terabits-per-second (Tbps) of data between equipment in the data centre.

Copper cabling, also referred to as direct attach copper, is the standard interconnect used to connect compute nodes in a server, and between servers when building larger computing systems.

Venu Balasubramonian

Data centre operators prefer to use passive copper cables. A copper cable costs less than an optical cable, a critical consideration when tens of thousands may be used in a large data centre.

Compute servers using the latest processors and AI accelerator chips have increasing input-output (I/O) requirements. This is causing interface speeds between servers, and between servers and switches, to keep doubling—from 400 gigabits to 800 gigabits and soon 1.6Tbps.

Moreoever, with each speed hike, the copper cable’s reach shrinks. A copper cable sending 25 gigabits of data has a reache of 7m, but it is only 2m at 100Gbps and is only 1m at 200Gbps.

The solution is to add a digital signal processing (DSP) chip to the passive copper cabling to create an ’active’ electrical cable. The chip boosts the signal thereby extending the reach. (See diagram below.)

Source; Marvell.

“As speeds go up and the physical distance remains the same, the interconnects have to become active,” says Venu Balasubramanian, vice president of product marketing, connectivity business unit at Marvell.

Marvell says its Alaska chip is the industry’s first to enable 1.6Tbps active electrical cables.

Data centre networking

Two main networks - the front-end and backend - are used in data centres supporting AI workloads.

Source: Marvell.

The front-end network, using a traditional Clos network, interfaces servers with the outside world. The Clos network uses a hierarchy of Ethernet switches, with the top-of-rack switches connecting a rack’s servers to leaf and spine switches. The network enables any server to communicate with any other server.

The second, backend network, is optimised to meet the networking requirements of AI. When training an AI model, the servers' accelerator chips perform intensive calculations before exchanging their results. These steps are repeated many times. The goal is to keep the AI accelerators occupied while ensuring minimal delay when data is exchanged. The backend network's protocol used to meet these traffic requirements is either Infiniband or Ethernet.

The diagram below shows the typical reaches connecting the compute nodes in a rack and between racks.

Source: Marvell.

Copper links are preferred for all the links within reach. These are the point-to-point links in a rack and the connections between ervers and the top-of-rack switches. Links between adjacent racks or switches are also within copper's reach. But optical connections must be used for distances 5m and greater.

"Up to five meters, and previously seven meters, you could connect with passive copper, that has been the interconnect all along," says Balasubramonian. "Now those links are getting replaced with active copper, and if copper can do it, that is what customers prefer."

AI accelerator chips' continual processing performance advancement is also reflected in their I/O requirements.

Nvidia's latest Blackwell graphics processing unit (GPU) uses 200 gigabit-per-second serialiser-deserialisers (serdes) while AI accelerator designs from other vendor are also adopting 200-gigabit serdes, says Balasubramonian.

The drastic shortening of the reach of passive copper cabling at 200Gbps is driving active electrical cabling usage.

The Alaska DSP

The Alaska chip is implemented using a 5nm CMOS process. To achieve 1.6Tbps, the DSP supports eight channels at 200Gbps, each implementing 4-level pulse amplitude modulation (PAM-4) signalling.

The DSP device amplifies, equalises, and reshapes the signals to achieve extended link distances. The Alaska chip also has a ‘gearbox’ feature that translates between signal speeds. This enables end users to adopt new servers with AI chips that support 200Gbps while using existing switches which may only have 100Gbps ports.

The Alaska chip also includes telemetry and debug features so that data centre operators can note the status of traffic flows and any networking issues.

The chip measures 12mm x14mm, to occupy as little space as possible inside a QSFP-DD or OSFP module, says Balasubramonian.

Using the Alaska device for active electrical cabling means 50Gbps signals can span over 7m, 100Gbps signals over 5m, and 200Gbps signals over 3m.

The 1.6Tbps active electrical cables using the Alaska device also use thinner gauge copper wire. The thinner wiring makes connecting systems easier as the thinner gauge cabling has a higher bend radius. The cabling also improves the air flow, helping equipment cooling.

Marvell says it is working with such active electrical cabling specialists as Amphenol, Molex and TE Connectivity.

Future trends

Marvell points out that AI servers are becoming increasingly distributed. The trend is that a board holding N GPUs will become two boards and in future four boards to host the same number of GPUs.

Source: Marvell.

This will requires even more copper interconnects. Passive copper cabling will be used where possible. "If you can do it you with direct attached copper you would do because there will be no power and latency impact [as no DSP chip need be added],” says Balasubramonian.

Marvell expects a combination of passive and active copper cabling to be used in the data centre with the percentage of the links served with passive cabling shrinking as speeds increase.

Marvell typically develops two generations of chip at each speed, with the second released some two years after the first.

The next chip will likely support 1.6Tbps with half the number of channels. This implies 200-gigabit serdes and PAM-4 to achieve 4x400Gbps links.

The cabling will require not just a new generation of serdes but also connectors and cables for a 1.6Tbs active electrical cable implemented using 4x400Gbps channels.

The goal at 400Gbps would be to achieve a reach of 2m. “We don’t yet know [if that is possible],” says Balasubramonian. “It is early.”