Broadcom samples the first 51.2-terabit switch chip
Tuesday, August 16, 2022 at 5:55PM
Roy Rubenstein in Bailly, Bob Wheeler, Broadcom, Dell'Oro, Humboldt, OFC 2022, Pete Del Vecchio, Tomahawk 4, Tomahawk 5

Part 1: Broadcom's Tomahawk 5

Broadcom is sampling the world's first 51.2-terabit switch chip.

With the Tomahawk 5, Broadcom continues to double switch silicon capacity every 24 months; Broadcom launched the first 3.2-terabit Tomahawk was launched in September 2014.

"Broadcom is once again first to market at 51.2Tbps," says Bob Wheeler, principal analyst at Wheeler's Network. "It continues to execute, while competitors have struggled to deliver multiple generations in a timely manner."

 

Tomahawk family

Hyperscalers use the Tomahawk switch chip family in their data centres.

Broadcom launched the 25.6-terabit Tomahawk 4 in December 2019. The chip uses 512 serdes, but these are 50-gigabit PAM-4. At the time, 50-gigabit PAM-4 matched the optical modules' 8-channel input-output (I/O).

Certain hyperscalers wanted to wait for 400-gigabit optical modules using four 100-gigabit PAM-4 electrical channels, so, in late 2020, Broadcom launched the Tomahawk4-100G switch chip, which employs 256, 100-gigabit PAM-4 serdes.

Tomahawk 5 doubles the 100-gigabit PAM-4 serdes to 512. However, given that 200-gigabit electrical interfaces are several years off, Broadcom is unlikely to launch a second-generation Tomahawk 5 with 256, 200-gigabit PAM-4 serdes. 

 

Source: Broadcom, Gazettabyte

Switch ICs

Broadcom has three switch chip families: Trident, Jericho and the Tomahawk.

The three switch chip families are needed since no one switch chip architecture can meet all the markets' requirements.

With its programable pipeline, Trident targets enterprises, while Jericho targets service providers.

According to Peter Del Vecchio, Broadcom's product manager for the Tomahawk and Trident lines, there is some crossover. For example, certain hyperscalers favour the Trident's programmable pipeline for their top-of-rack switches, which interface to the higher-capacity Tomahawk switches chips at the aggregation layer.

 

Monolithic design

The Tomahawk 5 continues Broadcom's approach of using a monolithic die design.

"It [the Tomahawk5] is not reticule-limited, and going to [the smaller] 5nm [CMOS process] helps," says Del Vecchio.

The alternative approach - a die and chiplets - adds overall latency and consumes more power, given the die and chiplets must be interfaced. Power consumption and signal delay also rise whether a high-speed serial or a slower, wider parallel bus is used to interface the two.

Equally, such a disaggregated design requires an interposer on which the two die types sit, adding cost.

 

Chip features

Broadcom says the capacity of its switch chips has increased 80x in the last 12 years; in 2010, Broadcom launched the 640-gigabit Trident.

Broadcom has also improved energy efficiency by 20x during the same period.

"Delivering less than 1W per 100Gbps is pretty astounding given the diminishing benefits of moving from a 7nm to a 5nm process technology," says Wheeler.

"In general, we have achieved a 30 per cent plus power savings between Tomahawk generations in terms of Watts-per-gigabit," says Del Vecchio.

Peter Del Vecchio

These power savings are not just from advances in CMOS process technology but also architectural improvements, custom physical IP designed for switch silicon and physical design expertise.

"We create six to eight switch chips every year, so we've gotten very good at optimising for power," says Del Vecchio.

The latest switch IC also adds features to support artificial intelligence (AI)/ machine learning, an increasingly important hyperscaler workload.

AI/ machine learning traffic flows have a small number of massive 'elephant' flows alongside 'mice' flows. The switch chip adds elephant flow load balancing to tackle congestion that can arise when the two flow classes mix.

"The problem with AI workloads is that the flows are relatively static so that traditional hash-based load balancing will send them over the same links," says Wheeler. "Broadcom has added dynamic balancing that accounts for link utilisation to distribute better these elephant flows."

The Tomahawk 5 also provides more telemetry information so data centre operators can better see and tackle overall traffic congestion.

The chip has added virtualisation support, including improved security of workloads in a massively shared infrastructure.

Del Vecchio says that with emerging 800-gigabit optical modules and 1.6 terabit ones on the horizon, the Tomahawk 5 is designed to handle multiples of 400 Gigabit Ethernet (GbE) and will support 800-gigabit optical modules.

The chip's 100-gigabit physical layer interfaces are combined to form 800 gigabit (8 by 100 gigabit), which is fed to the MAC, packet processing pipeline and the Memory Management Unit to create a logical 800-gigabit port. "After the MAC, it's one flow, not at 400 gigabits but now at 800 gigabits," says Del Vecchio.

Market research firm, Dell'Oro, says that 400GbE accounts for 15 per cent of port revenues and that by 2026 it will rise to 57 per cent.

Broadcom also cites independent lab test data showing that its support for RDMA over Converged Ethernet (RoCE) matches the performance of Infiniband.

"We're attempting to correct the misconception promoted by competition that Infiniband is needed to provide good performance for AI/ machine learning workloads," says Del Vecchio. The tests used previous generation silicon, not the Tomahawk 5.

"We're saying this now since machine learning workloads are becoming increasingly common in hyperscale data centres," says Del Vecchio.

As for the chip's serdes, they can drive 4m of direct attached copper cabling, with sufficient reach to connect equipment within a rack or between two adjacent racks.

 

Software support

Broadcom offers a software development kit (SDK) to create applications. The same SDK is common to all three of its switch chip families.

Source: Broadcom.

Broadcom also supports the Switch Abstraction Interface (SAI). This standards-based programming interface sits on top of the SDK, allowing the programming of switches independent of the silicon provider.

Broadcom says some customers prefer to use its custom SDK. It can take time for changes to filter up, and a customer may want something undertaken that Broadcom can develop quickly using its SDK.

 

System benefits

Doubling the switch chip's capacity every 24 months delivers system benefits.That is because implementing a 51.2-terabit switch using the current generation Tomahawk 4 requires six such devices.

Source: Broadcom.

Now a single 2-rack-unit (2RU) Tomahawk 5 switch chip can support 64 by 800-gigabit, 128 by 400-gigabit and 256 by 200-gigabit modules.

These switch boxes are air-cooled, says Broadcom.

 

Co-packaged optics

In early 2021 at a J.P Morgan analyst event, Broadcom revealed its co-packaged optics roadmap that highlighted Humboldt, a 25.6-terabit switch chip co-packaged with optics, and Bailly, a 51.2-terabit fully co-packaged optics design.

At OFC 2022, Broadcom demonstrated a 25.6Tbps switch that sent half of the traffic using optical engines.

Also shown was a mock-up of Bailly, a 51.2 terabit switch chip co-packaged with eight optical engines, each at 6.4Tbps.

Broadcom will offer customers a fully co-packaged optics Tomahawk 5 design but has not yet given a date.

Broadcom can also support a customer if they want tailored connectivity with, say, 3/4 of the Tomahawk 5 interfaces using optical engines and the remainder using electrical interfaces to front panel optics.

Article originally appeared on Gazettabyte (https://www.gazettabyte.com/).
See website for complete article licensing information.