Tencent makes its co-packaged optics move

- Tencent is the first hyperscaler to announce it is deploying a co-packaged optics switch chip
- Tencent will use Broadcom’s Humboldt that combines its 25.6-terabit Tomahawk 4 switch chip with four optical engines, each 3.2 terabit-per-second (Tbps)
Part 2: Broadcom’s co-packaged optics
Tencent will use Broadcom’s Tomahawk 4 switch chip co-packaged with optics for its data centres.
“We are now partnered with the hyperscaler to deploy this in a network,” says Manish Mehta, vice president of marketing and operations optical systems division, Broadcom. “This is a huge step for co-packaged optics overall.”
Broadcom demonstrated a working version of a Humboldt switch at OFC earlier this year.
The Chinese hyperscaler will use Broadcom’s 25.6Tbps Tomahawk 4 Humboldt, a hybrid design where half of the chip’s input-output (I/O) is optical and half is the chip’s serialisers-deserialisers (serdes) that connect to pluggable modules on the switch’s front panel.
Four Broadcom 3.2-terabit silicon photonics-based optical engines are co-packaged alongside the Tomahawk 4 chip to implement 12.8Tbps of optical I/O.
Broadcom demonstrated a working version of a Humboldt switch at OFC earlier this year.
Co-packaged optics
Broadcom started its co-packaged optics development work in 2019.
“One of the reasons for our investment in co-packaged optics was that we did see firsthand the ongoing limits of copper interconnect being approached,” says Mehta.
The transmission reach of copper links continues to shrink as the signalling speed has increased from 25 gigabits-per-second (Gbps) non-return to zero (NRZ) to PAM-4 (4-level pulse amplitude modulation) based signalling at 56Gbps, 112Gbps and, in the coming years, 224Gbps. Power consumption is also rising with each speed hike.
Broadcom says data centres now use 1 million optical interconnects, but that much of the connectivity is still copper-based, linking adjacent racks and equipment within the rack.
“Hyperscalers spend ten times more on interconnects than switching silicon,” says Mehta. Given these trends, there needs to be a continual improvement in the power profile, cost and scaled manufacturing of optical interconnect, he says.
In the short term, what is driving interest in co-packaged optics is overcoming the limitations of copper, says Broadcom.
In early 2021, Broadcom detailed at a JP Morgan event its co-packaged optics roadmap. Outlined was the 25.6-terabit Humboldt to be followed by Bailly, a 51.2-terabit all co-packaged optics design using Broadcom’s Tomahawk 5 switch chip which is now sampling.
Humboldt uses DR4 (4×100-gigabit using 4 fibres) whereas the 51.2-terabit Bailly will add multiplexing-demultiplexing and use the FR4 specification (4×100-gigabit wavelengths per fibre).
Technology and partners
Broadcom’s in-house technology includes lasers (VCSELs and EMLs), mixed-signal expertise (trans-impedance amplifiers and drivers), and silicon photonics, as well as its switch chips.
Broadcom uses a remote laser source for its co-packaged optics design. Placing the laser away from the package (the switch chip and optics) means no cooling is needed.
Broadcom is working with 15 partners to enable its co-packaged optics, highlighting the breadth of expertise required and the design complexity.

There are two prominent use cases for the hybrid I/O Humboldt.
One is for top-of-rack switches, where the electrical interfaces support short-reach copper links connecting the servers in a rack, while the optical links connect the top-of-rack box to the next layer of aggregation switching.
The second use is at the aggregation layer, where the electrical I/O connects other switches in the rack while the optical links connect to switch layers above or below the aggregation layer.
“There is a use case for having pluggable ports where you can deploy low-cost direct-attached copper,” says Mehta.
Broadcom says each data centre operator will have their own experience with their manufacturing partners as they deploy co-packaged optics. Tencent has decided to enter the fray with 25.6-terabit switches.
“It is not just Broadcom developing the optical solution; it is also ensuring that our manufacturing partner is ready to scale,” says Mehta.
Ruijie Networks is making the two-rack-unit (2RU) switch platform for Tencent based on Broadcom’s co-packaged optics solution. The co-packaged optics interfaces are routed to 16 MPO connectors while the switch supports 32, 400-gigabit QSFP112 modules.
“It’s always important to have your lead partner [Tencent] for any deployment like this, someone you’re working closely with to get it to market,” says Mehta. “But there is interest from other customers as well.”
Cost and power benefits
Broadcom says co-packaged optics will lower the optical cost-per-bit by 40 per cent while the system (switch platform) power savings will be 30 per cent.
Humboldt more than halves the power compared to using pluggables. Broadcom’s co-packaged optics consumes 7W for each 800-gigabits of bandwidth, whereas an equivalent 800-gigabit optical module consumes 16-18W.
Its second-generation design will embrace 5nm CMOS rather than 7nm and still more than halve the power: an 800-gigabit pluggable will consume 14-15W, whereas it will be 5.5W for the same co-packaged optics bandwidth.
Broadcom will move to CMOS for its second-generation electrical IC; it uses silicon germanium at present.
Power and operational cost savings are a longer-term benefit for data centre operators, says Broadcom. A more immediate concern is the growing challenge of managing the thermal profile when designing switching systems. “The amount of localised heat generation of these components is making systems quite challenging,” says Mehta.
A co-packaged design eliminates pluggables, making system design easier by improving airflow via the front panel and reducing the power required for optical interconnect.
“They’ve been telling us this directly,” says Mehta. “It’s been a pretty good testimonial to the benefits they can see for system design and co-packaged optics.”
Roadmap
At OFC 2022, Broadcom also showed a mock-up of Bailly, a 51.2 terabit switch chip co-packaged with eight 6.4Tbps optical engines.
Broadcom will offer customers a fully co-packaged optics Tomahawk 5 design but has not given a date.
Since Broadcom has consistently delivered a doubling of switch silicon capacity every 24 months, a 102.4-terabit Tomahawk 6 is scheduled to sample in the second half of 2024.
That timescale suggests it will be too early to use 224Gbps serdes being specified by the OIF. Indeed, Mehta believes 112Gbps serdes will have “a very long life”.
That would require the next-generation 102.2Tbps to integrate 1024, 100Gbps serdes on a die. Or, if that proves too technically challenging, then, for the first time, Broadcom’s switching ASIC may no longer be a monolithic die.
Broadcom’s networking group is focused on high-speed serial electrical interfaces. But the company is encouraged by developments such as the open standard UCIe for package interconnect, which looks at slower, wider parallel electrical interfaces to support chiplets. UCIe promises to benefit co-packaged optics.
Broadcom’s view is that it is still early with many of these design challenges.
“Our goal is to understand when we need to be ready and when we need to be launching our silicon on the optical side,” says Mehta. “That’s something we are working towards; it’s still not clear yet.”
Broadcom samples the first 51.2-terabit switch chip

- Broadcom’s Tomahawk 5 marks the era of the 51.2-terabit switch chip
- The 5nm CMOS device consumes less than 500W
- The Tomahawk 5 uses 512, 100-gigabit PAM-4 (4-level pulse amplitude modulation) serdes (serialisers-deserialisers)
- Broadcom will offer a co-packaged version combining the chip with eight 6.4 terabit-per-second (Tbps) optical engines
Part 1: Broadcom’s Tomahawk 5
Broadcom is sampling the world’s first 51.2-terabit switch chip.
With the Tomahawk 5, Broadcom continues to double switch silicon capacity every 24 months; Broadcom launched the first 3.2-terabit Tomahawk was launched in September 2014.
“Broadcom is once again first to market at 51.2Tbps,” says Bob Wheeler, principal analyst at Wheeler’s Network. “It continues to execute, while competitors have struggled to deliver multiple generations in a timely manner.”
Tomahawk family
Hyperscalers use the Tomahawk switch chip family in their data centres.
Broadcom launched the 25.6-terabit Tomahawk 4 in December 2019. The chip uses 512 serdes, but these are 50-gigabit PAM-4. At the time, 50-gigabit PAM-4 matched the optical modules’ 8-channel input-output (I/O).
Certain hyperscalers wanted to wait for 400-gigabit optical modules using four 100-gigabit PAM-4 electrical channels, so, in late 2020, Broadcom launched the Tomahawk4-100G switch chip, which employs 256, 100-gigabit PAM-4 serdes.
Tomahawk 5 doubles the 100-gigabit PAM-4 serdes to 512. However, given that 200-gigabit electrical interfaces are several years off, Broadcom is unlikely to launch a second-generation Tomahawk 5 with 256, 200-gigabit PAM-4 serdes.

Switch ICs
Broadcom has three switch chip families: Trident, Jericho and the Tomahawk.
The three switch chip families are needed since no one switch chip architecture can meet all the markets’ requirements.
With its programable pipeline, Trident targets enterprises, while Jericho targets service providers.
According to Peter Del Vecchio, Broadcom’s product manager for the Tomahawk and Trident lines, there is some crossover. For example, certain hyperscalers favour the Trident’s programmable pipeline for their top-of-rack switches, which interface to the higher-capacity Tomahawk switches chips at the aggregation layer.
Monolithic design
The Tomahawk 5 continues Broadcom’s approach of using a monolithic die design.
“It [the Tomahawk5] is not reticule-limited, and going to [the smaller] 5nm [CMOS process] helps,” says Del Vecchio.
The alternative approach – a die and chiplets – adds overall latency and consumes more power, given the die and chiplets must be interfaced. Power consumption and signal delay also rise whether a high-speed serial or a slower, wider parallel bus is used to interface the two.
Equally, such a disaggregated design requires an interposer on which the two die types sit, adding cost.
Chip features
Broadcom says the capacity of its switch chips has increased 80x in the last 12 years; in 2010, Broadcom launched the 640-gigabit Trident.
Broadcom has also improved energy efficiency by 20x during the same period.
“Delivering less than 1W per 100Gbps is pretty astounding given the diminishing benefits of moving from a 7nm to a 5nm process technology,” says Wheeler.
“In general, we have achieved a 30 per cent plus power savings between Tomahawk generations in terms of Watts-per-gigabit,” says Del Vecchio.

These power savings are not just from advances in CMOS process technology but also architectural improvements, custom physical IP designed for switch silicon and physical design expertise.
“We create six to eight switch chips every year, so we’ve gotten very good at optimising for power,” says Del Vecchio
The latest switch IC also adds features to support artificial intelligence (AI)/ machine learning, an increasingly important hyperscaler workload.
AI/ machine learning traffic flows have a small number of massive ‘elephant’ flows alongside ‘mice’ flows. The switch chip adds elephant flow load balancing to tackle congestion that can arise when the two flow classes mix.
“The problem with AI workloads is that the flows are relatively static so that traditional hash-based load balancing will send them over the same links,” says Wheeler. “Broadcom has added dynamic balancing that accounts for link utilisation to distribute better these elephant flows.”
The Tomahawk 5 also provides more telemetry information so data centre operators can better see and tackle overall traffic congestion.
The chip has added virtualisation support, including improved security of workloads in a massively shared infrastructure.
Del Vecchio says that with emerging 800-gigabit optical modules and 1.6 terabit ones on the horizon, the Tomahawk 5 is designed to handle multiples of 400 Gigabit Ethernet (GbE) and will support 800-gigabit optical modules.
The chip’s 100-gigabit physical layer interfaces are combined to form 800 gigabit (8 by 100 gigabit), which is fed to the MAC, packet processing pipeline and the Memory Management Unit to create a logical 800-gigabit port. “After the MAC, it’s one flow, not at 400 gigabits but now at 800 gigabits,” says Del Vecchio.
Market research firm, Dell’Oro, says that 400GbE accounts for 15 per cent of port revenues and that by 2026 it will rise to 57 per cent.
Broadcom also cites independent lab test data showing that its support for RDMA over Converged Ethernet (RoCE) matches the performance of Infiniband.
“We’re attempting to correct the misconception promoted by competition that Infiniband is needed to provide good performance for AI/ machine learning workloads,” says Del Vecchio. The tests used previous generation silicon, not the Tomahawk 5.
“We’re saying this now since machine learning workloads are becoming increasingly common in hyperscale data centres,” says Del Vecchio.
As for the chip’s serdes, they can drive 4m of direct attached copper cabling, with sufficient reach to connect equipment within a rack or between two adjacent racks.
Software support
Broadcom offers a software development kit (SDK) to create applications. The same SDK is common to all three of its switch chip families.
Broadcom also supports the Switch Abstraction Interface (SAI). This standards-based programming interface sits on top of the SDK, allowing the programming of switches independent of the silicon provider.
Broadcom says some customers prefer to use its custom SDK. It can take time for changes to filter up, and a customer may want something undertaken that Broadcom can develop quickly using its SDK.
System benefits
Doubling the switch chip’s capacity every 24 months delivers system benefits.That is because implementing a 51.2-terabit switch using the current generation Tomahawk 4 requires six such devices.

Now a single 2-rack-unit (2RU) Tomahawk 5 switch chip can support 64 by 800-gigabit, 128 by 400-gigabit and 256 by 200-gigabit modules.
These switch boxes are air-cooled, says Broadcom.
Co-packaged optics
In early 2021 at a J.P Morgan analyst event, Broadcom revealed its co-packaged optics roadmap that highlighted Humboldt, a 25.6-terabit switch chip co-packaged with optics, and Bailly, a 51.2-terabit fully co-packaged optics design.
At OFC 2022, Broadcom demonstrated a 25.6Tbps switch that sent half of the traffic using optical engines.
Also shown was a mock-up of Bailly, a 51.2 terabit switch chip co-packaged with eight optical engines, each at 6.4Tbps.
Broadcom will offer customers a fully co-packaged optics Tomahawk 5 design but has not yet given a date.
Broadcom can also support a customer if they want tailored connectivity with, say, 3/4 of the Tomahawk 5 interfaces using optical engines and the remainder using electrical interfaces to front panel optics.
Enabling 800-gigabit optics with physical layer ICs
Broadcom recently announced a family of 800-gigabit physical layer (PHY) chips. The device family is the company’s first 800-gigabit ICs with 100-gigabit input-output (I/O) interfaces.

Source: Broadcom
Moving from 50-gigabit to 100-gigabit-based I/O enables a new generation of 800-gigabit modules aligned with the latest switch chips.
“With the switch chip having 100-gigabit I/Os, PHYs are needed with the same interfaces,” says Machhi Khushrow, senior director of marketing, physical layer products division at Broadcom.
Broadcom’s latest 25.6 terabit-per-second (Tbps) Tomahawk 4 switch chip using 100-gigabit I/O was revealed at the same time.
800-gigabit PHY devices
The portfolio comprises three 800-gigabit PHY ICs. All operate at a symbol rate of 53 gigabaud, use 4-level pulse amplitude modulation (PAM-4) and are implemented in a 7nm CMOS process.
Two devices are optical PHYs: the BCM87800 and the BCM87802. These ICs are used within 800-gigabit optical modules such as the QSFP-DD800 and the OSFP form factors. The difference between the two chips is that the BCM87802 includes an integrated driver.
The third PHY - the BCM87360 - is a retimer IC used on line cards. Whether the chip is needed depends on the line card design and signal-integrity requirements; for example, whether the line card is used within a pizza box or part of a chassis-based platform.

Source: Broadcom
“If it is a higher-density card that is relatively small, it may only need 15 per cent of the ports with retimers,” says Khushrow. “If the line card is larger, where things fan out to longer traces, retimers may be needed for all the ports.”
All three 800-gigabit PHYs have eight 100-gigabit transmit and eight receive channels (8:8, as shown in the top diagram).
Applications
The optical devices support several 800-gigabit module designs that use either silicon photonics, directly modulated lasers (DMLs) or externally-modulated lasers (EMLs).
The 800-gigabit PHYs support the DR8 module (8 single-mode fibres, 500m reach), two 400-gigabit DR4 (4 single-mode fibres, 500m) or two FR4 in a module (each 4 wavelengths on a single-mode fibre, 2km) as well as the SR8, a parallel VCSEL-based design with a reach of 100m over parallel multi-mode fibre.
Timescales
Given the availability of these PHYs and that 800-gigabit modules will soon appear, will the development diminish the 400-gigabit market opportinity?
“This is independent of 400-gigabit [module] deployments,” says Khushrow.
The hyperscalers are deploying different architectures. There are hyperscalers that are only now transitioning to 200-gigabit modules while others are transitioning to 400- gigabit. They will all transition to 800 gigabit, he says: “How and when they transition are all at different points.”
Some of the hyperscalers deploying 400-gigabit modules are looking at 800 gigabit, and their deployment plans are maybe two to three years out. “We don’t expect 800 gigabit to cannibalise 400 gigabit, at least not in the near term,” he says.
Broadcom says 800-gigabit modules to ship in the second half of this year. “It all depends on how the switch infrastructure, line cards and optics become available,” says Khushrow.
Next developments
The landscape for high-speed networking in the data centre is changing and optics is moving closer to the switch chip, whether it is on-board optics or co-packaged optics.
“People are looking at both options,” says Khushrow.”It depends on the architecture of the data centre whether they use on-board optics or co-packaged optics.”
Meanwhile, the OIF is working on a 200-gigabit electrical interface standard.
Co-packaged optics is challenging and the technology has its own issues whereas optical transceivers are easier to use and deploy, says Khushrow.
Current industry thinking is that some form of co-packaged optics will be used with the adevnt of next-generation 51.2-terabit switch chips. But even with such capacity switches, pluggables will continue to be used, he says.
There will still be a need for PHYs, whether for pluggables, co-packaged designs or on the linecard.
“We will continue to provide those on our roadmap,” says Khushrow. “It is just a matter of what the form factor will be, whether it will be a packaged part or a die part.”


