Gazettabyte is asking industry figures for their thoughts after attending the recent 50th-anniversary ECOC show in Frankfurt. Here are the first contributions from Huawei's Maxim Kuschnerov, Coherent's Vipul Bhatt, and Broadcom's Rajiv Pancholy.
Maxim Kuschnerov, Director R&D, Optical & Quantum Communication Laboratory at Huawei.
At ECOC, my main interest concerned the evolution of data centre networking to 400 gigabits per lane for optics and electronics. Historically, the adoption of new optical line rates always preceded the serdes electrical interconnects but now copper cables are likely to drive much of the leading development work at 400 gigabit per lane.
Arista Networks argued that 448G-PAM6 works better for copper, while 448G-PAM4 is the better choice for optics - a recurring argument. While PAM6 signalling is certainly more suitable for longer copper cables, it will face even tougher challenges on the optical side with increasing reflection requirements in newly built, dusty data centres. Also, a linear drive option for future Ethernet will be imperative, given the DSP's increasing share of the the consumption in pluggable modules. Here, a native 448G-PAM4 format for the serdes (the attachment unit interface or AUI) and optics looks more practical.
My most important takeaway regarding components was the initial feasibility of electro-absorption modulated lasers (EMLs) with a greater than 100GHz analogue bandwidth, presented by Lumentum and Mitsubishi publicly and other companies privately. Along with thin-film lithium niobate (TFLN) Mach–Zehnder modulators suited for Direct Reach (DR) applications with shared lasers, EMLs have historically offered low cost, small size and native laser integration.
For 1.6-terabit modules, everyone is waiting on the system availability of 224-gigabit serdes at a switch and network interface card (NIC) level. The power consumption of 1.6-terabit optical modules will improve with 3nm CMOS DSPs and native 200 gigabit per lane. Still, it gets into an unhealthy region where the network cable power consumption is in the same ballpark as the system function of switching. Here, the bet on LPO certainly didn't pay off at 100 gigabits per lane and will not pay off at 200 gigabits per lane at scale. The question is whether linear receive optics (LRO)/ half-retimed approaches will enter the market. Technically, it's feasible. So, it might take one big market player with enough vertical integration capability and a need to reduce power consumption to move the needle into this more proprietary, closed-system direction. Nvidia showcased their PAM4 DSP at the show. Just saying...
212G VCSELs are still uncertain. There is a tight initial deployment window to be hit if these high-speed VCSELS are to displace single-mode fibre-based optics at the major operators. Coherent's results of 34GHz bandwidth are not sufficient and don't look like something that could yet be produced at scale. Claims by some companies that a 400 gigabit per lane VCSEL is feasible sound hollow for now, with the industry crawling around the 30GHz bandwidth window.
Last but not least, co-packaged optics. For years, this technology couldn't escape gimmick status. Certainly, reliability, serviceability, and testability of co-packaged optics using today's methodology would make a deployment impractical. However, the big prize at 400 gigabit per lane is saving power - a significant operational expense for operators - something that is too attractive to ignore.
The targets of improving optics diagnostics, developing higher-performance dust-reflection DSP algorithms to deal with multi-path interference, adopting more resiliency to failure in the network, and introducing a higher degree of laser sparing are not insurmountable tasks if the industry sets its mind to them. Given the ludicrous goals of the AI industry, which is reactivating and rebranding nuclear power plants, a significant reduction in network power might finally serve a higher purpose than just building a plumber's pipe.
Vipul Bhatt, Vice President of Marketing, Datacom Vertical, Coherent
ECOC 2024 was the most convincing testimony that the optical transceiver industry has risen to the challenge of AI’s explosive growth. There was hype, but I saw more solid work than hype. I saw demonstrations and presentations affirming that the 800-gigabit generation was maturing quickly, while preparations are underway for the next leap to 1.6 terabit and then 3.2 terabit.
This is no small feat, because the optics for AI is more demanding in three ways. I call them the three P’s of AI optics: performance, proliferation, and pace.
Performance because 200 gigabit PAM4 optical lanes must work with a low error rate at higher bandwidth. Proliferation because the drive to reduce power consumption has added new transceiver variants like linear packaged optics (LPO) and linear receive optics (LRO). And pace because the specifications of AI optics are evolving at a faster pace than traditional IEEE standards.
Rajiv Pancholy, Director of Hyperscale Strategy and Products, Optical Systems Division, Broadcom
As generative AI systems move to unsupervised, transformer-based parallel architectures, there is less time for resending packets due to data transmission errors. Improved bit error rates are thus required to reduce training times while higher interconnect bandwidth and data rates are needed to support larger GPU clusters. These compute networks are already moving to 224 gigabit PAM4 well before the previous generation at 112 gigabit PAM4 was allowed to reach hyperscale deployment volumes.
The problem is scalability with a high-radix supporting all-to-all connectivity. The power for a single rack of 72 GPUs is 120kW, and even with liquid cooling, this becomes challenging. Interconnecting larger scale-up and scale-out AI computing clusters requires more switching layers which increases latency.
Furthermore, after 224 gigabit PAM4, the losses through copper at 448 gigabit PAM4 make link distances from the ASIC too short. Moving to modulation schemes like PAM-6 or PAM-8 presents a problem for the optics, which would need to stay at 448 gigabit PAM4 to minimize crosstalk and insertion losses.
Supporting 448 gigabit PAM4 with optics then potentially requires new materials to be integrated into silicon, like thin-film lithium niobate (TFLN) and Barium Titanate (BaTiO3), electro-optic (EO) polymers, and III-V materials like Indium Phosphate (InP) and Gallium Arsenide (GaAs). So now we have a gearbox and, potentially, a higher forward error correction (FEC) coding gain is required, adding more power and latency before the signal even gets to the transmit-side optics.
There were 1.6-terabit OSFP transceivers operating with eight lanes of 212.5 gigabit PAM4 while vendors continue to work towards a 3.2-terabit OSFP-XD. With 32 x 3.2Tbps pluggables operating at 40W each, the optical interconnect power would be 1.3kW for a 102.4Tbps switch. And if you use 64 x 1.6Tbps OSFP at 25W each, the optical interconnect power will be eben higher, at 1.6kW. I wonder how linear pluggable optics can compensate for all the path impairments and reflections at high data rates from pluggable solutions. Perhaps you can relax link budgets, temperature requirements, and interoperability compliance.
The best session this year was the last ECOC Market Focus panel on the Tuesday, which kept everyone a bit longer before they could figure out where in Frankfurt Oktoberfest beer was on tap. The panel addressed “Next-Gen Networking Optics like 1.6T or 3.2T”. All but one of the participants discussed the need and a migration to co-packaged optics, which we at Broadcom first demonstrated in March 2022.
It was great to also present at the ECOC Market Focus forum. My presentation was titled “Will you need CPO in 3 years?” Last year in Glasgow, I gave a similar presentation: “Will you need CPO in 5 years?”