Celestial AI Archives

OFC 2024 industry reflections: Part 2

Gazettabyte is asking industry figures for their thoughts after attending the recent OFC show in San Diego. Here are the thoughts from Ciena, Celestial AI and Heavy Reading.

Dino DiPerna, Senior Vice President, Global Research and Development at Ciena.

Power efficiency was a key theme at OFC this year. Although it has been a prevalent topic for some time, it stood out more than usual at OFC 2024 as the industry strives to make further improvements.

There was a vast array of presentations focused on power efficiency gains and technological advancements, with sessions highlighting picojoule-per-bit (pJ/b) requirements, high-speed interconnect evolution including co-packaged optics (CPO), linear pluggable optics (LPO), and linear retimer optics (LRO), as well as new materials like thin-film lithium niobate, coherent transceiver evolution, and liquid cooling.

And the list of technologies goes on. The industry is innovating across multiple fronts to support data centre architecture requirements and carbon footprint reduction goals as energy efficiency tops the list of network provider needs.

Another hot topic at OFC was overcoming network scale challenges with various explorations in new frequency bands or fibre types.

One surprise from the show was learning of the achievement of less than 0.11dB/km loss for hollow-core optical fibre, which was presented in the post-deadline session. This achievement offers a new avenue to address the challenge of delivering the higher capacities required in future networks. So, it is one to keep an eye on for sure.

Preet Virk, Co-founder and Chief Operating Officer at Celestial AI.

This year’s OFC was all about AI infrastructure. Since it is an optical conference, the focus is on optical connectivity. A common theme was how interconnect bandwidth is the oxygen for AI infrastructure. Celestial AI agrees fully with this and adds the memory capacity issue to deal with the Memory Wall problem.

Traditionally, OFC has focused on inter- and intra-data centre connectivity. This year’s OFC clarified that chip-to-chip connectivity is also a critical bottleneck. We discussed our high-bandwidth, low-latency, and low-power photonic fabric solutions for compute-to-memory and compute-to-compute connectivity, which were well received at the show.

It seemed that we were the only company with optical connectivity that satisfies bandwidths for high-bandwidth memory—HBM3 and the coming HBM4—with our optical chiplet.

Sterling Perrin, Senior Principal Analyst, Heavy Reading.

OFC is the premier global event for the optics industry and the place to go to get up to speed quickly on trends that will drive the optics industry through the year and beyond. There’s always a theme that ties optics into the overall communications industry zeitgeist. This year’s theme, of course, is AI. OFC themes are sometimes a stretch – think connected cars – but this is not the case for the role of optics in AI where the need is immediate. And the role is clear: higher capacities and lower power consumption.

The fact that OFC took place one week after Nvidia’s GTC event during which President and CEO Jensen Huang unveiled the Grace-Blackwell Superchip was a perfect catalyst for discussions about the urgency for 800 gigabit and 1.6 terabit connectivity within the data centre.

At a Sunday workshop on linear pluggable optics (LPO), Alibaba’s Chongjin Xie presented a slide comparing LPO and 400 gigabit DR4 that showed 50 per cent reduction in power consumption, a 100 per cent reduction in latency, and a 30 per cent reduction in production cost. But, as Xie and many others noted throughout the conference, LPO feasibility at 200 gigabit per lane remains a major industry challenge that has yet to be solved.

Another topic of intense debate within the data centre is Infiniband versus Ethernet. Infiniband delivers high capacity and extremely low latency required for AI training, but it’s expensive, highly complex, and closed. The Ultra Ethernet Consortium aims to build an open, Ethernet-based alternative for AI and high-performance computing. But Nvidia product architect, Ashkan Seyedi, was skeptical about the need for high-performance Ethernet. During a media luncheon, he noted that InfiniBand was developed as a high-performance, low-latency alternative to Ethernet for high-performance computing. Current Ethernet efforts, therefore, are largely trying to re-create InfiniBand, in his view.

The comments above are all about connectivity within the data centre. Outside the data centre, OFC buzz was harder to find. What about AI and data centre interconnect? It’s not here yet. Connectivity between racks and AI clusters is measured in meters for many reasons. There was much talk about building distributed data centres in the future as a means of reducing the demands on individual power grids, but it’s preliminary at this point.

While data centres strive toward 1.6 terabit, 400 gigabit seems to be the data rate of highest interest for most telecom operators (i.e., non-hyperscalers), with pluggable optics as the preferred form factor. I interviewed the OIF’s inimitable Karl Gass, who was dressed in a shiny golden suit, about their impressive coherent demo that included 23 suppliers and demonstrated 400ZR, 400G ZR+, 800ZR, and OpenROADM.

Lastly, quantum safe networking popped up several times at Mobile World Congress this year and the theme continued at OFC. The topic looks poised to move out of academia and into networks, and optical networking has a central role to play. I learned two things.

First, “Q-Day”, when quantum computers can break public encryption keys, may be many years away, but certain entities such as governments and financial institutions want their traffic to be quantum safe well in advance of the elusive Q-Day.

Second, “quantum safe” may not require quantum technology though, like most new areas, there is debate here. In the fighting-quantum-without-quantum camp, Israel-based start-up CyberRidge has developed an approach to transmitting keys and data, safe from quantum computers, that it calls photonic level security.

by Michael

The computing problem of our time: Moving data

David Lazovsky

Celestial AI’s Photonic Fabric technology can deliver up to 700 terabits per second of bidirectional bandwidth per chip package.
The start-up has recently raised $100 million in funding.

The size of AI models that implement machine learning continue to grow staggeringly fast.

Such AI models are used for computer vision, large language models such as ChatGPT, and recommendation systems that rank items such as search results and music playlists.

The workhorse silicon used to build such AI models are graphics processing units (GPUs). GPU processing performance and their memory size may be advancing impressively but AI model growth is far outpacing their processing and input-output [I/O] capabilities.

To tackle large AI model workloads, hundreds and even thousands of GPUs are deployed in parallel for boost overall processing performance and high-performance memory storage capacity.

But it is proving hugely challenging to scale such parallel systems and feed sufficient data to the expensive processing nodes so they can do their work.

Or as David Lazovsky, CEO of start-up Celestial AI puts it, data movement has become the computing problem of our time.

Input-output bottleneck

The data movement challenge and scaling hardware for machine learning has caused certain AI start-ups to refocus, looking beyond AI processor development to how silicon photonics can tackle the input-output [I/O] bottleneck.

Lightelligence is one such start-up; Celestial AI is another.

Founded in 2020, Celestial AI has raised $100 million in its latest round of funding, and $165 million overall.

Celestial AI’s products include the Orion AI processor and its Photonic Fabric, an optoelectronic system-in-package comprising a silicon photonics chip and the associated electronics IC.

The Photonic Fabric uses two technological differentiators: a thermally stable optical modulator, and an electrical IC implemented in advanced CMOS.

The Photonic Fabric. Source: Celestial AI

Thermally stable modulation

Many companies use a ring resonator modulator for their co-packaged optics designs, says Lazovsky. Ring resonator modulators are tiny but sensitive to heat, so they must be temperature-controlled to work optimally.

“The challenge of rings is that they are thermally stable to about one degree Celsius,” says Lazovsky.

Celestial AI uses silicon photonics as an interposer such that it sits under the ASIC, a large chip operating at high temperatures.

“Using silicon photonics to deliver optical bandwidth to a GPU that’s running at 500-600 Watts, that’s just not going to work for a ring,” says Lazovsky, adding that even integrating silicon photonics into memory chips that consume 30W will not work.

Celestial AI uses a 60x more thermally stable modulator than a ring modulator.

The start-up uses continuous wave distributed feedback laser (DFB) lasers as the light source, the same lasers used for 400-gigabit DR4 and FR4 pluggable transceivers, and sets their wavelength to the high end of the operating window.

The result is a 60-degree operating window where the silicon photonics circuits can operate. “We can also add closed-loop control if necessary,” says Lazovsky.

Celestial AI is not revealing the details of its technology, but the laser source is believed to be external to the silicon photonics chip.

Thus a key challenge is getting the modulator to work stably so close to the ASIC, and this Celestial AI says it has done.

Advanced CMOS electronics

The start-up says TSMC’s 4nm and 5nm CMOS are the process nodes to be used for the Photonic Fabric’s electronics IC accompanying the optics.

“We are qualifying our technology for both 4nm and 5nm,” says Lazovsky. “Celestial AI’s current products are built using TSMC 5nm, but we have also validated the Photonic Fabric using 4nm for the ASIC in support of our IP licensing business.”

The electronics IC includes the modulator’s drive circuitry and the receiver’s trans-impedance amplifier (TIA).

Celestial AI has deliberately chosen to implement the electronics in a separate chip rather than use a monolithic design as done by other companies. With a monolithic chip, the optics and electronics are implemented using the same 45nm silicon photonics process.

But a 45nm process for the electronics is already an old process, says the start-up.

Using state-of-the-art 4nm or 5nm CMOS cuts down the area and the power requirements of the modulation driver and TIA. The optics and electronics are tightly aligned, less than 150 microns apart.

“We are mirroring the layout of our drivers and TIAs in electronics with the modulator and the photodiode in silicon photonics such that they are directly on top of each other,” says Lazovsky.

The proximity ensures a high signal-to-noise ratio; no advanced forward error correction (FEC) scheme or a digital signal processor (DSP) is needed. The short distances also reduce latency.

This contrasts with co-packaged optics, where chiplets surround the ASIC to provide optical I/O but take up valuable space alongside the ASIC edge, referred to as beachfront.

If the ASIC is a GPU, such chiplets must compete with stacked memory packages – the latest version being High Bandwidth Memory 3 (HBM3) – that also must be placed close to the ASIC.

There is also only so much space for the HBM3’s 1024-bit wide interface to move data, a problem also shared by co-packaged optics, says Lazovsky.

Using the Universal Chiplet Interconnect Express (UCIe) interface, for example, there is a limit to the bandwidth that can be distributed, not just to the chip but across the chip too.

“The beauty of the Photonic Fabric is not just that we have much higher bandwidth density, but that we can deliver that bandwidth anywhere within the system,” says Lazovsky.

The interface comes from below the ASIC and can deliver data to where it is needed: to the ASIC’s compute engines and on-chip Level 2 cache memory.

Bandwidth density

Celestial AI’s first-generation implementation uses four channels of 56 gigabits of non-return-to-zero signalling to deliver up to 700 terabit-per-second (Tbps) total bidirectional bandwidth per package.

How this number is arrived have not been given, but it is based on feeding the I/O via the ASIC’s surface area rather than the chip’s edges.

To put that in perspective, Nvidia’s latest Hopper H100 Tensor Core GPU uses five HBM3 sites. These sites deliver 80 gigabytes of memory and over three terabytes-per-second – 30Tbps – total memory bandwidth.”

The industry trend is to add more HBM memory in-package, but AI models are growing hundreds of times faster. “You need orders of magnitude more memory for a single workload than can fit on a chip,” he says.

Accordingly, vast amounts of efficient I/O are needed to link AI processors to remote pools of high-bandwidth memory by disaggregating memory from compute.

Celestial AI is now working on its second-generation interface that is expected in 18 months. The newer interface quadruples the package bandwidth to >2,000Tbps. The interface uses 4-level pulse amplitude modulation (PAM-4) signaling to deliver 112Gbps per channel and doubles the channel count from four to eight.

“The fight is about bandwidth density, getting large-scale parameters from external memory to the point of computing as efficiently as possible,” says Lazovsky,

By efficiently, Lazovsky means bandwidth, energy, and latency. And low latency for AI applications translates to revenues.

Celestial AI believes its Photonics Fabric technology is game-changing due to the bandwidth density achieved while overcoming the beachfront issue.

Composible memory

Celestial AI changed its priorities to focus on memory disaggregation after working with hyperscalers for the last two years.

The start-up will use its latest funding to expand its commercial activities.

“We’re building optically interconnected, high-capacity and high-bandwidth memory systems to allow our customers to develop composable resources,” says Lazovsky.

Celestial AI is using its Photonic fabric to enable 16 servers (via PCI Express cards) to access a single high-capacity optical-enabled DDR, HBM and hybrid pooled memory.

Another implementation will use its technology in chiplet form via the UCIe interface. Here, the bandwidth is 14.4Tbps, more than twice the speed of the leading co-packaged optics solutions.

Celestial AI also has an optical multi-chip interconnect bridge (OMIB), enabling an ASIC to access pooled high-capacity external memory in a 40ns round trip. OMIB can also be used to link chips optically on a multi-chip module.

Celestial AI stressed that its technology is not limited to memory disaggregation. The Photonic Fabric came out of the company looking to scale multiples of its Orion AI processors.

Celestial AI supports the JEDEC HBM standard and CXL 2.0 and 3.0, as well as other physical interface technologies such as Nvidia’s NVlink and AMD’s Infinity fabric.

“It is not limited to our proprietary protocol,” says Lazovsky.

The start-up is in discussions with ‘multiple’ companies interested in its technology, while Broadcom is a design services partner. Near Margalit, vice president and general manager of Broadcom’s optical systems division, is a technical advisor to the start-up.

Overall, the industry trend is to move from general computing to accelerated computing in data centres. That will drive more AI processors and more memory and compute disaggregation.

“It is optical,” says Lazovsky: “There is no other way to do it.”

by Michael