The computing problem of our time: Moving data

David Lazovsky

  • Celestial AI’s Photonic Fabric technology can deliver up to 700 terabits per second of bidirectional bandwidth per chip package.
  • The start-up has recently raised $100 million in funding.

The size of AI models that implement machine learning continue to grow staggeringly fast.

Such AI models are used for computer vision, large language models such as ChatGPT, and recommendation systems that rank items such as search results and music playlists.

The workhorse silicon used to build such AI models are graphics processing units (GPUs). GPU processing performance and their memory size may be advancing impressively but AI model growth is far outpacing their processing and input-output [I/O] capabilities.

To tackle large AI model workloads, hundreds and even thousands of GPUs are deployed in parallel for boost overall processing performance and high-performance memory storage capacity.

But it is proving hugely challenging to scale such parallel systems and feed sufficient data to the expensive processing nodes so they can do their work.

Or as David Lazovsky, CEO of start-up Celestial AI puts it, data movement has become the computing problem of our time.

Input-output bottleneck

The data movement challenge and scaling hardware for machine learning has caused certain AI start-ups to refocus, looking beyond AI processor development to how silicon photonics can tackle the input-output [I/O] bottleneck.

Lightelligence is one such start-up; Celestial AI is another.

Founded in 2020, Celestial AI has raised $100 million in its latest round of funding, and $165 million overall.

Celestial AI’s products include the Orion AI processor and its Photonic Fabric, an optoelectronic system-in-package comprising a silicon photonics chip and the associated electronics IC.

The Photonic Fabric uses two technological differentiators: a thermally stable optical modulator, and an electrical IC implemented in advanced CMOS.

The Photonic Fabric. Source: Celestial AI

Thermally stable modulation

Many companies use a ring resonator modulator for their co-packaged optics designs, says Lazovsky. Ring resonator modulators are tiny but sensitive to heat, so they must be temperature-controlled to work optimally.

“The challenge of rings is that they are thermally stable to about one degree Celsius,” says Lazovsky.

Celestial AI uses silicon photonics as an interposer such that it sits under the ASIC, a large chip operating at high temperatures.

“Using silicon photonics to deliver optical bandwidth to a GPU that’s running at 500-600 Watts, that’s just not going to work for a ring,” says Lazovsky, adding that even integrating silicon photonics into memory chips that consume 30W will not work.

Celestial AI uses a 60x more thermally stable modulator than a ring modulator.

The start-up uses continuous wave distributed feedback laser (DFB) lasers as the light source, the same lasers used for 400-gigabit DR4 and FR4 pluggable transceivers, and sets their wavelength to the high end of the operating window.

The result is a 60-degree operating window where the silicon photonics circuits can operate. “We can also add closed-loop control if necessary,” says Lazovsky.

Celestial AI is not revealing the details of its technology, but the laser source is believed to be external to the silicon photonics chip.

Thus a key challenge is getting the modulator to work stably so close to the ASIC, and this Celestial AI says it has done.

Advanced CMOS electronics

The start-up says TSMC’s 4nm and 5nm CMOS are the process nodes to be used for the Photonic Fabric’s electronics IC accompanying the optics.

“We are qualifying our technology for both 4nm and 5nm,” says Lazovsky. “Celestial AI’s current products are built using TSMC 5nm, but we have also validated the Photonic Fabric using 4nm for the ASIC in support of our IP licensing business.”

The electronics IC includes the modulator’s drive circuitry and the receiver’s trans-impedance amplifier (TIA).

Celestial AI has deliberately chosen to implement the electronics in a separate chip rather than use a monolithic design as done by other companies. With a monolithic chip, the optics and electronics are implemented using the same 45nm silicon photonics process.

But a 45nm process for the electronics is already an old process, says the start-up.

Using state-of-the-art 4nm or 5nm CMOS cuts down the area and the power requirements of the modulation driver and TIA. The optics and electronics are tightly aligned, less than 150 microns apart.

“We are mirroring the layout of our drivers and TIAs in electronics with the modulator and the photodiode in silicon photonics such that they are directly on top of each other,” says Lazovsky.

The proximity ensures a high signal-to-noise ratio; no advanced forward error correction (FEC) scheme or a digital signal processor (DSP) is needed. The short distances also reduce latency.

This contrasts with co-packaged optics, where chiplets surround the ASIC to provide optical I/O but take up valuable space alongside the ASIC edge, referred to as beachfront.

If the ASIC is a GPU, such chiplets must compete with stacked memory packages – the latest version being High Bandwidth Memory 3 (HBM3) – that also must be placed close to the ASIC.

There is also only so much space for the HBM3’s 1024-bit wide interface to move data, a problem also shared by co-packaged optics, says Lazovsky.

Using the Universal Chiplet Interconnect Express (UCIe) interface, for example, there is a limit to the bandwidth that can be distributed, not just to the chip but across the chip too.

“The beauty of the Photonic Fabric is not just that we have much higher bandwidth density, but that we can deliver that bandwidth anywhere within the system,” says Lazovsky.

The interface comes from below the ASIC and can deliver data to where it is needed: to the ASIC’s compute engines and on-chip Level 2 cache memory.

Bandwidth density

Celestial AI’s first-generation implementation uses four channels of 56 gigabits of non-return-to-zero signalling to deliver up to 700 terabit-per-second (Tbps) total bidirectional bandwidth per package.

How this number is arrived have not been given, but it is based on feeding the I/O via the ASIC’s surface area rather than the chip’s edges.

To put that in perspective, Nvidia’s latest Hopper H100 Tensor Core GPU uses five HBM3 sites. These sites deliver 80 gigabytes of memory and over three terabytes-per-second – 30Tbps – total memory bandwidth.”

The industry trend is to add more HBM memory in-package, but AI models are growing hundreds of times faster. “You need orders of magnitude more memory for a single workload than can fit on a chip,” he says.

Accordingly, vast amounts of efficient I/O are needed to link AI processors to remote pools of high-bandwidth memory by disaggregating memory from compute.

Celestial AI is now working on its second-generation interface that is expected in 18 months. The newer interface quadruples the package bandwidth to >2,000Tbps. The interface uses 4-level pulse amplitude modulation (PAM-4) signaling to deliver 112Gbps per channel and doubles the channel count from four to eight.

“The fight is about bandwidth density, getting large-scale parameters from external memory to the point of computing as efficiently as possible,” says Lazovsky,

By efficiently, Lazovsky means bandwidth, energy, and latency. And low latency for AI applications translates to revenues.

Celestial AI believes its Photonics Fabric technology is game-changing due to the bandwidth density achieved while overcoming the beachfront issue.

Composible memory

Celestial AI changed its priorities to focus on memory disaggregation after working with hyperscalers for the last two years.

The start-up will use its latest funding to expand its commercial activities.

“We’re building optically interconnected, high-capacity and high-bandwidth memory systems to allow our customers to develop composable resources,” says Lazovsky.

Celestial AI is using its Photonic fabric to enable 16 servers (via PCI Express cards) to access a single high-capacity optical-enabled DDR, HBM and hybrid pooled memory.

Another implementation will use its technology in chiplet form via the UCIe interface. Here, the bandwidth is 14.4Tbps, more than twice the speed of the leading co-packaged optics solutions.

Celestial AI also has an optical multi-chip interconnect bridge (OMIB), enabling an ASIC to access pooled high-capacity external memory in a 40ns round trip. OMIB can also be used to link chips optically on a multi-chip module.

Celestial AI stressed that its technology is not limited to memory disaggregation. The Photonic Fabric came out of the company looking to scale multiples of its Orion AI processors.

Celestial AI supports the JEDEC HBM standard and CXL 2.0 and 3.0, as well as other physical interface technologies such as Nvidia’s NVlink and AMD’s Infinity fabric.

“It is not limited to our proprietary protocol,” says Lazovsky.

The start-up is in discussions with ‘multiple’ companies interested in its technology, while Broadcom is a design services partner. Near Margalit, vice president and general manager of Broadcom’s optical systems division, is a technical advisor to the start-up.

Overall, the industry trend is to move from general computing to accelerated computing in data centres. That will drive more AI processors and more memory and compute disaggregation.

“It is optical,” says Lazovsky: “There is no other way to do it.”


OpenLight's CEO on its silicon photonics strategy

Adam Carter CEO at Open Light

Adam Carter, recently appointed the CEO of OpenLight, discusses the company’s strategy and the market opportunities for silicon photonics.

Adam Carter’s path to becoming OpenLight’s first CEO is a circuitous one.

OpenLight, a start-up, offers the marketplace an open silicon photonics platform with integrated lasers and gain blocks.

Having worked at Cisco and Oclaro, which was acquired by Lumentum in 2018, Carter decided to take six months off. Covid then hit, prolonging his time out.

Carter returned as a consultant working with firms, including a venture capitalist (VC). The VC alerted him about OpenLight’s search for a CEO.

Carter’s interest in OpenLight was immediate. He already knew the technology and OpenLight’s engineering team and recognised the platform’s market potential.

“If it works in the way I think it can work, it [the platform] could be very interesting for many companies who don’t have access to the [silicon photonics] technology,” says Carter.

Offerings and strategy

OpenLight’s silicon photonics technology originated at Aurrion, a fabless silicon photonics start-up from the University of California, Santa Barbara.

Aurrion’s heterogeneous integration silicon photonics technology included III-V materials, enabling lasers to be part of the photonic integrated circuit (PIC).

Juniper Networks bought Aurrion in 2016 and, in 2022, spun out the unit that became OpenLight, with Synopsys joining Juniper in backing the start-up.

OpenLight offers companies two services.

The first is design services for firms with no silicon photonics design expertise. OpenLight will develop a silicon photonics chip to meet the company’s specifications and take the design to production.

“If you don’t have a silicon photonics design team, we will do reference architectures for you,” says Carter.

The design is passed to Tower Semiconductor, a silicon photonics foundry that OpenLight, and before that, Juniper, worked with. Chip prototype runs are wafer-level tested and passed to the customer.

OpenLight gives the company the Graphic Data Stream (GDS) file, which defines the mask set the company orders from Tower for the PIC’s production.

OpenLight also serves companies with in-house silicon photonics expertise that until now have not had access to a silicon photonics process with active components: lasers, semiconductor optical amplifiers (SOAs), and modulators.

The components are part of the process design kit (PDK), the set of files that models a foundry’s fabrication process. A company can choose a PDK that best suits its silicon photonics design for the foundry to then make the device.

OpenLight offers two PDKs via Tower Semiconductor: a Synopsys PDK and one from Luceda Photonics.

OpenLight does not make components, but offers reference designs. OpenLight gets a small royalty with every wafer shipped when a company’s design goes to production.

“They [Tower] handle the purchasing orders, the shipments, and if required, they’ll send it to the test house to produce known good die on each wafer,” says Carter

OpenLight plans to expand the foundries it works with. “You have to give customers the maximum choice,” says Carter.

Design focus

OpenLight’s design team continues to add components to its library.

At the OFC show in March, held in San Diego, OpenLight announced a 224-gigabit indium phosphide optical modulator to enable 200-gigabit optical lanes. OpenLight also demoed an eight-by-100-gigabit transmitter alongside Synopsys’s 112-gigabit serialiser-deserialiser (serdes).

OpenLight also offers a ‘PDK sampler’ for firms to gain confidence in its process and designs.

The sampler comes with two PICs. One PIC has every component offered in OpenLight’s PDK so a customer can probe and compare test results with the simulation models of Tower’s PDKs.

”You can get confidence that the process and the design are stable,” says Carter.

The second PIC is the eight by 100 gigabit DR8 design demoed at OFC.

The company is also working on different laser structures to improve the picojoule-per-bit performance of its existing design.

“Three picojoules per bit will be the benchmark, and it will go lower as we understand more about reducing these numbers through design and process,” says Carter.

The company wants to offer the most updated components via its PDK, says Carter.

OpenLight’s small design team can’t do everything at once, he says: “And if I have to license other people’s designs into my PDK, I will, to make sure my customer has a maximum choice.”

Market opportunities

OpenLight’s primary market focus is communications, an established and significant market that will continue to grow in the coming years.

To that can be added artificial intelligence (AI) and machine learning, memory, and high-speed computing, says Carter.

“If you listen to companies like Google, Meta, and Amazon, what they’re saying is that most of their investment in hardware is going into what is needed to support AI and machine learning,” says Carter. “There is a race going on right now.”

When AI and machine learning take off, the volumes of optical connections will grow considerably since the interfaces will not just be for networking but also computing, storage, and memory.

“The industry is not quite ready yet to do that ramp at the bandwidths and the densities needed,” he says, but this will be needed in three to four years.

Large contract manufacturers also see volumes coming and are looking at how to offer optical subassembly, he says.

Another market opportunity is telecoms and, in particular coherent optics for metro networks. However, unit volumes will be critical. “Because I am in a foundry, at scale, I have to fill it with wafers,” says Carter.

Simpler coherent designs – ‘coherent lite’ – connecting data centre buildings could be helpful. There is much interest in short-reach connections, for 10km distances, at 1.6 terabit or higher capacity where coherent could be important and deliver large volumes, he says.

Emerging markets for OpenLight’s platform include lidar, where OpenLight is seeing interest, high-performance computing, and healthcare.

“Lidar is different as it is not standardised,” he says. It is a lucrative market, given how the industry has been funded.

OpenLight wants to offer lidar companies early access to components that they need. Many of these companies have silicon photonics design teams but may not have the actives needed for next-generation products, he says.

“I have a thesis that says everywhere a long-wavelength single-mode laser goes is potential for a PIC,” says Carter

Healthcare opportunities include a monitoring PIC placed on a person’s wrist. Carter also cites machine vision, and cell phone makers who want improved camera depth perception in handsets.

Carter is excited by these emerging silicon photonics markets that promise new incremental revenue streams. But timing will be key.

“We have to get into the right market at the right time with the right product,” says Carter. “If we can do that, then there are opportunities to grow and not rely on one market segment.”

As CEO, how does he view success at OpenLight?

“The employees here, some of whom have been here since the start of Aurrion, have never experienced commercial success,” says Carter. “If that happens, and I think it will because that is why I joined, that would be something I could be proud of.”


Privacy Preference Center