Has the era of co-packaged optics finally arrived?

Ayar Labs’ CEO, Mark Wade

Mark Wade, the recently appointed CEO of Ayar Labs, says his new role feels strangely familiar. Wade finds himself revisiting tasks he performed in the early days of the start-up that he helped co-found.

“In the first two years, I would do external-facing stuff during the day and then start working on our chips from 5 PM to midnight,” says Wade, who until last year was the company’s chief technology officer (CTO).

More practically, says Wade, he has spent much of the first months since becoming CEO living out of a suitcase and meeting with customers, investors, and shareholders.

 

 

History

Ayar Labs is bringing its technology to market to add high-bandwidth optical input-output (I/O) to large ASICs.

The technology was first revealed in a 2015 paper published in the science journal, Nature. In it, the optical circuitry needed for the interfaces was implemented using a standard CMOS process.

Vladimir Stojanovic, then an associate professor of electrical engineering and computer science at the University of California, Berkeley, described how, for the first time, a microprocessor could communicate with the external world using something other than electronics.

Stojanovic has left his role as a professor at the University of California, Berkeley, to become Ayar Labs’ CTO, following Wade’s appointment as CEO.

Focus

“A few years ago, we made this pitch that machine-learning clusters would be the biggest opportunity in the data centre,” says Wade. “And for efficient clusters, you need optical I/O.” Now, connectivity in artificial intelligence (AI) systems is a vast and growing problem. “The need is there, and our product is timed well,” says Wade.

Ayar Labs has spent the last year focusing on manufacturing and established low-volume production lines. The company manufactured approximately 10,000 optical chiplets in 2023 and expects similar volumes this year. The company also offers an external laser source SuperNova product that provides the light source needed for its optical chiplet.

Ayar Labs’ optical input-output (I/O) roadmap showing the change in electrical I/O interface evolving from Intel’s AIB to the UCIe standard, the move to faster data rates and, on the optical side, more wavelengths and the growing total I/O, per chiplet and packaged system. Source: Ayar Labs.

The products are being delivered to early adopter customers while Ayar Labs establishes the supply chain, product qualification, and packaging needed for volume manufacturing.

Wade says that some of its optical chiplets are being used for other non-AI segments. Ayar Labs has demonstrated its optical I/O being used with FPGAs for electronics systems for military applications. But the primary demand is for AI systems connectivity, whether compute to compute, compute to memory, compute to storage, and compute to a memory-semantic switch.

“A memory-semantic switch allows the scaling of a compute fabric whereby a bunch of devices need to talk to each other’s memory,” says Wade.

Wade cites Nvidia’s NVSwitch as one example: the first layer switch chip at the rack level that supports many GPUs in a non-blocking compute fabric.  Another example of a memory-semantic switch is the open standard Compute Express Link (CXL).

The need for co-packaged optics

At the Optica Executive Forum event held alongside the recent OFC show, several speakers questioned the need for I/O based on optical chiplets, also called co-packaged optics.

Google’s Hong Liu, a Distinguished Engineer at Google Technical Infrastructure, described co-packaged optics as an ’N+2 years’ technology, perpetually coming in two years’ time, (N being the current year).

Ashkan Seyedi of Nvidia stressed that copper continues to be the dominant interconnect for AI because it beats optics in such metrics as bandwidth density, power, and cost. Existing data centre optical networking technology cannot simply be repackaged as optical compute I/O, as it does not beat copper. Seyedi also shared a table that showed how much more expensive optical was in terms of dollar per gigabit/second ($/ Gbps).

Wade starts to address these points by pointing out that nobody is making money at the application layer of AI. Partly, this is because the underlying hardware infrastructure for AI is so costly.

“It [the infrastructure] doesn’t have the [networking] throughput or power efficiency to create the headroom for an application to be profitable,” says Wade.

The accelerator chips from the likes of Nvidia and Google are highly efficient in executing the mathematics needed for AI. But it is still early days when it comes to the architectures of AI systems, and more efficient hardware architectures will inevitably follow.

AI workloads also continue to grow at a remarkable rate. They are already so large that they must be spread across systems using ever more accelerator chips. With the parallel processing used to execute the workloads, data has to be shared periodically between all the accelerators using an ’all-to-all’ command.

“With large models, machines are 50 per cent efficient, and they can get down to 30 per cent or even 20 per cent,” says Wade. This means expensive hardware is idle for more than half the time. And the issue will only worsen with growing model size. According to Wade, using optical I/O promises the proper bandwidth density – more terabits-per-second per mm, power efficiency, and latency.

“These products need to get proven and qualified for volume productions,” he adds. “They are not going to get into massive scale systems until they are qualified for huge scale production.”

Wade describes what is happening now as a land grab. Demand for AI accelerators is stripping supply, and the question is still being figured out as to how the economics of the systems can be improved.

“It is not about making the hardware cheaper, just how to ensure the system is more efficiently utilised,” says Wade.  “This is a big capital asset; the aim is to have enough AI workload throughput so end-applications have a viable cost.”

This will be the focus as the market hits its stride in the coming two to three years. “It is unacceptable that a $100 million system is spending up to 80 per cent of its time doing nothing,” says Wade.

Wade also addresses the comments made the day at the Optica Executive Forum. “The place where [architectural] decisions are getting discussed and made are with the system-on-chip architects,” he says. “It’s they that decide, not [those at] a fibre-optics conference.”

He also questions the assumption that Google and Nvidia will shun using co-packaged optics.

Market opportunity

Wade does a simple back-of-an-envelope calculation to size the likely overall market opportunity by the early 2030s for co-packaged optics.

In the coming years, there will be 1,000 optical chiplets per server, 1,000 servers per data centre, while 1,000 new data centres using AI clusters will be built. That’s a billion devices in total. Even if the total addressable opportunity is several hundred million optical chiplets, that is still a massive opportunity by 2032, he says.

Wade expects Ayar Labs to ship 100,000 plus chiplets in the 2025-26 timeframe, with volumes ramping to the millions in the two years after that.

“That is the ramp we are aiming for,” he says. “Using optical I/O to build a balanced composable system architecture.” If co-packaged optics does emerge in such volumes, it will disrupt the optical component business and the mainstream technologies used today.

“Let me finish with this,” says Wade. “If we are still having this conversation in two years’ time, then we have failed.”


Silicon photonics adds off-chip comms to a RISC-V processor

A group of researchers have developed a microprocessor that uses silicon photonics-based optics to send and receive data.

"For the first time a system - a microprocessor - has been able to communicate with the external world using something other than electronics," says Vladimir Stojanovic, associate professor of electrical engineering and computer science at the University of California, Berkeley. 

 

Vladimir Stojanovic

The microprocessor is the result of work that started at MIT nearly a decade ago as part of a project sponsored by the US Defense Advanced Research Projects Agency (DARPA) to investigate the integration of photonics and electronics for off-chip and even intra-chip communications.     

The chip features a dual-core 1.65GHz RISC-V open instruction set processor and 1 megabyte of static RAM and integrates 70 million transistors and 850 optical components.

The work is also notable in that the optical components were developed without making any changes to an IBM 45nm CMOS process used to fabricate the processor. The researchers have demonstrated two of the processors communicating optically, with the RISC core on one chip reading and writing to the memory of the second device and executing programs such as image rendering.

This CMOS process approach to silicon photonics, dubbed 'zero-change' by the researchers, differs from that of the optical industry. So far silicon photonics players have customised CMOS processes to improve the optical components' performance. Many companies also develop the silicon photonics separately, using a trailing-edge 130nm or 90nm CMOS process while implementing the driver electronics on a separate chip using more advanced CMOS. That is because photonic devices such as a Mach-Zehnder modulator are relatively large and waste expensive silicon real-estate if implemented using a leading-edge process.  

IBM is one player that has developed the electronics and optics on one chip using a 90nm CMOS process. However, the company says that the electronics use feature sizes closer to 65nm to achieve electrical speeds of 25 gigabit-per-second (Gbps), and being a custom process, it will only be possible to implement 50-gigabit rates using 4-level pulse amplitude modulation (PAM-4).

 

We are now reaping the benefits of this very precise process which others cannot do because they are operating at larger process nodes

    

"Our approach is that photonics is sort of like a second-class citizen to transistors but it is still good enough," says Stojanovic. This way, photonics can be part of an advanced CMOS process.

Pursuing a zero-change process was first met with skepticism and involved significant work by the researchers to develop. "People thought that making no changes to the process would be super-restrictive and lead to very poor [optical] device performance," says Stojanovic. Indeed, the first designs produced didn't work. "We didn't understand the IBM process and the masks enough, or it [the etching] would strip off certain stuff we'd put on to block certain steps." 

But the team slowly mastered the process, making simple optical devices before moving on to more complex designs. Now the team believes its building-block components such as its vertical grating couplers have leading-edge performance while its ring-resonator modulator is close to matching the optical performance of designs using custom CMOS processes. 

"We are now reaping the benefits of this very precise process which others cannot do because they are operating at larger process nodes," says Stojanovic.     

 

Silicon photonics design

The researchers use a micro ring-resonator for its modulator design. The ring-resonator is much smaller than a Mach-Zehnder design and is 10 microns in diameter. Stojanovic says the dimensions of its vertical grating couplers are 10 to 20 microns while its silicon waveguides are 0.5 microns. 

Photonic components are big relative to transistors, but for the links, it is the transistors that occupy more area than the photonics. "You can pack a lot of utilisation in a very small chip area," he says.

A key challenge with a micro ring-resonator is ensuring its stability. As the name implies, modulation of light occurs when the device is in resonance but this drifts with temperature, greatly impairing its performance. 

Stojanovic cites how even the bit sequence can affect the modulator's temperature. "Given the microprocessor data is uncoded, you can have random bursts of zeros," he says. "When it [the modulator] drops the light, it self-heats: if it is modulating a [binary] zero it gets heated more than letting a one go through." 

The researchers have had to develop circuitry that senses the bit-sequence pattern and counteracts the ring's self-heating. But the example also illustrates the advantage of combining photonics and electronics. "If you have a lot of transistors next to the modulator, it is much easier to tune it and make it work," says Stojanovic.  

 

A prototype set-up of the chip-to-chip interconnect using silicon photonics. Source: Vladimir Stojanovic

 

Demonstration

The team used two microprocessors - one CPU talking to the memory of the second chip 4m away. Two chips were used rather than one - going off-chip before returning - to prove that the communication was indeed optical since there is also an internal electrical bus on-chip linking the CPU and memory. "We wanted to demonstrate chip-to-chip because that is where we think the biggest bang for the buck is," says Stojanovic.

In the demonstration, a single laser operating at 1,183nm feeds the two paths linking the memory and processor. Each link is 2.5Gbps for a total bandwidth of 5Gbps. However the microprocessor was clocked at one-eightieth of its 1.65GHz clock speed because only one wavelength was used to carry data. The microprocessor design can support 11 wavelengths for a total bandwidth of 55Gbit/s while the silicon photonics technology itself will support between 16 and 32 wavelengths overall. 

The group is already lab-testing a new iteration of the chip that promises to run the processor at full speed. The latest chip also features improved optical functions. "It has better devices all over the place: better modulators, photo-detectors and gratings; it keeps evolving," says Stojanovic.

 

We can ship that kind of bandwidth [3.2 terabits] from a single chip

 

Ayar Labs

Ayar Labs is a start-up still in stealth mode that has been established to use the zero-change silicon photonics to make interconnect chips for platforms in the data centre. 

Stojanovic says the microprocessor demonstrator is an example of a product that is two generations beyond existing pluggable modules. Ayar Labs will focus on on-board optics, what he describes as the next generation of product. On-board optics sit on a card, close to the chip. Optics integrated within the chip will eventually be needed, he says, but only once applications require greater bandwidth and denser interfaces.

"One of the nice things is that this technology is malleable; it can be put in various form factors to satisfy different connectivity applications," says Stojanovic. 

What Ayar Labs aims to do is replace the QSFP pluggable modules on the face plate of a switch with one chip next to the switch silicon that can have a capacity of 3.2 terabits. "We can ship that kind of bandwidth from a single chip," says Stojanovic.

Such a chip promises cost reduction given how a large part of the cost in optical design is in the packaging. Here, packaging 32, 100 Gigabit Ethernet QSFP modules can be replaced with a single optical module using the chip. "That cost reduction is the key to enabling deeper penetration of photonics, and has been a barrier for silicon photonics [volumes] to ramp," says Stojanovic.

There is also the issue of how to couple the laser to the silicon photonics chip. Stojanovic says such high-bandwidth interface ICs require multiple lasers: "You definitely don't want hundreds of lasers flip-chipped on top [of the optical chip], you have to have a different approach".  

Ayar Labs has not detailed what it is doing but Stojanovic says that its approach is more radical than simply sharing one laser across a few links, "Think about the laser as the power supply to the box, or maybe a few racks," he says.

The start-up is also exploring using standard polycrystalline silicon rather than the more specialist silicon-on-isolator wafers. 

"Poly-silicon is much more lossy, so we have had to do special tricks in that process to make it less so," says Stojanovic. The result is that changes are needed to be made to the process; this will not be a zero-change process. But Stojanovic says the changes are few in number and relatively simple, and that it has already been shown to work. 

Having such a process available would allow photonics to be added to transistors made using the most advanced CMOS processes - 16nm and even 7nm. "Then silicon-on-insulator becomes redundant; that is our end goal,” says Stojanovic.    

 

Further information

Single-chip microprocessor that communicates directly using light, Nature, Volume 528, 24-31 December 2015

Ayar Labs website


Privacy Preference Center