Lumentum's optical circuit switch for AI data centres

Peter Roorda

Part 3: Data Centre Switching

The resurgence of optical circuit switches for use in data centres is gaining momentum, driven by artificial intelligence (AI) workloads that require scalable connectivity.

Lumentum is one of several companies that showcased an optical circuit switch at the OFC event in San Francisco in March. Lumentum’s R300 switch connects optically the 300 input ports to any of the 300 output ports. The optical circuit switch uses micro-electro-mechanical systems (MEMS), tiny mirrors that move electrostatically, to direct light from an input port to one of the 300 output ports.

The R300 addresses the network needs of AI data centres, helping link large numbers of AI accelerator chips such as graphics processor units (GPUs).

“We’ve been talking to all the hyperscalers in North America and China,” says Peter Roorda, general manager of the switching business unit at Lumentum. “The interest is pretty broad for the applications of interconnecting GPUs and AI clusters; that’s the exciting one.”

Optical circuit switches

In a large-scale data centre, two or three tiers of electrical switch platforms link the many servers’ processors. The number of tiers needed depends on the overall processor count. The same applies to the back-end network used for AI workloads. These tiers of electrical switches are arranged in what is referred to as a Clos or “Fat Tree” architecture.

Tiers of electrical switches arranged in a Clos architecture. Source: Lumentum

Google presented a paper in 2022 revealing that it had been using an internally developed MEMS-based optical circuit switch for several years. Google used its optical circuit switches to replace all the top-tier ‘spine’ layer electrical switches across its data centres, resulting in significant cost and power savings.

Google subsequently revealed a second use for its switches to directly connect between racks of its tensor processor unit (TPU) accelerator chips. Google can move workloads across thousands of TPUs in a cluster, efficiently using its hardware and bypassing a rack when a fault arises.

Google’s revelation rejuvenated interest in optical switch technology, and at OFC, Lumentum showed its first R300 optical switch product in operation.

Unlike packet switches, which use silicon to process data at the packet level, an optical circuit switch sets up a fixed, point-to-point optical connection, akin to a telephone switchboard, for the duration of a session.

The optical switch is ideal for scenarios where large, sustained data flows are required, such as in AI training clusters.

How optical circuit switches (blue boxes) are used in a data centre. Source: Lumentum

Merits

The optical circuit switch’s benefits include cost and power savings and improved latency. Optical-based switch ports are data-rate independent. They can support 400 gigabit, 800 gigabit, and soon 1.6-terabit links without requiring an upgrade.

“Now, it’s not apples to apples; the optical circuit switch is not a packet switch,” says Roorda. “It’s just a dumb circuit switch, so there must be control plane software to manage it.” However, the cost, power, space savings, and port transparency incentives suffice for the hyperscalers to invest in the technology.

The MEMS-based R300

Lumentum has a 20-year history using MEMS. It first used the technology in its wavelength-selective switches used in telecom networks before the company adopted liquid crystal on silicon (LCOS) technology.

“We have 150,000 MEMS-based wavelength selective switches in the field,” says Roorda. “This gives us a lot of confidence about their reliability.”

MEMS-based switches are renowned for their manufacturing complexity, and Lumentum has experience in MEMS.

“This is a key claim as users are worried about the mechanical aspect of MEMS’ reliability,” says Michael Frankel, an analyst at LightCounting Market Research, which published an April report covering Ethernet, Infiniband and optical switches in cloud data centres. “Having a reliable volume manufacturer is critical.”

In its system implementation, Google revealed that it uses bi-directional transceivers in conjunction with the OCS.

“Using bi-directional ports is clever because you get to double the ports out of your optical circuit switch for the same money, “says Mike DeMerchant, Lumentum’s senior director of product line management, optical circuit switch. “But then you need customised, non-standard transceivers.”

A bi-directional design complicates the control plane management software because bi-directional transponders effectively create two sets of connections. “The two sets of transceivers can only talk in a limited fashion between each other, so you have to manage that additional control plane complexity,” says DeMerchant.

Lumentum enters the market with a 300×300 radix switch. Some customers have asked about a 1,000×1,000 port switch. From a connectivity perspective, bigger is better, says Roorda. “But bigger is also harder; if there is a problem with that switch, the consequences of a failure—the blast radius—are larger too,” he says.

Mike DeMerchant

Lumentum says there are requests for smaller optical circuit switches and expects to offer a portfolio of different-sized products in the next two years.

The R300 switch is cited as having a 3dB insertion loss, but Roorda says the typical performance is close to 1.5dB at the start of life. “And 3dB is good enough for using a standard off-the-shelf -FR4 or a -DR4 or -DR8 optical module [with the switch],” says Roorda.

A 400G QSFP-DD FR4 module uses four wavelengths on a single-mode fibre and has a reach of 2km, whereas a DR4 or DR8 uses a single wavelength on each fibre and has 4 or 8 single-mode fibre outputs, respectively, with a reach of 500m.

An FR4 interface is ideal with an optical circuit switch since multiple wavelengths are on a single fibre and can be routed through one port. However, many operators use DR4 and DR8 interfaces and are exploring using such transceivers.

“More ports would be consumed, diluting the cost-benefit, but the power savings would still be significant,” says Roorda.Additionally, in some applications, individually routing and recombining the separate ‘rails’ of DR4 or DR8 offer greater networking granularity. Here, the optical circuit switch still provides value, he says.

One issue with an optical circuit switch compared to an electrical-based one is that the optics go through both optical ports before reaching the destination transceiver, adding an extra 3dB loss. By contrast, for an electrical switch, the signal is regenerated optically by the pluggable transceiver at the output port.

LightCounting’s Frankel also highlights the switch’s loss numbers. “Lumentum’s claim of a low loss – under 2dB – and a low back reflection (some 60dB) are potential differentiators,” he says. “It is also a broadband design – capable of operating across the O-, C- and L-bands: O-band for data centre and C+L for telecom.”

Software and Hyperscaler Control

Lumentum is controlling the switch using the open-source SONiC [Software for Open Networking in the Cloud] network operating system (NOS), based on Linux. The hyperscalers will add the higher-level control plane management software using their proprietary software.

“It’s the basic control features for the optics, so we’re not looking to get into the higher control plane,” says DeMerchant.

Challenges and Scalability

Designing a 300×300 optical circuit switch is complicated. “It’s a lot of mirrors,” says Roorda. “You’ve got to align them, so it is a complicated, free-space, optical design.”

Reliability and scalable manufacturing are hurdles. “The ability to build these things at scale is the big challenge,” says Roorda. Lumentum argues that its stable MEMS design results in a reliable, simpler, and less costly switch.Lumentum envisions data centres evolving to use a hybrid switching architecture, blending optical circuit switches with Ethernet switches.

Roorda compares it to how telecom networks transitioned to using reconfigurable optical add-drop multiplexers (ROADMs).”It’ll be hybridised with packet switches because you need to sort the packets sometimes,” says Roorda.

Future developments may include multi-wavelength switching and telecom applications for optical circuit switches. “For sure, it is something that people are talking about,” he adds.

Lumentum says its R300 will be generally available in the second half of this year.


Oriole’s fast optical reconfigurable network

Georgios Zervas, CTO of Oriole Networks.

Part 1: Data Centre Switching
  • Start-up Oriole Networks has developed a photonic network to link numerous accelerator chips in an artificial intelligence (AI) data centre.
  • The fast photonic network is reconfigurable every 100 nanoseconds and is designed to replace tiers of electrical switches.
  • Oriole says its photonic networking saves considerable power and ensures the network is no longer a compute bottleneck.

In a London office bathed in spring sunlight, the team from Oriole Networks, a University College London (UCL) spin-out, detailed its vision for transforming AI and high-performance computing (HPC) data centres.

Oriole has developed a networking solution, dubbed Prism, that uses fast reconfigurable optical circuit switches to replace the tiers of electrical packet switches used to connect racks of AI processors in the data centre.

Electrical switches perform a crucial role in the data centre by enabling the scaling of AI computers comprising thousands of accelerator chips. Such chips – graphics processor units (GPUs), tensor processor units (TPUs), or more generically xPUs – are used to tackle large AI computational workloads.

The workloads include imprinting learning onto large AI models or performing inferencing once the AI model is trained, where it shares its knowledge when prompted.

Oriole’s novel network is based on optical circuit switches that can switch rapidly in response to changes in the workload, allocating the xPU resources as required. Electrical switches already do this very well.

Origins

The view from Oriole’s London office.

Founded in 2023, Oriole builds on over a decade of research work by Georgios Zervas and his research team at UCL.

The start-up has raised $35 million, including a $22 million Series A led by investment firm Plural’s Ian Hogarth, a technology entrepreneur and Chair of the UK’s AI Security Institute.

 

The company, now 50-strong, has two UK offices—one in London and a site in Paignton—and one in Palo Alto.

Oriole’s team blends photonics expertise, including Paignton’s former Lumentum coherent transceiver group and networking talent from Intel’s former Altera division west of London, with expertise in programmable logic design work addressing hyperscalers’ needs.

AI data centre metrics

Power is a key constraint limiting the productivity of an AI data centre.

“You can only get so much power to a data centre site,” says Joost Verberk, vice president, business development and marketing at Oriole. “Once that is determined, everything else follows; the systems and networking must be as power efficient as possible so that all the power can go to the GPUs.”

Joost Verberk

Oriole highlights two metrics Nvidia’s Jensen Huang used at the company’s recent GTC event to quantify AI data centre efficiency.

One is tokens per second per megawatt (tokens/s/MW). Tokens are data elements, such as a portion of a word or a strip of pixels, part of a digital image, that are fed to or produced by an AI model. The more tokens created, the more productive the data centre.

The second metric is response speed, measured in tokens per second (tokens/s), which gauges latency (speed of response).

Oriole says these two metrics are not always aligned, but the goal is to use less power while producing more tokens faster.

Discussing tokens implies that the data centre’s hardware is used for inference. However, Oriole stresses that training AI models for less power is also a goal. Oriole’s optical networking solution can be applied to both inference and training.

Going forward, only a handful of companies, such as hyperscalers, will train the largest AI models. Many smaller-sized AI clusters will be deployed and used for inference.

“By 2030, 80 per cent of AI will be inferencing,” says James Regan, CEO of Oriole.

Networking implications

Inferencing, by its nature, means that the presented AI tasks change continually. One implication is that the networking linking the AI processors must be dynamic: grabbing processors for a given task and releasing them on completion.

George Zervas, Oriole’s CTO, points out that while Nvidia uses the same GPU for training and inferencing, Google’s latest TPU, Ironwood, has inferencing enhancements. Google also has AI computing clusters dedicated to inference jobs.

Amazon Web Services, meanwhile, has separate accelerator chips for inferencing and training. The two processors have different interconnect bandwidth requirements (input-output, or I/O), with the inferencing processor’s requirement being lower.

For training, the data exchange between the processors/xPUs, depending on how the task is parallelised, is highly predictable. “You can create a series of optical short-lasting circuits that minimise collective communication time,” says Zervas. However, the switches must be deterministic and synchronous. “You should not have [packet] queues,” he says.

Inferencing, which may access many AI ‘mixture of experts’ models, requires a more dynamic system. “Different tokens will go to different sets of experts, spread across the xPUs”, says Zervas. “Sometimes, some xPUs batch the queries and then flush them out.”

The result is non-deterministic traffic, getting closer to the traffic patterns of traditional Cloud data centres. Here, the network must be reconfigured quickly, in hundreds of nanoseconds.

“What we say is that a nanosecond-speed optical circuit switch has a place wherever any electrical packet switch has a place,” says Zervas. It’s still a circuit switch, stresses Zervas, even at such fast switching speeds, since there is a guaranteed path between two points. This is unlike ‘best effort’ traffic in a traditional electrical switch, where packets can be dropped.

“In our case, that link can last just as short as [the duration of] a packet,” says Zervas. “Our switches can be reconfigured every 100 nanoseconds.”

Once the link is established, data is sent to the other end without encountering queuing. Or, as Zervas puts it, the switching matches the granularity of packets yet has delivery guarantees that only a circuit can deliver.

Optics’ growing role in data centre networking

Currently, protocols such as Infiniband or Ethernet are used to connect racks of xPUs, commonly referred to as the scale-out network. For xPUs to talk to each other, a traditional Clos or ‘fat tree’ architecture comprising a hierarchy of electrical switches is used.

Because of the distances spanning a data centre, pluggable transceivers connect an xPU via a networking interface card to the switching network to connect to the destination network interface card and xPU.

Broadcom and Nvidia have announced electrical switches that integrate optics with silicon switches, a newer development. Using such co-packaged optics circumvents the need for pluggable optical transceivers on the front panel of an electrical switch platform.

Google has also developed its data centre architecture to include optical circuit switches instead of the top tier of large electrical switches. In such a hybrid network, electrical switches still dominate the overall network. However, using the optical layer saves cost and power and allows Google to reconfigure the interconnect between its TPU racks as it moves workloads around.

However, Google’s optical circuit switch’s configuration speed is far slower than Oriole’s, certainly not nanoseconds.

With its Prism architecture, Oriole is taking the radical step of replacing all the electrical switching, not just the top tier. The result is a flat passive optical network.

Traditional vs Prism Diagram

“Switching happens at the edge [of the network] and the core is fully passive; it is made just of glass,” says Verberk.

The resulting network has zero packet loss and is highly synchronous. Eliminating electrical switches reduces overall power and system complexity while delivering direct xPU to xPU high-speed connectivity.

Prism architecture

Oriole’s first announcement is the Prism architecture that hinges on three system components:

  • a PCI Express (PCIe) based network interface card.
  • A novel pluggable module – the XTR – includes the optical transceiver and switching.
  • A photonic router that houses athermal arrayed waveguide gratings (AWGs) to route the different wavelengths of light. The router box is passive and has no electronics.

“You go optically from the GPU out to another GPU, and the only [electrical-optical] conversion that happens is at the network interface card next to each GPU,“ says Verberk.

The PCIe-based network interface card uses 800-gigabit optics and integrates with standard software ecosystems.

Built around an FPGA that includes ARM processors, the card supports protocols like Nvidia’s NCCL (Nvidia Collective Communications Library) and AMD’s RCCL Radeon Open Compute Communication Collectives Library) via plugins, ensuring compatibility with existing AI software frameworks.

The network interface card acts as a deterministic data transport, mapping collective operations used for AI computation (e.g., Message Passing Interface operations like all-reduce, scatter-gather) to optical paths with minimal latency.

The card’s scheduler maps deterministic patterns directly to wavelengths and fibres for training. At the same time, it dynamically reconfigures based on workload demands, using a standard direct memory access (DMA) engine for inference.

The XTR pluggable module is the heart of Prism’s switching capability. “Within a pluggable form factor unit, we do transmission, reception, and switching,” says Zervas.

The photonic network combines three dimensions of switching: optical wavelengths, space switching, and time slots (time-domain multiplexing).

The chosen wavelength colour is determined using a fast tunable laser.

The space switching inside the XTR pluggable refers to the selected fibre path. “You have a ribbon of fibres, and you can choose which fibres you want to go to,” says Regan.

James Regan

The time aspects refer to time slots of 100 nanoseconds, the time it takes for the tunable laser to adjust to a new wavelength. Overall, rapid colour changes can be used to route data to specific nodes.

“The modulated channel can determine which communication group or cluster you can go to, and the fibre route can determine the logical rack you’re going to, and then the colour of light you’re carrying can determine the node ID within the rack,” says Zervas.

The photonic routers, passive arrayed waveguide gratings, form Prism’s core. “They’re just glass, which means they are athermal,” says Regan, highlighting their reliability and zero power consumption. These N-by-N arrayed waveguide gratings route light based on wavelength and fibre selection, acting like prisms.

“On one port, let’s say the input port, we have a colour red; if it’s red, it comes to the first output, if it’s blue, to the second, if it’s purple, to the third, etc.,” says Zervas.

Stacked racks of multiple arrayed waveguide gratings can handle large-scale clusters, maintaining a single optical hop for consistent signal-to-noise ratio and insertion loss.

“Every node to every other node goes through this only once, ensuring uniform performance across thousands of GPUs,” says Zervas.

Prism’s Power and compute efficiencies

Using an 8,000-GPU cluster example, Prism eliminates 128 leaf and 64 spine electrical switches, cutting the number of optical transceivers by 60 per cent.

For even large AI clusters of over 16,000 GPUs, a third tier of switching is typically needed. This reduces the number of transceivers by 77 per cent.

Using Prism reduces overall power, not only optical transceivers but also by removing electrical switching and the associated cooling they need.

Unlike Ethernet packet switching, Prism’s optical circuits guarantee delivery without queuing, reconfiguring every 100 nanoseconds that matches packet durations.

For training, Prism reduces communication overhead to under 1 per cent. In existing networks, it is typically tens of per cent. This means the GPUs rarely wait for data and spend their time processing.

Market and deployment strategy

Oriole targets three segments: enterprises such as financial traders, HPC users such as car makers, switch makers, and hyperscalers.

“Our potential customer base is much wider,” says Regan, contrasting with chip-level optical input-output players focusing on specific chip vendors and hyperscalers.

Prism also features an Ethernet gateway that allows integration with existing data centres, avoiding a rip-and-replace. “You could just do that in the pieces of your data centre where you need it, or where you do new builds,” says Regan.

Oriole’s roadmap includes lab demonstrations this summer, alpha hardware by early 2026, deployable products by the end of 2026, and production ramp-up in 2027. Manufacturing is outsourced to high-volume contract manufacturers.

Challenges and outlook

Convincing hyperscalers to adopt a non-standard software stack remains a hurdle. “It becomes a collaboration,” says Zervas, noting the hyperscalers’ use of proprietary protocols.

Oriole’s full-stack approach—spanning Nvidia’s CUDA libraries to photonic circuits—does set it apart.

“It’s not often you bump into a company that has deep expertise in both [photonics and computing],” says Regan, contrasting Oriole with photonics-only or computing-only competitors.

“We’re building something here,” says Regan. “We’re building a major European player for networking, for AI, and arbitrary workloads into the future.”


Privacy Preference Center