Building an AI supercomputer using silicon photonics

- Luminous Computing is betting its future on silicon photonics as an enabler for an artificial intelligence (AI) supercomputer
Silicon photonics is now mature enough to be used to design complete systems.
So says Michael Hochberg (pictured), who has been behind four start-ups including Luxtera and Elenion whose products used the technology. Hochberg has also co-authored a book along with Lukas Chrostowski on silicon photonics design.
In the first phase of silicon photonics, from 2000 to 2010, people wondered whether they could even do a design using the technology.
“Almost everything that was being done had to fit into an existing socket that could be served by some other material system,” says Hochberg.
A decade later it was more the case that sockets couldn’t be served without using silicon photonics. “Silicon photonics had dominated every one of the transceiver verticals that matter: intra data centre, data centre interconnect, metro and long haul,” he says.
Now people have started betting their systems using silicon photonics, says Hochberg, citing the examples as lidar, quantum optics, co-packaged optics and biosensing.
Several months ago Hochberg joined as president of Luminous Computing, a start-up that recently came out of stealth mode after raising $105 million in Series A funding.
Luminous is betting its future on silicon photonics as an enabler for an artificial intelligence (AI) supercomputer that it believes will significantly outperform existing platforms.
Machine learning
The vision of AI is to take tasks that were the exclusively the domain of the human mind and automate them at scale, says Hochberg.
Just in the last decade, the AI community has advanced from doing things using machine learning (ML) that are trivial for humans to tasks that only the most talented experts can achieve.
“We have reached the point where machine learning capabilities are superhuman in many respects,” says Hochberg. “Where they produce results quantifiably better than humans can.”
But achieving such machine learning progress has required huge amounts of data and hardware.
“The training runs for the state-of-the-art recommendation engines and natural language models take tens to hundreds of thousands of GPUs (graphics processing units) and they run from months to years,” says Hochberg.
Moreover, the computational demands associated with machine learning training aren’t just doubling every 18 months, like with Moore’s law, but every 3-4 months. “And for memory demands, it is even faster,” he says.
What that means is that the upper limit for doing such training runs are complete data centres.
Luminous Computing wants to develop AI hardware that scales quickly and simply. And a key element of that will be to use silicon photonics to interconnect the hardware.
“One of the central challenges scaling up big clusters is that you have one kind of bus between your CPU and memory, another between your CPU and GPU, another between the GPUs in a box and yet another – Infiniband – between the boxes,” says Hochberg.
These layers of connectivity run at different speeds and latencies that complicate programming for scale. Such systems also result in expensive hardware like GPUs being under-utilised.
“What we are doing is throwing massive optical interconnect at this problem and we are building the system around this optical interconnect,” says Hochberg.
Using sufficient interconnect will enable the computation to scale and will simplify the software. “It is going to be simple to use our system because if you need anything in memory, you just go and get it because there is bandwidth to spare.”
Supercomputing approach
Luminous is not ready to reveal its supercomputer architecture. But the company says it is vertically integrated and is designing the complete system including the processing and interconnect.
When the company started in 2018, it planned to use a photonic processor as the basis of the compute but the class of problems it could solve were deemed insufficiently impactful.
The company then switched to developing a set of ASICs designed around the capabilities of the optics. And it is the optics that rearchitects how data moves within the supercomputer.
“That is the place where you get order-of-magnitude advantages,” says Hochberg.
The architecture will tackle a variety of AI tasks typically undertaken by hyperscalars. “If we can enable them to run models that are bigger than what can be run today while using much smaller programming teams, that has enormous economic impact,” he says.
Hochberg also points out that many organisations want to use machine learning for lots of markets: “They would love to have the ability to train on very large data sets but they don’t have a big distributed systems engineering team to figure out how to scale things up onto big-scale GPUs; that is a market that we want to help.”
The possible customers of Luminous’s system are so keen to access such technology that they are helping Luminous. “That is something I didn’t experience in the optical transceiver world,” quips Hochberg.
The supercomputer will be modular, says Luminous, but its smallest module will have much greater processing capability than, say, a platform hosting 8 or 16 GPUs.
Silicon photonics
Luminous is confident in using silicon photonics to realise its system even though the design will advance how the technology has been used till now.
“You are always making a bet in this space that you can do something that is more complex than anything anyone else is doing because you are going to ship your product a couple of years hence,” says Hochberg
Luminous is has confidence because of the experience of its design team, the design tools it has developed and its understanding of advanced manufacturing processes.
“We have people that know how to stand up complex things,” says Hochberg.
Status
Luminous’s staff is currently around 100, a doubling in the last year. And it is set to double again by year-end.
The company is busy doing modelling work as to how the machine learning algorithms will run on its system. “Not just today’s models but also tomorrow’s models,” says Hochberg.
Meanwhile, there is a huge amount of work to be done to deliver the first hardware by 2024.
“We have a bunch of big complex chips we have to build, we have software that has to live on top of it, and it all has to come together and work,” concludes Hochberg.
CW-WDM MSA charts a parallel path for optics
Artificial intelligence (AI) and machine learning have become an integral part of the businesses of the webscale players.
The mega data centre players apply machine learning to the treasure trove of data collected from users to improve services and target advertising.

Chris Cole
They can also use their data centres to offer cloud-based AI services.
Training neural networks with data sets is so intensive that it is driving new processor and networking requirements.
It is also impacting optics. Optical interfaces will need to become faster to cope with the amount of data, and that means interfaces with more parallel channels.
Anticipating these trends, a group of companies has formed the Continuous-Wave Wavelength Division Multiplexing (CW-WDM) multi-source agreement (MSA).
The CW-WDM MSA will specify lasers sources and the wavelength grids they use. The lasers will operate in the O-band (1260nm-1360nm) used for datacom optics.
The MSA is defining eight, 16 and 32 channels and will build on work done by the ITU-T and the IEEE.
This is good news for the laser manufacturers, says Chris Cole, Chair of CW-WDM MSA (pictured), given they have already shipped millions of lasers for datacom.
“In general, lasers are typically the hardest thing,” he says.
Wavelength count
The majority of datacom pluggable modules deployed today use either one or four optical channels. “When I started in optics 20 years ago it was all about single wavelengths,” says Cole.
Four channels were first used successfully for 40-gigabit interfaces. “That is when we introduced coarse wavelength-division multiplexing (CWDM),” says Cole.
Four wavelengths are the standard approach for 100, 200 and 400-gigabit optical modules. Spreading data across four channels simplifies the design of the electrical and optical interfaces.
“But we are ready to move on because the ability to increase parallel channels - be it parallel fibres or wavelengths - is much greater than the ability to push speed,” says Cole. “If all we do is rely on a four-wavelength paradigm and we keep pushing speed, we will run into a brick wall.”
Integration
Adopting more parallel channels will have two consequences on the optics, says Cole.
One is that photonic integration will become the only practical way to build multi-channel designs. Eight-channel designs are possible using discrete components but it won’t be cost-competitive for designs of 16 or more channels.
“It has to be photonic integration because as you get to eight and later, 16 and 32 wavelengths, it is not supportable in a small size with conventional approaches,” says Cole.
The MSA favours silicon photonics integration but indium phosphide or polymer integration platforms could be used.
The MSA will also cause wavelengths to be packed far more closely than the 20nm used for CWDM. Techniques now exist that enable tighter wavelength spacings without needing dedicated cooling.
One approach is separating the laser from the silicon chip - a switch chip or processor - that generates a lot of heat. Here, light from the source is fed to the optics over a fibre such that temperature control is more straightforward because the laser and chip are separated.
Cole also highlights the athermal silicon photonics of Juniper Networks that controls wavelength drift on the grid without requiring a thermo-electric cooler. Juniper gained the technology with its Aurrion acquisition in 2016.
Specification work
“Using the O-band has a lot of advantages,” says Cole. “That is where all the datacom optics are.”
The optical loss in the O-band may be double that of the C-band but this is not an issue for datacom’s short spans.
The MSA is to define a technology roadmap rather than a specific product, says Cole. First-generation products will use eight wavelengths followed by 16- and then 32-wavelength designs. Sixty-four and even 128 channel counts will be specified once the technology is established.
“Initially we did [specify 64 and 128 channels] but we took it out,” says Cole. “We’ll know a lot more if we are successful over three generations; we’ll figure out what we need to do when we get to that point.”
The MSA is proposing two bands, one 18nm wide (1291nm-1309nm) and the other 36nm wide (1282nm-1318nm). Eight, 16 and 32 wavelengths are assigned across both bands.
“It’s smack in the middle of the CWDM4 grid which is the largest shipping laser grid ever, and it is smack on top of the LWDM4 grid [used by -LR4 modules] which is the next highest grid to ship in volume,” says Cole.
The MSA will also specify continuous-wave laser parameters such as the output power, spectral width, variation in power between the wavelengths, and allowable wavelength shift.
Members
Cole started work on the CW-WDM MSA in collaboration with Ayar Labs while he was still at II-IV. Now at Luminous Computing, Cole, along with MSA editor Matt Sysak of Ayar Labs, and associate editor Dave Lewis of Lumentum, are preparing the first MSA draft and have solicited comments from members as to what to include in the specifications.
The MSA has 11 promoter members: Arista, Ayar Labs, CST Global, imec, Intel, Lumentum, Luminous Computing, MACOM, Quintessent, Sumitomo Electric, and II-VI.
The MSA has created a new observer member status to get input from companies that otherwise would be put off joining an MSA due to the associated legal requirements.
“So we have an observer category that if someone is serious and they want to see a subset of the material the MSA is working on and provide feedback, we welcome that,” says Cole.
The observer members are AMF, Axalume, Broadcom, Coherent Solutions, Furukawa Electric, GlobalFoundries, Keysight Technologies, NeoPhotonics, NVIDIA, Samtec, Scintil Photonics, and Tektronix.
“This MSA is meant to be inclusive, and it is meant to foster innovation and foster as broad an industry contribution as possible,” concludes Cole.
Further information
The CW-WDM MSA has several documents and technical papers on its website. The first document is the CW-WDM MSA grid proposal while the rest are technical papers addressing developments and applications driving the need for high-channel-count optical interfaces.

