Building an AI supercomputer using silicon photonics 
Thursday, March 24, 2022 at 12:33PM
Roy Rubenstein in GPUs, Lukas Chrostowski, Luminous Computing, Michael Hochberg, artificial intelligence, silicon photonics, supercomputing

Silicon photonics is now mature enough to be used to design complete systems.

So says Michael Hochberg (pictured), who has been behind four start-ups including Luxtera and Elenion whose products used the technology. Hochberg has also co-authored a book along with Lukas Chrostowski on silicon photonics design.

In the first phase of silicon photonics, from 2000 to 2010, people wondered whether they could even do a design using the technology.

“Almost everything that was being done had to fit into an existing socket that could be served by some other material system,” says Hochberg.

A decade later it was more the case that sockets couldn’t be served without using silicon photonics. “Silicon photonics had dominated every one of the transceiver verticals that matter: intra data centre, data centre interconnect, metro and long haul,” he says.

Now people have started betting their systems using silicon photonics, says Hochberg, citing the examples as lidar, quantum optics, co-packaged optics and biosensing.

Several months ago Hochberg joined as president of Luminous Computing, a start-up that recently came out of stealth mode after raising $105 million in Series A funding.

Luminous is betting its future on silicon photonics as an enabler for an artificial intelligence (AI) supercomputer that it believes will significantly outperform existing platforms.

 

Machine learning

The vision of AI is to take tasks that were the exclusively the domain of the human mind and automate them at scale, says Hochberg.

Just in the last decade, the AI community has advanced from doing things using machine learning (ML) that are trivial for humans to tasks that only the most talented experts can achieve.

“We have reached the point where machine learning capabilities are superhuman in many respects,” says Hochberg. “Where they produce results quantifiably better than humans can.”

But achieving such machine learning progress has required huge amounts of data and hardware.

“The training runs for the state-of-the-art recommendation engines and natural language models take tens to hundreds of thousands of GPUs (graphics processing units) and they run from months to years,” says Hochberg.

Moreover, the computational demands associated with machine learning training aren’t just doubling every 18 months, like with Moore’s law, but every 3-4 months. “And for memory demands, it is even faster,” he says.

What that means is that the upper limit for doing such training runs are complete data centres.

Luminous Computing wants to develop AI hardware that scales quickly and simply. And a key element of that will be to use silicon photonics to interconnect the hardware.

“One of the central challenges scaling up big clusters is that you have one kind of bus between your CPU and memory, another between your CPU and GPU, another between the GPUs in a box and yet another - Infiniband - between the boxes,” says Hochberg.

These layers of connectivity run at different speeds and latencies that complicate programming for scale. Such systems also result in expensive hardware like GPUs being under-utilised.

“What we are doing is throwing massive optical interconnect at this problem and we are building the system around this optical interconnect,” says Hochberg.

Using sufficient interconnect will enable the computation to scale and will simplify the software. “It is going to be simple to use our system because if you need anything in memory, you just go and get it because there is bandwidth to spare.”

 

Supercomputing approach

Luminous is not ready to reveal its supercomputer architecture. But the company says it is vertically integrated and is designing the complete system including the processing and interconnect.

When the company started in 2018, it planned to use a photonic processor as the basis of the compute but the class of problems it could solve were deemed insufficiently impactful.

The company then switched to developing a set of ASICs designed around the capabilities of the optics. And it is the optics that rearchitects how data moves within the supercomputer.

“That is the place where you get order-of-magnitude advantages,” says Hochberg.

The architecture will tackle a variety of AI tasks typically undertaken by hyperscalars. “If we can enable them to run models that are bigger than what can be run today while using much smaller programming teams, that has enormous economic impact,” he says.

Hochberg also points out that many organisations want to use machine learning for lots of markets: “They would love to have the ability to train on very large data sets but they don't have a big distributed systems engineering team to figure out how to scale things up onto big-scale GPUs; that is a market that we want to help.”

The possible customers of Luminous’s system are so keen to access such technology that they are helping Luminous. “That is something I didn’t experience in the optical transceiver world,” quips Hochberg.

The supercomputer will be modular, says Luminous, but its smallest module will have much greater processing capability than, say, a platform hosting 8 or 16 GPUs.

 

Silicon photonics

Luminous is confident in using silicon photonics to realise its system even though the design will advance how the technology has been used till now.

“You are always making a bet in this space that you can do something that is more complex than anything anyone else is doing because you are going to ship your product a couple of years hence,” says Hochberg

Luminous is has confidence because of the experience of its design team, the design tools it has developed and its understanding of advanced manufacturing processes.

“We have people that know how to stand up complex things,” says Hochberg.

 

Status

Luminous’s staff is currently around 100, a doubling in the last year. And it is set to double again by year-end.

The company is busy doing modelling work as to how the machine learning algorithms will run on its system. “Not just today’s models but also tomorrow’s models,” says Hochberg.

Meanwhile, there is a huge amount of work to be done to deliver the first hardware by 2024.

“We have a bunch of big complex chips we have to build, we have software that has to live on top of it, and it all has to come together and work,” concludes Hochberg.

Article originally appeared on Gazettabyte (https://www.gazettabyte.com/).
See website for complete article licensing information.