The spectacular rise in the capabilities of artificial intelligence (AI) is directly attributable to the scaling of the computing hardware used to train AI models.
“People discovered early on that if you increase the size of those models and the amount of data to train those models, you get a big step-up in accuracy and performance,” says Nigel Toon, CEO and chairman of AI processor firm Graphcore. “The results have been stunning.”
Toon cites research that shows that for large language models the size of the model and the data must be scaled equally.
However, AI developers have started to see a slowdown in the gains achieved solely by such scaling. This is leading to new thinking in how engineers build an AI model and how it generates its output when prompted. The result is a new wave of AI, says Toon.
Model changes
Toon introduces several concepts to explain the characteristics of AI's latest wave.
Instead of a single model containing all the learned information, models can be combined, each with its own expertise, an approach known as a mixture of experts.
GPT-4 was the first time OpenAI started down this path with some eight experts, says Toon, while DeepSeek, a Chinese research company, has taken it much further by using many experts, he says.
"Rather than having one model that contains all the information, you end up with many more models, each of which is an expert in a particular area," says Toon. "Then you find a way of working out which of those experts you will call upon at any particular moment."
Another crucial development is how the model performs its reasoning, referred to as agentic AI. What is notable about agentic workflows is that instead of producing a one-shot output, the model performs what Toon calls a chain of thought. The model goes back and does some reflection, says Toon. The results are promising, delivering performance akin to using a much larger model.
We are thus on the cusp of a new wave of AI, says Toon, with Open AI's o1 release one of the first indications of this. DeepSeek also uses a reasoning approach coupled with its large mixture-of-experts.
This ability to go back and apply reasoning is also important in terms of context, a concept that reflects how much of a view an AI model can keep track of.
Toon cites the example of using a large language model to generate a long piece of text or an AI model creating a video sequence. Maintaining context across a whole piece becomes more and more difficult, especially with video.
"In generating that, you want to go back, you want to try different trains of thought, you want to pull in different pieces of information, and you probably want to pull in different experts," says Toon. "The complexity of the models you end up building and the inference process that you apply over those models are just increasing."
AI system scaling
Toon stresses that the next wave of AI will require continual computing and networking system scaling.
"On the one hand, you can say the age of scaling is maybe over, but it is one-dimensional scaling that is over," says Toon. "The models will still get bigger and will be much more complex."
The next wave of models will need more computing power and their underlying structure will change. They will consist of multiple models working together, and there will be numerous steps before the model generates its output.
Toon expects clusters of AI accelerators such as graphics processing units (GPUs) to become larger still while the way the accelerators interact will also change: "It's going to become more complex." The way a GPU talks to memory will also change because of the need to store context. "You will want to pull pieces back and forth," says Toon.
So not only will the model’s make-up change, but inferencing will becoming increasingly important.
"Rather than just producing a set of tokens, it's going backwards and forwards, maybe producing multiple sets of tokens, working out which are the right ones, and changing things," says Toon. "There's a real imperative here [with inferencing], because that is cost to the user."
Performing the inferencing promptly and computationally efficiently will thus be key.
Open-source AI
Toon is a proponent of an open-source approach to AI.
"When you're in a phase of dynamic innovation, which we're still at, sharing that knowledge across different innovative groups will allow people to move forward much more quickly."
Adopting an open-source approach will benefit more responsible AI. "The more eyeballs you have on it from clever people, the better it will be," says Toon.
Graphcore
Softbank Group acquired Toon's company, Graphcore, in July 2024 as part of the Group's broader AI strategy.
"Distinct from the Vision Fund, [a huge technology investment fund managed by Softbank], we are a SoftBank Group company," says Toon. "We sit alongside ARM under the SoftBank Group, which is helping us build the next generation of products."
Softbank's telecommunications arm, Softbank Corp., is one of several Asian telcos that view AI as a crucial business opportunity.
In September, SoftBank Corp announced that it is working with photonics chip specialist, NewPhotonics, to develop technology for linear pluggable optics, co-packaged optics, and an all-optics switch fabric for the AI-RAN initiative.
Toon notes a growing divide in AI strategy between Asia, and the US and Europe. "I'm not sure if it's a good thing, but it is part of what is going on in the world," he says. He is also a member on the UK Research and Innovation (UKRI) board, a non-government body sponsored by the UK's Department of Science, Innovation and Technology. "It helps to steer £9 billion ($11.5 billion) from the UK government into universities and the research councils that fit within UKRI and Innovate UK," says Toon.
Toon authored the book: How AI thinks: How we built it, how it can help us, and how we can control it, that was published in 2024.