Nokia's 4.8-terabit FP5 packet-processing chipset

Part 1: IP routing: Nokia’s latest FP5 and router platforms
Nokia has unveiled its latest packet-processing silicon that will be the mainstay of its IP router platforms for years to come.
The FP5 chipset is rated at 4.8 terabits-per-second (Tbps), a twelvefold improvement in Nokia’s packet-processing silicon performance in a decade. (See chart.)

Communications service provider (CSP) BT says Nokia’s 7750 router platforms equipped with the FP5 chipset will deliver every use case it needs for its Multi Service Edge; from core routing, MPLS-VPN, broadband network gateways (BNG), to mobile backhaul and Ethernet.
The FP5 announcement comes four years after Nokia unveiled its existing flagship router chipset, the FP4. The FP4 was announced as a 2.4Tbps chipset but Nokia upgraded its packet-processing rating to 3Tbps.
“We announced what we knew but then, through subsequent development and testing, the performance ended up at 3Tbps,” says Heidi Adams, head of IP and optical networks marketing at Nokia.
The FP5 may also exceed its initial 4.8Tbps rating.
Nokia will use the FP5 to upgrade its existing platforms and power new router products; it will not license the chipset nor will it offer it for use in open router platforms.
Nokia’s chipset evolution
At the heart of Nokia’s router silicon is a 2D array of packet processing cores.
The FP3, announced in 2011 by Alcatel-Lucent (acquired by Nokia in 2016), used 288 packet processing cores arranged in a 32×9 array. Each row of cores acted as a packet-processing pipeline that could be partitioned to perform independent tasks. The array’s columns performed table look-ups and each column could be assigned several tasks.
Nokia didn’t detail how the FP4 upgraded the array of cores. But the performance enhancement was significant; the FP4 delivers a 7.5x improvement in packet processing performance compared to the FP3.
The 16nm CMOS FP4 chipset includes a traffic manager (q-chip), packet processor (p-chip), the t-chip that interfaces to the router fabric, and what was then a new chip, the e-chip.
The e-chip acts as a media access controller (MAC) that parcels data from the router’s client-side pluggable optical modules for the p-chip.
Nokia even designed memory for the FP4 whereby instructions can be implemented during memory access and the memory can be allocated to perform different types of look-up and buffering, depending on requirements.
To maximise the memory’s performance, Nokia used advanced packaging for the FP4’s p-chip and q-chip. The resulting 2.5D-packaged p-chip comprises the packet processor die and stacks of memory. The q-chip is also a multi-chip module containing RISC processors and buffering memory.
The FP4 uses 56Gbps PAM-4 serialiser-deserialiser (serdes) interfaces, technology that Nokia secured from Broadcom.
FP5’s features
The FP5 builds on the major architectural upgrade undertaken with the FP4.
Using a 7nm CMOS process technology, Nokia’s FP5 designers have combined on-chip what were two separate FP4 chips: the packet processor (p-chip) and traffic manager (q-chip).
The FP5 chipset consumes a quarter of the power of the FP4 in terms of watts-per-gigabit (0.1W/Gig for the FP5 compared to the FP4’s 0.4W/Gig).
Consolidating two chips into one accounts for part of the power savings. Using 112Gbps serdes and a more advanced CMOS process are other factors.
Nokia has also added encryption hardware blocks to the chip’s ports. The hardware blocks implement the MACsec algorithm and can also encrypt layer 2.5 and layer 3 traffic.
The chipset can handle packet flows as large as 1.6 terabits. “We don’t have any physical interfaces that support flows at that rate,” says Adams. “It’s an indicator that the chipset is ready for much more.”
The e-chip, which Nokia describes as a tremendously important device, has also been upgraded. As well as the MAC function, it acts as an early-stage packet processor, performing pre-processing and pre-classification tasks on the traffic.
The e-chip also performs pre-buffering for the packer processor. Using multiple such devices allows the line card to expand the forwarding limit of the FP5’s packet processor. This enables Nokia’s routers to perform what it calls intelligent aggregation (IA). “We can bring in more traffic, increase the number of ingress ports even if those ports start to get fully loaded, because of the chipset architecture being fully buffered,” says Adams. “The result is a 30 per cent uplift in the stated capacity numbers.”
The FP5 chipset has been taped out and the silicon is being tested in Nokia’s lab.
Router platforms
IP core routers are tasked with moving large amounts of IP traffic across a network backbone. IP edge routers, in contrast, typically aggregate a variety of services such as mobile transport, residential traffic or act as gateways.

The platforms that will use the FP5 are classified by Nokia as edge routers. “The boundaries have blurred,” says Adams. “It is more important to look at how applications are deployed and what the requirements are.”
The platforms using the FP5 are the existing 7750 SR-14s and 7750-SR7s routers that were announced with the launch of the FP4.
These chassis were designed to accommodate Nokia’s current and next-generation router cards. “This allows operators to retain the same chassis and support a mix of FP4 and FP5 cards, growing into them gradually,” says Adams.
Nokia has announced three other platforms: two mid-range platforms, the 7750 SR2-se and the 7750 SR1-se, and the 7750 SR-1 that will be available in six variants. “They [the SR-1 boxes] are going to be available in a range of configurations and different port speeds,” says Adams.
Platforms using the FP5 chipset will ship in the first half of 2022, starting with the SR-1.
Nokia also announced an FP5 expandable media adaptor (XMA) line card for the non-fixed platforms (the 7750 SR-14s/ SR-7s and SR-2se). The card supports 36 pluggable slots and with 400 Gigabit Ethernet (GbE) has a capacity of 14.4Tbps full-duplex or 19.2Tbps in intelligent aggregation mode.
The card will also support 400ZR and ZR+ coherent modules and is ready for 800GbE pluggables that will double the card’s capacity ratings.
Nokia says the FP5 improves the throughput of the XMA card by a factor of three: Nokia’s 4.8Tbps XMA (12Tbps IA) uses four FP4 chipsets while the latest 14.4Tbps (19.2Tbps IA) XMA uses six FP5 chipsets.
Custom silicon versus disaggregated designs
Nokia says the benefits of having its own chipset justify the intellectual effort and development expense, even when advanced merchant silicon is available and certain CSPs are embracing open disaggregated routers.
“We feel there is a need in the industry for platforms based on this kind of technology,” says Adams.
What is important is the total cost of ownership and that Nokia’s systems are deployed in critical networks where resiliency, reliability, the feature set and network security are all critical, says Adams.
Nokia also points to the progress it has made since the launch of the FP4. “We have secured 350 projects, two-thirds of which were new footprints or competitive displacements,” says Adams. Nokia’s IP revenues in 2020 were $3.2 billion.
That said, Nokia also partners with merchant silicon vendors: the 7250 IXR interconnect router uses merchant silicon, for example.
“If I look at disaggregation, absolutely, it is an interesting area,” says Adams. “But I think it is very early days.”
Neil McRae, managing director and chief architect at BT, says that while some operators are looking at disaggregated software and hardware, BT doesn’t believe this is necessarily the best solution in terms of performance, reliability or cost.

“Increasingly, the ratio of capital investment in core networking is moving towards optical transceivers than router silicon,” says McRae. “But to get the most out of the network and the router, using custom silicon for the most demanding cases still delivers the best outcomes.
“In our live network but also in our testing, the integrated solution is more reliable, easier to operate and a significant improvement from a total cost of ownership point of view,” says McRae.
BT says it will be able to scale interfaces on the 7750 from 1-400Gbs using the FP5 and Nokia’s SR-OS routing networking software.
BT also highlights the importance of reliability under demand, pointing out how the CSP’s traffic has doubled during the pandemic without impacting its customers.
“Nokia’s understanding of how the underlying silicon is going to react in different situations gives them a significant advantage in building the software on top that performs in challenging situations,” says McRae.
Chip Strategy
Nokia says that were it to sell its FP5 silicon as a standalone product, it would enter a very different design environment.
“You are designing to the requirements of multiple customers versus designing for your systems,” says Adams.
Nokia’s belief is that there is strong demand for platforms designed to purpose.
“We are staying true to that strategy,” says Adams.
The era of cloud-scale routeing
Nokia's FP4 p-chip. The multi-chip module shows five packages: the p-chip die surrounded by four memory stacks. Each stack has five memory die. The p-chip and memory stacks are interconnected using an interposer. - Nokia has unveiled the FP4, a 2.4 terabit-per-second network processor that has 6x the throughput of its existing FP3.
- The FP4 is a four-IC chipset implemented using 16nm CMOS FinFET technology. Two of the four devices in the chipset are multi-chip modules.
- The FP4 uses 56 gigabit-per-second serial-deserialiser (serdes) technology from Broadcom, implemented using PAM-4 modulation. It also supports terabit flows.
- Nokia announced IP edge and core router platforms that will use the FP4, the largest configuration being a 0.58 petabit switching capacity router.
Much can happen in an internet minute. In that time, 4.1 million YouTube videos are viewed, compared to 2.8 million views a minute only last year. Meanwhile, new internet uses continue to emerge. Take voice-activated devices, for example. Amazon ships 50 of its Echo devices every minute, almost one a second.
Given all that happens each minute, predicting where the internet will be in a decade’s time is challenging. But that is the task Alcatel-Lucent’s (now Nokia’s) chip designers set themselves in 2011 after the launch of its FP3 network processor chipset that powers its IP-router platforms.
Six years on and its successor - the FP4 - has just been announced. The FP4 is the industry’s first multi-terabit network processor that will be the mainstay of Nokia’s IP router platforms for years to come.
Cloud-scale routing
At the FP4’s launch, Nokia’s CEO, Rajeev Suri, discussed the ‘next chapter’ of the internet that includes smart cities, new higher-definition video formats and the growing number of connected devices.
IP traffic is growing at a compound annual growth rate (CAGR) of 25 percent through to 2022, according to Nokia Bell Labs, while peak data rates are growing at a 39 percent CAGR. Nokia Bell Labs also forecasts that the number of connected devices will grow from 12 billion this year to 100 billion by 2025.
Basil Alwan, Nokia’s president of IP and optical networks, said the internet has entered the era of cloud-scale routeing. When delivering a cloud service, rarely is the request fulfilled by one data centre. Rather, several data centres are involved in fulfilling the tasks. “One transaction to the cloud is multiplied,” said Alwan.
IP traffic is also becoming more dynamic, while the Internet of Things presents a massive security challenge.
Alwan also mentioned how internet content providers have much greater visibility into their traffic whereas the telcos’ view of what flows in their networks is limited. Hence their interest in analytics to understand and manage their networks better.
These are the trends that influenced the design of the FP4.
We put a big emphasis on making sure we had a high degree of telemetry coming out at the chip level
FP4 goals
Telemetry, the sending of measurement data for monitoring purposes, and network security were two key design goals for the FP4.
Steve Vogelsang“We put a big emphasis on making sure we had a high degree of telemetry coming out at the chip level,” said Steve Vogelsang, CTO for Nokia's IP and optical business.
Tasks include counters, collecting statistics and packet copying. “This is to make sure we have the instrumentation coming off these systems that we can use to drive the [network] analytics platform,” said Vogelsang.
Being able to see the applications flowing in the network benefits security. Distributed Denial-of-Service (DDoS) attacks are handled by diverting traffic to a ‘scrubbing centre’ where sophisticated equipment separates legitimate IP packets from attack traffic that needs scrubbing.
The FP4 supports the deeper inspection of packets. “Once we identify a threat, we can scrub that traffic directly in the network,” said Vogelsang. Nokia claims that that the FP4 can deal with over 90 percent of the traffic that would normally go to a scrubbing centre.
Chipset architecture
Nokia’s current FP3 network processor chipset comprises three devices: the p-chip network processor, the q-chip traffic manager and the t-chip fabric interface device.
The p-chip network processor inspects packets and performs table look-ups using fast-access memory to determine where packets should be forwarded. The q-chip is the traffic manager that oversees the packet flows and decides how packets should be dealt with, especially when congestion occurs. The third FP3 chip is the t-chip that interfaces to the router fabric.
The FP4 retains the three chips and adds a fourth: the e-chip - a media access controller (MAC) that parcels data from the router’s client-side pluggable optical modules for the p-chip. However, while the FP4 retains the same nomenclature for the chips as the FP3, the CMOS process, chip architecture and packaging used to implement the FP4 are significantly more advanced.
The FP4 can deal with over 90 percent of the traffic that would normally go to a scrubbing centre
Nokia is not providing much detail regarding FP4 chipset's architecture, unlike the launch of the FP3. “We wanted to focus on the re-architecture we have gone through,” said Vogelsang. But looking at the FP3 design, insight can be gained as to how the FP4 has likely changed.
The FP3’s p-chip uses 288 programmable cores. Each programmable core can process two instructions each clock cycle and is clocked at 1GHz.
The 288 cores are arranged as a 32-row-by-9-column array. Each row of cores can be viewed as a packet-processing pipeline. A row pipeline can also be segmented to perform independent tasks. The array’s columns are associated with table look-ups. The resulting FP3 p-chip is a 400-gigabit network processor.
Vogelsang said there is limited scope to increase the clock speed of the FP4 p-chip beyond 1GHz. Accordingly, the bulk of the FP4’s sixfold throughput improvement is the result of a combination of programmable core enhancements, possible a larger core array and, most importantly, system improvements. In particular, the memory architecture is now packaged within the p-chip for fast look-ups, while the chipset’s input-output lanes have been boosted from 10 gigabits-per-second (Gbps) to 50Gbps.
Nokia has sought to reuse as much of the existing microcode to program the cores for the FP4 p-chip but has added new instructions to take advantage of changes in the pipeline.
Software compatibility already exists at the router operating system level. The same SROS router operating system runs on Nokia’s network processors, merchant hardware from the like of Broadcom and on x86 instruction-set microprocessors in servers using virtualisation technology.
Such compatibility is achieved using a hardware abstraction layer that sits between the operating system and the underlying hardware. “The majority of the software we write has no idea what the underlying hardware is,” said Vogelsang.
Nokia has a small team of software engineers focussed on the FP4’s microcode changes but, due to the hardware abstraction layer, such changes are transparent to the main software developers.
The FP3’s traffic manager, the q-chip, comprises four reduced instruction set computer (RISC) cores clocked at 900MHz. This too has been scaled up for the FP4 but Nokia has not given details.
The t-chip interfaces to the switch fabric that sits on a separate card. In previous generations of router products, a mid-plane is used, said Nokia. This has been scrapped with the new router products being announced. Instead, the switch cards are held horizontally in the chassis and the line cards are vertical. “A bunch of metal guides are used to guide the two cards and they directly connect to each other,” said Vogelsang. “The t-chips are what interface to these connectors inside the system.”
The MAC e-chip interfaces to the line card’s pluggable modules and support up to a terabit flow. Indeed, the MAC will support integer multiples of 100 Gigabit Ethernet from 100 gigabit to 1 terabit. Nokia has a pre-standard implementation of FlexMAC that allows it to combine lanes across multiple transceivers into a single interface.
Nokia will have line cards that support 24 or 36 QSFP-DD pluggable modules, with each module able to support 400 Gigabit Ethernet.
The FP4 is also twice as power efficient, consuming 4 gigabit/W.
We wanted to make sure we used a high-volume chip-packaging technology that was being driven by other industries and we found that in the gaming industry
Design choices
One significance difference between the two network processor generations is the CMOS process used. Nokia skipped 28nm and 22nm CMOS nodes to go from 40nm CMOS for the FP3 to 16nm FinFET for the FP4. “We looked at that and we did not see all the technologies we would need coming together to get the step-function in performance that we wanted,” said Vogelsang.
Nokia also designed its own memory for the FP4.
“A challenge we face with each generation of network processor is finding memories and memory suppliers that can offer the performance we need,” said Vogelsang. The memory Nokia designed is described as intelligent: instructions can effectively be implemented during memory access and the memory can be allocated to do different types of look-up and buffering, depending on requirements.
Another key area associated with maximising the performance of the memory is the packaging. Nokia has adopted multi-chip module technology for the p-chip and the q-chip.
“We wanted to make sure we used a high-volume chip-packaging technology that was being driven by other industries and we found that in the gaming industry,” said Vogelsang, pointing out that the graphics processing unit (GPU) has similar requirements to those of a network processor. GPUs are highly memory intensive while manipulating bits on a screen is similar to manipulating headers and packets.
The resulting 2.5D packaged p-chip comprises the packet processor die and stacks of memory. Each memory stack comprises 5 memory die. All sit on an interposer substrate - itself a die that is used for dense interconnect of devices. The resulting FP4 p-chip is thus a 22-die multi-chip module.
“Our memory stacks are connected at the die edges and do not use through-silicon vias,” said Vogelsang. “Hence it is technically a 2.5D package [rather than 3D].”
The q-chip is also implemented as a multi-chip module containing RISC processors and buffering memory, whereas the router fabric t-chip and MAC e-chip are single-die ICs.
The FP4’s more advanced CMOS process also enables significantly faster interfaces. The FP4 uses PAM-4 modulation to implement 56Gbps interfaces. “You really need to run those bit rates much much higher to get the traffic into and out of the chip,” said Vogelsang.
Nokia says it is using embedded serialiser-deserialiser interface technology from Broadcom.
Next-gen routers
Nokia has also detailed the IP edge and core routers that will use the FP4 network processor.
The 7750 Service Router (SR-s) edge router family will support up to 144 terabits in a single shelf. This highest capacity configuration is the 7750 SR-14. It is a 24-rack-unit-plus-the-power-supply high chassis and supports a dozen line cards, each 12Tbps when using 100-gigabit modules, or 24x400GbE when using QSFP-DD modules.
Another new platform is the huge 7950 Extensible Routing System (XRS-XC) IP core router which can be scaled to 576 terabits - over half a petabit - when used in a six-chassis configuration. Combining the six chassis does not make require the use of front-panel client-side interfaces. Instead, dedicated interfaces are used with active optical cables to interlink the chassis.
The first router products will be shipped to customers at the year end with general availability expected from the first quarter of 2018.
A Terabit network processor by 2015?
Given that 100 Gigabit merchant silicon network processors will appear this year only, it sounds premature to discuss Terabit devices. But Alcatel-Lucent's latest core router family uses the 400 Gigabit FP3 packet-processing chipset, and one Terabit is the next obvious development.
Source: Gazettabyte
Core routers achieved Terabit scale awhile back. Alcatel-Lucent's recently announced IP core router family includes the high-end 32 Terabit 7950 XRS-40, expected in the first half of 2013. The platform has 40 slots and will support up to 160, 100 Gigabit Ethernet interfaces.
Its FP3 400 Gigabit network processor chipset, announced in 2011, is already used in Alcatel-Lucent's edge routers but the 7950 is its first router platform family to exploit fully the chipset's capability.
The 7950 family comes with a selection of 10, 40 and 100 Gigabit Ethernet interfaces. Alcatel-Lucent has designed the router hardware such that the card-level control functions are separate from the Ethernet interfaces and FP3 chipset that both sit on the line card. The re-design is to preserve the service provider's investment. Carrier modules can be upgraded independently of the media line cards, the bulk of the line card investment.
The 7950 XRS-20 platform, in trials now, has 20 slots which take the interface modules - dubbed XMAs (media adapters) - that house the various Ethernet interface options and the FP3 chipset. What Alcatel-Lucent calls the card-level control complex is the carrier module (XCM), of which there are up to are 10 in the system. The XCM, which includes control processing, interfaces to the router's system fabric and holds up to two XMAs.
There are two XCM types used with the 7950 family router members. The 800 Gigabit-per-second (Gbps) XCM supports a pair of 400Gbps XMAs or 200Gbps XMAs, while the 400Gbps XCM supports a single 400Gbps XMA or a pair of 200Gbps XMAs.
The slots that host the XCMs can scale to 2 Terabits, suggesting that the platforms are already designed with the next packet processor architecture in mind.
FP3 chipset
The FP3 chipset, like the previous generation 100Gbps FP2, comprises three devices: the P-chip network processor, a Q-chip traffic manager and the T-chip that interfaces to the router fabric.
The P-chip inspects packets and performs the look ups that determine where the packets should be forwarded. The P-chip determines a packet's class and the quality of service it requires and tells the Q-chip traffic manager in which queue the packet is to be placed. The Q-chip handles the packet flows and makes decisions as to how packets should be dealt with, especially when congestion occurs.
The basic metrics of the 100Gbps FP2 P-chip is that it is clocked at 840GHz and has 112 micro-coded programmable cores, arranged as 16 rows by 7 columns. To scale to 400Gbps, the FP3 P-chip is clocked at 1GHz (x1.2) and has 288 cores arranged as a 32x9 matrix (x2.6). The cores in the FP3 have also been re-architected such that two instructions can be executed per clock cycle. However this achieves a 30-35% performance enhancement rather than 2x since there are data dependencies and it is not always possible to execute instructions in parallel. Collectively the FP3 enhancements provide the needed 4x improvement to achieve 400Gbps packet processing performance.
The FP3's traffic manager Q-chip retains the FP2's four RISC cores but the instruction set has been enhanced and the cores are now clocked at 900GHz.
Terabit packet processing
Alcatel-Lucent has kept the same line card configuration of using two P-chips with each Q-chip. The second P-chip is viewed as an inexpensive way to add spare processing in case operators need to support more complex service mixes in future. However, it is rare that in the example of the FP2-based line card, the capability of the second P-chip has been used, says Alcatel-Lucent.
Having the second P-chip certainly boosts the overall packet processing on the line card but at some point Alcatel-Lucent will develop the FP4 and the next obvious speed hike is 1 Terabit.
Moving to a 28nm or an even more advanced CMOS process will enable the clock speed of the P-chip to be increased but probably not by much. A 1.2GHz clock would still require a further more-than-doubling of the cores, assuming Alcatel-Lucent doesn't also boost processing performance elswhere to achieve the overall 2.5x speed-up to a 1 Terabit FP4.
However, there are two obvious hurdles to be overcome to achieve a Terabit network processor: electrical interface speeds and memory.
Alcatel-Lucent settled on 10Gbps SERDES to carry traffic between the chips and for the interfaces on the T-chip, believing the technology the most viable and sufficiently mature when the design was undertaken. A Terabit FP4 will likely adopt 25Gbps interfaces to provide the required 2.5x I/O boost.
Another even more significant challenge is the memory speed improvement needed for the look up tables and for packet buffering. Alcatel-Lucent worked with the leading memory vendors when designing the FP3 and will do the same for its next-generation design.
Alcatel-Lucent, not surprisingly, will not comment on devices it has yet to announce. But the company did say that none of the identified design challenges for the next chipset are insurmountable.
Further reading:
Network processors to support multiple 100 Gigabit flows
A more detailed look at the FP3 in New Electronics, click here
