PCI-SIG releases the next PCI Express bus specification

The Peripheral Component Interconnect Express (PCIe) 6.0 specification doubles the data rate to deliver 64 giga-transfers-per-second (GT/s) per lane.
For a 16-lane configuration, the resulting bidirectional data transfer capacity is 256 gigabytes-per-second (GBps).
“We’ve doubled the I/O bandwidth in two and a half years, and the average pace is now under three years,” says Al Yanes, President of the Peripheral Component Interconnect – Special Interest Group (PCI-SIG).
The significance of the specification’s release is that PCI-SIG members can now plan their products.
Users of FPGA-based accelerators, for example, will know that in 12-18 months there will be motherboards running at such rates, says Yanes
Applications
The PCIe bus is used widely for such applications as storage, processors, artificial intelligence (AI), the Internet of Things (IoT), mobile, and automotive.
In servers, PCIe has been adopted for storage and by general-purpose processors and specialist devices such as FPGAs, graphics processor units (GPUs) and AI hardware.
The CXL standard enables server disaggregation by interconnecting processors, accelerator devices, memory, and switching, with the protocol sitting on top of the PCIe physical layer. The NVM Express (NVMe) storage standard similarly uses PCIe.
“If you are on those platforms, you know you have a healthy roadmap; this technology has legs,” says Yanes.
A focus area for PCI-SIG is automotive which accounts for the recent membership growth; the organisation now has 900 members. PCI-SIG has also created a new workgroup addressing automotive.
Yanes attributes the automotive industry’s interest in PCIe due to the need for bandwidth and real-time analysis within cars. Advanced driver assistance systems, for example, use a variety of sensors and technologies such as AI.
PCIe 6.0
The PCIe bus uses a dual simplex scheme – serial transmissions in both directions – referred to as a lane. The bus can be configured in several lane configurations: x1, x2, x4, x8, x12, x16 and x32, although x2, x12 and x32 are rarely used.
PCIe 6.0’s 64GT/s per lane is double that of PCIe 5.0 that is already emerging in ICs and products.
IBM’s latest 7nm POWER10 16-core processor, for example, uses the PCIe 5.0 bus as part of its I/O, while the latest data processing units (DPUs) from Marvell (Octeon 10) and Nvidia (BlueField 3) also support PCIe 5.0.
To achieve the 64GT/s transfer rates, the PCIe bus has adopted 4-level pulse amplitude modulation (PAM-4) signalling. This requires forward error correction (FEC) to offset the bit error rates of PAM-4 while minimising the impact on latency. And low latency is key given the PCIe PHY layer is used by such protocols as CXL that carry coherency and memory traffic. (see IEEE Micro article.)
The latest specification also adopts flow control unit (FLIT) encoding. Here, fixed 256-byte packets are sent: 236 bytes of data and 20 bytes of cyclic redundancy check (CRC).
Using fixed-length packets simplifies the encoding, says Yanes. Since the PCIe 3.0 specification, 128b/130b encoding has been used for clock recovery and the aligning of data. Now with the fixed-sized packet of FLIT, no encoding bits are needed. “They know where the data starts and where it ends,” says Yanes.
Silicon designed for PCIe 6.0 will also be able to use FLITs with earlier standard PCIe transfer speeds.
Yanes says power-saving modes have been added with the release. Both ends of a link can agree to make lanes inactive when they are not being used.

Status and developments
IP blocks for PCIe 6.0 already exist while demonstrations and technology validations will occur this year. First products using PCIe 6.0 will appear in 2023.
Yanes expects PCIe 6.0 to be used first in servers with accelerators used for AI and machine learning, and also where 800 Gigabit Ethernet will be needed.
PCI-SIG is also working to develop new cabling for PCIe 5.0 and PCIe 6.0 for sectors such as automotive. This will aid the technology’s adoption, he says
Meanwhile, work has begun on PCIe 7.0.
“I would be ecstatic if we can double the data rate to 128GT/s in two and a half years,” says Yanes. “We will be investigating that in the next couple of months.”
One challenge with the PCIe standard is that it borrows the underlying technology from telecom and datacom. But the transfer rates it uses are higher than the equivalent rates used in telecom and datacom.
So, while PCI 6.0 has adopted 64GT/s, the equivalent rate used in telecom is 56Gbps only. The same will apply if PCI-SIG chooses 128GT/s as the next data rate given that telecom uses 112Gbps.
Yanes notes, however, that telecom requires much greater reaches whereas PCIe runs on motherboards, albeit ones using advanced printed circuit board (PCB) materials.
800G MSA defines PSM8 while eyeing 400G’s progress

A key current issue regarding data centres is forecasting the uptake of 400-gigabit optics.
If a rapid uptake of 400-gigabit optics occurs, it will also benefit the transition to 800-gigabit modules. But if the uptake of 400-gigabit optics is slower, some hyperscalers could defer and wait for 800-gigabit pluggables instead.
So says Maxim Kuschnerov, a spokesperson for the 800G Pluggable MSA (multi-source agreement).
The 800G MSA has issued its first 800-gigabit pluggable specification.
Dubbed the PSM8, the design uses the same components as 400-gigabit optics, doubling capacity in the same QSFP-DD pluggable form factor.
“Four-hundred-gigabit modules hitting volume is crucially important because the 800-gigabit specification leverages 400-gigabit components,” says Kuschnerov. “The more 400-gigabit is delayed, it impacts everything that comes after.”
PSM8
The PSM8 is an eight-channel parallel single-mode (PSM) fibre design, each fibre carrying 100 gigabits of data.
The 100m-reach PSM8 version 1.0 specification was published in August, less than a year after the 800G MSA was announced.
The 800G Pluggable MSA is developing two other 800-gigabit specifications based on 200-gigabit electrical and optical lanes.
One is a 500m four-fibre 800-gigabit implementation, each fibre a 200-gigabit channel. This is an 800-gigabit equivalent of the existing 400-gigabit IEEE DR4 standard.
The second design is a single-fibre four-channel coarse wavelength-division multiplexing (CWDM) with a 2km reach, effectively an 800-gigabit CWDM4.
Specifications
The 800G MSA chose to tackle a parallel single-mode fibre design because the components needed already exist. In turn, a competing initiative, the IEEE’s 100-gigabit-per-lane multi-mode fibre approach, will have a lesser reach.
“The IEEE has an activity for 100-gigabit per lane for multi-mode but the reach is 50m,” says Kuschnerov. “How much market will you get with a limited-reach objective?”
In contrast, the 100m reach of the PSM8 better serves applications in the data centre and offers a path for single-mode fibre which, long-term, will provide general data centre connectivity, argues Kuschnerov, whether parallel fibre or a CWDM approach.
Investment will also be needed to advance multi-mode optics to achieve 100 gigabits whereas PSM8 will use 50 gigabaud optics already used by 400-gigabit modules.
Kuschnerov stresses that the PSM8 is not a repackaging of two IEEE 400-gigabit DR4s designs. The PSM8 uses more relaxed specifications to reduce cost; a possibility given PSM8’s 100m reach compared to the DR4’s 500m.
“We have relaxed various specifications to enable more choice,” says Kuschnerov. For example, externally modulated lasers (EMLs), directly modulated lasers (DMLs) and silicon photonics-based designs can all be used.
The transmitter power has also been reduced by 2.5dB compared to the DR4, while the extinction ratio of the modulator is 1.5dB less.
The need for an 800-gigabit in a QSFP-800DD form factor is to serve emerging 25.6-terabit Ethernet switches. Using 400-gigabit optics, a 2-rack-unit-high (2RU) switch is needed whereas a 1RU switch platform is possible using 800-gigabit pluggables.
“The big data centre players all have different plans and their own roadmaps,” says Kuschnerov. “From our observation of the industry, the upgrading speed for 400 gigabit and 800 gigabit is slower than what was expected a year ago.”
First samples of the PSM8 module are expected in the second half of 2021 with volume production in 2023.
800-gigabit PSM4 and CWDM4
The members of the MSA have already undertaking pre-development work on the two other specifications that use 200-gigabit-per-lane optics: the 800-gigabit PSM4 and the CWDM4.
“It was a lot of work discussing the feasibility of 200-gigabit-per-lane,” says Kuschnerov. There is much experimental work to be done regarding the choice of modulation format and forward error correction (FEC) scheme which will need to be incorporated in future 4-level pulse-amplitude modulation (PAM-4) digital signal processors.
“We are progressing, the key is low power and low latency which is crucial here,” says Kushnerov. A tradeoff will be needed in the chosen FEC scheme ensuring sufficient coding gain while minimising its contribution to the overall latency.
As for the modulation scheme, while different PAM schemes are possible, PAM-4 already looks like the front runner, says Kuschnerov.

The 800G Pluggable MSA is at the proof-of-concept stage, with a demonstration of working 200-gigabit-per-lane optics at the recent CIOE show held in Shenzhen, China. “Some of the components used are not just prototypes but are designed for this use case although we are not there yet with an end-to-end product.”
The designs will require 200-gigabit electrical and optical lanes. The OIF has just started work on 200-gigabit electrical interfaces and will likely only be completed in 2025. Achieving the required power consumption will also be a challenge.
Catalyst
Since the embrace of 200-gigabit-per-lane technology by the 800G Pluggable MSA just over a year ago, other initiatives are embracing the rate.
The IEEE has started its ‘Beyond 400G’ initiative that is defining the next Ethernet specification and both 800-gigabit and 1.6 terabit optics are under consideration. As has the OIF with its next-generation 224-gigabit electrical interface.
“These activities will enable a 200-gigabit ecosystem,” says Kuschnerov. “Our focus is on 800-gigabit but it is having a much wider impact beyond 4×200-gigabit, it is impacting 1.6 terabits and impacting serdes (serialisers/ deserialisers).”
The 800G Pluggable MSA is doing its small part but what is needed is the development of an end-to-end 200-gigabit ecosystem, he says: “This is a challenging undertaking.”
The 800G Pluggable MSA now has 40 members including hyperscalers, switch makers, systems vendors, and component and module makers.
The CWDM8 MSA avoids PAM-4 to fast-track 400G
Another multi-source agreement (MSA) group has been created to speed up the market introduction of 400-gigabit client-side optical interfaces.
The CWDM8 MSA is described by its founding members as a pragmatic approach to provide 400-gigabit modules in time for the emergence of next-generation switches next year. The CWDM8 MSA was announced at the ECOC show held in Gothenburg last week.
Robert BlumThe eight-wavelength coarse wavelength-division multiplexing (CWDM) MSA is being promoted as a low-cost alternative to the IEEE 803.3bs 400 Gigabit Ethernet Task Force’s 400-gigabit eight-wavelength specifications, and less risky than the newly launched 100G Lambda MSA specifications based on four 100-gigabit wavelengths for 400 gigabit.
“The 100G Lambda has merits and we are also part of that MSA,” says Robert Blum, director of strategic marketing and business development at Intel’s silicon photonics product division. “We just feel the time to get to 100-gigabit-per-lambda is really when you get to 800 Gigabit Ethernet.”
Intel is one of the 11 founding companies of the CWDM8 MSA.
Specification
The CWDM8 MSA will develop specifications for 2km and 10km links. The MSA uses wavelengths spaced 20nm apart. As a result, unlike the IEEE’s 400GBASE-FR8 and 400GBASE-LR8 that use the tightly-spaced LAN-WDM wavelength scheme, no temperature control of the lasers is required. “It is just like the CWDM4 but you add four more wavelengths,” says Blum.
The CWDM8 MSA also differs from the IEEE specifications and the 100G Lambda MSA in that it does not use 4-level pulse-amplitude modulation (PAM-4). Instead, 50-gigabit non-return-to-zero (NRZ) signalling is used for each of the eight wavelengths.
The MSA will use the standard CDAUI-8 8x50-gigabit PAM-4 electrical interface. Accordingly, a retimer chip will be required inside the module to translate the input 50-gigabit PAM electrical signal to 50-gigabit NRZ. According to Intel, several companies are developing such a chip.
When we looked at what is available and how to do an optical interface, there was no good solution that would allow us to meet those timelines
Benefits
Customers are telling Intel that they need 400-gigabit duplex-fibre optical modules early next year and that they want to have them in production by the end of 2018.
“When we looked at what is available and how to do an optical interface, there was no good solution that would allow us to meet those timelines, fit the power budget of the QSFP-DD [module] and be at the cost points required for data centre deployment,” says Blum.
An 8x50-gigabit NRZ approach is seen as a pragmatic solution to meet these requirements.
No PAM-4 physical layer DSP chip is needed since NRZ is used. The link budget is significantly better compared to using PAM-4 modulation. And there is a time-to-market advantage since the technologies used for the CWDM8 are already proven.
We just think it [100-gigabit PAM4] is going to take longer than some people believe
This is not the case for the emerging 100-gigabit-per-wavelength MSA that uses 50-gigabaud PAM-4. “PAM-4 makes a lot of sense on the electrical side, a low-bandwidth [25 gigabaud], high signal-to-noise ratio link, but it is not the ideal when you have high bandwidth on the optical components [50 gigabaud] and you have a lot of noise,” says Blum.
One-hundred-gigabit-per-wavelength will be needed for the optical path, says Blum, but for 800 Gigabit Ethernet with its eight electrical channels and eight optical ones. “We just think it [100-gigabit PAM4] is going to take longer than some people believe.” Meanwhile, the CWDM8 is the best approach to meet market demand for a 400-gigabit duplex interfaces to support next-generation data centre switches expected next year, says Blum.
The founding members of the CWDM8 MSA include chip and optical component players as well as switch system makers. Unlike the 100G Lambda MSA, no larger-scale data centre operators have joined the MSA.
The members are Accton, Barefoot Networks, Credo Semiconductor, Hisense, Innovium, Intel, MACOM, Mellanox, Neophotonics and Rockley Photonics.


