The APC’s blueprint for silicon photonics

The Advanced Photonics Coalition (APC) wants to smooth the path for silicon photonics to become a high-volume manufacturing technology.
The organisation is talking to companies to tackle issues whose solutions will benefit the photonics technology.
The Advanced Photonics Coalition wants to act as an industry catalyst to prove technologies and reduce the risk associated with their development, says Jeffery Maki, Distinguished Engineer at Juniper Networks and a member of the Advanced Photonics Coalition’s board.
Origins
The Advanced Photonics Coalition was unveiled at the Photonic-Enabled Cloud Computing (PECC) Industry Summit jointly held with Optica last October.
The Coalition was formerly known as the Coalition for On-Board Optics (COBO), an industry initiative led by Microsoft.
Microsoft wanted a standard for on-board optics, until then it was a proprietary technology. At the time, on-board optics was seen as an important stepping stone between pluggable optical modules and their ultimate successor, co-packaged optics.
After years of work developing specifications and products, Microsoft chose not to adopt on-board optics in its data centres. Although COBO added other work activities, such as co-packaged optics, the organisation lost momentum and members.
Maki stresses that COBO always intended to tackle other work besides its on-board optics starting point.
Now, this is the Advanced Photonics Coalition’s goal: to have a broad remit to create working groups to address a range of issues.
Tackling technologies
Many standards organisations publish specifications but leave the implementation technologies to their member companies. In contrast, the Advanced Photonics Coalition is taking a technology focus. It wants to remove hurdles associated with silicon photonics to ease its adoption.
“Today, we see the artificial intelligence and machine learning opportunities growing, both in software and hardware,” says Maki. “We see a need in the coming years for more hardware and innovative solutions, especially in power, latency, and interconnects.”
Work Groups
In the past, systems vendors like Cisco or Juniper drove industry initiatives, and other companies fell in line. More recently, it was the hyperscalers that took on the role.
There is less of that now, says Maki: “We have a lot of companies with technologies and good ideas, but there is not a strong leadership.”
The Advanced Photonics Coalition wants to fill that void and address companies’ common concerns in critical areas. “Key customers will then see the value of, and be able to access, that standard or technology that’s then fostered,” says Maki.
The Advanced Photonics Coalition has yet to announce new working groups but it expects to do so in 2024.
One area of interest is silicon photonics foundries and their process design kits (PDKs). Each foundry has a PDK, made up of tools, models, and documentation, to help engineers with the design and manufacture of photonic integrated devices.
“A starting point might be support for more than one foundry in a multi-foundry PDK,” says Maki. “Perhaps a menu item to select the desired foundry where more than one foundry has been verified to support.”
Silicon photonics has long been promoted as a high-volume manufacturing technology for the optical industry. “But it is not if it has been siloed into separate efforts such that there is not that common volume,” says Maki.
Such a PDK effort would identify gaps that each foundry would need to fill. “The point is to provide for more than one foundry to be able to produce the item,” he says.
A company is also talking to the Advanced Photonics Coalition about co-packaged optics. The company has developed an advanced co-packaged optics solution, but it is proprietary.
“Even with a proprietary offering, one can make changes to improve market acceptance,” says Maki. The aim is to identify the areas of greatest contention and remedy them first, for example, the external laser source. “Opening that up to other suppliers through standards adoption, existing or new, is one possibility,” he says.
The Advanced Photonics Coalition is also exploring optical interconnecting definitions with companies. “How we do fibre-attached to silicon photonics, there’s a desire that there is standardisation to open up the market more,” says Maki. “That’s more surgical but still valuable.”
And there are discussions about a working group to address co-packaged optics for the radio access network (RAN). Ericsson is one company interested in co-packaged optics for the RAN. Another working group being discussed could tackle optical backplanes.
Maki says there are opportunities here to benefit the industry.
“Companies should understand that nothing is slowing them down or blocking them from doing something other than their ingenuity or their own time,” he says.
Status
COBO had 50 members earlier in 2023. Now, the membership listed on the website has dropped to 39 and the number could further dip; companies that joined for COBO may still decide to leave.
At the time of writing, an new as yet unannounced member has joined the Advanced Photonics Coalition, taking the membership to 40.
“Some of those companies that left, we think they will return once we get the working groups formed,” says Maki, who remains confident that the organisation will play an important industry role.
“Every time I have a conversation with a company about the status of the market and the needs that they see for the coming years, there’s good alignment amongst multiple companies,” he says.
There is an opportunity for an organisation to focus on the implementation aspects and the various technology platforms and bring more harmony to them, something other standards organisations don’t do, says Maki.
The various paths to co-packaged optics

Near package optics has emerged as companies have encountered the complexities of co-packaged optics. It should not be viewed as an alternative to co-packaged optics but rather a pragmatic approach for its implementation.
Co-packaged optics will be one of several hot topics at the upcoming OFC show in March.
Placing optics next to silicon is seen as the only way to meet the future input-output (I/O) requirements of ICs such as Ethernet switches and high-end processors.
For now, pluggable optics do the job of routing traffic between Ethernet switch chips in the data centre. The pluggable modules sit on the switch platform’s front panel at the edge of the printed circuit board (PCB) hosting the switch chip.
But with switch silicon capacity doubling every two years, engineers are being challenged to get data into and out of the chip while ensuring power consumption does not rise.
One way to boost I/O and reduce power is to use on-board optics, bringing the optics onto the PCB nearer the switch chip to shorten the electrical traces linking the two.
The Consortium of On-Board Optics (COBO), set up in 2015, has developed specifications to ensure interoperability between on-board optics products from different vendors.
However, the industry has favoured a shorter still link distance, coupling the optics and ASIC in one package. Such co-packaging is tricky which explains why yet another approach has emerged: near package optics.
I/O bottleneck
“Everyone is looking for tighter and tighter integration between a switch ASIC, or ‘XPU’ chip, and the optics,” says Brad Booth, president at COBO and principal engineer, Azure hardware architecture at Microsoft. XPU is the generic term for an IC such as a CPU, a graphics processing unit (GPU) or even a data processing unit (DPU).
What kick-started interest in co-packaged optics was the desire to reduce power consumption and cost, says Booth. These remain important considerations but the biggest concern is getting sufficient bandwidth on and off these chips.
“The volume of high-speed signalling is constrained by the beachfront available to us,” he says.
Booth cites the example of a 16-lane PCI Express bus that requires 64 electrical traces for data alone, not including the power and ground signalling. “I can do that with two fibres,” says Booth.

Near package optics
With co-packaged optics, the switch chip is typically surrounded by 16 optical modules, all placed on an organic substrate (see diagram below).
“Another name for it is a multi-chip module,” says Nhat Nguyen, senior director, solutions architecture at optical I/O specialist, Ayar Labs.
A 25.6-terabit Ethernet switch chip requires 16, 1.6 terabits-per-second (1.6Tbps) optical modules while upcoming 51.2-terabit switch chips will use 3.2Tbps modules.
“The issue is that the multi-chip module can only be so large,” says Nguyen. “It is challenging with today’s technology to surround the 51.2-terabit ASIC with 16 optical modules.”

Near package optics tackles this by using a high-performance PCB substrate – an interposer – that sits on the host board, in contrast to co-packaged optics where the modules surround the chip on a multi-chip module substrate.
The near package optics’ interposer is more spacious, making the signal routing between the chip and optical modules easier while still meeting signal integrity requirements. Using the interposer means the whole PCB doesn’t need upgrading which would be extremely costly.
Some co-packaged optics design will use components from multiple suppliers. One concern is how to service a failed optical engine when testing the design before deployment. “That is one reason why a connector-based solution is being proposed,” says Booth. “And that also impacts the size of the substrate.”
A larger substrate is also needed to support both electrical and optical interfaces from the switch chip.
Platforms will not become all-optical immediately and direct-attached copper cabling will continue to be used in the data centre. However, the issue with electrical signalling, as mentioned, is it needs more space than fibre.
“We are in a transitional phase: we are not 100 per cent optics, we are not 100 per cent electrical anymore,” says Booth. “How do you make that transition and still build these systems?”
Perspectives
Ayar Labs views near package optics as akin to COBO. “It’s an attempt to bring COBO much closer to the ASIC,” says Hugo Saleh, senior vice president of commercial operations and managing director of Ayar Labs U.K.
However, COBO’s president, Booth, stresses that near package optics is different from COBO’s on-board optics work.
“The big difference is that COBO uses a PCB motherboard to do the connection whereas near package optics uses a substrate,” he says. “It is closer than where COBO can go.”
It means that with near package optics, there is no high-speed data bandwidth going through the PCB.
Booth says near package optics came about once it became obvious that the latest 51.2-terabit designs – the silicon, optics and the interfaces between them – cannot fit on even the largest organic substrates.
“It was beyond the current manufacturing capabilities,” says Booth. “That was the feedback that came back to Microsoft and Facebook (Meta) as part of our Joint Development Foundation.”
Near package optics is thus a pragmatic solution to an engineering challenge, says Booth. The larger substrate remains a form of co-packaging but it has been given a distinct name to highlight that it is different to the early-version approach.
Nathan Tracy, TE Connectivity and the OIF’s vice president of marketing, admits he is frustrated that the industry is using two terms since co-packaged optics and near package optics achieve the same thing. “It’s just a slight difference in implementation,” says Tracy.
The OIF is an industry forum studying the applications and technology issues of co-packaging and this month published its framework Implementation Agreement (IA) document.
COBO is another organisation working on specifications for co-packaged optics, focussing on connectivity issues.

Technical differences
Ayar Labs highlights the power penalty using near package optics due to its use of longer channel lengths.
For near package optics, lengths between the ASIC and optics can be up to 150mm with the channel loss constrained to 13dB. This is why the OIF is developing the XSR+ electrical interface, to expand the XSR’s reach for near package optics.
In contrast, co-packaged optics confines the modules and host ASIC to 50mm of each other. “The channel loss here is limited to 10dB,” says Nguyen. Co-packaged optics has a lower power consumption because of the shorter spans and 3dB saving.
Ayar Labs highlights its optical engine technology, the TeraPHY chiplet that combines silicon photonics and electronics in one die. The optical module surrounding the ASIC in a co-packaged design typically comprises three chips: the DSP, electrical interface and photonics.
“We can place the chiplet very close to the ASIC,” says Nguyen. The distance between the ASIC and the chiplet can be as close as 3-5mm. Whether on the same interposer Ayar Labs refers to such a design using athird term: in-package optics.
Ayar Labs says its chiplet can also be used for optical modules as part of a co-packaged design.
The very short distances using the chiplet result in a power efficiency of 5pJ/bit whereas that of an optical module is 15pJ/bit. Using TeraPHY for an optical module co-packaged design, the power efficiency is some 7.5pJ/bit, half that of a 3-chip module.
A 3-5mm distance also reduces the latency while the bandwidth density of the chiplet, measured in Gigabit/s/mm, is higher than the optical module.
Co-existence
Booth refers to near package optics as ‘CPO Gen-1’, the first generation of co-packaged optics.
“In essence, you have got to use technologies you have in hand to be able to build something,” says Booth. “Especially in the timeline that we want to demonstrate the technology.”
Is Microsoft backing near package optics?

“We are definitely saying yes if this is what it takes to get the first level of specifications developed,” says Booth.
But that does not mean the first products will be exclusively near package optics.
“Both will be available and around the same time,” says Booth. “There will be near packaged optics solutions that will be multi-vendor and there will be more vertically-integrated designs; like Broadcom, Intel and others can do.”
From an end-user perspective, a multi-vendor capability is desirable, says Booth.
Ayar Labs’ Saleh sees two developing paths.
The first is optical I/O to connect chips in a mesh or as part of memory semantic designs used for high-performance computing and machine learning. Here, the highest bandwidth and lowest power are key design goals.
Ayar Labs has just announced a strategic partnership with high performance computing leader, HPE, to design future silicon photonics solutions for HPE’s Slingshot interconnect that is used for upcoming Exascale supercomputers and also in the data centre.
The second path concerns Ethernet switch chips and here Saleh expects both solutions to co-exist: near package optics will be an interim solution with co-packaged optics dominating longer term. “This will move more slowly as there needs to be interoperability and a wide set of suppliers,” says Saleh.
Booth expects continual design improvements to co-packaged optics. Further out, 2.5D and 3D chip packaging techniques, where silicon is stacked vertically, to be used as part of co-packaged optics designs, he says.
Silicon photonics webinar

Daryl Inniss and I assess how the technology and marketplace has changed since we published our silicon photonics book at the end of 2016. Click here to view the webinar. Ours is the first of a series of webinars that COBO, the Consortium of On-Board Optics, is hosting.
Intel combines optics to its Tofino 2 switch chip

Part 1: Co-packaged Ethernet switch
The advent of co-packaged optics has moved a step closer with Intel’s demonstration of a 12.8-terabit Ethernet switch chip with optical input-output (I/O).
The design couples a Barefoot Tofino 2 switch chip to up to 16 optical ‘tiles’ – each tile, a 1.6-terabit silicon photonics die – for a total I/O of 25.6 terabits.
“It’s an easy upgrade to add our next-generation 25.6-terabit [switch chip] which is coming shortly,” says Ed Doe, Intel’s vice president, connectivity group, general manager, Barefoot division.
Intel acquired switch-chip maker, Barefoot, seven months ago after which it started the co-packaging optics project.
Intel also revealed that it is in the process of qualifying four new optical transceivers – a 400Gbase-DR4, a 200-gigabit FR4, a 100-gigabit FR1 and a 100Gbase-LR4 – to add to its portfolio of 100-gigabit PSM4 and CWDM4 modules.
Urgency
Intel had planned to showcase the working co-packaged switch at the OFC conference and exhibition, held last week in San Diego. But after withdrawing from the show due to the Coronavirus outbreak, Intel has continued to demonstrate the working co-packaged switch at its offices in Santa Clara.

“We have some visionaries of the industry coming through and being very excited, making comments like: ‘This is an important milestone’,” says Hong Hou, corporate vice president, general manager, silicon photonics product division at Intel.
“There are a lot of doubts still [about co-packaged optics], in the reliability, the serviceability, time-to-market, and the right intercept point [when it will be needed]: is it 25-, 51- or 102-terabit switch chips?” says Hou. “But no one says this is not going to happen.”
If the timing for co-packaged optics remains uncertain, why the urgency?
“There has been a lot of doubters as to whether it is possible,” says Doe. “We had to show that this was feasible and more than just a demo.”
Intel has also been accumulating IP from its co-packaging work. Topics include the development of a silicon-photonics ring modulator, ensuring optical stability and signal integrity, 3D packaging, and passive optical alignment. Intel has also developed a fault-tolerant design that adds a spare laser to each tile to ensure continued working should the first laser fail.
“We can diagnose which laser is the source of the problem, and we have a redundant laser for each channel,” says Hou. “So instead of 16 lasers we have 32 functional lasers but, at any one time, only half are used.”
Co-packaged optics
Ethernet switches connected in the data centre currently use pluggable optics. The switch chip resides on a printed circuit board (PCB) and is interfaced to the pluggable modules via electrical traces.
But given that the capacity of the Ethernet switch ICs is doubling every two years, the power consumption of the I/O continues to rise yet the power delivered to a data centre is limited. Accordingly, solutions that ensure a doubling of switch speed but do not increase the power consumed are required.
One option is embedded optics such as the COBO initiative. Here, optics are moved from the switch’s faceplate onto the PCB, closer to the switch chip. This shortens the electrical traces while overcoming the capacity limitations of the number of pluggable modules that can be fitted onto the switch’s faceplate. Freeing up the faceplate by removing pluggables also improves airflow to cool the switch.
The second, more ambitious approach is co-packaged optics where optics are combined with the switch ASIC in the one package.
Co-packaged optics can increase the overall I/O on and off the switch chip, something that embedded optics doesn’t address. And placing the optics next to the ASIC, the drive requirements of the high-speed serialiser-deserialisers (serdes) is simplified.
Meanwhile, pluggable optics continue to advance in the form factors used and their speeds as well as developments such as fly-over cables that lower the loss connecting the switch IC to the front-panel pluggables.
In turn, certain hyperscalers are not convinced about co-packaged optics.
Microsoft and Facebook announced last year the formation of the Co-Packaged Optics (CPO) Collaboration to help guide the industry to develop the elements needed for packaging optics. But Google and Alibaba said at OFC that they prefer the flexibility and ease of maintenance of pluggables.
Data centre trends
The data centre is a key market for Intel which sells high-end server microprocessors, switch ICs, FPGAs and optical transceivers.
Large-scale data centres deploy 100,000 servers, 50,000 switches and over one million optical modules. And a million pluggable modules equate to $150M to $250M of potential revenue, says Intel.

“One item that is understated is the [2:1] ratio of servers to switches,” says Doe. “We have seen a trend in recent years where the layers of switching in data centres have increased significantly.”
One reason for more switching layers is that traffic over-subscription is no longer used. With top-of-rack switches, a 3:1 over-subscription was common which limited the switch’s uplink bandwidth needed.
However, the changing nature of the computational workloads now requires that any server can talk to any other server.
“You can’t afford to have any over-subscription at any layer in the network,” says Doe. “As a result, you need to have a lot more bandwidth: an equal amount of downlink bandwidth to uplink bandwidth.”
Another factor that has increased the data centre’s switch layer count is the replacement of chassis switches with disaggregated pizza boxes. Typically, a chassis switch encompasses three layers of switching.
“Disaggregation is a factor but the big one is the 1:1 [uplink-downlink bandwidth] ratio, not just at the top-of-rack switch but all the way through,” says Doe. “They [the hyperscalers] want to have uniform bandwidth throughout the entire data centre.”
Tofino switch IC
Barefoot has two families of Tofino chips. The first-generation Tofino devices have a switching capacity ranging from 1.2 to 6.4 terabits and are implemented using a 16nm CMOS process. The Tofino 2 devices, implemented using a 7nm CMOS IC, range from 4 terabits to 12.8 terabits.
“What we having coming soon is the Tofino next-generation which will go to both 25 terabits and 51 terabits,” says Doe.
Intel is not discussing future products but Doe hints that both switch ICs will be announced jointly rather than the typical two-year delay between successive generations of switch IC. This also explains the urgency of the company’s co-packaging work.
The 12.8-terabit Tofino 2 chip comprises the switch core dies and four electrical I/O tiles that house the device’s serdes.
“The benefit of the tile design is that it allows us to easily swap the tiles for higher-speed serdes – 112 gigabit-per-second (Gbps) – once they become available,” says Doe. And switching the tiles to optical was already envisaged by Barefoot.
Optical tile
Intel’s 1.6-terabit silicon-photonics tile includes two integrated lasers (active and spare), a ring modulator, an integrated modulator driver, and receiver circuitry. “We also have on-chip a v-groove which allows for passive optical alignment,” says Hou.
Each tile implements the equivalent of four 400GBASE-DR4s. The 500m-reach DR4 comprises four 100-gigabit channels, each sent over single-mode fibre.
“This is a standards-based interface,” says Robert Blum, Intel’s director of strategic marketing and business development, as the switch chip must interact with standard-based optics.
The switch chip and the tiles sit on an interposer. Having an interposer will enable different tiles and different system-on-chips to be used in future.
Hou says that having the laser integrated with the tile saves power. This contrasts with designs where the laser is external to the co-packaged design.
The argument for using an external laser is that it is remote from the switch chip which runs hot. But Hou says that the switch chip itself has efficient thermal management which the tile and its laser(s) can exploit. Each tile consumes 35W, he says.
As for laser reliability, Intel points to its optical modules that it has been selling since 2016 when it started selling the PSM4.
Hou claims Intel’s hybrid laser design, where the gain chip is separated from the cavity, is far more reliable than a III-V facet cavity.
“We have shipped over three million 100-gigabit transceivers, primarily the PSM4. The DPM [defects per million] is 28-30, about two orders of magnitude less than our closest competitor,” says Hou. “Eight out of ten times the cause of the failure of a transceiver is the laser, and nine out of ten times, the laser failure is due to a cavity problem.”
The module’s higher reliability reduces the maintenance needed, and enables data centre operators to offer more stringent service-level agreements, says Hou.
Intel says it will adopt wavelength-division multiplexing (WDM) to enable a 3.2-terabit tile which will be needed with the 51.2-terabit Tofino.

Switch platform
Intel’s 2-rack-unit (2RU) switch platform is a hybrid design: interfaced to the Tofino 2 are four tiles as well as fly-over cables to connect the chip to the front-panel pluggables.
“The hyperscalers are most interested in co-packaging but when you talk to enterprise equipment manufacturers, their customers may not have a fabric as complicated as that of the hyperscalers,” says Hou. “Bringing pluggables in there allows for a transition.”
The interposer design uses vertical plug-in connectors enabling a mix of optical and electrical interfaces “It is pretty easy, at the last minute, to [decide to] bring in 10 optical [interfaces] and six fly-over cables [to connect] to the pluggables,” says Hou.
“This is not like on-board optics,” adds Blum. “This [connector arrangement] is part of the multi-chip package, it doesn’t go through the PCB. It allows us to have [OIF-specified] XSR serdes and get the power savings.”
Intel expects its co-packaged design to deliver a 30 per cent power saving as well as a 25 to 30 per cent cost-saving. And now that it has a working platform, Hou expects more engagements with customers that seeking these benefits and its higher-bandwidth density.
“This can stimulate more discussions and drive an ecosystem formation around this technology,” concludes Hou.
See Part 2: Ranovus outlines its co-packaged optics plans.
Lumentum completes sale of certain datacom lines to CIG
Brandon Collings, CTO of Lumentum, talks CIG, 400ZR and 400ZR+, COBO, co-packaged optics and why silicon photonics is not going to change the world.
Lumentum has completed the sale of part of its datacom product lines to design and manufacturing company, Cambridge Industries Group.

The sale will lower the company's quarterly revenues by between $20 million to $25 million. Lumentum also said that it will stop selling datacom transceivers in the next year to 18 months.
The move highlights how fierce competition and diminishing margins from the sale of client-side modules is causing optical component companies to rethink their strategies.
Lumentum’s focus is now to supply its photonic chips to the module makers, including CIG. “From a value-add point of view, there is a lot more value in selling those chips than the modules,” says Brandon Collings, CTO of Lumentum.
400ZR and ZR+
Lumentum will continue to design and sell line-side coherent optical modules, however.
“With coherent, there is a lot of complexity and challenge in the module’s design and manufacture,” says Collings. “We believe we can extract the value we need to continue in that business.”
The emerging 400ZR and 400ZR+ are examples of such challenging coherent interfaces.
The 400ZR specification, developed by the Optical Internetworking Forum (OIF), is a 400-gigabit coherent interface with an 80km reach. The 400 gigabit-per-second (Gbps) line rate will be achieved using a 64-gigabaud symbol rate and a 16-QAM modulation scheme.
>
“[400ZR] is not client-side. Sixty-four gigabaud is very hard to do in such an extremely compact form factor.
”
Module makers will implement the 400ZR interface using client-side pluggable modules such as the QSFP-DD and the OSFP to enable data centre operators to add coherent interfaces directly to their switches.
But implementing 400ZR will be a challenge. “This is not client-side,” says Collings. “Sixty-four gigabaud is very hard to do in such an extremely compact form factor.”
First samples of 400ZR modules are expected by year-end.
The 400ZR+ interface, while not a specification, is a catch-all for a 400-gigabit coherent that exceeds the 400ZR specification. The 400ZR+ will be a multi-rate design that will support additional line rates of 300, 200 and 100Gbps. Such rates coupled with more advanced forward-error correction (FEC) schemes will enable the 400ZR+ to span much greater distances than 80km.
The 400ZR+ interface helps the developers of next-generation coherent DSP chips to recoup their investment by boosting the overall market their devices can address. “It is basically a way of saying I’m going to spend $50 million developing a coherent DSP, and the 400ZR market alone is not big enough for that investment,” says Collings.
Lumentum says there will be some additional functionality that will be possible to fit into a QSFP-DD such that at least one of the ZR+ modes will be supported. But given the QSFP-DD module’s compactness and power constraints, the ZR+ will also be implemented in the CFP2 form factor that has the headroom needed to fully exploit the coherent DSP’s capabilities to also address metro and regional networks.
400ZR+ modules are expected in volume by the end of 2020 or early 2021.
DSP economics
Lumentum will need to source a coherent DSP for its 400ZR/ ZR+ designs as it does not have its own coherent chip. At the recent OFC show held in San Diego, the talk was of new coherent DSP players entering the marketplace to take advantage of the 400ZR/ZR+ opportunity. Collings says he is aware of five DSP players but did not cite names.
NEL and Inphi are the two established suppliers of merchant coherent DSPs. Lumentum (Oclaro) has partnered with Acacia Communications to use its Meru DSP for Lumentum’s CFP2-DCO design, although it is questionable whether Acacia will license its DSP for 400ZR/ ZR+, at least initially.
>
“God forbid if 10 or more players are doing this as no matter how you slice it, people will be losing [money]”
Lumentum and Oclaro also partnered with Ciena to use its WaveLogic Ai for a long-haul module. That leaves room for at least one more provider of a coherent DSP that could be a new entrant or an established system vendor that will license an internal design.
Collings points out that it makes no sense economically to have more than five players. If it takes $50 million to tape out a 7nm CMOS coherent DSP, the five players will invest a total of $250 million. And if the investment cost for the module, photonics and everything else is a comparable amount, that equates to $500 million being spent on the 400-gigabit coherent generation.
As for the opportunity, Collings talks of about a total of up to 500,000 ports a year by 2020. That equates to an investment return in the first year of $1,000 per device sold. “God forbid if 10 or more players are doing this as no matter how you slice it, people will be losing [money].”
Beyond Pluggables
The evolution of optics beyond pluggables was another topic under discussion at OFC.
The Consortium of On-Board Optics (COBO), the developerof an interoperable optical solution that embeds optics on the line card, had a stand at the show and a demonstration of its technology. In turn, co-packaged optics, the stage after COBO in the evolution of optical interfaces that will integrate the optics with the silicon in one package, is also now also on companies' agenda.
Collings explains that COBO came about because the industry thought on-board optics would be needed given the challenge of 400-gigabit pluggables meeting the interface density needed for 12.8-terabit switches . “I shared that opinion four to five years ago,” he says, adding that Lumentum is a member of COBO.
>
“That problem is real. It is a matter of how far the current engineering can go before it becomes too painful.”
But 400-gigabit optics has been engineered to meet the required faceplate density, including ZR for coherent. As a result, COBO is less applicable. “That need to break the paradigm is a lot less,” he says.
That said, Collings says COBO has driven valuable industry discussion given that the data centre is heading in a direction where 32 ports of 800-gigabit interfaces will be needed to get data in and out of next-generation, 25-terabit switches.
“That problem is real,” says Collings. “It is a matter of how far the current engineering can go before it becomes too painful.” Scaling indefinitely what is done today is not an option, he says.
It is possible with the next generation of switch chip to simply use a two-rack-unit box with twice as many 400-gigabit modules. “That has already been done at the 100-gigabit generation that lasted longer because it doubled up the 100-gigabit port count,” he says.
“In the generation after that, you are now asking for stuff that looks very challenging with today’s technology,” he says. “And that is where co-packaging is focused, the 50-terabit switch generation.” Switches using such capacity silicon are expected in the next four years.
But this is where it gets tricky, as co-packaging not only presents significant technical challenges but also will change the supply chain and business models.
Collings points out that hyperscalars do not like making big pioneering investments in new technology, rather they favour buying commodity hardware. “They don’t like risk, they love competition, and they like a healthy ecosystem,” he says.
“There is a lot of talk from the technology direction of how we can solve this problem [using co-packaged optics] but I think on the business side, the riskside, the investment side is putting a lot of pressure on that actually happening,” says Collings. “Where it ends up I don’t honestly know.”
Silicon photonics
One trend evident at OFC was the growing adoption of silicon photonics by optical component companies.
Indeed, the market research firm, LightCounting, in a research note summarising OFC 2019, sees silicon photonics as a must-have technology given co-packaged optics is now clearly on the industry’s roadmap.
However, Collings stresses that Lumentum’s perspective remains unchanged regarding the technology.
“It’s a fabless exercise so we can participate in silicon photonics and, quite frankly, that is why a lot of other companies are participating because the barrier to entry is quite low,” says Collings. “Nevertheless, we look at silicon photonics as another tool in the toolbox: it has advantages in some areas, some significant disadvantages in others, and in some places, it is simply comparable.”
When looking at a design from a system perspective such as a module, other considerations come into play besides the cost of the silicon photonics chip itself. Collings cites the CFP2 coherent module. While the performance of its receiver is good using silicon photonics, the modulator is questionable. You also need a laser and a semiconductor optical amplifier to compensate for silicon photonics higher loss, he says,
The alternative is to use an indium phosphide-based design and that has its own design issues. “What we are finding when you look at the right level is that the two are the same or indium phosphide has the advantage,” says Collings. “And as we go faster, we are finding silicon is not really keeping up in bandwidth and performance.”
As a result, Lumentum is backing indium phosphide for coherent operating at 64 gigabaud.
“A lot of people are talking about silicon photonics because they can talk about it,” says Collings. “It’s not worthless, don’t get me wrong, but its success outside of Acacia has been niche, and Acacia is top notch at doing this stuff.”
Switch chips not optics set the pace in the data centre
Broadcom is doubling the capacity of its switch silicon every 18-24 months, a considerable achievement given that Moore’s law has slowed down.
Last December, Broadcom announced it was sampling its Tomahawk 3 - the industry’s first 12.8-terabit switch chip - just 14 months after it announced its 6.4-terabit Tomahawk 2.
Rochan SankarSuch product cycle times are proving beyond the optical module makers; if producing next-generation switch silicon is taking up to two years, optics is taking three, says Broadcom.
“Right now, the problem with optics is that they are the laggards,” says Rochan Sankar, senior director of product marketing at switch IC maker, Broadcom. “The switching side is waiting for the optics to be deployable.”
The consequence, says Broadcom, is that in the three years spanning a particular optical module generation, customers have deployed two generations of switches. For example, the 3.2-terabit Tomahawk based switches and the higher-capacity Tomahawk 2 ones both use QSFP28 and SFP28 modules.
In future, a closer alignment in the development cycles of the chip and the optics will be required, argues Broadcom.
Switch chips
Broadcom has three switch chip families, each addressing a particular market. As well as the Tomahawk, Broadcom has the Trident and Jericho families (see table).

All three chips are implemented using a 16nm CMOS process. Source: Broadcom/ Gazettabyte.
“You have enough variance in the requirements such that one architecture spanning them all is non-ideal,” says Sankar.
The Tomahawk is a streamlined architecture for use in large-scale data centres. The device is designed to maximise the switching capacity both in terms of bandwidth-per-dollar and bandwidth-per-Watt.
“The hyperscalers are looking for a minimalist feature set,” says Sankar. They consider the switching network as an underlay, a Layer 3 IP fabric, and they want the functionality required for a highly reliable interconnect for the compute and storage, and nothing more, he says.
Right now, the problem with optics is that they are the laggards
Production of the Tomahawk 3 integrated circuit (IC) is ramping and the device has already been delivered to several webscale players and switch makers, says Broadcom.
The second, Trident family addresses the enterprise and data centres. The chip includes features deliberately stripped from the Tomahawk 3 such as support for Layer 2 tunnelling and advanced policy to enforce enterprise network security. The Trident also has a programmable packet-processing pipeline deemed unnecessary inlarge-scale data centres.
But such features are at the expense of switching capacity. “The Trident tends to be one generation behind the Tomahawk in terms of capacity,” says Sankar. The latest Trident 3 is a 3.2-terabit device.
The third, Jericho family is for the carrier market. The chip includes a packet processor and traffic manager and comes with the accompanying switch fabric IC dubbed Ramon. The two devices can be scaled to create huge capacity IP router systems exceeding 200 terabits of capacity. “The chipset is used in many different parts of the service provider’s backbone and access networks,” says Sankar. The Jericho 2, announced earlier this year, has 10 terabits of capacity.
Trends
Broadcom highlights several trends driving the growing networking needs within the data centre.
One is how microprocessors used within servers continue to incorporate more CPU cores while flash storage is becoming disaggregated. “Now the storage is sitting some distance from the compute resource that needs very low access times,” says Sankar.
The growing popularity of public cloud is also forcing data centre operators to seek greater servers utilisation to ‘pack more tenants per rack’.
There are also applications such as deep learning that use other computing ICs such as graphics processor units (GPUs) and FPGAs. “These push very high bandwidths through the network and the application creates topologies where any element can talk to any element,” says Sankar. This requires a ‘flat’ networking architecture that uses the fewest networking hops to connect the communicating nodes.
Such developments are reflected in the growth in server links to the first level or top-of-rack (TOR) switches, links that have gone from 10 to 25 to 50 and 100 gigabits. “Now you have the first 200-gigabit network interface cards coming out this year,” says Sankar.
Broadcom has been able to deliver 12.8 terabits-per-second in 16nm, whereas some competitors are waiting for 7nm
Broadcom says the TOR switch is not the part of the data centre network experiencing greatest growth. Rather, it is the layers above - the leaf-and-spine switching layers - where bandwidth requirements are accelerating the most. This is because the radix - the switch’s inputs and outputs - is increasing with the use of equal-cost multi-path (ECMP) routing. ECMP is a forwarding technique to distribute the traffic over multiple paths of equal cost to a destination port. “The width of the ECMP can be 4-way, 8-way and 16-way,” says Sankar. “That determines the connectivity to the next layer up.”
It is such multi-layered leaf-spine architectures that the Tomahawk 3 switch silicon addresses.
Tomahawk 3
The Tomahawk 3 is implemented using a 16nm CMOS process and features 256 50-gigabit PAM-4 serialiser-deserialiser (serdes) interfaces to enable the 12.8-terabit throughput.
“Broadcom has been able to deliver 12.8 terabits-per-second in 16nm, whereas some competitors are waiting for 7nm,” says Bob Wheeler, vice president and principal analyst for networking at the Linley Group.
Sankar says Broadcom undertook significant engineering work to move from the 16nm Tomahawk 2’s 25-gigabit non-return-to-zero serdes to a 16nm-based 50G PAM-4 design. The resulting faster serdes design requires only marginally more die area while reducing the gigabit-per-Watt measure by 40 percent.
The Tomahawk 3 also features a streamlined packet-processing pipeline and improved shared buffering. In the past, a switch chip could implement one packet-processing pipeline, says Wheeler. But at 12.8 terabit-per-second (Tbps), the aggregate packet rate exceeds the capacity of a single pipeline. “Broadcom implements multiple ingress and egress pipelines, each connected with multiple port blocks,” says Wheeler. The port blocks include MACs and serdes. “The hard part is connecting the pipelines to a shared buffer, and Broadcom doesn’t disclose details here.”
Source: Broadcom.
The chip also has telemetry support that exposes packet information to allow the data centre operators to see how their networks are performing.
Adopting a new generation of switch silicon also has system benefits.
One is reducing the number of hops between endpoints to achieve a lower latency. Broadcom cites how a 128x100 Gigabit Ethernet (GbE) platform based on a single Tomahawk 3 can replace six 64x100GbE switches in a two-tier arangement. This reduces latency by 60 percent, from 1 microsecond to 400 nanoseconds.
There are also system cost and power consumption benefits. Broadcom uses the example of Facebook’s Backpack modular switch platform. The 8 rack unit (RU) chassis uses two tiers of switches - 12 Tomahawk chips in total. Using the Tomahawk 3, the chassis can be replaced with a 1RU platform, reducing the power consumption by 75 percent and system cost by 85 percent.
Many in the industry have discussed the possibility of using the next 25.6-terabit generation of switch chip in early trials of in-package optics
Aligning timelines
Both the switch-chip vendors and the optical module players are challenged to keep up with the growing networking capacity demands of the data centre. The fact that next-generation optics takes about a year longer than the silicon is not new. It happened with the transition from 40-gigabit QSFP+ to 100-gigabit QSFP28 optical modules and now from the 100-gigabit QSFP28 to 200 gigabit QSFP56 and 400-gigabit QSFP-DD production.
“400-gigabit optical products are currently sampling in the industry in both OSFP and QSFP-DD form factors, but neither has achieved volume production,” says Sankar.
Broadcom is using 400-gigabit modules with its Tomahawk 3 in the lab, and customers are doing the same. However, the hyperscalers are not deploying Tomahawk-3 based data center network designs using 400-gigabit optics. Rather, the switches are using existing QSFP28 interfaces, or in some cases 200-gigabits optics. But 400-gigabit optics will follow.
The consequence of the disparity in the silicon and optics development cycles is that while the data centre players want to exploit the full capacity of the switch once it becomes available, they can’t. This means the data centre upgrades conducted - what Sankar calls ‘mid-life kickers’ - are costlier to implement. In addition, given that most cloud data centres are fibre-constrained, doubling the number of fibres to accommodate the silicon upgrade is physically prohibitive, says Broadcom.
“The operator can't upgrade the network any faster than the optics cadence, leading to a much higher overall total cost of ownership,” says Sankar. They must scale out to compensate for the inability to scale up the optics and the silicon simultaneously.
Optical I/O
Scaling the switch chip - its input-output (I/O) - presents its own system challenges. “The switch-port density is becoming limited by the physical fanout a single chip can support, says Sankar: “You can't keep doubling pins.”
It will be increasingly challenging to increase the input-output (I/O) to 512 or 1024 serdes in future switchchips while satisfying the system link budget, and achieving both in a power-efficient manner. Another reason why aligning the scaling of the optics and the serdes speeds with the switching element is desirable, says Broadcom.
Broadcom says electrical interfaces will certainly scale for its next-generation 25.6-terabit switch chip.
Linley Group’s Wheeler expects the 25.6-terabit switch will be achieved using 256 100-gigabit PAM4 serdes. “That serdes rate will enable 800 Gigabit Ethernet optical modules,” he says. “The OIF is standardising serdes via CEI-112G while the IEEE 802.3 has the 100/200/400G Electrical Interfaces Task Force running in parallel.”
But system designers already acknowledge that new ways to combine the switch silicon and optics are needed.
“One level of optimisation is the serdes interconnect between the switch chip and the optical module itself,” says Sankar, referring to bringing of optics on-board to shorten the electrical paths the serdes must drive. The Consortium of On-Board Optics (COBO) has specified just such an interoperable on-board optics solution.
“The stage after that is to integrate the optics with the IC in a single package,” says Sankar.
Broadcom is not saying which generation of switch chip capacity will require in-package optics. But given the IC roadmap of doubling switch capacity at least every two years, there is an urgency here, says Sankar.
The fact that there are few signs of in-package developments should not be mistaken for inactivity, he says: “People are being very quiet about it.”
Brad Booth, chair of COBO and principal network architect for Microsoft’s Azure Infrastructure, says COBO does not have a view as to when in-package optics will be needed.
Discussions are underway within the IEEE, OIF and COBO on what might be needed for in-package optics and when, says Booth: “One thing that many people do agree upon is that COBO is solving some of the technical problems that will benefit in-package optics such as optical connectivity inside the box.”
The move to in-package optics represents a considerable challenge for the industry.
“The transition and movement to in-package optics will require the industry to answer a lot of new questions that faceplate pluggable just doesn’t handle,” says Booth. “COBO will answer some of these, but in-package optics is not just a technical challenge, it will challenge the business-operating model.”
Booth says demonstrations of in-package optics can already be done with existing technologies. And given the rapid timelines of switch chip development, many in the industry have discussed the possibility of using the next 25.6-terabit generation of switch chip in early trials of in-package optics, he says.
There continues to be strong interest in white-box systems and strong signalling to the market to build white-box platforms
White boxes
While the dominant market for the Tomahawk family is the data centre, a recent development has been the use the 3.2-terabit Tomahawk chip within open-source platforms such as the Telecom Infra Project’s (TIP) Voyager and Cassini packet optical platforms.
Ciena has also announced its own 8180 platform that supports 6.4 terabits of switching capacity, yet Ciena says the 8180 uses a Tomahawk 3, implying the platform will scale to 12.8Tbps.
Niall Robinson,vice president, global business development at ADVA, a member of TIP and the Voyager initiative, makes the point that since the bulk of the traffic remains within the data centre, the packet optical switch capacity and the switch silicon it uses need not be the latest generation IC.
“Eventually, the packet-optical boxes will migrate to these larger switching chips but with some considerable time lag compared to their introduction inside the data centre,” says Robinson.
The advent of 400-gigabit client-port optics will drive the move to higher-capacity platforms such as the Voyager because it is these larger chips that can support 400-gigabit ports. “Perhaps a Jericho 2 at 9.6-terabit is sufficient compared to a Tomahawk 3 at 12.8-terabit,” says Robinson.
Edgecore Networks, the originator of the Cassini platform, says it too is interested in the Tomahawk 3 for its Cassini platform.
“We have a Tomahawk 3 platform that is sampling now,” says Bill Burger, vice president, business development and marketing, North America at Edgecore Networks, referring to a 12.8Tbps open networking switch that supports 32, 400-gigabit QSFP-DD modules that has been contributed to the Open Compute Project (OCP).
Broadcom’s Sankar highlights the work of the OCP and TIP in promoting disaggregated hardware and software. The initiatives have created a forum for open specifications, increased the number of hardware players and therefore competition while reducing platform-development timescales.
“There continues to be strong interest in white-box systems and strong signalling to the market to build white-box platforms,” says Sankar.
The issue, however, is the lack of volume deployments to justify the investment made in disaggregated designs.
“The places in the industry where white boxes have taken off continues to be the hyperscalers, and a handful of hyperscalers at that,” says Sankar. “The industry has yet to take up disaggregated networking hardware at the rate at which it is spreading at least the appearance of demand.”
Sankar is looking for the industry to narrow the choice of white-box solutions available and for the emergence of a consumption model for white boxes beyond just several hyperscalers.
400ZR will signal coherent’s entry into the datacom world
- 400ZR will have a reach of 80km and a target power consumption of 15W
- The coherent interface will be available as a pluggable module that will link data centre switches across sites
- Huawei expects first modules to be available in the first half of 2020
- At OFC, Huawei announced its own 250km 400-gigabit single-wavelength coherent solution that is already being shipped to customers
Coherent optics will finally cross over into datacom with the advent of the 400ZR interface. So claims Maxim Kuschnerov, senior R&D manager at Huawei.
Maxim Kuschnerov400ZR is an interoperable 400-gigabit single-wavelength coherent interface being developed by the Optical Internetworking Forum (OIF).
The 400ZR will be available as a pluggable module and as on-board optics using the COBO specification. The IEEE is also considering a proposal to adopt the 400ZR specification, initially for the data-centre interconnect market. “Once coherent moves from the OIF to the IEEE, its impact in the marketplace will be multiplied,” says Kuschnerov.
But developing a 400ZR pluggable represents a significant challenge for the industry. “Such interoperable coherent 16-QAM modules won’t happen easily,” says Kuschnerov. “Just look at the efforts of the industry to have PAM-4 interoperability, it is a tremendous step up from on-off keying.”
Despite the challenges, 400ZR products are expected by the first half of 2020.
400ZR use cases
The web-scale players want to use the 400ZR coherent interface to link multiple smaller buildings, up to 80km apart, across a metropolitan area to create one large virtual data centre. This is a more practical solution than trying to find a large enough location that is affordable and can be fed sufficient power.
Once coherent moves from the OIF to the IEEE, its impact in the marketplace will be multiplied
Given how servers, switches and pluggables in the data centre are interoperable, the attraction of the 400ZR is obvious, says Kuschnerov: “It would be a major bottleneck if you didn't have [coherent interface] interoperability at this scale.”
Moreover, the advent of the 400ZR interface will signal the start of coherent in datacom. Higher-capacity interfaces are doubling every two years or so due to the webscale players, says Kuschnerov, and with the advent of 800-gigabit and 1.6-terabit interfaces, coherent will be used for ever-shorter distances, from 80km to 40km and even 10km.
At 10km, volumes will be an order of magnitude greater than similar-reach dense wavelength-division multiplexing (DWDM) interfaces for telecom. “Datacom is a totally different experience, and it won’t work if you don’t have a stable supply base,” he says. “We see the ZR as the first step combining coherent technology and the datacom mindset.”
Data centre players will plug 400ZR modules into their switch-router platforms, avoiding the need to interface the switch-router to a modular, scalable DWDM platform used to link data centres.
The 400ZR will also find use in telecom. One use case is backhauling residential traffic over a cable operator’s single spans that tend to be lossy. Here, ZR can be used at 200 gigabits - using 64 gigabaud signalling and QPSK modulation - to extend the reach over the high-loss spans. Similarly, the 400ZR can also be used for 5G mobile backhaul, aggregating multiple 25-gigabit streams.
Another application is for enterprise connectivity over distances greater than 10km. Here, the 400ZR will compete with direct-detect 40km ER4 interfaces.
Having several use cases, not just data-centre interconnect, is vital for the success of the 400ZR. “Extending ZR to access and metro-regional provides the required diversity needed to have more confidence in the business case,” says Kuschnerov.
The 400ZR will support 400 gigabits over a single wavelength with a reach of 80km, while the target power consumption is 15W.
The industry is still undecided as to which pluggable form factor to use for 400ZR. The two candidates are the QSFP-DD and the OSFP. The QSFP-DD provides backward compatibility with the QSFP+ and QSFP28, while the OSFP is a fresh design that is also larger. This simplifies the power management at the expense of module density; 32 OSFPs can fit on a 1-rack-unit faceplate compared to 36 QSFP-DD modules.
The choice of form factor reflects a broader industry debate concerning 400-gigabit interfaces. But 400ZR is a more challenging design than 400-gigabit client-side interfaces in terms of trying to cram optics and the coherent DSP within the two modules while meeting their power envelopes.
The OSFP is specified to support 15W while simulation results published at OFC 2018 suggest that the QSFP-DD will meet the 15W target. Meanwhile, the 15W power consumption will not be an issue for COBO on-board optics, given that the module sits on the line card and differs from pluggables in not being confined within a cage.
Kuschnerov says that even if it proves that only the OSFP of the two pluggables supports 400ZR, the interface will still be a success given that a pluggable module will exist that delivers the required face-plate density.
400G coherent
Huawei announced at OFC 2018 its own single-wavelength 400-gigabit coherent technology for use with its OptiX OSN 9800 optical and packet OTN platform, and it is already being supplied to customers.
The 400-gigabit design supports a variety of baud rates and modulation schemes. For a fixed-grid network, 34 gigabaud signalling enables 100 gigabits using QPSK, and 200 gigabits using 16-QAM, while at 45 gigabaud 200 gigabits using 8-QAM is possible. For flexible-grid networks, 64 gigabaud is used for 200-gigabit transmission using QPSK and 400 gigabits using 16-QAM.
Huawei uses an algorithm called channel-matched shaping to improve optical performance in terms of data transmission and reach. This algorithm includes such techniques as pre-emphasis, faster-than-Nyquist, and Nyquist shaping. According to Kuschnerov, the goal is to squeeze as much capacity out of a network’s physical channel so that advanced coding techniques such as probabilistic constellation shaping can be used to the full. For Huawei’s first 400-gigabit wavelength solution, constellation shaping is not used but this will be added in its upcoming coherent designs.
Huawei has already demonstrated the transmission of 400 gigabits over 250km of fibre. “Current generation 400G-per-lambdas does not enable long-haul or regional transmission so the focus is on shorter reach metro or data-centre-interconnect environments,” says Kuschnerov.
When longer reaches are needed, Huawei can offer two line cards, each supporting 200 gigabits, or a single line card hosting two 200-gigabit modules. The 200-gigabits-per-wavelength is achieved using 64 gigabaud and QPSK modulation, resulting in a 2,500km reach.
Up till now, such long-haul distances have been served using 100-gigabitwavelengths. Now, says Kuschnerov, 200 gigabit at 64 gigabaud is becoming the new norm in many newly built networks while the 34 gigabaud 200 gigabit is being favoured in existing networks based on a 50GHz grid.
Rockley Photonics showcases its in-packaged design at OFC
The packaged design includes Rockley's own 2 billion transistor layer 3 router chip, and its silicon photonics-based optical transceivers. The layer 3 router chip, described as a terabit device, also includes mixed-signal circuits needed for the optical transceevers' transmit and receive paths.
Source: Rockley Photonics (annotated by Gazettabyte).
Rockley says it is using 500m-reach PSM4 transceivers for the design and that while a dozen ribbon cables are shown, this does not mean there are 12 100-gigabit PSM4 transceivers. The company is not saying what the total optical input-output is.
Source: Rockley Photonics (annotated by Gazettabyte).
The company has said it is not looking to enter the marketplace as a switch chip player competing with the likes of Broadcom, Intel, Cavium, Barefoot Networks and Innovium. To develop such a device and remain competitive requires considerable investment and that is not Rockley's focus. Instead, it is using its router chip as a demonstrator to show the marketplace what can be done and that the technology works.
When asked what progress Rockley is making showcasing its technology, its CEO Andrew Rickman said: “It is going very well but nothing we can say publicly."
The switch chip makers continue to use electrical interfaces for their state-of-the-art switches which have a capacity of 12.8 terabits. It still remains to be seen which generation of switch chip will finally adopt in-packaged optics and whether on-board optics designs such as COBO will be adopted first.
For the full interview with CEO Andrew Rickman, click here.
New MSA to enable four-lambda 400-gigabit modules
A new 100-gigabit single-wavelength multi-source agreement (MSA) has been created to provide the industry with 2km and 10km 100-gigabit and 400-gigabit four-wavelength interfaces.
Mark NowellThe MSA is backed by 22 founding companies including Microsoft, Alibaba and Cisco Systems.
The initiative started work two months ago and a draft specification is expected before the year end.
“Twenty-two companies is a very large MSA at this stage, which shows the strong interest in this technology,” says Mark Nowell, distinguished engineer, data centre switching at Cisco Systems and co-chair of the 100G Lambda MSA. “It is clear this is going to be the workhorse technology for the industry for quite a while.”
Phased approach
The 100G Lambda MSA is a phased project. In the first phase, three single-mode fibre optical interfaces will be specified: a 100-gigabit 2km link (100G-FR), a 100-gigabit 10km link (100G-LR), and a 2km 400-gigabit coarse wavelength-division multiplexed (CWDM) design, known as the 400G-FR4. A 10km version of the 400-gigabit CWDM design (400G-LR4) will be developed in the second phase.
For the specifications, the MSA will use work already done by the IEEE that has defined two 100-gigabit-per-wavelength specifications. The IEEE 802.3bs 400 Gigabit Ethernet Task Force has defined a 400-gigabit parallel fibre interface over 500m, referred to as DR4 (400GBASE-DR4). The second, the work of the IEEE 802.3cd 50, 100 and 200 Gigabit Ethernet Task Force, defines the DR (100GBASE-DR), a 100-gigabit single lane specification for 500m.
Twenty-two companies is a very large MSA at this stage, which shows the strong interest in this technology
“The data rate is known, the type of forward-error correction is the same, and we have a starting point with the DR specs - we know what their transmit levels and receive levels are,” says Nowell. The new MSA will need to contend with the extra signal loss to extend the link distances to 2km and 10km.
With the 2km 400G-FR4 specification, not only does the design involve longer distances but also loss introduced using an optical multiplexer and demultiplexer to combine and separate the four wavelengths transmitted over the single-mode fibre.
“It is really a technical problem, one of partitioning the specifications to account for the extra loss of the link channel,” says Nowell.
One way to address the additional loss is to increase the transmitter’s laser power but that raises the design’s overall power consumption. And since the industry continually improves receiver performance - its sensitivity - over time, any decision to raise the transmitter power needs careful consideration. “There is always a trade off,” says Nowell. “You don't want to put too much power on the transmitter because you can’t change that specification.”
The MSA will need to decide whether the transmitter power is increased or is kept the same and then the focus will turn to the receiver technology. “This is where a lot of the hard work occurs,” he says.
Origins
The MSA came about after the IEEE 802.3bs 400 Gigabit Ethernet Task Force defined 2km (400GBASE-FR8) and 10km (400GBASE-LR8 interfaces based on eight 50 gigabit-per-second wavelengths. “There was concern or skepticism that some of the IEEE specification for 2km and 10km at 400 gigabits were going to be the lowest cost,” says Nowell. Issues include fitting eight wavelengths within the modules as well as the cost of eight lasers. Many of the large cloud players wanted a four-wavelength solution and they wanted it specified.
The debate then turned to whether to get the work done within the IEEE or to create an MSA. Given the urgency that the industry wanted such a specification, there was a concern that it might take too long to get the project started and completed using an IEEE framework, so the decision was made to create the MSA.
“The aim is to write these specifications as quickly as we can but with the assumption that the IEEE will pick up the challenge of taking on the same scope,” says Nowell. “So the specs are planned to be written following IEEE methodology.” That way, when the IEEE does address this, it will have work it can reference.
“We are not saying that the MSA spec will go into the IEEE,” says Nowell. “We are just making it so that the IEEE, if they chose, can quickly and easily have a very good starting point.”
Form factors
The MSA specification does not dictate the modules to be used when implementing the 100-gigabit-based wavelength designs. An obvious candidate for the single-wavelength 2km and 10km designs is the SFP-DD. And Nowell says the OSFP and the QSFP-DD pluggable optical modules as well as COBO, the embedded optics specification, will be used to implement 400G-FR4. “From Cisco’s point of view, we believe the QSFP-DD is where it is going to get most of its traction,” says Nowell, who is also co-chair of the QSFP-DD MSA.
Nowell points out that the industry knows how to build systems using the QSFP form factors: how the systems are cooled and how the high-speed tracks are laid down. The development of the QSFP-DD enables the industry to reuse this experience to build new high-density systems.
“And the backward compatibility of the QSFP-DD is massively important,” he says. A QSFP-DD port also supports the QSFP28 and QSFP modules. Nowell says there are customers that buy the latest 100-gigabit switches but use lower-speed 40-gigabit QSFP modules until their network needs 100 gigabits. “We have customers that say they want to do the same thing with 100 and 400 gigabits,” says Nowell. “That is what motivated us to solve that backward-compatibility problem.”
Roadmap
A draft specification of the phase one work will be published by the 22 founding companies this year. Once published, other companies - ‘contributors’ - will join and add their comments and requirements. Further refinement will then be needed before the final MSA specification, expected by mid-2018. Meanwhile, the development of the 10km 400G-LR4 interface will start during the first half of 2018.
The MSA work is focussed on developing the 100-gigabit and 400-gigabit specifications. But Nowell says the work will help set up what comes next after 400 gigabits, whether that is 800 gigabits, one terabit or whatever.
“Once a technology gets widely adopted, you get a lot of maturity around it,” he says. “A lot of knowledge about where and how it can be extended.”
There are now optical module makers building eight-wavelength optical solutions while in the IEEE there are developments to start 100-gigabit electrical interfaces, he says: “There are a lot of pieces out there that are lining up.”
The 22 founding members of the 100G Lambda MSA Group are: Alibaba, Arista Networks, Broadcom, Ciena, Cisco, Finisar, Foxconn Interconnect Technology, Inphi, Intel, Juniper Networks, Lumentum, Luxtera, MACOM, MaxLinear, Microsoft, Molex, NeoPhotonics, Nokia, Oclaro, Semtech, Source Photonics and Sumitomo Electric.
COBO targets year-end to complete specification
Part 3: 400-gigabit on-board optics
- COBO will support 400-gigabit and 800-gigabit interfaces
- Three classes of module have been defined, the largest supporting at least 17.5W
The Consortium for On-board Optics (COBO) is scheduled to complete its module specification this year.
A draft specification defining the mechanical aspects of the embedded optics - the dimensions, connector and electrical interface - is already being reviewed by the consortium’s members.
Brad Booth“The draft specification encompasses what we will do inside the data centre and what will work for the coherent market,” says Brad Booth, chair of COBO and principal network architect for Microsoft’s Azure Infrastructure.
COBO was established in 2015 to create an embedded optics multi-source agreement (MSA). On-board optics have long been available but until now these have been proprietary solutions.
“Our goal [with COBO] was to get past that proprietary aspect,” says Booth. “That is its true value - it can be used for optical backplane or for optical interconnect and now designers will have a standard to build to.”
The draft specification encompasses what we will do inside the data centre and what will work for the coherent market
Specification
The COBO modules are designed to be interchangeable. Unlike front-panel optical modules, the COBO modules are not ‘hot-pluggable’ - they cannot be replaced while the card is powered. But the design allows for COBO modules to be interchanged.
The COBO design supports 400-gigabit multi-mode and single-mode optical interfaces. The electrical interface chosen is the IEEE-defined CDAUI-8, eight lanes each at 50 gigabits implemented using a 25-gigabit symbol rate and 4-level pulse-amplitude modulation (PAM-4). COBO also supports an 800-gigabit interface using two tightly-coupled COBO modules.
The consortium has defined three module categories that vary in length. The module classes reflect the power envelope requirements; the shortest module supports multi-mode and the lower-power module designs while the longest format supports coherent designs. “The beauty of COBO is that the connectors and the connector spacings are the same no matter what length [of module] you use,” says Booth.
The COBO module is described as table-like, a very small printed circuit board that sits on two connectors. One connector is for the high-speed signals and the other for the power and control signals. “You don't have to have the cage [of a pluggable module] to hold it because of the two-structure support,” says Booth.
To be able to interchange classes of module, a ‘keep-out’ area is used. This area refers to board space that is deliberately left empty to ensure the largest COBO module form factor will fit. A module is inserted onto the board by first pushing it downwards and then sliding it along the board to fit the connection.
Booth points out that module failures are typically due to the optical and electrical connections rather than the optics itself. This is why the repeated accuracy of pick-and-place machines are favoured for the module’s insertion. “The thing you want to avoid is having touch points in the field,” he says.
Coherent
A working group was set up after the Consortium first started to investigate using the MSA for coherent interfaces. This work has now been included in the draft specification. “We realised that leaving it [the coherent work] out was going to be a mistake,” says Booth.
The main coherent application envisaged is the 400ZR specification being developed by the Optical Internetworking Forum (OIF).
The OIF 400ZR interface is the result of Microsoft’s own Madison project specification work. Microsoft went to the industry with several module requirements for metro and data centre interconnect applications.
Madison 1.0 was a two-wavelength 100-gigabit module using PAM-4 that resulted in Inphi’s 80km ColorZ module that supports up to 4 terabits over a fibre. Madison 1.5 defines a single-wavelength 100-gigabit module to support 6.4 to 7.2 terabits on a fibre. “Madison 1.5 is probably not going to happen,” says Booth. “We have left it to the industry to see if they want to build it and we have not had anyone come forward yet.”
Madison 2.0 specified a 400-gigabit coherent-based design to support a total capacity of 38.4 terabits - 96 wavelengths of 400 gigabits.
Microsoft initially envisioned a 43 gigabaud 64-QAM module. However, the OIF's 400ZR project has since adopted a 60-gigabaud 16-QAM module which will achieve either 48 wavelengths at 100GHz spacing or 64 wavelengths at 75GHz spacing, capacities of 19.2Tbps and 25.6Tbps, respectively.
In 2017, the number of coherent metro links Microsoft will use will be 10x greater than the number of metro and long-haul coherent links it used in 2016.
Once Microsoft starting talking about Madison 2.0, other large internet content providers came forward saying they had similar requirements which led to the initiative being driven into the OIF. The result is the 400ZR MSA that the large-scale data centre players want to be built by as many module companies as possible.
Booth highlights the difference in Microsoft’s coherent interface volume requirements just in the last year. In 2017, the number of coherent metro links Microsoft will use will be 10x greater than the number of metro and long-haul coherent links it used in 2016.
“Because it is an order of magnitude more, we need to have some level of specification, some level of interop because now we're getting to the point where if I have an issue with any single supplier, I do not want my business impeded by it,” he says.
Regarding the COBO module, Booth stresses that it will be the optical designers that will determine the different coherent specifications possible. Thermal simulation work already shows that the module will support 17.5W and maybe more.
“There is a lot more capability in this module that there is in a standard pluggable only because we don't have the constraint of a cage,” says Booth. “We can always go up in height and we can always add more heat sink.”
Booth says the COBO specification will likely need a couple more members’ reviews before its completion. “Our target is still to have this done by the end of the year,” he says.
Amended on Sept 4th, added comment about the 400ZR wavelength plans and capacity options




