Ayar Labs and Intel add optical input-output to an FPGA

Start-up Ayar Labs, working with Intel, has interfaced its TeraPHY optical chiplet to the chip giant’s Stratix10 FPGA.
Hugo SalehIntel has teamed with several partners in addition to Ayar Labs for its FPGA-based silicon-in-package design, part of the US Defense Advanced Research Projects Agency’s (DARPA) project.
Ayar Labs used the Hot Chips conference, held in Palo Alto, California in August, to detail its first TeraPHY chiplet product and its interface to the high-end FPGA.
Origins
Ayar Labs was established to commercialise research that originated at MIT. The MIT team worked on integrating both photonics and electronics on a single die without changing the CMOS process.
The start-up has developed such building-block optical components in CMOS as a vertical coupler grating and a micro-ring resonator for modulation, while the electronic circuitry can be used to control and stabilise the ring resonator’s operation.
Ayar Labs has also developed an external laser source that provides an external light source that can power up to 256 optical channels, each operating at either 16, 25 or 32 gigabits-per-second (Gbps).
The company has two strategic investors: Intel Capital, the investment arm of Intel, and semiconductor firm GlobalFoundries.
The start-up received $24 million in funding late last year and has used the funding to open a second office in Santa Clara, California, and double its staff to about 40.
Markets
Ayar Labs has identified four markets for its silicon photonics technology.
The first is the military, aerospace and government market segment. Indeed, the Intel FPGA system-in-package is for a phased-array radar application.
Two further markets are high-performance computing and artificial intelligence, and telecommunications and the cloud.
Computer vision and advanced driver assisted systems is the fourth market segment. Here, the start-up’s expertise in silicon photonics is not for optical I/O but a sensor for LIDAR, says Hugo Saleh, Ayar Labs’ vice president of marketing and business development.
Stratix 10 system-in-package
The Intel phased-array radar system-in-package is designed to takes in huge amounts of RF data that is down-converted and digitised using an RF chiplet. The data is then pre-processed on the FPGA and sent optically using Ayar Labs’ TeraPHY chiplets for further processing in the cloud.

“To digitise all that information you need multiple TeraPHY chiplets per FPGA to pull the information back into the cloud,” says Saleh. A single phased-array radar can use as many as 50,000 FPGAs.
Such a radar design can be applied to civilian and to military applications where it can track 10,000s of objects.
Moreover, it is not just FPGAs that the TeraPHY chiplet can be interfaced to.
Large aerospace companies developing flight control systems also develop their own ASICs. “Almost every single aerospace company we have talked to as part of our collaboration with Intel has said they have custom ASICs,” says Saleh. “They want to know how they can procure, package and test the chiplets and bring them to market.”
It is one thing to integrate a chiplet but photonics is tricky
TeraPHY chiplet
Two Intel-developed technologies are used to interface the TeraPHY chiplet to the Stratix 10 FPGA.
The first is Intel’s Advanced Interface Bus (AIB), a parallel electrical interface technology. The second is the Embedded Multi-die Interconnect Bridge (EMIB) which supports the dense I/O needed to interface the main chip, in this case, the FPGA to a chiplet.
EMIB is a sliver of silicon designed to support I/O. The EMIBs are embedded in an organic substrate on which the dies sit; one is for each chiplet-FPGA interface. The EMIB supports various bump pitches to enable dense I/O connections.
Ayar Labs’ first TeraPHY product uses 24 AIB cells for its electrical interface. Each cell supports 20 channels, each operating at 2Gbps. The result is that each cell supports 40Gbps and the overall electrical bandwidth of the chiplet is 960 gigabits.
The TeraPHY’s optical interface uses 10 transmitter-receiver pairs, each pair supporting 8 optical channels that can operate at 16Gbps, 25Gbps or 32Gbps. The result is that the TeraPHY support a total optical bandwidth ranging from 1.28Tbps to 2.56Tbps.
The optical bandwidth is deliberately higher than the electrical bandwidth, says Saleh: “Just because you have ten [transmit/ receive] macros on the die doesn’t mean you have to use all ten.”
Also, the chiplet supports a crossbar switch that allows one-to-many connections such that an electrical channel can be sent out on more than one optical interface and vice versa.
For the Intel FPGA system-in-package, two TeraPHY chiplets are used, each supporting 16Gbps channels such that the chiplet’s total optical I/O is up to 5.12 terabits.
Ramifications
Saleh stresses the achievement in integrating optics in-package: “It is one thing to integrate a chiplet but photonics is tricky.”
Ayar Labs flip-chips its silicon and etches on the backside. “Besides all the hard work that goes into figuring how to do that, and keeping it hermetically sealed, you still have to escape light,” he says. “Escaping light out of the package that is intended to be high volume requires significant engineering work.” This required working very closely with Intel’s packaging department.
Now the challenge is to take the demonstrator chip to volume manufacturing.
Saleh also points to a more fundamental change that will need to take place with the advent of chip designs using optical I/O.
Over many years compute power in the form of advanced microprocessors that incorporate ever more CPU cores has doubled every two years or so. In contrast, I/O has advanced at a much slower pace – 5 or 10 per cent annually.
This has resulted in application software for high-performance computing being written to take this BW-compute disparity into account, reducing the number of memory accesses and minimising I/O transactions.
“Software now has to be architected to take advantage of all this new performance and all this new bandwidth,” he says. “We are going to see tremendous gains in performance because of it.”
Ayar Labs says it is on schedule to deliver its first TeraPHY chiplet product in volume to lead customers by the second half of 2020.
PCI Express back on track with latest specifications
Richard Solomon and Scott Knowlton are waiting for me in the lobby of a well-known Tel-Aviv hotel overseeing the sunlit Mediterranean Sea.
Richard SolomonSolomon, vice chair of the PCI Special Interest Group (PCI-SIG), and Knowlton, its marketing working group co-chair, are visiting Israel to deliver a training event addressing the PCI Express (PCIe) high-speed serial bus standard.
With over 750 member companies, PCI-SIG conducts several training events around the world each year. The locations are chosen where there is a concentration of companies and engineers undertaking PCIe designs. “These are chip, board and systems architects,” says Solomon.
PCI-SIG has hit its stride after a prolonged quiet period. The group completed the PCIe 4.0 standard in 2017, seven years after it launched PCIe 3.0. The PCIe 4.0 doubles the serial bus speed and with the advent of PCIe 5.0, it will double again.
“We were late with PCIe 4.0,” admits Solomon. But with the introduction of the PCIe 5.0 standard in the first quarter of 2019, the serial bus’ speed progression will be back on track. “PCIe 5.0 is where the industry needs it to be.”
The latest training event is addressing the transition to PCIe 5.0. “User implementation stuff; the PHY, controller and verification IP,” says Knowlton. Verification IP refers to the protocols and interfaces needed to verify a PCIe 5.0-enabled chip design.
Markets
PCIe is used in a range of industries. In the cloud, the serial bus is used for servers and storage.
For servers, PCIe has been adopted by general-purpose microprocessors and more specialist devices such as FPGAs, graphics processing units and AI hardware.
The technology is also being used by enterprises, with PCIe switch silicon adopted in data centres to enable server redundancy and failover.
PCIe 5.0 is where the industry needs it to be
PCIe is also being used for storage and in particular solid-state drives (SSDs). That is because PCIe 4.0 transfers data at 16 gigabit-per-second (Gbps) per lane and can be scaled in parallel, typically in a by-four (x4) or a by-16 (x16) lane configuration.
The proportion of the SSDs that use PCIe is expected to grow from a quarter in 2018 to over three quarters in 2022, according to Forward Insights. Meanwhile, IDC forecasts that the SSD market will grow at a compound annual growth rate of 15 percent from 2016 to 2021.
PCIe is also employed within mobile handsets and for the Internet of Things designs. PCI-SIG attributes its adoption for these applications due to its speed and lane-width flexibility as well as its power efficiency.
Source: PCI-SIG
Bus specifications
The PCIe bus uses point-to-point communications. The standard uses a simple duplex scheme - serial transmissions in both directions that is referred to as a lane. The bus can be bundled in a variety of lane configurations - x1, x2, x4, x8, x12, x16 and x32 - although x2, x12 and x32 are rarely, if ever, used in practice.
Scott KnowltonThe first two iterations of PCIe, versions 1.0 and 2.0, delivered 2.5 and 5 gigatransfers-per-second (GT/s) per lane per direction, respectively.
A transfer refers to an encoded bit. The first two PCIe versions use an 8b/10b encoding scheme such that for every ten-bit payload sent, only 8 bits are data. This is why the data transfer rates per lane per direction are 2Gbps and 4Gbps (250 and 500 gigabytes-per-second), respectively (see table).
With PCIe 3.0, the decision was made to increase the transfer rate to 8GT/s per lane based on the assumption that no equalisation would be needed to counter inter-symbol interference at that speed, says Solomon. However, equalisation was needed in the end but that explains why PCIe 3.0 adopted 8GT/s and not 10GT/s.
Another PCIe 3.0 decision was to move to a 128b/130b scheme to reduce the encoding overhead from 20 percent to just over 1 percent. This is why the transfer rate and bit rate are almost equal from the PCIe 3.0 standard onwards (see table).
The recent PCIe 4.0 specification doubles the transfer rate from 8GT/s to 16GT/s while PCIe 5.0 will achieve 32GT/s per lane per direction.
When more than one lane is used, the encoded data is distributed across the lanes. A PCIe controller is used at each end of a lane to make sense of the bits. Meanwhile, a PCIe switch, a separate chip, can be used when fan out is needed to distribute the point-to-point links.
Compliance testing and design issues
Compliance testing of PCIe 4.0 will only occur in the beginning of 2019 even though it was standardised in 2017. Solomon says that this length of time is actually one of PCI-SIG's shorter periods. It takes time to refine the exact electrical testing to be used, he sys, and there is only so much that can be done until the silicon arrives.
Given that there are now 28Gbps and 56Gbps serialiser-deserialiser (serdes) technologies available, why were the PCIe 4.0 and PCIe 5.0 lane speeds not faster? Solomon says the latest PCIe standards were chosen to be multiples of the PCIe 3.0’s 8GT/s lane speed to ensure backward compatibility.
That said, designing systems using PCIe 4.0 and PCIe 5.0 signalling speeds is a challenge. Printed circuit boards need to be multi-layer and used higher-quality materials while retimer ICs are needed to achieve signal distances of 20 inches.
Solomon stresses that not all systems required such signal reaches; the dense electronics being developed for automotives that use AI techniques to make sense of their environment being one such example.
And with that, Solomon apologises and gets up: “I have a session to present”.
SDM and MIMO: An interview with Bell Labs
Part 2: The capacity crunch and the role of SDM
The argument for spatial-division multiplexing (SDM) - the sending of optical signals down parallel fibre paths, whether multiple modes, cores or fibres - is the coming ‘capacity crunch’. The information-carrying capacity limit of fibre, for so long described as limitless, is being approached due to the continual yearly high growth in IP traffic. But if there is a looming capacity crunch, why are we not hearing about it from the world’s leading telcos?
“It depends on who you talk to,” says Peter Winzer, head of the optical transmission systems and networks research department at Bell Labs. The incumbent telcos have relatively low traffic growth - 20 to 30 percent annually. “I believe fully that it is not a problem for them - they have plenty of fibre and very low growth rates,” he says.
Twenty to 30 percent growth rates can only be described as ‘very low’ when you consider that cable operators are experiencing 60 percent year-on-year traffic growth while it is 80 to 100 percent for the web-scale players. “The whole industry is going through a tremendous shift right now,” says Winzer.
In a recent paper, Winzer and colleague Roland Ryf extrapolate wavelength-division multiplexing (WDM) trends, starting with 100-gigabit interfaces that were adopted in 2010. Assuming an annual traffic growth rate of 40 to 60 percent, 400-gigabit interfaces become required in 2013 to 2014, and the authors point out that 400-gigabit transponder deployments started in 2013. Terabit transponders are forecast in 2016 to 2017 while 10 terabit commercial interfaces are expected from 2020 to 2024.
In turn, while WDM system capacities have scaled a hundredfold since the late 1990s, this will not continue. That is because systems are approaching the Non-linear Shannon Limit which estimates the upper limit capacity of fibre at 75 terabit-per-second.
Starting with 10-terabit-capacity systems in 2010 and a 30 to 40 percent core network traffic annual growth rate, the authors forecast that 40 terabit systems will be required shortly. By 2021, 200 terabit systems will be needed - already exceeding one fibre’s capacity - while petabit-capacity systems will be required by 2028.
Even if I’m off by an order or magnitude, and it is 1000, 100-gigabit lines leaving the data centre; there is no way you can do that with a single WDM system
Parallel spatial paths are the only physical multiplexing dimension remaining to expand capacity, argue the authors, explaining Bell Labs’ interest in spatial-division multiplexing for optical networks.
If the telcos do not require SDM-based systems anytime soon, that is not the case for the web-scale data centre operators. They could deploy SDM as soon as 2018 to 2020, says Winzer.
The web-scale players are talking about 400,000-server data centres in the coming three to five years. “Each server will have a 25-gigabit network interface card and if you assume 10 percent of the traffic leaves the data centre, that is 10,000, 100-gigabit lines,” says Winzer. “Even if I’m off by an order or magnitude, and it is 1000, 100-gigabit lines leaving the data centre; there is no way you can do that with a single WDM system.”
SDM and MIMO
SDM can be implemented in several ways. The simplest way to create parallel transmission paths is to bundle several single-mode fibres in a cable. But speciality fibre can also be used, either multi-core or multi-mode.
For the demo, Bell Labs used such a fibre, a coupled 3-core one, but Sebastian Randel, a member of technical staff, said its SDM receiver could also be used with a fibre supporting a few spatial modes. By increasing slightly the diameter of a single-mode fibre, not only is a single mode supported but two second-order modes. “Our signal processing would cope with that fibre as well,” says Winzer.
The signal processing referred to, that restores the multiple transmissions at the receiver, implements multiple input, multiple output or MIMO. MIMO is a well-known signal processing technique used for wireless and digital subscriber line (DSL).
They are garbled up, that is what the rotation is; undoing the rotation is called MIMO
Multi-mode fibre can support as many as 100 spatial modes. “But then you have a really big challenge to excite all 100 spatial modes individually and detect them individually,” says Randel. In turn, the digital signal processing computation required for the 100 modes is tremendous. “We can’t imagine we can get there anytime soon,” says Randel.
Instead, Bell Labs used 60 km of the 3-core coupled fibre for its real-time SDM demo. The transmission distance could have been much longer except the fibre sample was 60 km long. Bell Labs chose the coupled-core fibre for the real-time MIMO demonstration as it is the most demanding case, says Winzer.
The demonstration can be viewed as an extension of coherent detection used for long-distance 100 gigabit optical transmission. In a polarisation-multiplexed, quadrature phase-shift keying (PM-QPSK) system, coupling occurs between the two light polarisations. This is a 2x2 MIMO system, says Winzer, comprising two inputs and two outputs.
For PM-QPSK, one signal is sent on the x-polarisation and the other on the y-polarisation. The signals travel at different speeds while hugely coupling along the fibre, says Winzer: “The coherent receiver with the 2x2 MIMO processing is able to undo that coupling and undo the different speeds because you selectively excite them with unique signals.” This allows both polarisations to be recovered.
With the 3-core coupled fibre, strong coupling arises between the three signals and their individual two polarisations, resulting in a 6x6 MIMO system (six inputs and six outputs). The transmission rotates the six signals arbitrarily while the receiver, using 6x6 MIMO, rotates them back. “They are garbled up, that is what the rotation is; undoing the rotation is called MIMO.”
Demo details
For the demo, Bell Labs generated 12, 2.5-gigabit signals. These signals are modulated onto an optical carrier at 1550nm using three nested lithium niobate modulators. A ‘photonic lantern’ - an SDM multiplexer - couples the three signals orthogonally into the fibre’s three cores.
The photonic lantern comprises three single-mode fibre inputs fed by the three single-mode PM-QPSK transmitters while its output places the fibres closer and closer until the signals overlap. “The lantern combines the fibres to create three tiny spots that couple into a single fibre, either single mode or multi-mode,” says Winzer.
At the receiver, another photonic lantern demultiplexes the three signals which are detected using three integrated coherent receivers.
Don’t do MIMO for MIMO’s sake, do MIMO when it helps to bring the overall integrated system cost down
To implement the MIMO, Bell Labs built a 28-layer printed circuit board which connects the three integrated coherent receiver outputs to 12, 5-gigabit-per-second 10-bit analogue-to-digital converters. The result is an 600 gigabit-per-second aggregate output digital data stream. This huge data stream is fed to a Xilinx Virtex-7 XC7V2000T FPGA using 480 parallel lanes, each at 1.25 gigabit-per-second. It is the FPGA that implements the 6x6 MIMO algorithm in real time.
“Computational complexity is certainly one big limitation and that is why we have chosen a relatively low symbol rate - 2.5 Gbaud, ten times less than commercial systems,” says Randel. “But this helps us fit the [MIMO] equaliser into a single FPGA.”
Future work
With the growth in IP traffic, optical engineers are going to have to use space and wavelengths. “But how are you going to slice the pie?” says Winzer.
With the example of 10,000, 100-gigabit wavelengths, will 100 WDM channels be sent over 100 spatial paths or 10 WDM channels over 1,000 spatial paths? “That is a techno-economic design optimisation,” says Winzer. “In those systems, to get the cost-per-bit down, you need integration.”
That is what the Bell Lab’s engineers are working on: optical integration to reduce the overall spatial-division multiplexing system cost. “Integration will happen first across the transponders and amplifiers; fibre will come last,” says Winzer.
Winzer stresses that MIMO-SDM is not primarily about fibre, a point frequently misunderstood. The point is to enable systems with crosstalk, he says.
“So if some modulator manufacturer can build arrays with crosstalk and sell the modulator at half the price they were able to before, then we have done our job,” says Winzer. “Don’t do MIMO for MIMO’s sake, do MIMO when it helps to bring the overall integrated system cost down.”
Further Information:
Space-division Multiplexing: The Future of Fibre-Optics Communications, click here
For Part 1, click here
Altera’s 30 billion transistor FPGA
- The Stratix 10 features a routing architecture that doubles overall clock speed and core performance
- The programmable family supports the co-packaging of transceiver chips to enable custom FPGAs
- The Stratix 10 family supports up to 5.5 million logic elements
- Enhanced security features stop designs from being copied or tampered with
Altera has detailed its most powerful FPGA family to date. Two variants of the Stratix 10 family have been announced: 10 FPGAs and 10 system-on-chip (SoC) devices that include a quad-core 64-bit architecture Cortex-A53 ARM processor alongside the programmable logic. The ARM processor can be clocked at up to 1.5 GHz.
The Stratix 10 family is implemented using Intel’s 14nm FinFET process and supports up to 5.5 million logic elements. The largest device in Altera’s 20nm Arria family of FPGAs has 1.15 million logic elements, equating to 6.4 billion transistors. “Extrapolating, this gives a figure of some 30 billion transistors for the Stratix 10,” says Craig Davis, senior product marketing manager at Altera.
Altera's HyperFlex routing architecture. Shown (pointed to by the blue arrow) are the HyperFlex registers that sit at the junction of the interconnect traces. Also shown are the adaptive logic module blocks. Source: Altera.
The FPGA family uses a routing fabric, dubbed HyperFlex, to connect the logic blocks. HyperFlex is claimed to double the clock speed compared to designs implemented using Altera’s Stratix V devices, to achieve gigahertz rates. “Having that high level of performance allows us to get to 400 gigabit and one terabit OTN (Optical Transport Network) systems,” says Davies.
The FPGA company detailed the Stratix 10 a week after Intel announced its intention to acquire Altera for US $16.7 billion.
Altera is also introducing with the FPGA family what it refers to as heterogeneous 3D system packaging and integration. The technology enables a designer to customise the FPGA’s transceivers by co-packaging separate transceiver integrated circuits (ICs) alongside the FPGA.
Different line-rate transceivers can be supported to meet a design's requirements: 10, 28 or 56 gigabit-per-second (Gbps), for example. It also allows different protocols such as PCI Express (PCIe), and different modulation formats including optical interfaces. Altera has already demonstrated a prototype FPGA co-packaged with optical interfaces, while Intel is developing silicon photonics technology.
HyperFlex routing
The maximum speed an FPGA design can be clocked is determined by the speed of its logic and the time it takes to move data from one part of the chip to another. Increasingly, it is the routing fabric rather than the logic itself that dictates the total delay, says Davis.
This has led the designers of the Stratix 10 to develop the HyperFlex architecture that adds a register at each junction of the lines interconnecting the logic elements.
Altera first tackled routing delay a decade ago by redesigning the FPGA’s logic building block. Altera went from a 4-input look-up table logic building block to a more powerful 8-input one that includes output registers. Using the more complex logic element - the adaptive logic module (ALM) - simplifies the overall routing. “You are essentially removing one layer of routing from your system,” says Davies.
When an FPGA is programmed, the file is presented that dictates how the wires and hence the device’s logic are connected. The refinement with HyperFlex is that there are now registers at those locations where the switching between the traces occurs. A register can either be bypassed or used.
“It allows us to put the registers anywhere in the design, essentially placing them in an optimum place for a given route across the FPGA,” says Davies. The number of hyper-registers in the device's routing outnumber the standard registers in the ALM blocks by a factor of ten.
Using the registers, designers can introduce data pipelining to reduce overall delay and it is this pipelining, combined with the advanced 14nm CMOS process, that allows a design to run at gigahertz rates.
“We have made the registers small but they add one or two percent to the total die area, but in return it gives us the ability to go to twice the performance,” says Davies. “That is a good trade-off.
The biggest change getting HyperFlex to work has been with the software tools, says Davies. HyperFlex and the associated tools has taken over three years to develop.
“This is a fundamental change,” says Davies. “It [HyperFlex] is relatively simple but it is key; and it is this that allows customers to get to this doubling of core performance.”
The examples cited by Altera certainly suggest significant improvements in speed, density, power dissipation, but I want to see that in real-world designs
Loring Wirbel, The Linley Group
Applications
Altera says that over 100 customer designs have now been processed using the Stratix 10 development tools.
It cites as an example a current 400 gigabit design implemented using a Stratix V FPGA that requires a bus 1024-bits wide, clocked at 390MHz. The wide bus consumes considerable chip area and routing it to avoid congestion is non-trivial.
Porting the design to a Stratix 10 enables the bus to be clocked at 781MHz such that the bus width can be halved to 512 bits. “It reduces congestion, makes it easier to do timing closure and ship the design,” says Davies. “This is why we think Stratix 10 is so important for high-performance applications like OTN and data centres.” Timing closure refers to the tricky part of a design where the engineer may have to iterate to ensure that a design meets all the timing requirements.
For another, data centre design, a Stratix 10 device can replace five Stratix V ICs on one card. The five FPGAs are clocked at 250MHz, run PCIe Gen2 x8 interfaces and DDR3 x72 memory clocked at 800MHz. Overall the power consumed is 120W. Using one Stratix 10 chip clocked at 500MHz, faster PCIe Gen3 x8 can be supported as can a wider DDR3 x144 memory clocked at 1.2GHz, with only 44W consumed.
Loring Wirbel, senior analyst at The Linley Group, says that Altera’s insertion of pipelined registers to cut average trace lengths is unique.
“The more important question is, can the hyper-register topology regularly gain the type of advantages claimed?” says Wirbel. “The examples cited by Altera certainly suggest significant improvements in speed, density, power dissipation, but I want to see that in real-world designs.”
We are also looking at optical transceivers directly connected to the FPGA
Craig Davies, Altera
Connectivity tiles
Altera recognises that future FPGAs will support a variety of transceiver types. Not only are there different line speeds to be supported but also different modulation schemes. “You can’t build one transceiver that fits all of these requirements and even if you could, it would not be an optimised design,” says Davies.
Instead, Altera is exploiting Intel’s embedded multi-die interconnect bridge (EMIB) technology to interface the FPGA and transceivers, dubbed connectivity tiles. The bridge technology is embedded into the chip’s substrate and enables dense interconnect between the core FPGA and the transceiver IC.
Intel claims fewer wafer processing steps are required to make the EMIB compared to other 2.5D interposer processes. An interposer is an electrical design that provides connectivity. “This is a very simple ball-grid sort of interposer, nothing like the Xilinx interposer,” says Wirbel. “But it is lower cost and not intended for the wide range of applications that more advanced interposers use.”
Using this approach, a customer can add to their design the desired interface, including optical interfaces as well as electrical ones. “We are also looking at optical transceivers directly connected to the FPGA,” says Davies.
Wirbel says such links would simplify interfacing to OTN mappers, and data centre designs that use optical links between racks and for the top-of-rack switch.
“Intel wants to see a lot more use of optics directly on the server CPU board, something that the COBO Alliance agrees with in part, and they may steer the on-chip TOSA/ ROSA (transmitter and receiver optical sub-assembly) toward intra-board applications,” he says.
But this is more into the future. “It's fine if Intel wants to pursue those things, but it should not neglect common MSAs for OTN and Ethernet applications of a more traditional sort,” says Wirbel.
The benefit of the system-in-package integration is that different FPGAs can be built without having to create a new expensive mask set each time. “You can build a modular lego-block FPGA and all that it has different is the packaged substrate,” says Davies.
Security and software
Stratix 10 also features security features to protect companies’ intellectual property from being copied or manipulated.
The FPGA features security hardware that protects circuitry from being tampered with; the bitstream that is loaded to configure the FPGA must be decrypted first.
The FPGA is also split into sectors such that parts of the device can have different degrees of security. The sectoring is useful for cloud-computing applications where the FPGA is used as an accelerator to the server host processor. As a result, different customers’ applications can be run in separate sectors of the FPGA to ensure that they are protected from each other.
The security hardware also allows features to be included in a design that the customer can unlock and pay for once needed. For example, a telecom platform could be upgraded to 100 Gigabit while the existing 40 Gig live network traffic runs unaffected in a separate sector.
Altera has upgraded its FPGA software tools in anticipation of the Stratix 10. Features include a hierarchical design flow to simplify the partitioning of a design project across a team of engineers, and the ability to use cloud computing to speed up design compilation time.
What applications will require such advanced FPGAs, and which customers will be willing to pay a premium price for? Wirbel says the top applications will remain communications.
“The emergence of new 400 Gig OTN transport platforms, and the emergence of all kinds of new routers and switches with 400 Gig interfaces, will keep a 40 percent communication base for FPGAs overall solid at Altera,” he says.
Wirbel also expects server accelerator boards where FPGA-based accelerators are used for such applications as financial trading and physics simulation will also be an important market. “But Intel must consider the accelerator board market as an ideal place for Stratix 10 on its own, and not merely as a vehicle for promoting a future Xeon-plus-FPGA hybrid,” he says.
Altera will have engineering samples of the Stratix 10 towards the end of 2015, before being shipped to customers.
Altera optical FPGA in 100 Gigabit Ethernet traffic demo
Altera is demonstrating its optical FPGA at OFC/NFOEC, being held in Los Angeles this week. The FPGA, coupled to parallel optical interfaces, is being used to send and receive 100 Gigabit Ethernet packets of various sizes.
The technology demonstrator comprises an Altera Stratix IV FPGA with 28, 11.3Gbps electrical transceivers coupled to two Avago Technologies' MicroPod optical modules.
"FPGAs are now being used for full system level solutions"
Kevin Cackovic, Altera
The MicroPods - a 12x10Gbps transmitter and a 12x10Gbps optical transceiver - are co-packaged with the FPGA. "All the interconnect between the serdes and the optics are on the package, not on the board," says Steve Sharp, marketing program manager, fiber optic products division at Avago. Such a design benefits signal integrity and power consumption, he says: "It opens up a different world for FPGA users, and for system integration for optic users."
Both Altera and Avago stress that the optical FPGA has been designed deliberately using proven technologies. "We wanted to focus on demonstrating the integration of the optics, not pushing either of the process technologies to the absolute edge," says Sharp.
The nature of FPGA designs has changed in recent years, says Kevin Cackovic, senior strategic marketing manager of Altera's transmission business unit. Many designs no longer use FPGAs solely to interface application-specific standard products to ASICs, or as a co-processor. "FPGAs are now being used for full system level solutions, things like a framer or MAC technology, forward error correction at very high rates, mapper engines, packet processing and traffic management," he says.
Having its FPGAs in such designs has highlighted for Altera current and upcoming system bottlenecks. "This is what is driving our interest in looking at this technology and what is possible integrating the optics into the FPGA," says Cackovic. Applications requiring the higher bandwidth and the greater reach of optical - rack-to-rack rather than chip-to-module - include next-generation video, cloud computing and 3D gaming, he says.
Altera has still to announce its product plans regarding the optical FPGA dsign. Meanwhile Avago says it is looking at higher-speed versions of MicroPod.
"The request for higher line rates is obviously there," says Sharp. "Whether it goes all the way to 28 [Gigabit] or one of the steps in-between, we are not sure yet."
Transport processors now at 100 Gigabit
Cortina Systems has detailed its CS605x family of transport processors that support 100 Gigabit Ethernet and Optical Transport Network (OTN).
The CS6051 transport processor architecture. Source: Cortina Systems
The application-specific standard product (ASSP) family from Cortina Systems is aimed at dense wavelength division multiplexing (DWDM) platforms, packet optical transport systems, carrier Ethernet switch routers and Internet Protocol edge and core routers. The chip family can also be used in data centre top-of-rack Ethernet aggregation switches.
"Our traditional business in OTN has been in the WDM market," says Alex Afshar, product line manager, transport products at Cortina Systems. "What we see now is demand across all those platforms."
ASSP versus FPGA
Until now, equipment makers have used field programmable gate arrays (FPGAs) to implement 100 Gigabit-per-second (Gbps) designs. This is an important sector for FPGA vendors, with Altera and Xilinx making several company acquisitions to bolster their IP offerings to address the high end sector. System vendors have also used FPGA board-based designs from specialist firm TPACK, acquired by Applied Micro in 2010.
The advantage of an FPGA design is that it allows faster entry to market, while supporting relevant standards as they mature. FPGAs also enable equipment makers to use their proprietary intellectual property (IP); for example, advanced forward error correction (FEC) codes, to distinguish their designs.
However, once a market reaches a certain maturity, ASSPs become available. "ASSPs are more efficient in terms of cost, power and integration," says Afshar.
But industry analysts point out that ASSP vendors have a battle on their hands. "In this class of product, there is a lot of customisation and proprietary design and FPGAs are well suited for that," says Jag Bolaria, senior analyst at The Linley Group.
CS605x family
The CS605x extends Cortina's existing CS604x 40Gbps OTN transport processors launched in April 2011. The CS605x devices aggregate 40 Gigabit Ethernet or OTN streams into 100Gbps or map between 100 Gigabit Ethernet and OTN frames. Combining devices from the two families enables 10 and 40 Gigabit OTN/ Ethernet traffic to be aggregated into 100 Gigabit streams.
The CS6051 is the 100 Gigabit family's flagship device. The CS6051 can interface directly to three 40Gbps optical modules, a 100 Gigabit CFP or a 12x10Gbit/s CXP module. The device also supports the Interlaken interface to 120 Gigabit (10x12.5Gbps) to interface to devices such as network processors, traffic managers and FPGAs.
The CS6051 supports several forward error correction (FEC) codes including the standard G.709, a 9.4dB coding gain FEC with only a 7% overhead, and an 'ultra-FEC' whose strength can be varied with overhead, from 7% to 20%.
The CS6053 is similar to the CS6051 but uses a standard G.709 FEC only, aimed at system vendors with their own powerful FECs such as the latest soft-decision FEC. The CS6052 supports Ethernet and OTN mapping but not aggregation while the CS6054 supports Ethernet only. It is the C6054 which is used for top-of-rack switches in the data centre.
The devices consume between 10-12W. Samples of the CS605x family have been available since October 2011 and will be in volume production in the first half of this year.
Further reading:
For a more detailed discussion of the C605x family, click on the article featured in New Electronics
FPGA transceiver speed hikes bring optics to the fore

Despite rapid increases in the transceiver speeds of field-programmable gate arrays (FPGA), the transition to optical has begun.
FPGA vendors Xilinx and Altera have increased their on-chip transceiver speeds fourfold since 2005, from 6.5Gbps to 28Gbps. But signal integrity issues and the rapid decline in reach associated with higher speed means optics is becoming a relevant option.
Altera has unveiled a prototype with two 12x10Gbps optical engines but has yet to reveal its product plans. Xilinx believes that FPGA optical interfaces are still several years off with requirements being met with electrical interfaces for now.
Altera unveils its optical FPGA prototype
Altera has been showcasing a field-programmable gate array (FPGA) chip with optical interfaces. The 'optical FPGA' prototype makes use of parallel optical interfaces from Avago Technologies.
Combining the FPGA with optics extends the reach of the chip's transceivers to up to 100m. Such a device, once commercially available, will be used to connect high-speed electronics on a line card without requiring exotic printed circuit board (PCB) materials. An optical FPGA will also be used to link equipment such as Ethernet switches in the data centre.
"It is solving a problem the industry is going to face," says Craig Davis, product marketing manager at Altera. "As you go to faster bit-rate transceivers, the losses on the PCB become huge."
What has been done
Altera's optical FPGA technology demonstrator combines a large FPGA - a Stratix IV EP4S100G5 - to two Avago 'MicroPod' 12x10.3 Gigabit-per-second (Gbps) optical engines.
Avago's MicroPod 12x10Gbps optical engine deviceThe FPGA used has 28, 11.3Gbps electrical transceivers and in the optical FPGA implementation, 12 of the interfaces connect to the two MicroPods, a transmitter optical sub-assembly (TOSA) and a receiver optical sub-assembly (ROSA).
The MicroPod measures 8x8mm and uses 850nm VCSELs. The two optical engines interface to a MTP connector and consume 2-3W. Each MicroPod sits in a housing - a land grid array compression socket - that is integrated as part of the FPGA package.
"The reason we are doing it [the demonstrator] with a 10 Gig FPGA and 10 Gig transceivers is that they are known, good technologies," says Davis. "It is a production GT part and known Avago optics."
Why it matters
FPGAs, with their huge digital logic resources and multiple high-speed electrical interfaces, are playing an increasingly important role in telecom and datacom equipment as the cost to develop application-specific standard product (ASSP) devices continues to rise.
The 40nm-CMOS Stratix IV FPGA family have up to 32, 11.3Gbps transceivers, while Altera's latest 28nm Stratix V FPGAs support up to 66x14.1Gbps transceivers, or 4x28Gbps and 32x12.5Gbps electrical transceivers on-chip.
Altera's FPGAs can implement the 10GBASE-KR backplane standard at spans of up to 40 inches. "You have got the distances on the line card, the two end connectors and whatever the distances are across a 19-inch rack," says Davis. Moving to 28Gbps transceivers, the distance is reduced significantly to several inches only. To counter such losses expensive PCBs must be used.
One way to solve this problem is to go optical, says Davis. Adding 12-channel 10Gbps optical engines means that the reach of the FPGAs is up to 100m, simplifying PCB design and reducing cost while enabling racks and systems to be linked.
The multimode fibre connector to the MicroPod
Developing an optical FPGA prototype highlights that chip vendors already recognise the role optical interfaces will play.
It is also good news for optical component players as the chip market promises a future with orders of magnitude greater volumes than the traditional telecom market.
The optical FPGA is one target market for silicon photonics players. One, Luxtera, has already demonstrated its technology operating at 28Gbps.
What next
Altera stresses that this is a technology demonstrator only.
The company has not made any announcements regarding when its first optical FPGA product will be launched, and whether the optical technology will enter the market interfacing to its FPGAs' 11.3Gbps, 14.1Gbps or highest-speed 28Gbps transceivers.
The undersideof the FPGA, showing the 1,932-pin ball grid array
Intelligent networking: Q&A with Alcatel-Lucent's CTO
Alcatel-Lucent's corporate CTO, Marcus Weldon, in a Q&A with Gazettabyte. Here, in Part 1, he talks about the future of the network, why developing in-house ASICs is important and why Bell Labs is researching quantum computing.
Marcus Weldon (left) with Jonathan Segel, executive director in the corporate CTO Group, holding the lightRadio cube. Photo: Denise Panyik-Dale
Q: The last decade has seen the emergence of Asian Pacific players. In Asia, engineers’ wages are lower while the scale of R&D there is hugely impressive. How is Alcatel-Lucent, active across a broad range of telecom segments, ensuring it remains competitive?
A: Obviously we have a Chinese presence ourselves and also in India. It varies by division but probably half of our workforce in R&D is in what you would consider a low-cost country. We are already heavily present in those areas and that speaks to the wage issue.
But we have decided to use the best global talent. This has been a trait of Bell Labs in particular but also of the company. We believe one of our strengths is the global nature of our R&D. We have educational disciplines from different countries, and different expertise and engineering foci etc. Some of the Eastern European nations are very strong in maths, engineering and device design. So if you combine the best of those with the entrepreneurship of the US, you end up with a very strong mix of an R&D population that allows for the greatest degree of innovation.
We have no intention to go further towards a low-cost country model. There was a tendency for that a couple of years ago but we have pulled back as we found that we were losing our innovation potential.
We are happy with the mix we have even though the average salary is higher as a result. And if you take government subsidies into account in European nations, you can get almost the same rate for a European engineer as for a Chinese engineer, as far as Alcatel-Lucent is concerned.
One more thing, Chinese university students, interestingly, work so hard up to getting into university that university is a period where they actually slack off. There are several articles in the media about this. The four years that students spend in university, away from home for the first time, they tend to relax.
Chinese companies were complaining that the quality of engineers out of university was ever decreasing because of what was essentially a slacker generation, they were arguing, of overworked high-school students that relaxed at college. Chinese companies found that they had to retrain these people once employed to bring them to the level needed.
So that is another small effect which you could argue is a benefit of not being in China for some of our R&D.
Alcatel-Lucent's Bell Labs: Can you spotlight noteworthy examples of research work being done?
Certainly the lightRadio cube stuff is pure Bell Labs. The adaptive antenna array design, to give you an example, was done between the US - Bell Labs' Murray Hill - and Stuttgart, so two non-Asian sites at Bell Labs involved in the innovations. These are wideband designs that can operate at any frequencies and are technology agnostic so they can operate for GSM, 3G and LTE (Long Term Evolution).
"We believe that next-generation network intelligence, 10-15 years from now, might rely on quantum computing"
The designs can also form beams so you can be very power-efficient. Power efficiency in the antenna is great as you want to put the power where it is needed and not just have omni (directional) as the default power distribution. You want to form beams where capacity is needed.
That is clearly a big part of what Bell Labs has been focussing on in the wireless domain as well as all the overlaying technologies that allow you to do beam-forming. The power amplifier efficiency, that is another way you lose power and you operate at a more costly operational expense. The magic inside that is another focus of Bell Labs on wireless.
In optics, it is moving from 100 Gig to 400 Gig coherent. We are one of the early innovators in 100 Gig coherent and we are now moving forward to higher-order modulation and 400 Gig.
On the DSL side it the vectoring/ crosstalk cancellation work where we have developed our own ASIC because the market could not meet the need we had. The algorithms ended up producing a component that will be in the first release of our products to maintain a market advantage.
We do see a need for some specialised devices like the FlexPath FP3 network processor, the IPTV product, the OTN (Optical Transport Network) switch that is at the heart of our optical products is our own ASIC, and the vectoring/ crosstalk cancellation engine in our DSL products. Those are the innovations Bell Labs comes up with and very often they lead to our portfolio innovations.
There is also a lot of novel stuff like quantum computing that is on the fringes of what people think telecoms is going to leverage but we are still active in some of those forward-looking disciplines.
We have quite a few researchers working on quantum computing, leveraging some of the material expertise that we have to fabricate novel designs in our lab and then create little quantum computing structures.
Why would quantum computing be useful in telecom?
It is very good for parsing and pattern matching. So when you are doing complex searches or analyses, then quantum computing comes to the fore.
We do believe there will be processing that will benefit from quantum computing constructs to make decisions in ever-increasingly intelligent networks. Quantum computing has certain advantages in terms of its ability to recognise complex states and do complex calculations. We believe that next-generation network intelligence, 10-15 years from now, might rely on quantum computing.
We don't have a clear application in mind other than we believe it is a very important space that we need to be pioneering.
"Operators realise that their real-estate resource - including down to the central office - is not the burden that it appeared to be a couple of years ago but a tremendous asset"
You wrote a recent blog on the future of the network. You mentioned the idea of the emergence of one network with the melding of wireless and wireline, and that this will halve the total cost of ownership. This is impressive but is it enough?
The half number relates to the lightRadio architecture. There are many ingredients in it. The most notable is that traffic growth is accounted for in that halving of the total cost of ownership. We calculated what the likely traffic demand would be going forward: a 30-fold increase in five years.
Based on that growth, when we computed how much the lightRadio architecture, involving the adaptive antenna arrays, small cells and the move to LTE, if you combine these things and map it into traffic demand, the number comes up that you can build the network for that traffic demand and with those new technologies and still halve the total cost of ownership.
It really is quite a bit more aggressive than it appears because it is taking account of a very significant growth in traffic.
Can we build that network and still lower the cost? The answer is yes.
You also say that intelligence will be increasingly distributed in the network, taking advantage of Moore's Law. This raises two questions. First, when does it make sense to make your own ASICs?
When I say ASICs I include FPGAs. FPGAs are your own design just on programmable silicon and normally you evolve that to an ASIC design once you get to the right volumes.
There is a thing called an NRE (non-recurring engineering) cost, a non-refundable engineering cost to product an ASIC in a fab. So you have to have a certain volume that makes it worthwhile to produce that ASIC, rather than keeping it in an FPGA which is a more expensive component because it is programmable and has excess logic. On the other hand, there is economics that says an FPGA is the right way for sub-10,000 volumes per annum whereas for millions of parts you would do an ASIC.
We work on both those types of designs. And generally, and I think even Huawei would agree with us, a lot of the early innovation is done in FPGAs because you are still playing with the feature set.
Photo: Denise Panyik-Dale
Often there is no standard at that point, there may be preliminary work that is ongoing, so you do the initial innovation pre-standard using FPGAs. You use a DSP or FPGA that can implement a brand new function that no one has thought of, and that is what Bell Labs will do. Then, as it starts becoming of interest to the standard bodies, you have it implemented in a way that tries to follow what the standard will be, and you stay in a FPGA for that process. At some point later, you take a bet that the functionality is fixed and the volume will be high enough, and you move to an ASIC.
So it is fairly commonplace for novel technology to be implemented by the [system] vendors. And only in the end stage when it has become commoditised to move to commercial silicon, meaning a Broadcom or a Marvell.
Also around the novel components we produce there are a whole host of commercial silicon components from Texas Instruments, Broadcom, Marvell, Vitesse and all those others. So we focus on the components where the magic is, where innovation is still high and where you can't produce the same performance from a commercial part. That is where we produce our own FPGAs and ASICs.
Is this trend becoming more prevalent? And if so, is it because of the increasing distribution of intelligence in network.
I think it is but only partly because of intelligence. The other part is speed. We are reaching the real edges of processing speed and generally the commercial parts are not at that nanometer of [CMOS process] technology that can keep up.
To give an example, our FlexPath processor for the router product we have is on 40nm technology. Generally ASICs are a technology generation behind FPGAs. To get the power footprint and the packet-processing performance we need, you can't do that with commercial components. You can do it in a very high-end FPGA but those devices are generally very expensive because they have extremely low yields. They can cost hundreds or thousands of dollars.
The tendency is to use FPGAs for the initial design but very quickly move to an ASIC because those [FGPA] parts are so rare and expensive; nor do they have the power footprint that you want. So if you are running at very high speeds - 100Gbps, 400Gbps - you run very hot, it is a very costly part and you quickly move to an ASIC.
Because of intelligence [in the network] we need to be making our own parts but again you can implement intelligence in FPGAs. The drive to ASICs is due to power footprint, performance at very high speeds and to some extent protection of intellectual property.
FPGAs can be reverse-engineered so there is some trend to use ASICs to protect against loss of intellectual property to less salubrious members of the industry.
Second, how will intelligence impact the photonic layer in particular?
You have all these dimensions you can trade off each other. There are things like flexible bit-rate optics, flexible modulation schemes to accommodate that, there is the intelligence of soft-decision FEC (forward error correction) where you are squeezing more out of a channel but not just making it a hard-decision FEC - is it a '0' or a '1' but giving a hint to the decoder as to whether it is likely to be a '0' or a '1'. And that improves your signal-to-noise ratio which allows you to go further with a given optics.
So you have several intelligent elements that you are going to co-ordinate to have an adaptive optical layer.
I do think that is the largest area.
Another area is smart or next-generation ROADMs - we call it connectionless, contentionless, and directionless.
There is a sense that as you start distributing resources in the network - cacheing resources and computing resources - there will be far more meshing in the metro network. There will be a need to route traffic optically to locally positioned resources - highly distributed data centre resources - and so there will be more photonic switching of traffic. Think of it as photonic offload to a local resource.
We are increasingly seeing operators realise that their real-estate resource - including down to the central office - is not the burden that it appeared to be a couple of years ago but a tremendous asset if you want to operate a private cloud infrastructure and offer it as a service, as you are closer to the user with lower latency and more guaranteed performance.
So if you think about that infrastructure, with highly distributed processing resources and offloading that at the photonic layer, essentially you can easily recognise that traffic needs to go to that location. You can argue that there will be more photonic switching at the edge because you don't need to route that traffic, it is going to one destination only.
This is an extension of the whole idea of converged backbone architecture we have, with interworking between the IP and optical domains, you don't route traffic that you don't need to route. If you know it is going to a peering point, you can keep that traffic in the optical domain and not send it up through the routing core and have it constantly routed when you know from the start where it is going.
So as you distribute computing and cacheing resources, you would offload in the optical layer rather than attempt to packet process everything.
There are smarts at that level too - photonic switching - as well as the intelligent photonic layer.
For the second part of the Q&A, click here
