The key elements of NFV usage: A guide
Orchestration, service assurance, service fulfilment, automation and closed-loop automation. These are important concepts associated with network functions virtualisation (NFV) technology being adopted by telecom operators as they transition their networks to become software-driven and cloud-based.
Prayson Pate (pictured), CTO of the Ensemble division at ADVA Optical Networking, explains the technologies and their role and gives each a status update.
Orchestration
Network functions virtualisation (NFV) is based on the idea of replacing physical appliances - telecom boxes - with software running on servers performing the same networking role.
Using NFV speeds up service development and deployment while reducing equipment and operational costs.
It also allows operators to work with multiple vendors rather than be dependent on a single vendor providing the platform and associated custom software.
Operators want to adopt software-based virtual network functions (VNFs) running on standard servers, storage and networking, referred to as NFV infrastructure (NFVI).
In such an NFV world, the term orchestration refers to the control and management of virtualised services, composed of virtual network functions and executed on the NFV infrastructure.
The use of virtualised services has created the need for a new orchestration layer that sits between the existing operations support system-billing support system (OSS-BSS) and the NFV infrastructure (see diagram below). This orchestration layer performs the following tasks:
- Manages the catalogue of virtual network functions developed by vendors and by the open-source communities.
- Translates incoming service requests to create the virtualised implementation using the underlying infrastructure.
- Links the virtual network functions as required to create a service, referred to as a service chain. This service chain may be on one server or it may be distributed across the network.
- Performs the various management tasks for the virtual network functions: setting them up, scaling them up and down, updating and upgrading them, and terminating them - the ‘lifecycle management’ of virtual network functions. The orchestrator also ensures their resiliency.
The ETSI standards body, the NFV Industry Specification Group (ETSI NFV ISG), leads the industry effort to define the architecture for NFV, including orchestration.
Several companies are providing proprietary and pre-standard NFV orchestration solutions, including ADVA Optical Networking, Amdocs, Ciena, Ericsson, IBM, Netcracker and others. In addition, there are open source initiatives such as the Linux Foundation Networking Fund’s Open Network Automation Platform (ONAP), ETSI NFV ISG’s Open Source MANO (OSM) and OpenStack’s Tacker initiative.
Source: ETSI GS NFV 002
Service assurance
Service providers promise that their service will meet a certain level of performance defined in the service level agreement (SLA).
Service assurance refers to the measurement of parameters such as packet loss and latency associated with a service; parameters which are compared against the SLA. Service assurance also remedies any SLA shortfalls. More sophisticated parameters can also be measured such as privacy and responses to distributed denial-of-service (DDOS) attacks.
NFV enables telcos to create and launch services more quickly and economically. But an end customer only cares about the service, not the underlying technology. Customers will not stand for a less reliable service or a service with inferior performance just because it is implemented as a virtual function on a server.
Service assurance is not a new concept, but the nature of a virtualised implementation means a new approach is required. No longer is there a one-to-one association between services and network elements, so the linkages between services, the building-block virtual network functions, and the underlying virtual infrastructure need to be understood. Just as the services are virtualised, so the service assurance process needs virtualised components such as virtual probes and test heads.
The telcos’ operations groups are concerned about how to deploy and support virtualised services. Innovations in service assurance will make their job easier and enable them to do what they could not do before.
EXFO, Ixia, Spirent, and Viavi supply virtual probes and test heads. These may be used for initial service verification, ongoing monitoring, and active troubleshooting. Active troubleshooting is a powerful concept as it enables an operator to diagnose issues without dispatching a technician and equipment.
Service fulfilment
Service fulfilment refers to the configuration and delivery of a service to a customer at one or more locations.
Service fulfilment is essential for an operator because it is how orders are turned into revenue. The more quickly and accurately a service is fulfilled, the sooner the operator gets paid. Prompt fulfilment also leads to greater customer satisfaction and reduced churn.
Early-adopter operators see NFV as a way to improve service fulfilment. Verizon is using its NFV-based service offering to speed up service fulfilment. When a customer orders a service, Verizon instructs the manufacturer to ship a server to the customer. Once connected and powered at the customer’s site, the server calls home and is configured. Combined with optional LTE, a customer can get a service on demand without waiting for a technician. This significantly improves the traditional model where a customer may wait weeks before being able to use the telco’s service.
Network automation
Network automation uses machines instead of trained staff to operate the network. For NFV, the automated software tasks include configuration, operation and monitoring of network elements.
The benefits of network automation include speed and accuracy of service fulfilment - humans can err - along with reduced operational costs.
Telcos have been using network automation for high-volume services and to manage complexity. That said, many operators include manual steps in their process. Such a hands-on approach doesn’t work with cloud technologies such as NFV. Cloud customers can acquire, deploy and operate services without any manual interaction from the webscale players. Likewise, NFV must be automated if telecom operators are to benefit from its potential.
Network automation is closely tied to orchestration. Commercial suppliers and open-source groups are working to ensure that service orders flow automatically from high-level systems down to implementation, dubbed flow-through provisioning and that ‘zero-touch’ provisioning that removes all manual steps becomes a reality. But for this to happen, open and standard interfaces are needed.
Closed-loop automation
Closed-loop automation adds a feedback loop to network automation. The feedback enables the automation to take into account changing network conditions such as loading and network failures, as well as dynamic service demands such as bandwidth changes or services wanted by users.
Closed-loop automation compares the network’s state against rules and policies, replacing what were previously staff decisions. These systems are sometimes referred to as intent-based, as they focus on the desired intent or result rather than on the inputs to the network controls.
Service providers are also investigating adding artificial intelligence and machine learning to closed loop automation. Artificial intelligence and machine learning can replace the hard-coded rules with adaptive and dynamic pattern recognition, allowing anomalies to be detected, adapted to, and even predicted.
Closed-loop automation offloads operational teams not only from manual control but also from manual management processes. Human decisions and planning are replaced by policy-driven control, while human reasoning is replaced by artificial intelligence and machine learning algorithms.
Policy systems or ‘engines’ have existed for a while for functions such as network and file access, but these engines were not closed-loop; there was no feedback. These policy concepts have now been updated to include desired network state, such that a feedback loop is needed to compare the current status with the desired one.
A closed-loop automation system makes dynamic changes to ensure a targeted operational state is reached even when network or service conditions change. This approach enables service providers to match capacity with demand, solve traffic management and network quality issues, and manage 5G and Internet of Things upgrades.
Closed loop automation is complex. Employing artificial intelligence and machine learning will require interfaces to be defined that allow network data into the intelligent systems and enable the outputs to be used.
Several suppliers have announced products supporting closed-loop automation or intent-based networking, including Apstra, Cisco Systems, Forward Networks, Juniper Networks, Nokia, and Veriflow Systems. In addition, the open source ONAP project is also pursuing work in this area.
Will white boxes predominate in telecom networks?
Will future operator networks be built using software, servers and white boxes or will traditional systems vendors with years of network integration and differentiation expertise continue to be needed?
AT&T’s announcement that it will deploy 60,000 white boxes as part of its rollout of 5G in the U.S. is a clear move to break away from the operator pack.
The service provider has long championed network transformation, moving from proprietary hardware and software to a software-controlled network based on virtual network functions running on servers and software-defined networking (SDN) for the control switches and routers.
Glenn WellbrockNow, AT&T is going a stage further by embracing open hardware platforms - white boxes - to replace traditional telecom hardware used for data-path tasks that are beyond the capabilities of software on servers.
For the 5G deployment, AT&T will, over several years, replace traditional routers at cell and tower sites with white boxes, built using open standards and merchant silicon.
“White box represents a radical realignment of the traditional service provider model,” says Andre Fuetsch, chief technology officer and president, AT&T Labs. “We’re no longer constrained by the capabilities of proprietary silicon and feature roadmaps of traditional vendors.”
But other operators have reservations about white boxes. “We are all for open source and open [platforms],” says Glenn Wellbrock, director, optical transport network - architecture, design and planning at Verizon. “But it can’t just be open, it has to be open and standardised.”
Wellbrock also highlights the challenge of managing networks built using white boxes from multiple vendors. Who will be responsible for their integration or if a fault occurs? These are concerns SK Telecom has expressed regarding the virtualisation of the radio access network (RAN), as reported by Light Reading.
“These are the things we need to resolve in order to make this valuable to the industry,” says Wellbrock. “And if we don’t, why are we spending so much time and effort on this?”
Gilles Garcia, communications business lead director at programmable device company, Xilinx, says the systems vendors and operators he talks to still seek functionalities that today’s white boxes cannot deliver. “That’s because there are no off-the-shelf chips doing it all,” says Garcia.
We’re no longer constrained by the capabilities of proprietary silicon and feature roadmaps of traditional vendors
White boxes
AT&T defines a white box as an open hardware platform that is not made by an original equipment manufacturer (OEM).
A white box is a sparse design, built using commercial off-the-shelf hardware and merchant silicon, typically a fast router or switch chip, on which runs an operating system. The platform usually takes the form of a pizza box which can be stacked for scaling, while application programming interfaces (APIs) are used for software to control and manage the platform.
As AT&T’s Fuetsch explains, white boxes deliver several advantages. By using open hardware specifications for white boxes, they can be made by a wider community of manufacturers, shortening hardware design cycles. And using open-source software to run on such platforms ensures rapid software upgrades.
Disaggregation can also be part of an open hardware design. Here, different elements are combined to build the system. The elements may come from a single vendor such that the platform allows the operator to mix and match the functions needed. But the full potential of disaggregation comes from an open system that can be built using elements from different vendors. This promises cost reductions but requires integration, and operators do not want the responsibility and cost of both integrating the elements to build an open system and integrating the many systems from various vendors.
Meanwhile, in AT&T’s case, it plans to orchestrate its white boxes using the Open Networking Automation Platform (ONAP) - the ‘operating system’ for its entire network made up of millions of lines of code.
ONAP is an open software initiative, managed by The Linux Foundation, that was created by merging a large portion of AT&T’s original ECOMP software developed to power its software-defined network and the OPEN-Orchestrator (OPEN-O) project, set up by several companies including China Mobile and China Telecom.
AT&T has also launched several initiatives to spur white-box adoption. One is an open operating system for white boxes, known as the dedicated network operator system (dNOS). This too will be passed to The Linux Foundation.
The operator is also a key driver of the open-based reconfigurable optical add/ drop multiplexer multi-source agreement, the OpenROADM MSA. Recently, the operator announced it will roll out OpenROADM hardware across its network. AT&T has also unveiled the Akraino open source project, again under the auspices of the Linux Foundation, to develop edge computing-based infrastructure.
At the recent OFC show, AT&T said it would limit its white box deployments in 2018 as issues are still to be resolved but that come 2019, white boxes will form its main platform deployments.
Xilinx highlights how certain data intensive tasks - in-line security, performed on a per-flow basis, routing exceptions, telemetry data, and deep packet inspection - are beyond the capabilities of white boxes. “White boxes will have their place in the network but there will be a requirement, somewhere else in the network for something else, to do what the white boxes are missing,” says Garcia.
Transport has been so bare-bones for so long, there isn’t room to get that kind of cost reduction
AT&T also said at OFC that it expects considerable capital expenditure cost savings - as much as a halving - using white boxes and talked about adopting in future reverse auctioning each quarter to buy its equipment.
Niall Robinson, vice president, global business development at ADVA Optical Networking, questions where such cost savings will come from: “Transport has been so bare-bones for so long, there isn’t room to get that kind of cost reduction. He also says that there are markets that already use reverse auctioning but typically it is for items such as components. “For a carrier the size of AT&T to be talking about that, that is a big shift,” says Robinson.
Layer optimisation
Verizon’s Wellbrock first aired reservations about open hardware at Lightwave’s Open Optical Conference last November.
In his talk, Wellbrock detailed the complexity of Verizon’s wide area network (WAN) that encompasses several network layers. At layer-0 are the optical line systems - terminal and transmission equipment - onto which the various layers are added: layer-1 Optical Transport Network (OTN), layer-2 Ethernet and layer-2.5 Multiprotocol Label Switching (MPLS). According to Verizon, the WAN takes years to design and a decade to fully exploit the fibre.
“You get a significant saving - total cost of ownership - from combining the layers,” says Wellbrock. “By collapsing those functions into one platform, there is a very real saving.” But there is a tradeoff: encapsulating the various layers’ functions into one box makes it more complex.
“The way to get round that complexity is going to a Cisco, a Ciena, or a Fujitsu and saying: ‘Please help us with this problem’,” says Wellbrock. “We will buy all these individual piece-parts from you but you have got to help us build this very complex, dynamic network and make it work for a decade.”
Next-generation metro
Verizon has over 4,000 nodes in its network, each one deploying at least one ROADM - a Coriant 7100 packet optical transport system or a Fujitsu Flashwave 9500. Certain nodes employ more than one ROADM; once one is filled, a second is added.
“Verizon was the first to take advantage of ROADMs and we have grown that network to a very large scale,” says Wellbrock.
The operator is now upgrading the nodes using more sophiticated ROADMs, as part of its next-generation metro. Now each node will need only one ROADM that can be scaled. In 2017, Verizon started to ramp and upgraded several hundred ROADM nodes and this year it says it will hit its stride before completing the upgrades in 2019.
“We need a lot of automation and software control to hide the complexity of what we have built,” says Wellbrock. This is part of Verizon’s own network transformation project. Instead of engineers and operational groups in charge of particular network layers and overseeing pockets of the network - each pocket being a ‘domain’, Verizon is moving to a system where all the networks layers, including ROADMs, are managed and orchestrated using a single system.
The resulting software-defined network comprises a ‘domain controller’ that handles the lower layers within a domain and an automation system that co-ordinates between domains.
“Going forward, all of the network will be dynamic and in order to take advantage of that, we have to have analytics and automation,” says Wellbrock.
In this new world, there are lots of right answers and you have to figure what the best one is
Open design is an important element here, he says, but the bigger return comes from analytics and automation of the layers and from the equipment.
This is why Wellbrock questions what white boxes will bring: “What are we getting that is brand new? What are we doing that we can’t do today?”
He points out that the building blocks for ROADMs - the wavelength-selective switches and multicast switches - originate from the same sub-system vendors, such that the cost points are the same whether a white box or a system vendor’s platform is used. And using white boxes does nothing to make the growing network complexity go away, he says.
“Mixing your suppliers may avoid vendor lock-in,” says Wellbrock. “But what we are saying is vendor lock-in is not as serious as managing the complexity of these intelligent networks.”
Wellbrock admits that network transformation with its use of analytics and orchestration poses new challenges. “I loved the old world - it was physics and therefore there was a wrong and a right answer; hardware, physics and fibre and you can work towards the right answer,” he says. “In this new world, there are lots of right answers and you have to figure what the best one is.”
Evolution
If white boxes can’t perform all the data-intensive tasks, then they will have to be performed elsewhere. This could take the form of accelerator cards for servers using devices such as Xilinx’s FPGAs.
Adding such functionality to the white box, however, is not straightforward. “This is the dichotomy the white box designers are struggling to address,” says Garcia. A white box is light and simple so adding extra functionality requires customisation of its operating system to run these application. And this runs counter to the white box concept, he says.
We will see more and more functionalities that were not planned for the white box that customers will realise are mandatory to have
But this is just what he is seeing from traditional systems vendors developing designs that are bringing differentiation to their platforms to counter the white-box trend.
One recent example that fits this description is Ciena’s two-rack-unit 8180 coherent network platform. The 8180 has a 6.4-terabit packet fabric, supports 100-gigabit and 400-gigabit client-side interfaces and can be used solely as a switch or, more typically, as a transport platform with client-side and coherent line-side interfaces.
The 8180 is not a white box but has a suite of open APIs and has a higher specification than the Voyager and Cassini white-box platforms developed by the Telecom Infra Project.
“We are going through a set of white-box evolutions,” says Garcia. “We will see more and more functionalities that were not planned for the white box that customers will realise are mandatory to have.”
Whether FPGAs will find their way into white boxes, Garcia will not say. What he will say is that Xilinx is engaged with some of these players to have a good view as to what is required and by when.
It appears inevitable that white boxes will become more capable, to handle more and more of the data-plane tasks, and as a response to the competition from traditional system vendors with their more sophisticated designs.
AT&T’s white-box vision is clear. What is less certain is whether the rest of the operator pack will move to close the gap.
