Making best use of data at the network's edge 
Tuesday, August 13, 2024 at 11:57AM
Roy Rubenstein in AnyLog, EdgeLake, Federated learning, Moshe Shadmon, The Linux Foundation, edge data, telecom operators

Moshe Shadmon has always been interested in data, the type that is spread out and requires scrutiny.  

Moshe Shadmon

He read law at university but was also fascinated by maths and computers.

By the time Shadmon graduated with a law degree, he had set up a software company. He never practised law. 

"I think that part [not having an engineering degree] has always allowed me to look at things differently," he says.

More recently, Shadmon's interest in data has focussed on the network edge. Here, the data is typically across locations and too plentiful to fit within one machine. 

"If the data needs to be managed across many machines, it is a problem," says Shadmon. "Suddenly, solutions become complicated and expensive."

 

Distributed edge data 

Edge data refers to data generated by sensors or Internet of Things (IoT) devices located at several sites. Extracting insights from such edge data is challenging.

Shadmon refers to this as a 'big data' problem, 'big' being a relative term. Data keeps growing, proportional to the hardware used. Data generated two decades ago is now tiny compared to today's data. The evolution of IoT devices, with billions now deployed, is a testament to such growth.

The real challenge with edge data lies in its management. There is currently no efficient technology to manage such distributed data - the data is raw and has no universal format. It is an issue many players in the industry can relate to. 

Adding management software to the endpoints is also a challenge as edge hardware typically has limited resources. Alternatively, moving the data to the cloud where there are software tools and ample computing, is expensive: renting processing and storage and requiring networking to upload the data to the cloud.

"Companies move the data to the cloud or into centralized databases, not because it's a great way to deal with the data, but because they don't have a choice," says Shadmon.

It is these edge data challenges that Shadmon's start-up company, AnyLog, is addressing.

Shadmon founded AnyLog six years ago. AnyLog spent its first five years developing the edge data management platform. In the last year, AnyLog has been demonstrated its working product and is collaborating with large companies, such as IBM, that are building offerings using its technology. 

AnyLog has also contributed an open-source version of its edge data technology to the Linux Foundation, a project known as EdgeLake.

 

The technology's workings

The hardware at the edge typically comprises one or more sensors, a programmable logic controller—an industrial computer interfaced to sensors—and an edge 'node' that extracts the data. The node may be a switch, a gateway, or a server next to the sensors and is typically connected to a network. 

AnyLog has developed software that resides at the node. "You plug it on the node, and it's a stack of services that manages the data in an automated way," says Shadmon. "You could think of it as the equivalent of the data services you have in the cloud."

The software does two clever things.

It adds a virtual layer that makes all the data in all the interest nodes appear centralised. This means that any node in the set of nodes of interest can be queried, and the software will identify the locations where the relevant data resides to satisfy the query first sent to one of the nodes. The outcome is identical to a setup where all the data is stored in the cloud except that here, the data remains at the edge.

Blockchain technology is used to locate and manage the distributed data. According to Shadmon, this is transparent to the end user, but it oversees the 'metadata'- data about the data - and serves as a directory to identify where the needed data is located.  

Shadmon cites a smart city example where the query is to quantify the electricity usage in San Francisco in the last hour. There may be thousands of nodes hosting the data. The technology identifies the nodes with the relevant data of electricity usage in San Francisco. These nodes are accessed and they return their data to the first node which then performs data aggregation. 

Data may also be more substantial than time-stamped electricity usage numbers. For example, the data could be video streams from high-definition cameras across multiple locations.

The key benefit of AnyLog's approach is that only the needed data is read wherever it is stored. This avoids moving and processing all the data from multiple locations into the cloud. Moreover, any of the nodes can be queried to satisfy a request.

"If you don't get the performance you need, you can add more nodes and increase the data distribution," says Shadmon. "Now you will have a higher degree of parallelism and less data on each node; it's a very scalable model." 

AnyLog's technology can also be used for machine learning at the edge, a market opportunity that excites Shadmon. 

 

AI at the edge 

Engineers must decide how they apply machine learning to data at the edge.

The necessary AI training and inferencing hardware can be deployed at the edge, but only if the application justifies such a solution. More commonly, the data is first moved to the cloud, especially when the edge data is spread across locations. Once the data is in the cloud, AI hardware and software can be applied.  

"What companies want to do is to enable AI in real-time in a simple and cost-effective way," says Shadmon. Cloud is used not because it's a great solution but because the alternative - building and trying to deal with the data at the edge - is much more complicated, he says. 

AnyLog's proprietary solution — and the Linux Foundation open-source EdgeLake equivalent — enables the training of an AI model using federated learning without having to move the local data.

Source: AnyLog.

The data at each node is used for local training, creating a 'sub’ model. The AnyLog software can locate and aggregate all the sub-models to form the complete training model, which is then pushed to each node for AI inferencing at the network edge. The AI learning cycle is repeated - see diagram - to incorporate new data as it is generated.

"All of this is automated," says Shadmon. 

 

Bypassing the cloud players 

Today, the telcos are the leading connectivity providers uploading data from the edge to the cloud. 

"But they are not just moving the data; the telcos are also moving the business from the edge to the cloud," says Shadmon. It is the cloud computing players, not the telcos, that benefit from data hosting and data processing.   

However, by virtualizing the data, a telco's network also serves the end-user’s data requirements; the cloud players are bypassed. Here is an edge opportunity for the telcos. For once, they can take business away from the cloud providers, says Shadmon: "Every per cent of data that remains at the edge and doesn't go to the cloud is a multi-billion-dollar opportunity for the telcos."

AnyLog is in discussion with several telcos.

Article originally appeared on Gazettabyte (https://www.gazettabyte.com/).
See website for complete article licensing information.