Design

How to design production-ready AI solutions

AI in telecom

With new technologies such as 5G, edge and IoT continuing to advance, new levels of complexity are increasing in telecom network operations. The hybrid nature of old and new technologies, various frequency bands and spectrums, and heterogeneous devices and applications all contribute to these new challenges for future networks. AI and ML technologies can be transformational to help achieve the desired network performance while meeting energy efficiency, security, and latency requirements.

Telecommunication generates a huge amount of data, which presents great potential to leverage AI and ML. However, traditional telecom operations were originally built and designed for telephony calls, and later evolved into more dynamic data connections since 4G/LTE. The networks and network data were very static and designed mainly for generating alarms and operating in a break/fix pattern. Implementing the system and process changes necessary to support AI and ML to address the more dynamic nature of future networks is critical and hugely beneficial for telecom operators. For instance, basic log or telemetric data does not contain enough information to diagnose a network issue accurately. By adding transactional data which indicate event sequence, with additional curated error code, the latest natural language processing and deep learning, AI/ML solutions can help recommend resolutions for an issue with accuracy that is above human performance [1].

Distributed nature of 5G and future networks

Data is crucial for any AI and ML applications. In 5G and future networks, data can be collected and processed in a very distributed manner (Figure 1), at a radio base station, regional data center or central data center.

Figure 1: data collected in a distributed manner in a network.

The time and space complexity of where the data should be processed varies greatly (see Figure 2). Depending on the latency and performance requirements of the applications that consume AI/ML output, the designs that best suits the situation may vary.

Time and space complexity of ML model and data.

Figure 2: Time and space complexity of ML model and data.

Compute and communication restrictions

AI and ML is fundamentally executed on the compute resources. If the application that consumes AI/ML, data and AI/ML model all sit on the cloud with elastic compute and sufficient communication bandwidth (unlimited compute and unlimited communication bandwidth), we could easily meet our performance and latency requirements. However, things will become more challenging when the application that will consume AI/ML output is located in a resource constrained and/or communication bandwidth limited environment, for example, on an edge. Figure 3 illustrates the compute and communication restrictions on the design choices of AI/ML solutions.

Design decisions with respect to time and space complexity of AI/ML model and data.

Figure 3: Design decisions with respect to time and space complexity of AI/ML model and data.

Centralized training/retraining and inference

When a large quantity of data is needed or the model requires extensive compute (to reach the required accuracy/performance), the data need to be typically processed and stored at a centralized location (where more compute, storage and memory resources are available) – see Figure 4. Due to the size of the data, the complexity of the model and the communication limitations, the execution cannot usually be achieved in real time. This is because central locations are normally further away from the application. Training data will need to be relayed and stored in the central server for training and retraining. Inference data will also need to be relayed to carry out the necessary inference compute on the central server before the model output is relayed back to the edge, then consumed by the application.

training, retraining, inference all happens on central server.

Figure 4: training, retraining, inference all happens on central server.

One of the benefits of this design is that it’s highly scalable and relatively easy to maintain. Centralized automation process/workflow can be set up to monitor data quality, define retraining policy and render inference results automatically. The same infrastructure set up centrally can be leveraged for multiple models, applications, and sites. However, this design choice will only be suitable for systems that require high performance, can tolerate longer latency, and have enough communication bandwidth to meet latency requirements.

Distributed training/retraining and inference

At the other end of the spectrum, if ultra-low latency is the priority, then the data must be stored and processed close to, or on the edge, where compute, storage and memory are often limited. The AI/ML model in this case must be simple enough to be executed in real time with the limited resources available on the edge to render real-time predictions (Figure 5).

The benefit of this design is that it’s lean and fast. However, each instance will require its own infrastructure, workflow, and monitoring process. To maintain it in a scalable manner will require additional synchronization and coordination.

training, retraining, inference all happens on edge

Figure 5: training, retraining, inference all happens on edge

Centralized/distributed training, distributed inference

There are some flexibilities for workload distribution and optimization if the latency requirements lie anywhere between the above two extreme scenarios. It is also an active research area of ​​distributed machine learning[2] (federated learning and split learning, for example) to optimize the distribution of the computation and communication load considering latency, privacy, security, and performance requirements.

The scenario in Figure 6 illustrates the situation when ETL (extract, transform, load) for batch training data requires more compute and processing power than what the edge device can provide. In this case, the batch data will be relayed to the central server, with the heavy lifting part of the ETL on batch data completed centrally. Then the processed data will be transferred back to the edge for training. This process is useful when dimension reduction can be carried out on batch data so that the training and inference can be scaled down to a reasonable workload which the edge can manage. Since this relay only affects training data, the application can still be expected to achieve the desired latency for inference. Since training will happen on edge, this design is only suitable for models that do not require extensive compute to train.

Centralize part of the ETL process, distribute everything else.

Figure 6: Centralize part of the ETL process, distribute everything else.

Figure 7 illustrates another scenario, when the initial model is trained centrally. This design helps offload the resource intensive initial training to the central server and conduct a less resource demanding retraining (via transfer learning, for example) on the edge.

  Centralized initial training, distributed retraining and inference.

Figure 7: Centralized initial training, distributed retraining and inference.

For each of these scenarios, communication bandwidth needs to be carefully evaluated. When the communication bandwidth is limited and the payload to relay the data/model is too big, there are two options to reduce the transmission overhead: quantization – using low precision values ​​to compute (so that the size of the data becomes smaller) or sparcification – minimizing the coding length of stochastic gradients (so that the size of the model becomes smaller).

Balance pipeline complexity and maintenance cost

Although it is sometimes necessary to put machine leaning, the supporting infrastructure and workflow together manually at the beginning of implementing ML solutions, to achieve the scalability and maintain reasonable support and operational cost, the workflow will have to be orchestrated and automated. Figure 8 shows the ML development process and requirements. Through iterative experimentations during the development phase, multiple variations such as data cleaning, feature engineering, model architecture, and hyper parameter tuning need to be carried out, evaluated, and finalized. The orchestrated experiment, once successfully completed, will be required to produce the model, pipeline, and workflow management logic (below, in pink) needed to automate the pipeline and model execution in ML deployment. All these ML models, pipelines and workflows together with their lineage, dependency and triggering logic will have to be produced in the deployment environment, further implemented and validated with the live production data (Figure 9).

If any of the steps are not automated, the cost involved (having data scientists and data engineers constantly monitoring, tweaking, and tuning the models manually) besides the infrastructure set up and hardware resource (compute and communication) utilization will need to be added to the maintenance cost of the entire system. It’s less ideal to deploy ML systems that are not sufficiently automated (thinking about managing and maintaining hundreds of models for thousands of nodes).

Figure 8: ML development

Figure 8: ML development.

ML deployment and execution.

Figure 9: ML deployment and execution.

use case example

Two variations in the implementation are demonstrated in the following production ML systems.

In Figure 10, training, retraining, and inference all happen on central server (implementation of Figure 4) in the Ericsson network, where the model output is used in the application to support CSP network. In this use case, repair center data is collected with pipelines designed to connect to the machine leaning solution, where it is used to train the model to identify and predict potential issues. This results in a preemptive resolution and potentially cuts down the cost for repair significantly. The benefit of this solution also includes automatic retraining which minimizes human intervention and maintenance costs while keeping the model updated according to changing situations.

  Production ML implementation for CSP (centralized training/retraining and inference).

Figure 10: Production ML implementation for CSP (centralized training/retraining and inference).

An example of the distributed training/retraining and inference (implementation of Figure 5) is illustrated in Figure 11. In this case, there is enough compute and storage capacity in the CSP network via the Ericsson network intelligence (ENI) solution. The entire lifecycle of machine learning can be completed in a distributed manner. We have implemented this solution to monitor network health and detect anomalies in a timely manner.

Production ML implementation for CSP (distributed training/retraining and inference).

Figure 11: Production ML implementation for CSP (distributed training/retraining and inference).

Conclusion

5G is a prime catalyst for more workloads being executed on the edge, coupled with more low-latency data-driven use cases and applications. Lowering operational costs and ensuring returns on network investments are key priorities to help service providers leap ahead of the competition using AI. When applying AI and machine learning solutions, it is important to consider the design concepts related to the distributed nature of data, resource limitations, pipeline readiness and maintenance costs. Given the specific application and available resources, a balanced and systematic design approach should be taken to optimize the benefits for a given cost structure. Applying AI/ML is not only about the algorithms or models, but also about careful designs of how each component interacts to maximize the ROI of an often costly AI/ML project. With well-designed implementations, AI/ML will play more important roles in managing and optimizing future network and relevant use cases to realize its full potential.

References:

[1] How to automatically resolve trouble tickets with machine learning

[2] Omar Nassef, Wenting Sun, Hakimeh Purmehdi, Mallik Tatipamula, Toktam Mahmoodi, A survey: Distributed Machine Learning for 5G and beyond, Computer Networks,Volume 207,2022,108820,ISSN 1389-1286,https://doi.org/10.1016 /j.comnet.202.108820.

Want to learn more?

Here’s how to design machine learning architectures for today’s telecom systems.

Read our AI Confidential: How can machine learning on encrypted data improve privacy protection?

.

About the author

Getprofitam

Leave a Comment