Altamimi, Sadi2023-05-182023-05-182023-05-18http://hdl.handle.net/10393/44971http://dx.doi.org/10.20381/ruor-29177Reinforcement learning (RL) has been at the core of recent advances in fulfilling the AI promise towards general intelligence. Unlike other machine learning (ML) paradigms, such as supervised learning (SL) that learn to mimic how humans act, RL tries to mimic how humans learn, and in many tasks, managed to discover new strategies and achieved super-human performance. This is possible mainly because RL algorithms are allowed to interact with the world to collect the data they need for training by themselves. This is not possible in SL, where the ML model is limited to a dataset collected by humans which can be biased towards sub-optimal solutions. The downside of RL is its high cost when trained on real systems. This high cost stems from the fact that the actions taken by an RL model during the initial phase of training are merely random. To overcome this issue, it is common to train RL models using simulators before deploying them in production. However, designing a realistic simulator that faithfully resembles the real environment is not easy at all. Furthermore, simulator-based approaches don’t utilize the sheer amount of field-data available at their disposal. This work investigates new ways to bridge the gap between SL and RL through an offline pre-training phase. The idea is to utilize the field-data to pre-train RL models in an offline setting (similar to SL), and then allow them to safely explore and improve their performance beyond human-level. The proposed training pipeline includes: (i) a process to convert static datasets into RL-environment, (ii) an MDP-aware data augmentation process of offline-dataset, and (iii) a pre-training step that improves RL exploration phase. We show how to apply this approach to design an action recommendation engine (ARE) that automates network operation centers (NOC); a task that is still tackled by teams of network professionals using hand-crafted rules. Our RL algorithm learns to maximize the Quality of Experience (QoE) of NOC users and minimize the operational costs (OPEX) compared to traditional algorithms. Furthermore, our algorithm is scalable, and can be used to control large-scale networks of arbitrary size.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Network AutomationReinforcement LearningAutomating Network Operation Centers using Reinforcement LearningThesis