Kumar, Rushil2026-04-012026-04-012026-04-01http://hdl.handle.net/10393/51494https://doi.org/10.20381/ruor-31827Path-planning and control tasks have always been quintessential problems in Robotics. The application of Deep Reinforcement Learning (Deep-RL) to these problems has enabled far more dynamic and adaptive solutions than traditional model-based approaches. In recent years, considerable research attention has shifted towards safe reinforcement learning, where uncertainty and safety constraints are involved directly into the learning process. These methods ensure that agents learn policies that avoid unsafe behaviors, such as collisions, while still exploring effectively. However, for complex, high-dimensional control tasks, the learning and deployment pipeline become increasingly burdensome, as Deep RL models require substantial computational resources. Sparse and pruned networks offer potential solutions for deployment efficiency. Nonetheless, training such networks from scratch remains challenging due to limited representational capacity. Learning highly complex tasks with a sparse model from the outset is particularly difficult, as the reduced parameter count restricts the model's ability to explore, often resulting in unstable learning dynamics. This issue is pronounced for off-policy actor-critic algorithms, which rely heavily on stable function approximation. Conversely, pruning a fully trained dense model may appear to be a viable alternative. However, doing so risks removing parameters that encode essential behaviors or safety-related information. Such indiscriminate loss can lead to unpredictable or unsafe policies, especially in environments governed by physical constraints, where certain safety conditions may be implicitly encoded in specific network weights. In this work, we propose a research hypothesis (based on the Lottery Ticket Hypothesis (LTH)) that guides the development of an off-policy reinforcement learning framework, which will call Trikaya; designed to robustly train a shortened network (with significantly fewer active nodes) from scratch while preserving physical safety throughout learning. By leveraging the tangent space of the environment's Constraint Space Manifold (CSM) (via ATACOM environments), the framework ensures that all learnable actions adhere to safety constraints inherent to the environment. Lottery ticket sub-networks are extracted from a fully trained dense model using Iterative Magnitude Pruning (IMP) or One Shot Magnitude Pruning (OSMP). These masks are then applied for the reinitialization step of the LTH for building the shortened network, which after training, can accomplish tasks with similar metrics as the full dense network while maintaining safe learning steps. Furthermore, the proposed method works for an off-policy based algorithm, which is suitable for a low resource architecture, such as Raspberry Pi boards, for instance. It is shown that the method is able to withstand learning via exploration and still finds satisfactory convergence to significant pruning scales.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/RoboticsPath-PlanningControlDeep Reinforcement LearningOptimisationLeveraging Constraint Space Manifolds and Lottery Tickets to Learn Complex Control Tasks with Pruned Neural NetworksThesis