Robust Query Optimization in Relational Databases Based on Approximate Probabilistic Machine Learning
| dc.contributor.author | Kamali, Seyed Mohammad Amin | |
| dc.contributor.supervisor | Kantere, Verena | |
| dc.date.accessioned | 2025-10-21T17:27:16Z | |
| dc.date.available | 2025-10-21T17:27:16Z | |
| dc.date.issued | 2025-10-21 | |
| dc.description.abstract | Query processing systems rely on a query optimizer to explore alternative execution plans for a given query and choose the one expected to be optimal. The optimizer estimates costs using parameters such as cardinalities that are unknown at compile time, necessitating reliance on estimated values subject to errors. Traditional optimizers target expected optimality without considering the inherent uncertainties in these estimates, frequently resulting in suboptimal runtime performance when estimation errors occur. This thesis addresses this fundamental limitation by formalizing the concept of robustness in query optimization and developing a novel approach called Roq (Robust Query Optimization) based on approximate probabilistic machine learning. Within the context of Digital Transformation and Innovation, database management systems serve as critical technological infrastructure that underpins modern data-driven initiatives. As organizations increasingly rely on analytics for decision support, the reliability and efficiency of database systems become paramount for successful digital modernization efforts. The robustness of query execution directly impacts the stability of applications and services that fuel digital transformation across sectors. We first establish a theoretical framework that decomposes cost model uncertainty into distinct components: plan risk (uncertainty inherent to the plan structure) and estimation risk (uncertainty in cost estimates due to modeling limitations). Building on this foundation, we propose a principled methodology for quantifying suboptimality risk, defined as the likelihood of a plan being suboptimal at runtime compared to alternatives. This risk quantification framework employs variational inference techniques to capture both aleatoric (data) and epistemic (model) uncertainties. The contributions of this work include: (1) a novel learned cost model architecture that leverages Graph Neural Networks to capture complex structural characteristics of queries and plans, (2) risk-aware plan evaluation strategies that utilize uncertainty quantification to select robust plans, and (3) RobOpt, a practical workload optimization tool that demonstrates the real-world applicability of our theoretical framework, with a human-centered approach. Our extensive experimental evaluation across multiple benchmarks (JOB, CEB, DSB, and TPC-DS+) demonstrates that Roq consistently outperforms state-of-the-art approaches. The risk-aware plan selection strategies improve the 99th percentile of suboptimality by up to 44.4% compared to existing learned optimizers while improving the average runtime performance by up to 23.0%. Moreover, Roq exhibits exceptional robustness to workload shifts, maintaining stable performance where baseline methods degrade significantly. Analysis of computational overhead confirms that the benefits of risk-aware optimization can be realized with minimal compilation-time impact, making the approach practical for real-world deployment. By addressing uncertainty explicitly rather than ignoring it, this research establishes a foundation for more predictable, reliable, and efficient database performance, thus supporting the broader objectives of digital transformation by enhancing the technological backbone upon which modern data-centric innovations depend. The integration of advanced machine learning techniques with traditional database systems exemplifies how cross-disciplinary approaches can address longstanding challenges in the digitalization landscape. | |
| dc.identifier.uri | http://hdl.handle.net/10393/50937 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31462 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.subject | query optimization | |
| dc.subject | machine learning | |
| dc.subject | databases | |
| dc.subject | robustness | |
| dc.subject | probabilistic machine learning | |
| dc.subject | relational database management systems | |
| dc.subject | digital transformation | |
| dc.title | Robust Query Optimization in Relational Databases Based on Approximate Probabilistic Machine Learning | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Génie / Engineering | |
| thesis.degree.level | Doctoral | |
| thesis.degree.name | PhD | |
| uottawa.department | Science informatique et génie électrique / Electrical Engineering and Computer Science |
