Enhancing Deep Neural Network Reliability: Test Adequacy, Selection, and Generation Strategies for Fault Detection

Aghababaeyan, Zohreh2024-11-292024-11-292024-11-29http://hdl.handle.net/10393/49918https://doi.org/10.20381/ruor-30733Deep Neural Networks (DNNs) have achieved remarkable success in fields such as image recognition, medical diagnostics, and autonomous systems. However, like traditional software, DNNs are susceptible to failures, requiring rigorous testing to ensure their reliability. Although their potential to bring change is substantial, their weaknesses can lead to severe outcomes in practical applications, such as accidents in self-driving cars or incorrect treatment in AI health system recommendations. These shortcomings underscore the pressing necessity for thorough testing of DNNs to guarantee dependability and safety in essential uses. However, testing DNNs comes with unique challenges. One significant challenge is the high cost of data labeling, as manually labeling large datasets for fault detection is both expensive and resource-intensive. Another challenge lies in the complexity of test case selection, where identifying the most fault-revealing inputs requires advanced methods to avoid inefficiencies. Additionally, model comparison poses its own difficulties, as comparing AI models for updates, compression, or selection demands specialized techniques to uncover their behavioral differences. This thesis addresses these challenges through practical and innovative solutions: - Fault Estimation: A novel clustering-based approach that groups similar mispredictions, enhancing the precision of fault detection and improving model reliability assessment. - Test Adequacy Metrics: A thorough evaluation of existing metrics and the introduction of a diversity-based adequacy metric that excels in detecting critical faults, particularly in safety-critical systems. - DeepGD Framework: A black-box test selection framework that reduces labeling costs by identifying diverse and fault-revealing inputs, significantly improving testing efficiency. - DiffGAN for Model Comparison: A GAN-based method to generate test inputs that expose behavioral differences between models, ensuring reliable updates and robust model selection, especially in domains like autonomous driving and healthcare. By tackling the real-world challenges of DNN testing, this work provides scalable, practical solutions that enhance safety, reliability, and robustness in high-risk applications. It directly addresses the pressing need for rigorous testing in DNNs, where failures can have life-altering consequences. These contributions mark a significant step forward in making DNN models safer and more dependable.enAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/CoverageDeep Neural Network (DNN)DiversityFaultsTestTest Case SelectionDNN Fault DetectionMulti-Objective OptimizationDeep Learning Model EvaluationUncertainty MetricsModel Retraining GuidanceDifferential TestingTest GenerationGANEnhancing Deep Neural Network Reliability: Test Adequacy, Selection, and Generation Strategies for Fault DetectionThesis