Repository logo

Reducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study

dc.contributor.authorWu, Taoyu
dc.contributor.supervisorSabetzadeh, Mehrdad
dc.contributor.supervisorNejati, Shiva
dc.date.accessioned2024-12-18T16:12:45Z
dc.date.available2024-12-18T16:12:45Z
dc.date.issued2024-12-18
dc.description.abstractDeep learning (DL) systems have achieved remarkable success across various domains, including healthcare, autonomous driving, and facial recognition. However, ensuring the reliability of these systems requires comprehensive testing with accurately labeled datasets, which is a resource-intensive and costly process, especially for large-scale data. This thesis presents an empirical study aimed at reducing the cost of test data labeling for deep-learning systems through automated and semi-automated methods. We propose two approaches: Human-in-the-Loop labeling and Pseudo Labeling. The Human-in-the-Loop approach combines machine learning with human expertise, intervening only in challenging cases to ensure high labeling accuracy while minimizing human effort. The Pseudo Labeling method, on the other hand, iteratively refines labels generated by the model through repeated training cycles, offering a more automated alternative to manual labeling. Through extensive experiments across seven diverse datasets, we evaluated these approaches against vision-based deep-learning baseline models, including VGG16, ResNet50, and ViT_13B, focusing on the trade-offs between human effort and labeling accuracy. Our results demonstrate that the Human-in-the-Loop approach achieves near-human labeling accuracy (up to 99%) while requiring less human effort than traditional methods. Conversely, Pseudo Labeling performs well in scenarios with minimal human intervention but generally does not outperform Human-in-the-Loop or baseline models when larger amounts of labeled data are available. This research provides valuable insights into balancing the cost of labeling with the need for high accuracy in deep-learning system testing. The findings highlight the potential of Human-in-the-Loop as a practical solution for scenarios requiring high precision and efficiency, while also recognizing the limitations of automated methods like Pseudo Labeling. Future work could explore hybrid models and further optimizations to reduce the overall cost of test data labeling while maintaining high system reliability.
dc.identifier.urihttp://hdl.handle.net/10393/49986
dc.identifier.urihttps://doi.org/10.20381/ruor-30788
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectData Labelling
dc.subjectDeeplearning System testing
dc.subjectActive Learning
dc.subjectPseudo Labelling Learning
dc.subjectHuman-in-the-Loop
dc.titleReducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study
dc.typeThesisen
thesis.degree.disciplineSciences / Science
thesis.degree.levelMasters
thesis.degree.nameMCS
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Wu_Taoyu_2024_thesis.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: