Reducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study

Wu, Taoyu

Reducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study

dc.contributor.author	Wu, Taoyu
dc.contributor.supervisor	Sabetzadeh, Mehrdad
dc.contributor.supervisor	Nejati, Shiva
dc.date.accessioned	2024-12-18T16:12:45Z
dc.date.available	2024-12-18T16:12:45Z
dc.date.issued	2024-12-18
dc.description.abstract	Deep learning (DL) systems have achieved remarkable success across various domains, including healthcare, autonomous driving, and facial recognition. However, ensuring the reliability of these systems requires comprehensive testing with accurately labeled datasets, which is a resource-intensive and costly process, especially for large-scale data. This thesis presents an empirical study aimed at reducing the cost of test data labeling for deep-learning systems through automated and semi-automated methods. We propose two approaches: Human-in-the-Loop labeling and Pseudo Labeling. The Human-in-the-Loop approach combines machine learning with human expertise, intervening only in challenging cases to ensure high labeling accuracy while minimizing human effort. The Pseudo Labeling method, on the other hand, iteratively refines labels generated by the model through repeated training cycles, offering a more automated alternative to manual labeling. Through extensive experiments across seven diverse datasets, we evaluated these approaches against vision-based deep-learning baseline models, including VGG16, ResNet50, and ViT_13B, focusing on the trade-offs between human effort and labeling accuracy. Our results demonstrate that the Human-in-the-Loop approach achieves near-human labeling accuracy (up to 99%) while requiring less human effort than traditional methods. Conversely, Pseudo Labeling performs well in scenarios with minimal human intervention but generally does not outperform Human-in-the-Loop or baseline models when larger amounts of labeled data are available. This research provides valuable insights into balancing the cost of labeling with the need for high accuracy in deep-learning system testing. The findings highlight the potential of Human-in-the-Loop as a practical solution for scenarios requiring high precision and efficiency, while also recognizing the limitations of automated methods like Pseudo Labeling. Future work could explore hybrid models and further optimizations to reduce the overall cost of test data labeling while maintaining high system reliability.
dc.identifier.uri	http://hdl.handle.net/10393/49986
dc.identifier.uri	https://doi.org/10.20381/ruor-30788
dc.language.iso	en
dc.publisher	Université d'Ottawa / University of Ottawa
dc.rights	Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Data Labelling
dc.subject	Deeplearning System testing
dc.subject	Active Learning
dc.subject	Pseudo Labelling Learning
dc.subject	Human-in-the-Loop
dc.title	Reducing the Cost of Test Data Labelling for Deep-Learning Systems: An Empirical Study
dc.type	Thesis	en
thesis.degree.discipline	Sciences / Science
thesis.degree.level	Masters
thesis.degree.name	MCS
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Wu_Taoyu_2024_thesis.pdf
Taille:: 1.03 MB
Format:: Adobe Portable Document Format

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -