Large Language Models: Towards Safety, Robustness, and Understanding

Crothers, Evan

Large Language Models: Towards Safety, Robustness, and Understanding

dc.contributor.author	Crothers, Evan
dc.contributor.supervisor	Viktor, Herna L.
dc.contributor.supervisor	Japkowicz, Nathalie
dc.date.accessioned	2024-11-20T20:54:49Z
dc.date.available	2024-11-20T20:54:49Z
dc.date.issued	2024-11-20
dc.description.abstract	The Transformer neural network architecture has had an enormous impact on state-of-the-art language model performance across a wide range of tasks in the text domain. In order for large language models based on this architecture to be suitable for widespread usage, it is critical to ensure they are not abused for malicious purposes, that they are robust against adversarial attack, and that the behaviour of such models is well-understood. The rapid proliferation of user-friendly interfaces to generative language models in particular, such as ChatGPT, highlight the pressing need for preventing abuse of large language models, while improving adversarial robustness of systems designed to detect them. This thesis outlines a plan to make a significant contribution towards these goals in several ways. We begin by performing an in-depth survey of the categories of malicious attacks associated with machine generated text, a threat modelling exercise to explore cybersecurity threats related to these attacks, and a comprehensive overview of detection methodologies and recommendations to improve defenses. This work was featured by cybersecurity expert Bruce Schneier as "a solid grounding amongst all of the hype", and a talk on the paper was presented as part of the United Nations "AI for Good" speaker series. Second, we demonstrate a new technique utilizing statistical features to augment Transformer-derived features to improve adversarial robustness in detection of computer-generated text - an important problem for detection of spam and disinformation, and a setting where adversarial attacks are likely. Third, we determine to what extent existing metrics for assessment of machine generated text align with subjective human assessment, identifying gaps between computational metrics and subjective human assessment of machine generated text. Finally, we perform an in-depth assessment of how masking-based faithfulness measures are applied to Transformer text classifiers, demonstrating pitfalls in faithfulness-based model comparisons, investigating the underlying mechanisms that cause these issues to arise, and determining the impacts of relying on such measures on adversarial robustness and fairness.
dc.identifier.uri	http://hdl.handle.net/10393/49870
dc.identifier.uri	https://doi.org/10.20381/ruor-30696
dc.language.iso	en
dc.publisher	Université d'Ottawa / University of Ottawa
dc.rights	Attribution-NonCommercial-ShareAlike 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject	large language models
dc.subject	generative AI
dc.subject	fairness
dc.subject	explainability
dc.subject	adversarial attacks
dc.subject	cybersecurity
dc.subject	NLP
dc.subject	ethical AI
dc.subject	interpretability
dc.subject	neural networks
dc.subject	machine learning
dc.title	Large Language Models: Towards Safety, Robustness, and Understanding
dc.type	Thesis	en
thesis.degree.discipline	Génie / Engineering
thesis.degree.level	Doctoral
thesis.degree.name	PhD
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Crothers_Evan_2024_thesis.pdf
Taille:: 3.95 MB
Format:: Adobe Portable Document Format

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -