Repository logo

Large Language Models: Towards Safety, Robustness, and Understanding

dc.contributor.authorCrothers, Evan
dc.contributor.supervisorViktor, Herna L.
dc.contributor.supervisorJapkowicz, Nathalie
dc.date.accessioned2024-11-20T20:54:49Z
dc.date.available2024-11-20T20:54:49Z
dc.date.issued2024-11-20
dc.description.abstractThe Transformer neural network architecture has had an enormous impact on state-of-the-art language model performance across a wide range of tasks in the text domain. In order for large language models based on this architecture to be suitable for widespread usage, it is critical to ensure they are not abused for malicious purposes, that they are robust against adversarial attack, and that the behaviour of such models is well-understood. The rapid proliferation of user-friendly interfaces to generative language models in particular, such as ChatGPT, highlight the pressing need for preventing abuse of large language models, while improving adversarial robustness of systems designed to detect them. This thesis outlines a plan to make a significant contribution towards these goals in several ways. We begin by performing an in-depth survey of the categories of malicious attacks associated with machine generated text, a threat modelling exercise to explore cybersecurity threats related to these attacks, and a comprehensive overview of detection methodologies and recommendations to improve defenses. This work was featured by cybersecurity expert Bruce Schneier as "a solid grounding amongst all of the hype", and a talk on the paper was presented as part of the United Nations "AI for Good" speaker series. Second, we demonstrate a new technique utilizing statistical features to augment Transformer-derived features to improve adversarial robustness in detection of computer-generated text - an important problem for detection of spam and disinformation, and a setting where adversarial attacks are likely. Third, we determine to what extent existing metrics for assessment of machine generated text align with subjective human assessment, identifying gaps between computational metrics and subjective human assessment of machine generated text. Finally, we perform an in-depth assessment of how masking-based faithfulness measures are applied to Transformer text classifiers, demonstrating pitfalls in faithfulness-based model comparisons, investigating the underlying mechanisms that cause these issues to arise, and determining the impacts of relying on such measures on adversarial robustness and fairness.
dc.identifier.urihttp://hdl.handle.net/10393/49870
dc.identifier.urihttps://doi.org/10.20381/ruor-30696
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectlarge language models
dc.subjectgenerative AI
dc.subjectfairness
dc.subjectexplainability
dc.subjectadversarial attacks
dc.subjectcybersecurity
dc.subjectNLP
dc.subjectethical AI
dc.subjectinterpretability
dc.subjectneural networks
dc.subjectmachine learning
dc.titleLarge Language Models: Towards Safety, Robustness, and Understanding
dc.typeThesisen
thesis.degree.disciplineGénie / Engineering
thesis.degree.levelDoctoral
thesis.degree.namePhD
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Crothers_Evan_2024_thesis.pdf
Size:
3.95 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: