Ethical Detection of Online Influence Campaigns Using Transformer Language Models

Crothers, Evan

Ethical Detection of Online Influence Campaigns Using Transformer Language Models

dc.contributor.author	Crothers, Evan
dc.contributor.supervisor	Viktor, Herna
dc.contributor.supervisor	Japkowicz, Nathalie
dc.date.accessioned	2020-08-18T19:21:04Z
dc.date.available	2020-08-18T19:21:04Z
dc.date.issued	2020-08-18	en_US
dc.description.abstract	The past five years have seen the rapid escalation of online influence campaigns: coordinated attempts to covertly exploit social media platforms to undermine democratic elections and manipulate public opinion. These campaigns threaten the electoral process of democratic countries, erode confidence in integrity of online social spaces, and undermine trust in mainstream news media. The detection of online influence campaigns (OIC) is a formidable problem, with significant active development within the field of applied artificial intelligence. Models based on the Transformer architecture --- a specific type of neural network architecture amenable to transferring the capability of large pre-trained language models to novel domains --- are a promising instrument for counteracting these campaigns. The focus of this thesis is the intelligent application of such deep learning techniques under real-world conditions for the improved detection of online influence campaigns, while remaining mindful of the ethical implications of automated systems that impact public political expression. This thesis contributes new methodologies for reducing algorithmic bias in supervised detection of online influence campaigns, as well as a novel unsupervised process for improving OIC detection. In the case of supervised approaches, where labelled text from past influence campaigns is used for detecting new campaigns, we present a method for reducing algorithmic bias through careful additional preprocessing and evaluation procedures. In the case of unsupervised approaches, which operate in the absence of labelled data from prior campaigns, algorithmic bias is mitigated through the incorporation of a human analyst to provide additional oversight. The supervised detection approach presented in this thesis includes an assessment of the potential for discrimination against non-native English speakers that may result from Transformer-based classifiers when applied to OIC detection in online communities. The findings indicate that while Transformer features derived from the text of user comments can be leveraged to identify suspect activity, this approach can lead to the emergence of algorithmic bias targeting non-native English grammar and keywords over-represented in past influence campaigns. Drawing on research in native language identification (NLI), "named entity masking" (NEM) is demonstrated to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. The novel unsupervised process incorporates the creation of a user representation, created through the averaging of multiple Transformer output embeddings for user-provided submission titles. With dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP), this user representation can be visualized as a projection that a human analyst can use to identify similar posting patterns among active users within a community. By incorporating ethical oversight by trained human operators, this approach results in a practical system that can be used effectively to facilitate analysis of social media communities, while providing a higher ethical standard than a fully automated solution. The usefulness of this solution is demonstrated quantitatively by leveraging past ground-truth data to perform an extrinsic cluster quality analysis on the projection, and a qualitative analysis is performed focused on accounts that have faced disciplinary action from the host social media platform since the analysis took place. Together, the research and methodologies presented in this work represents substantial improvement to the rigour of contemporary supervised and unsupervised OIC detection systems, and represent a promising future direction for ethical and effective detection techniques.	en_US
dc.identifier.uri	http://hdl.handle.net/10393/40853
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-25079
dc.language.iso	en	en_US
dc.publisher	Université d'Ottawa / University of Ottawa	en_US
dc.subject	neural networks	en_US
dc.subject	transformer	en_US
dc.subject	ethics	en_US
dc.subject	online influence campaigns	en_US
dc.subject	cybersecurity	en_US
dc.subject	algorithmic bias	en_US
dc.subject	elections	en_US
dc.subject	AI	en_US
dc.subject	machine learning	en_US
dc.subject	democratic institutions	en_US
dc.subject	natural language processing	en_US
dc.subject	NLP	en_US
dc.subject	applied AI	en_US
dc.subject	AAI	en_US
dc.title	Ethical Detection of Online Influence Campaigns Using Transformer Language Models	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Génie / Engineering	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	MCS	en_US
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en_US

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Crothers_Evan_2020_thesis.pdf
Taille:: 3.61 MB
Format:: Adobe Portable Document Format
Description:

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -