Repository logo

Ethical Detection of Online Influence Campaigns Using Transformer Language Models

dc.contributor.authorCrothers, Evan
dc.contributor.supervisorViktor, Herna
dc.contributor.supervisorJapkowicz, Nathalie
dc.date.accessioned2020-08-18T19:21:04Z
dc.date.available2020-08-18T19:21:04Z
dc.date.issued2020-08-18en_US
dc.description.abstractThe past five years have seen the rapid escalation of online influence campaigns: coordinated attempts to covertly exploit social media platforms to undermine democratic elections and manipulate public opinion. These campaigns threaten the electoral process of democratic countries, erode confidence in integrity of online social spaces, and undermine trust in mainstream news media. The detection of online influence campaigns (OIC) is a formidable problem, with significant active development within the field of applied artificial intelligence. Models based on the Transformer architecture --- a specific type of neural network architecture amenable to transferring the capability of large pre-trained language models to novel domains --- are a promising instrument for counteracting these campaigns. The focus of this thesis is the intelligent application of such deep learning techniques under real-world conditions for the improved detection of online influence campaigns, while remaining mindful of the ethical implications of automated systems that impact public political expression. This thesis contributes new methodologies for reducing algorithmic bias in supervised detection of online influence campaigns, as well as a novel unsupervised process for improving OIC detection. In the case of supervised approaches, where labelled text from past influence campaigns is used for detecting new campaigns, we present a method for reducing algorithmic bias through careful additional preprocessing and evaluation procedures. In the case of unsupervised approaches, which operate in the absence of labelled data from prior campaigns, algorithmic bias is mitigated through the incorporation of a human analyst to provide additional oversight. The supervised detection approach presented in this thesis includes an assessment of the potential for discrimination against non-native English speakers that may result from Transformer-based classifiers when applied to OIC detection in online communities. The findings indicate that while Transformer features derived from the text of user comments can be leveraged to identify suspect activity, this approach can lead to the emergence of algorithmic bias targeting non-native English grammar and keywords over-represented in past influence campaigns. Drawing on research in native language identification (NLI), "named entity masking" (NEM) is demonstrated to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. The novel unsupervised process incorporates the creation of a user representation, created through the averaging of multiple Transformer output embeddings for user-provided submission titles. With dimensionality reduction via Uniform Manifold Approximation and Projection (UMAP), this user representation can be visualized as a projection that a human analyst can use to identify similar posting patterns among active users within a community. By incorporating ethical oversight by trained human operators, this approach results in a practical system that can be used effectively to facilitate analysis of social media communities, while providing a higher ethical standard than a fully automated solution. The usefulness of this solution is demonstrated quantitatively by leveraging past ground-truth data to perform an extrinsic cluster quality analysis on the projection, and a qualitative analysis is performed focused on accounts that have faced disciplinary action from the host social media platform since the analysis took place. Together, the research and methodologies presented in this work represents substantial improvement to the rigour of contemporary supervised and unsupervised OIC detection systems, and represent a promising future direction for ethical and effective detection techniques.en_US
dc.identifier.urihttp://hdl.handle.net/10393/40853
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-25079
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectneural networksen_US
dc.subjecttransformeren_US
dc.subjectethicsen_US
dc.subjectonline influence campaignsen_US
dc.subjectcybersecurityen_US
dc.subjectalgorithmic biasen_US
dc.subjectelectionsen_US
dc.subjectAIen_US
dc.subjectmachine learningen_US
dc.subjectdemocratic institutionsen_US
dc.subjectnatural language processingen_US
dc.subjectNLPen_US
dc.subjectapplied AIen_US
dc.subjectAAIen_US
dc.titleEthical Detection of Online Influence Campaigns Using Transformer Language Modelsen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMCSen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Crothers_Evan_2020_thesis.pdf
Size:
3.61 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: