Explainable Prompt Learning for Movie Review Sentiment Analysis

Stilwell, SeanExplainable Prompt Learning for Movie Review Sentiment AnalysisUniversité d'Ottawa | University of Ottawa2024Natural Language ProcessingExplainable AILarge Language ModelsSentiment AnalysisUniversité d'Ottawa | University of OttawaUniversité d'Ottawa | University of Ottawa2024-03-192024-03-192024-03-19enThesishttp://hdl.handle.net/10393/46044https://doi.org/10.20381/ruor-30220Attribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/Large language models have transformed the field of natural language processing with an outstanding ability to analyze and comprehend human texts. Recently, a popular approach for applying these models to various tasks is prompting, where we present a model with a prompt that guides it towards solving a particular task. This approach has achieved success on a variety of tasks and has led to the concept of prompt learning, where we fine-tune the language models on the prompt itself, leading to further success on many tasks. In this work, we explore the use of prompting and prompt learning for sentiment analysis on movie reviews from the IMDB dataset. We conduct two experiments for our sentiment classification experiment. In the first experiment, we present a set of human-engineered prompts to a collection of language models along with the movie reviews, obtaining strong results. In the second experiment, we apply prompt learning by fine-tuning the selected language models on the prompts themselves. In this experiment, we achieved a state-of-the-art 98.53% accuracy with the Llama 2 model. We observe that all models achieve stronger results when we apply prompt learning, demonstrating the effectiveness of this approach. In addition to the application of prompting and prompt learning, we also explore the field of Explainable AI (XAI). To the best of our knowledge, no existing work has applied XAI to prompt learning systems. We apply a variety of XAI methods to our prompt learning system to generate human-understandable explanations for the model predictions. We compare these XAI methods using a variety of metrics. We evaluate how well the explanations reflect the decision-making process using the Faithfulness-by-Construction pipeline, attaining a peak sufficiency of 77.05%. Through human evaluation, we also obtain an adequate justification rate of 75%, an understandability of 100%, and a trustworthiness of 86%. This work contributes a novel prompt learning-based framework for sentiment analysis of movie reviews that achieves state-of-the-art results, along with the results of a comprehensive evaluation of a variety of language models for both prompting and prompt learning. This work also contributes a novel framework for applying XAI methods to prompt learning systems, along with a comprehensive evaluation of the quality of explanations generated by these methods. We also provide insights into the behaviours of various language models and the importance of effective prompt engineering for this task.