Enhancing Legal Compliance and Regulation Analysis with Large Language Models
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa | University of Ottawa
Abstract
Context: Software is increasingly pervasive in regulated industries, where compliance with regulations is crucial. Driven by pressing concerns such as data protection and privacy, certain industries, including healthcare, have incorporated specific measures into their compliance frameworks to better address the role of software. Nonetheless, many industries, despite their growing reliance on digital monitoring and automation, have yet to give adequate consideration to software, as the pressure to adapt has not been as strong. This thesis focuses on two domains with complementary challenges: (i) food safety, with regulations that remain largely technology-neutral and therefore demand novel methods to connect legal provisions with systems and software requirements, and (ii) privacy and policy, where provisions already exhibit a close connection to software and systems, but compliance checking methods could be further improved.
Problem: The introduction of Industry 4.0 technologies, particularly the Internet of Things (IoT), has significantly transformed the food industry, enabling real-time monitoring and control of critical processes. Yet, food-safety regulations, like many others, remain deliberately technology-agnostic by design to ensure long-term relevance, promote innovation, and maintain market neutrality. This creates a gap between regulations and modern food-safety systems, which increasingly depend on software. Bridging this gap requires systematic identification and operationalization of requirements-related provisions within regulations. In parallel, current approaches to legal compliance checking, especially in the privacy and policy domains, often rely on sentences as the unit of analysis, apply coarse-grained classification strategies, and do not automatically provide justification for compliance decisions. These approaches also demand significant manual effort, which limits their practical usefulness for stakeholders who must demonstrate and maintain compliance.
Approach: This thesis develops and empirically evaluates a suite of methods that Large Language Models (LLMs) to address these challenges. Contributions include: (1) a Grounded Theory (GT) study of food-safety regulations, resulting in a conceptual characterization of food-safety concepts closely related to systems and software requirements; (2) an empirical evaluation of four families of LLMs (BERT, GPT, Llama, and Mixtral) for automatic classification of requirements-related provisions in food-safety regulations; (3) a study of compliance checking for privacy and policy regulations (specifically General Data Protection Regulation (GDPR) Data Processing Agreements), assessing the effectiveness of state-of-the-art LLMs (GPT, Mixtral, Mistral, Zephyr, Phi), and demonstrating the benefits of paragraph-level context and the provision of explanation and justification; and (4) a quasi-experimental study on deriving Behavior-Driven Development (BDD) artifacts from regulations using LLMs (Llama and CLaude).
Outcomes: The thesis advances regulatory analysis and compliance checking by contributing (1) a conceptual model of requirements-related food-safety concepts and the resulting annotated dataset; (2) LLM-based pipelines for classification of legal provisions and compliance checking of regulatory artifacts; and (3) empirical evidence from a quasi experiment on translating legal provisions into behavioural specifications.
Description
Keywords
Legal Compliance, Requirements Engineering, Large Language Models, Food Safety
