Automated Log Analysis: Failure Prediction and Anomaly Detection Using Machine Learning and Large Language Models
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa / University of Ottawa
Abstract
The dependability of modern software systems is becoming increasingly crucial as their complexity and scope continue to grow. Log data recorded during system execution can be leveraged to predict failures and detect anomalies automatically. However, designing accurate and efficient log-based analysis methods remains challenging due to the diversity of system environments, the instability of logs over time, and the scarcity of labeled data for training. This thesis addresses these challenges by systematically evaluating deep learning models for log-based failure prediction and by proposing a hybrid data-efficient approach that combines machine learning and a Large Language Model (LLM) for anomaly detection in unstable logs.
The first part of the thesis focuses on failure prediction using log data. While several Machine Learning (ML) and Deep Learning (DL) methods have been proposed, existing empirical studies are limited in scope, often examining only a subset of DL architectures or narrow dataset conditions. This thesis systematically investigates the combination of different log embedding strategies with major types of DL architectures, including Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformers. To enable a comprehensive evaluation, we design a modular architecture to accommodate various embedding strategies with different DL encoder configurations and synthesize 360 datasets with controlled characteristics, such as dataset size and failure rate, across three distinct system behavioral models. Experimental results demonstrate that CNN-based models with the Logkey2vec embedding strategy achieve the best overall performance, particularly when the dataset size exceeds 350 instances or when the failure rate exceeds 7.5%. These findings provide actionable insights into selecting effective models depending on dataset characteristics.
The second part of the thesis addresses Anomaly Detection in Unstable Logs (ULAD), a more realistic but underexplored situation where logs evolve due to software or environmental changes. Existing approaches based on machine learning typically require substantial labeled data, whereas LLMs can generalize with little data but struggle to capture structured log patterns. To address these complementary limitations, this thesis introduces FlexLog, a novel hybrid approach that integrates simple ML models (Decision Trees (DT), K-Nearest Neighbors (KNN), and Single-layer Feedforward Network (SLFN)) with a LLM (Mistral) through ensemble learning. FlexLog further incorporates a cache and a retrieval-augmented generation (RAG) components to improve time efficiency and accuracy, respectively. We evaluate FlexLog on four datasets specifically configured for unstable log anomaly detection. Results show that FlexLog consistently outperforms baseline methods by at least 1.2 percentage points (pp) in F1 score, while reducing labeled data requirements by 62.87 pp. When trained on the same amount of data as the baselines, FlexLog achieves up to a 13 pp increase in F1 score on the ADFA-U dataset, while maintaining an inference time of less than one second per log sequence, making it suitable for most practical applications.
By systematically evaluating deep learning architectures for log-based failure prediction and introducing a hybrid ML-LLM framework for detecting unstable log anomalies, this thesis provides a unified and empirical foundation for advancing log-based dependability analysis. To ensure reproducibility and enable future research, all datasets, tools, and experimental results are made publicly available.
