Multimodal Emotion Recognition Using Physiological Signals

Dhothar, Mehakdeep Kaur

Multimodal Emotion Recognition Using Physiological Signals

Files

Dhothar_Mehakdeep_Kaur_2025_thesis.pdf (9.41 MB)

Date

2025-11-19

Authors

Dhothar, Mehakdeep Kaur

Publisher

Université d'Ottawa / University of Ottawa

Creative Commons

Abstract

Affective computing aims to develop systems capable of recognizing and interpreting human emotions, yet existing multimodal datasets frequently suffer from limitations such as poor signal quality, high inter-subject variability, and inconsistent evaluation protocols. To address these gaps, this thesis develops and validates a comprehensive framework for multimodal emotion recognition using physiological signals - Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RSP) - augmented with speech-based representations. The goal was to establish standardized preprocessing workflows, rigorous signal quality assessment (SQA), and reproducible baseline experiments to support the development and technical validation of a large-scale physiological dataset. This framework was applied to a dataset collected from 99 participants, containing synchronized physiological recordings, speech responses, and self-reported emotional annotations during exposure to validated video stimuli. To ensure data integrity, a rigorous SQA and artifact-removal pipeline was applied across modalities, integrating established ECG and respiration metrics with newly designed EDA-specific indicators. Using this refined dataset, multiple emotion-classification experiments were conducted under a strict subject-independent evaluation protocol, comparing fixed 30-second windows with emotion-triggered temporal segments. Across all tasks - binary arousal, binary valence, and multiclass emotion recognition - trigger-based segments consistently produced clearer and more discriminative physiological patterns. Random Forest achieved the strongest overall performance, including 78.8% multiclass accuracy using physiological features alone. To explore multimodal enhancement, speech embeddings were fused with handcrafted physiological features. This early-fusion approach led to substantial improvements across all tasks, most notably increasing multiclass accuracy from 78.8% to 97% when using trigger-based segments. These findings demonstrate that speech provides complementary affective information that enhances physiological representations. A subject-wise evaluation was also conducted to examine emotion separability across individuals and to identify video-specific misclassification patterns that reveal how different stimuli elicit varying physiological responses. Overall, this thesis delivers a validated multimodal dataset, reproducible processing pipelines, and strong baseline benchmarks that provide a solid foundation for future research in physiological and multimodal emotion recognition.

Keywords

Machine Learning, Affect Recognition, Multimodal Machine Learning, Affective Computing, Emotion Recognition

URI

http://hdl.handle.net/10393/51064
https://doi.org/10.20381/ruor-31529

Collections

- Thèses, 2011 - // Theses, 2011 -

Full item page Statistics

Multimodal Emotion Recognition Using Physiological Signals

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Creative Commons

Abstract

Description

Keywords

Citation

URI

Collections

Related Materials

Alternate Version