Multimodal Emotion Recognition Using Physiological Signals
| dc.contributor.author | Dhothar, Mehakdeep Kaur | |
| dc.contributor.supervisor | Bolić, Miodrag | |
| dc.date.accessioned | 2025-11-19T19:37:06Z | |
| dc.date.available | 2025-11-19T19:37:06Z | |
| dc.date.issued | 2025-11-19 | |
| dc.description.abstract | Affective computing aims to develop systems capable of recognizing and interpreting human emotions, yet existing multimodal datasets frequently suffer from limitations such as poor signal quality, high inter-subject variability, and inconsistent evaluation protocols. To address these gaps, this thesis develops and validates a comprehensive framework for multimodal emotion recognition using physiological signals - Electrocardiogram (ECG), Electrodermal Activity (EDA), and Respiration (RSP) - augmented with speech-based representations. The goal was to establish standardized preprocessing workflows, rigorous signal quality assessment (SQA), and reproducible baseline experiments to support the development and technical validation of a large-scale physiological dataset. This framework was applied to a dataset collected from 99 participants, containing synchronized physiological recordings, speech responses, and self-reported emotional annotations during exposure to validated video stimuli. To ensure data integrity, a rigorous SQA and artifact-removal pipeline was applied across modalities, integrating established ECG and respiration metrics with newly designed EDA-specific indicators. Using this refined dataset, multiple emotion-classification experiments were conducted under a strict subject-independent evaluation protocol, comparing fixed 30-second windows with emotion-triggered temporal segments. Across all tasks - binary arousal, binary valence, and multiclass emotion recognition - trigger-based segments consistently produced clearer and more discriminative physiological patterns. Random Forest achieved the strongest overall performance, including 78.8% multiclass accuracy using physiological features alone. To explore multimodal enhancement, speech embeddings were fused with handcrafted physiological features. This early-fusion approach led to substantial improvements across all tasks, most notably increasing multiclass accuracy from 78.8% to 97% when using trigger-based segments. These findings demonstrate that speech provides complementary affective information that enhances physiological representations. A subject-wise evaluation was also conducted to examine emotion separability across individuals and to identify video-specific misclassification patterns that reveal how different stimuli elicit varying physiological responses. Overall, this thesis delivers a validated multimodal dataset, reproducible processing pipelines, and strong baseline benchmarks that provide a solid foundation for future research in physiological and multimodal emotion recognition. | |
| dc.identifier.uri | http://hdl.handle.net/10393/51064 | |
| dc.identifier.uri | https://doi.org/10.20381/ruor-31529 | |
| dc.language.iso | en | |
| dc.publisher | Université d'Ottawa / University of Ottawa | |
| dc.rights | Attribution-NonCommercial 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject | Machine Learning | |
| dc.subject | Affect Recognition | |
| dc.subject | Multimodal Machine Learning | |
| dc.subject | Affective Computing | |
| dc.subject | Emotion Recognition | |
| dc.title | Multimodal Emotion Recognition Using Physiological Signals | |
| dc.type | Thesis | en |
| thesis.degree.discipline | Génie / Engineering | |
| thesis.degree.level | Masters | |
| thesis.degree.name | MASc | |
| uottawa.department | Science informatique et génie électrique / Electrical Engineering and Computer Science |
