Deep Learning Architectures for Enhanced Emotion Recognition from EEG and Facial Expressions

Soleimani, Sareh2024-03-082024-03-082024-03-08http://hdl.handle.net/10393/46012https://doi.org/10.20381/ruor-30202Human emotion plays a central role in human experiences that are associated with decision-making, interactions, and cognitive processes. Therefore, emotion recognition has become an important area of research in the field of affective computing for human-computer interactions (HCI). Particularly, there is a growing need for automatic human emotion recognition systems for different applications including robotics, games, surveillance, and healthcare. In this thesis, we aim to improve automated human emotion recognition methods using machine learning and deep learning approaches. We contribute three solutions for human emotional state recognition. First, we propose a hybrid emotion prediction model that extracts frequency and time domain information from Electroencephalogram (EEG) signals. The model is a cascade of deep learning networks consisting of a pre-trained convolutional neural network (CNN) and residual blocks of recurrent networks. The former extracts spatial features from the signal while the latter learns the temporal dynamics of multi-channel EEG signals and introduces shortcuts across neural layers to enhance the deep network's training efficiency. The proposed model is compared with existing state-of-the-art methods and achieves 0.61 and 0.63 accuracy on the validation and 0.65 and 0.68 accuracy on the test dataset for valence and arousal emotional dimensions, respectively for DEAP dataset. Second, we propose a novel framework we call the Contrastive Learning GAN-based Graph Neural Network to recognize emotion from EEG signals. The framework integrates self-supervised learning with supervised learning to capture high-quality EEG representations and overcome inter-subject and intra-subject emotion variabilities. We compare the proposed model with recent state-of-the-art emotion recognition models on the DEAP and MAHNOB datasets. The results show that the proposed model achieves a higher recognition performance over previous models with 0.64 and 0.66 as emotion classification accuracies on the test set of the DEAP dataset, 0.66 and 0.71 emotion classification accuracies on the test set of the MAHNOB-HCI dataset for the valence and arousal emotional dimensions, respectively. The proposed model also achieves 0.74, 0.74 and 0.74, 0.78 emotion classification accuracies on the training set of the DEAP and MAHNOB-HCI datasets for valence and arousal emotional dimensions, respectively. We conduct an in-depth examination of how each component of the proposed model contributes to enhancing the emotion recognition accuracy. Third, we propose a novel Transformer-based bimodal model using EEG and facial expression to perform emotion recognition. We deploy transformer encoders to integrate information across different frequency and data channel regions. We evaluate the proposed model using the DEAP and MAHNOB-HCI datasets. Our experimental results demonstrate that the proposed model surpasses existing techniques, achieving 0.66, 0.72 and 0.65, 0.66 in addition to 0.69, 0.73 and 0.61, 0.68 bimodal accuracies on the DEAP and MAHNOB-HCI training and testing datasets for the valence and arousal emotional dimensions, respectively.enMachine LearningDeep LearningData AugmentationAffective ComputingDeep Learning Architectures for Enhanced Emotion Recognition from EEG and Facial ExpressionsThesis