Repository logo

Multi-modal Feature Fusion Using Full Sequences for Dynamic Hand Gesture Recognition with Simulated Robotic Arm Control

Loading...
Thumbnail ImageThumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Université d'Ottawa | University of Ottawa

Creative Commons

Attribution 4.0 International

Abstract

Dynamic hand gesture recognition (DHGR) enables accessible human-robot interaction by interpreting sequential human hand movements rather than static poses. Previous DHGR systems only focused on using the RGB modality in datasets and ignored depth. This thesis addresses this issue using a multi-modal classifier preserving temporal integrity. The InceptionV3-LSTM architecture is recreated, using a public RGB-depth dataset of six dynamic gestures. Full 40-frame sequences are used along with stratified 5-fold cross-validation to prevent sequences splitting across folds. The feature extraction pipeline fuses visual and landmark features from both RGB and depth modalities in parallel InceptionV3 streams, feeding a stacked LSTM-RNN. The results demonstrate that overfitting is reduced when using full-sequence multi-modal training, with validation loss decreasing while exceeding RGB-only accuracy. This work contributes a multi-modal pipeline for DHGR that is implemented in a simulated robotic control application.

Description

Keywords

dynamic hand gesture recognition, multi-modal fusion, long short-term memory, full sequence data splitting, rgb modality, depth modality, convolutional neural network

Citation

Related Materials

Alternate Version