What, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learning

Rahman, Md Atiqur

What, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learning

dc.contributor.author	Rahman, Md Atiqur
dc.contributor.supervisor	Laganière, Robert
dc.date.accessioned	2023-12-06T18:45:23Z
dc.date.available	2023-12-06T18:45:23Z
dc.date.issued	2023-12-06	en_US
dc.description.abstract	Over the past decade, there has been an explosion in the volume of video data, including internet videos and surveillance camera footage. These videos often feature extended durations with unedited content, predominantly filled with background clutter, while the relevant activities of interest occupy only a small portion of the footage. Consequently, there is a compelling need for advanced processing techniques to automatically analyze this vast reservoir of video data, specifically with the goal of identifying the segments that contain the events of interest. Given that humans are the primary subjects in these videos, comprehending human activities plays a pivotal role in automated video analysis. This thesis seeks to tackle the challenge of detecting human activities from untrimmed videos, aiming to classify and pinpoint these activities both in their spatial and temporal dimensions. To achieve this, we propose a modular approach. We begin by developing a temporal activity detection framework, and then progressively extend the framework to support activity detection in the spatio-temporal dimension. To perform temporal activity detection, we introduce an end-to-end trainable deep learning model leveraging 3D convolutions. Additionally, we propose a novel and adaptable fusion strategy to combine both the appearance and motion information extracted from a video, using RGB and optical flow frames. Importantly, we incorporate the learning of this fusion strategy into the activity detection framework. Building upon the temporal activity detection framework, we extend it by incorporating a spatial localization module to enable activity detection both in space and time in a holistic end-to-end manner. To accomplish this, we leverage shared spatio-temporal feature maps to jointly optimize both spatial and temporal localization of activities, thus making the entire pipeline more effective and efficient. Finally, we introduce several novel techniques for modeling actor motion, specifically designed for efficient activity recognition. This is achieved by harnessing 2D pose information extracted from video frames and then representing human motion through bone movement, bone orientation, and body joint positions. Our experimental evaluations, conducted using benchmark datasets, showcase the effectiveness of the proposed temporal and spatio-temporal activity detection methods when compared to the current state-of-the-art methods. Moreover, the proposed motion representations excel in both performance and computational efficiency. Ultimately, this research shall pave the way forward towards imbuing computers with social visual intelligence, enabling them to comprehend human activities in any given time and space, opening up exciting possibilities for the future.	en_US
dc.identifier.uri	http://hdl.handle.net/10393/45709
dc.identifier.uri	http://dx.doi.org/10.20381/ruor-29913
dc.language.iso	en	en_US
dc.publisher	Université d'Ottawa / University of Ottawa	en_US
dc.subject	Machine Learning	en_US
dc.subject	Deep Learning	en_US
dc.subject	Human Activity Detection	en_US
dc.subject	Untrimmed Video Analysis	en_US
dc.subject	Pose-based Motion Modeling	en_US
dc.title	What, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learning	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Génie / Engineering	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	PhD	en_US
uottawa.department	Science informatique et génie électrique / Electrical Engineering and Computer Science	en_US

Fichiers

Trousse originale

Voici les éléments 1 - 1 sur 1

Nom:: Rahman_Md_Atiqur_2023_thesis.pdf
Taille:: 12.06 MB
Format:: Adobe Portable Document Format
Description:

Télécharger

Trousse de licence

Voici les éléments 1 - 1 sur 1

Nom:: license.txt
Taille:: 6.65 KB
Format:: Item-specific license agreed upon to submission
Description:

Télécharger

Collections

- Thèses, 2011 - // Theses, 2011 -