Repository logo

What, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learning

dc.contributor.authorRahman, Md Atiqur
dc.contributor.supervisorLaganière, Robert
dc.date.accessioned2023-12-06T18:45:23Z
dc.date.available2023-12-06T18:45:23Z
dc.date.issued2023-12-06en_US
dc.description.abstractOver the past decade, there has been an explosion in the volume of video data, including internet videos and surveillance camera footage. These videos often feature extended durations with unedited content, predominantly filled with background clutter, while the relevant activities of interest occupy only a small portion of the footage. Consequently, there is a compelling need for advanced processing techniques to automatically analyze this vast reservoir of video data, specifically with the goal of identifying the segments that contain the events of interest. Given that humans are the primary subjects in these videos, comprehending human activities plays a pivotal role in automated video analysis. This thesis seeks to tackle the challenge of detecting human activities from untrimmed videos, aiming to classify and pinpoint these activities both in their spatial and temporal dimensions. To achieve this, we propose a modular approach. We begin by developing a temporal activity detection framework, and then progressively extend the framework to support activity detection in the spatio-temporal dimension. To perform temporal activity detection, we introduce an end-to-end trainable deep learning model leveraging 3D convolutions. Additionally, we propose a novel and adaptable fusion strategy to combine both the appearance and motion information extracted from a video, using RGB and optical flow frames. Importantly, we incorporate the learning of this fusion strategy into the activity detection framework. Building upon the temporal activity detection framework, we extend it by incorporating a spatial localization module to enable activity detection both in space and time in a holistic end-to-end manner. To accomplish this, we leverage shared spatio-temporal feature maps to jointly optimize both spatial and temporal localization of activities, thus making the entire pipeline more effective and efficient. Finally, we introduce several novel techniques for modeling actor motion, specifically designed for efficient activity recognition. This is achieved by harnessing 2D pose information extracted from video frames and then representing human motion through bone movement, bone orientation, and body joint positions. Our experimental evaluations, conducted using benchmark datasets, showcase the effectiveness of the proposed temporal and spatio-temporal activity detection methods when compared to the current state-of-the-art methods. Moreover, the proposed motion representations excel in both performance and computational efficiency. Ultimately, this research shall pave the way forward towards imbuing computers with social visual intelligence, enabling them to comprehend human activities in any given time and space, opening up exciting possibilities for the future.en_US
dc.identifier.urihttp://hdl.handle.net/10393/45709
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-29913
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectMachine Learningen_US
dc.subjectDeep Learningen_US
dc.subjectHuman Activity Detectionen_US
dc.subjectUntrimmed Video Analysisen_US
dc.subjectPose-based Motion Modelingen_US
dc.titleWhat, When, and Where Exactly? Human Activity Detection in Untrimmed Videos Using Deep Learningen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelDoctoralen_US
thesis.degree.namePhDen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Rahman_Md_Atiqur_2023_thesis.pdf
Size:
12.06 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: