Repository logo

Real-Time Video Object Detection with Temporal Feature Aggregation

dc.contributor.authorChen, Meihong
dc.contributor.supervisorLang, Jochen
dc.date.accessioned2021-10-05T18:00:57Z
dc.date.available2021-10-05T18:00:57Z
dc.date.issued2021-10-05en_US
dc.description.abstractIn recent years, various high-performance networks have been proposed for single-image object detection. An obvious choice is to design a video detection network based on state-of-the-art single-image detectors. However, video object detection is still challenging due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. In this thesis, we design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. We utilize Yolov3 as the base detector. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our temporal network utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our multi-scale detector and multi-scale temporal network communicate at each scale and also across scales. The number of inputs of our temporal network can be either 4, 8, or 16 frames in this thesis and correspondingly we name our temporal network TemporalNet-4, TemporalNet-8 and TemporalNet-16. Our approach achieves 77.1\% mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9\% mAP which is a competitive result on this video object detection benchmark. Our network is also real-time with a running time of 35ms/frame.en_US
dc.identifier.urihttp://hdl.handle.net/10393/42790
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-27007
dc.language.isoenen_US
dc.publisherUniversité d'Ottawa / University of Ottawaen_US
dc.subjectAttention Mechanismen_US
dc.subjectAP3Den_US
dc.subjectCNNen_US
dc.subjectOctave Convolutionen_US
dc.subjectOne-Stage Detectionen_US
dc.subjectVideo Object Detectionen_US
dc.titleReal-Time Video Object Detection with Temporal Feature Aggregationen_US
dc.typeThesisen_US
thesis.degree.disciplineGénie / Engineeringen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMCSen_US
uottawa.departmentScience informatique et génie électrique / Electrical Engineering and Computer Scienceen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Chen_Meihong_2021_thesis.pdf
Size:
14.66 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
6.65 KB
Format:
Item-specific license agreed upon to submission
Description: