Temporal Pyramid Structure for Video Frame Interpolation
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Université d'Ottawa | University of Ottawa
Abstract
The most prevalent structure in video frame interpolation involves using optical flow to guide frame warping, which typically considers only the two adjacent frames. However, these methods often fail to capture long-range temporal dependencies and often result in significant deformation in complex motion scenarios. We propose a novel Temporal Pyramid Attention (TPA) block, which employs a temporal pyramid structure to connect four frames within a sliding window for the generation of intermediate frames. The temporal pyramid structure consists of three layers to leverage multi-level features, estimate the frame window, and connect with a GRU to generate a bi-directional feature flow. Furthermore, the dual pyramid structure incorporates channel attention mechanisms, enabling the interpolation of three frames in a single process. The TPA block employs a multi-scale approach to effectively capture temporal dependencies and spatial correlations, enhancing the quality of interpolated frames. Our model achieves a state-of-the-art performance on the Vimeo90K septuplet dataset compared to existing methods using pre-trained parameters.
Description
Keywords
Deep learning, Video frame interpolation, Gated recurrent unit, Knowledge distillation, Temporal feature extraction
