Temporal Pyramid Structure for Video Frame Interpolation
En cours de chargement...
Date
Authors
Nom de la revue
ISSN de la revue
Titre du volume
Éditeur
Université d'Ottawa | University of Ottawa
Résumé
The most prevalent structure in video frame interpolation involves using optical flow to guide frame warping, which typically considers only the two adjacent frames. However, these methods often fail to capture long-range temporal dependencies and often result in significant deformation in complex motion scenarios. We propose a novel Temporal Pyramid Attention (TPA) block, which employs a temporal pyramid structure to connect four frames within a sliding window for the generation of intermediate frames. The temporal pyramid structure consists of three layers to leverage multi-level features, estimate the frame window, and connect with a GRU to generate a bi-directional feature flow. Furthermore, the dual pyramid structure incorporates channel attention mechanisms, enabling the interpolation of three frames in a single process. The TPA block employs a multi-scale approach to effectively capture temporal dependencies and spatial correlations, enhancing the quality of interpolated frames. Our model achieves a state-of-the-art performance on the Vimeo90K septuplet dataset compared to existing methods using pre-trained parameters.
Description
Mots-clés
Deep learning, Video frame interpolation, Gated recurrent unit, Knowledge distillation, Temporal feature extraction

