Temporal Pyramid Structure for Video Frame Interpolation

Yang, Jiaqi2025-01-092025-01-092025-01-09http://hdl.handle.net/10393/50062https://doi.org/10.20381/ruor-30831The most prevalent structure in video frame interpolation involves using optical flow to guide frame warping, which typically considers only the two adjacent frames. However, these methods often fail to capture long-range temporal dependencies and often result in significant deformation in complex motion scenarios. We propose a novel Temporal Pyramid Attention (TPA) block, which employs a temporal pyramid structure to connect four frames within a sliding window for the generation of intermediate frames. The temporal pyramid structure consists of three layers to leverage multi-level features, estimate the frame window, and connect with a GRU to generate a bi-directional feature flow. Furthermore, the dual pyramid structure incorporates channel attention mechanisms, enabling the interpolation of three frames in a single process. The TPA block employs a multi-scale approach to effectively capture temporal dependencies and spatial correlations, enhancing the quality of interpolated frames. Our model achieves a state-of-the-art performance on the Vimeo90K septuplet dataset compared to existing methods using pre-trained parameters.enAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/Deep learningVideo frame interpolationGated recurrent unitKnowledge distillationTemporal feature extractionTemporal Pyramid Structure for Video Frame InterpolationThesis