Stochastic Dynamics for Video Infilling


Abstract

In this paper, we introduce a stochastic generation framework (SDVI) to infill long intervals in video sequences. To enhance the temporal resolution, video interpolation aims to produce transitional frames for a short interval between every two frames. Video Infilling, however, aims to complete long intervals in a video sequence. Our framework models the infilling as a constrained stochastic generation process and sequentially samples dynamics from the inferred distribution. SDVI consists of two parts: (1) a bi-directional constraint propagation to guarantee the spatial-temporal coherency among frames, (2) a stochastic sampling process to generate dynamics from the inferred distributions. Experimental results show that SDVI can generate clear and varied sequences. Moreover, motions in the generated sequence are realistic and able to transfer smoothly from the referenced start frame to the end frame.


The Training and The Inference of Our Model

Training Inference

[GitHub] Code will be released after the paper review process


Paper

Anonymous,Anonymous,Anonymous,Anonymous
Stochastic Video Long-term Interpolation.
arXiv (preprint)

[Bibtex]


Video Results

Here We show more qualitative results of the UCF-101 SMMNIST, KTH and BAIR Robot Arm Pushing. For each sequence (every 3 rows and 3 columns) except for UCF101, the 1st column is the 8th ground truth frame and the 3rd row is the 16th ground truth frame. We set opacity as 0.7 for these two columns to emphaize the second column, our video result.


Section 1: UCF101: Increase from 2 fps to 16 fps
(the noise dot is due to the gif file creation, we connect 1 ground truth frame and 7 generated frames and then next 1 ground truth frame, etc.)












Section 2: BAIR Robot Arm Pushing (64*64 Increase temporal resolution by 9 times)
Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:



Section 3: KTH (128*128 Increase temporal resolution by 8 times)
Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:



Section 4: BAIR Robot Arm Pushing (64*64 Increase temporal resolution by 8 times)
Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:



Section 5: KTH Action (64*64 Increase temporal resolution by 8 times)
Ground truth:

Our odel random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:



Section 6: Stochastic Moving MNIST (64*64 Increase temporal resolution by 8 times)
Ground truth:

Our odel random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample:

Ground truth:

Our model random sample:

Our model best SSIM sample: