Conference Agenda

Session
IVC3-O: Image and Video Compression 3
Time:
Wednesday, 23/Sept/2020:
4:15pm - 5:15pm

Session Chair: Lu Zhang
Location: Virtual platform

Presentations
4:15pm - 4:30pm
⭐ This paper has been nominated for the best paper award.

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Kristian Fischer, Fabian Brand, Christian Herglotz, André Kaup

Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49 % bitrate can be saved compared to the VTM-8.0 implementation in terms of Bjøntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up~9.95 % compared to conventional VTM.

Fischer-Video Coding for Machines with Feature-Based Rate-Distortion Optimization-140.pdf


4:30pm - 4:45pm

A Triangulation-Based Backward Adaptive Motion Field Subsampling Scheme

Fabian Brand1, Jürgen Seiler1, Elena Alshina2, André Kaup1

1Friedrich-Alexander-Universität Erlangen Nürnberg, Germany; 2Huawei Technologies Duesseldorf GmbH

Optical flow procedures are used to generate dense motion fields which approximate true motion. Such fields contain a large amount of data and if we need to transmit such a field, the raw data usually exceeds the raw data of the two images it was computed from. In many scenarios, however, it is of interest to transmit a dense motion field efficiently. Most prominently this is the case in inter prediction for video coding.

In this paper we propose a transmission scheme based on subsampling the motion field. Since a field which was subsampled with a regularly spaced pattern usually yields suboptimal results, we propose an adaptive subsampling algorithm that preferably samples vectors at positions where changes in motion occur. The subsampling pattern is fully reconstructable without the need for signaling of position information. We show an average gain of 2.95 dB in mean squared error compared to regular subsampling. Furthermore we show that an additional prediction stage can improve the results by an additional 0.43 dB, gaining 3.38 dB in total.

Brand-A Triangulation-Based Backward Adaptive Motion Field Subsampling Scheme-108.pdf


4:45pm - 5:00pm

Graph-based skeleton data compression

Pratyusha Das, Antonio Ortega

University of Southern California, United States of America

With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph Based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.

Das-Graph-based skeleton data compression-272.pdf


5:00pm - 5:15pm
⭐ This paper has been nominated for the best paper award.

Optical Flow and Mode Selection for Learning-based Video Coding

Théo Ladune1,2, Pierrick Philippe1, Wassim Hamidouche2, Lu Zhang2, Olivier Déforges2

1Orange, France; 2Univ. Rennes, INSA Rennes, CNRS, IETR, UMR 6164, France

This paper introduces a new method for inter-frame coding based on two

complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing

and conveying the optical flow and a pixel-wise coding mode selection. The

optical flow is used to perform a prediction of the frame to code. The

coding mode selection enables competition between direct copy of the prediction or

transmission through CodecNet.

The proposed coding scheme is assessed under the Challenge on Learned

Image Compression 2020 (CLIC20) P-frame coding track test conditions, where it

is shown to perform on par with the state-of-the-art video codec ITU/MPEG HEVC.

Moreover, the possibility of copying the prediction enables to learn the optical

flow in an actual end-to-end fashion i.e. without pre-training or

dedicated loss term.

Ladune-Optical Flow and Mode Selection for Learning-based Video Coding-113.pdf