Conference Agenda

LIC-GC: Grand Challenge on Learning-based Image Coding
Thursday, 24/Sept/2020:
9:40am - 10:40am

Session Chair: Touradj Ebrahimi
Location: Virtual platform

9:40am - 9:55am

Overview of the grand challenge

Joao Ascenso

Instituto Superior Técnico, Portugal

This serves as an introduction and an overview of the grand challenge on Learning-based Image Coding Challenge

9:55am - 10:10am

A Hybrid Layered Image Compressor with Deep-Learning Technique

Wei-Cheng Lee1, Chih-Peng Chang1, Wen-Hsiao Peng1,2, Hsueh-Ming Hang1,2

1National Chiao Tung University; 2Pervasive AI Research (PAIR) Labs

This paper presents a detailed description of NCTU's proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.

Lee-A Hybrid Layered Image Compressor with Deep-Learning Technique-292.pdf

10:10am - 10:25am

Learned Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution

Jianping Lin1,2, Mohammad Akbari2, Haisheng Fu3, Qian Zhang3, Shang Wang3, Jie Liang2, Dong Liu1, Feng Liang3, Guohe Zhang3, Chengjie Tu4

1University of Science and Technology of China; 2Simon Fraser University, Canada; 3Xi’an Jiaotong University, China; 4Tencent Technologies, China

In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the auto-encoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to BPG.

Lin-Learned Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution-290.pdf

10:25am - 10:40am

L2C -- Learning to Learn to Compress

Nannan Zou1, Honglei Zhang2, Francesco Cricri2, Hamed Tavakoli2, Jani Lainema2, Miska Hannuksela2, Emre Aksu2, Esa Rahtu1

1Tampere University; 2Nokia Technologies, Finland

In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or benevolent overfitting to the input content. In order to reduce the gap between training and inference conditions, we propose a new training paradigm for learned image compression, which is based on meta-learning. In a first phase, the neural networks are trained normally. In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance. Furthermore, after meta-learning, we propose to overfit and cluster the bias terms of the decoder on training image patches, so that at inference time the optimal content-specific bias terms can be selected at encoder-side. Finally, we propose a new probability model for lossless compression, which combines concepts from both multi-scale and super-resolution probability model approaches. We show the benefits of all our proposed ideas via carefully designed experiments.

Zou-L2C -- Learning to Learn to Compress-293.pdf