LIC-GC: Grand Challenge on Learning-based Image Coding
9:40am - 9:55am
Overview of the grand challenge
Instituto Superior Técnico, Portugal
This serves as an introduction and an overview of the grand challenge on Learning-based Image Coding Challenge
9:55am - 10:10am
A Hybrid Layered Image Compressor with Deep-Learning Technique
1National Chiao Tung University; 2Pervasive AI Research (PAIR) Labs
This paper presents a detailed description of NCTU's proposal for learning-based image compression, in response to the JPEG AI Call for Evidence Challenge. The proposed compression system features a VVC intra codec as the base layer and a learning-based residual codec as the enhancement layer. The latter aims to refine the quality of the base layer via sending a latent residual signal. In particular, a base-layer-guided attention module is employed to focus the residual extraction on critical high-frequency areas. To reconstruct the image, this latent residual signal is combined with the base-layer output in a non-linear fashion by a neural-network-based synthesizer. The proposed method shows comparable rate-distortion performance to single-layer VVC intra in terms of common objective metrics, but presents better subjective quality particularly at high compression ratios. It consistently outperforms HEVC intra, JPEG 2000, and JPEG. The proposed system incurs 18M network parameters in 16-bit floating-point format. On average, the encoding of an image on Intel Xeon Gold 6154 takes about 13.5 minutes, with the VVC base layer dominating the encoding runtime. On the contrary, the decoding is dominated by the residual decoder and the synthesizer, requiring 31 seconds per image.
10:10am - 10:25am
Learned Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution
1University of Science and Technology of China; 2Simon Fraser University, Canada; 3Xi’an Jiaotong University, China; 4Tencent Technologies, China
In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the auto-encoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to BPG.
10:25am - 10:40am
L2C -- Learning to Learn to Compress
1Tampere University; 2Nokia Technologies, Finland
In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or benevolent overfitting to the input content. In order to reduce the gap between training and inference conditions, we propose a new training paradigm for learned image compression, which is based on meta-learning. In a first phase, the neural networks are trained normally. In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance. Furthermore, after meta-learning, we propose to overfit and cluster the bias terms of the decoder on training image patches, so that at inference time the optimal content-specific bias terms can be selected at encoder-side. Finally, we propose a new probability model for lossless compression, which combines concepts from both multi-scale and super-resolution probability model approaches. We show the benefits of all our proposed ideas via carefully designed experiments.