DLM1-P: Deep Learning for Multimedia 1
10:50am - 10:55am
Parallelized Rate-Distortion Optimized Quantization Using Deep Learning
Qualcomm, Netherlands, The
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1. This scheme yields significant reductions in bit-rate at the expense of relatively small increases in distortion. Typically, RDOQ algorithms are prohibitively expensive to implement on real-time hardware encoders due to their sequential nature and their need to frequently obtain entropy coding costs. This work addresses this limitation using a neural network-based approach, which learns to trade-off rate and distortion during offline supervised training. As these networks are based solely on standard arithmetic operations that can be executed on existing neural network hardware, no additional area-on-chip needs to be reserved for dedicated RDOQ circuitry. We train two classes of neural networks, a fully-convolutional network and an auto-regressive network, and evaluate each as a post-quantization step designed to refine cheap quantization schemes such as scalar quantization (SQ). Both network architectures are designed to have a low computational overhead. After training they are integrated into the HM 16.20 implementation of HEVC, and their video coding performance is evaluated on a subset of the H.266/VVC SDR common test sequences. Comparisons are made to RDOQ and SQ implementations in HM 16.20. Our method outperforms the SQ baseline, and on average reaches 45% of the performance of the iterative HM RDOQ algorithm.
10:55am - 11:00am
Deep Learning Off-the-shelf Holistic Feature Descriptors for Visual Place Recognition in Challenging Conditions
Tampere University, Finland
In this paper, we present a comprehensive study on the utility of deep learning feature extraction methods for visual place recognition task in three challenging conditions, appearance variation, viewpoint variation and combination of both appearance and viewpoint variation. We extensively compared the performance of convolutional neural network architectures with batch normalization layers in terms of fraction of the correct matches. These architectures are primarily trained for image classification and object detection problems and used as holistic feature descriptors for visual place recognition task. To verify effectiveness of our results, we utilized four real world datasets in place recognition. Our investigation demonstrates that convolutional neural network architectures coupled with batch normalization and trained for other tasks in computer vision outperform architectures which are specifically designed for place recognition tasks.
11:00am - 11:05am
Learned BRIEF -- transferring the knowledge from hand-crafted to learning-based descriptors
Ghent University, Belgium
In this paper, we present a novel approach for designing local image descriptors that learn from data and from hand-crafted descriptors. In particular, we construct a learning model that first mimics the behaviour of a hand-crafted descriptor and then learns to improve upon it in an unsupervised manner. We demonstrate the use of this knowledge-transfer framework by constructing the learned BRIEF descriptor based on the well-known hand-crafted descriptor BRIEF. We implement our learned BRIEF with a convolutional autoencoder architecture. Evaluation on the HPatches benchmark for local image descriptors shows the effectiveness of the proposed approach in the tasks of patch retrieval, patch verification, and image matching.