Conference Agenda

Overview and details of the sessions of this conference. Please register as a participant for the conference (free!) and then Login in order to have access to downloads in the detailed view. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

Session Overview
DLM3-P: Deep Learning for Multimedia 3
Thursday, 24/Sept/2020:
5:25pm - 5:40pm

Session Chair: Esa Rahtu
Location: Virtual platform

5:25pm - 5:30pm

Generalized Operational Classifiers for Material Identification

Xiaoyue Jiang1, Ding Wang1, Dat Thanh Tran2, Serkan Kiranyaz3, Moncef Gabbouj2, Xiaoyi Feng1

1Northwestern Polytechnical University, China, People's Republic of; 2Tampere University, Finland; 3Qatar University, Qatar

Material is one of the intrinsic features of objects, and consequently material recognition plays an important role in image understanding. The same material may have various shapes and appearance, while keeping the same physical characteristic. This brings great challenges for material recognition. Besides suitable features, a powerful classifier also can improve the overall recognition performance. Due to the limitations of classical linear neurons, used in all shallow and deep neural networks, such as CNN, we propose to apply the generalized operational neurons to construct a classifier adaptively. These generalized operational perceptrons (GOP) contain a set of linear and nonlinear neurons, and possess a structure that can be built progressively. This makes GOP classifier more compact and can easily discriminate complex classes. The experiments demonstrate that GOP networks trained on a small portion of the data (4%) can achieve comparable performances to state-of-the-arts models trained on much larger portions of the dataset.

Jiang-Generalized Operational Classifiers for Material Identification-217.pdf

5:30pm - 5:35pm

Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest

Kevin Riou1, Jingwen Zhu1, Suiyi Ling1,2, Mathis Piquet1, Vincent Truffault3, Patrick Le Callet1

1University of Nantes; 2CAPACITÉS, France; 3CTIFL

Confinement during COVID-19 has caused serious effects on agriculture all over the world. As one of the efficient solutions, mechanical harvest/auto-harvest that is based on object

detection and robotic harvester becomes an urgent need. Within the auto-harvest system, robust few-shot object detection model is one of the bottlenecks, since the system is required to deal with new vegetable/fruit categories and the collection of large-scale annotated datasets for all the novel categories is expensive. There are many few-shot object detection models that were developed by the community. Yet, whether they could be employed directly for real life agricultural applications is still questionable, as there is a context-gap between the commonly used training datasets and the images collected in real life agricultural scenarios. To this end, in this study, we present a novel cucumber dataset and propose two data augmentation strategies that help to bridge the context-gap. Experimental results show that 1) the state-of-the-art few-shot object detection model performs poorly on the novel ‘cucumber’ category; and 2) the proposed augmentation

strategies outperform the commonly used ones.

Riou-Few-Shot Object Detection in Real Life-197.pdf

5:35pm - 5:40pm

Auto-Encoder based Structured Dictinoary Learning

Deyin Liu1, Lin Wu2, Liangchen Liu3, Qichang Hu4, Lin Qi1

1Zhengzhou University, China, People's Republic of; 2Hefei University of Technology, China, People's Republic of; 3The University of Queensland, Australia; 4Motovis Australia Pty Ltd Adelaide, Australia

Dictionary learning and deep learning are two popular representation learning paradigms, which can be combined to boost the classification task. However, existing combination methods often learn multiple dictionaries embedded in a cascade of layers, and a specialized classifier accordingly. This may inattentively lead to overfitting and high computational cost. In this paper, we present a deep auto-encoding architecture which is coupled with a dictionary layer to straightly produce a dictionary for classification. To empower the dictionary with discrimination, we construct the dictionary with class-specific sub-dictionaries, and introduce supervision by imposing category constraints. The proposed framework is inspired by a sparse optimization method, namely Iterative Shrinkage Thresholding Algorithm, which characterizes the learning process by the forward-propagation based optimization w.r.t the dictionary only. Extensive experiments demonstrate the effectiveness of our method in image classification.

Liu-Auto-Encoder based Structured Dictinoary Learning-235.pdf