Conference Agenda

Overview and details of the sessions of this conference. Please register as a participant for the conference (free!) and then Login in order to have access to downloads in the detailed view. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view.

Session Overview
ASS-P: Audio Source Separation
Tuesday, 22/Sept/2020:
6:25pm - 6:45pm

Session Chair: Julio Carabias Orti
Location: Virtual platform

6:25pm - 6:30pm

Audio-Fingerprinting via Dictionary Learning

Chrisitna Saravanos, Dimitris Ampeliotis, Kostas Berberidis

University of Patras, Greece

In recent years, several successful schemes have been proposed to solve the song identification problem. These techniques aim to construct a signal’s audio-fingerprint by either employing conventional signal processing techniques or by computing its sparse representation in the time-frequency domain. This paper proposes a new audio-fingerprinting scheme which is able to construct a unique and concise representation of an audio signal by applying a dictionary, which is learnt here via the well-known K-SVD algorithm applied on a song database. The promising results which emerged while conducting the experiments suggested that, not only the proposed approach preformed rather well in its attempt to identify the signal content of several audio clips –even in cases this content had been distorted by noise - but also surpassed the recognition rate of a Shazam-based paradigm.

Saravanos-Audio-Fingerprinting via Dictionary Learning-232.pdf

6:30pm - 6:35pm

Audio Inpainting based on Self-similarity for Sound Source Separation Applications

Dan Barry, Alessandro Ragano, Andrew Hines

University College Dublin, Ireland

Sound source separation algorithms have advanced significantly in recent years but many algorithms can suffer from objectionable artefacts. The artefacts include phasiness, transient smearing, high frequency loss, unnatural sounding noise floor and reverberation to name a few. One of the main reasons for this is due to the fact that in many algorithm, individual time-frequency bins are often only attributed to one source at a time, meaning that many time-frequency bins will be set to zero for a separated source. This leads to an impressive signal to interference ratio but at the cost of natural sounding resynthesis. Here, we present a simple algorithm capable of audio inpainting based on self-similarity within the signal. The algorithm attempts to use the non-zero bin values observed in similar frames as substitutes for the zero bin values in the current analysis frame. We present results from subjective listening tests which show a preference for the inpainted audio over the original audio produced from a simple source separation algorithm. Further, we use the Fréchet Audio Distance metric to evaluate the perceptual effect of the proposed inpainting algorithm. The results of this evaluation support the subjective test preferences.

Barry-Audio Inpainting based on Self-similarity for Sound Source Separation Applications-178.pdf

6:35pm - 6:40pm

Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis

Fangchen Feng, Azeddine Beghdadi

Université Sorbonne Paris Nord, France

In this paper, we propose a new formulation for the blind source separation problem for audio signals with convolutive mixtures to improve the separation performance of Independent Vector Analysis (IVA). The proposed method benefits from both the recently investigated convolutive approximation model and the IVA approaches that take advantage of the cross-band information to avoid permutation alignment. We first exploit the link between the IVA and the Sparse Component Analysis (SCA) methods through the structured sparsity. We then propose a new framework by combining the convolutive narrowband approximation and the Windowed- Group-Lasso (WGL). The optimization of the model is based on the alternating optimization approach where the convolutive kernel and the source components are jointly optimized. The proposed approach outperforms the existing methods through numerical evaluations in terms of objective measures.

Feng-Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis-202.pdf

6:40pm - 6:45pm

Solos: A Dataset for Audio-Visual Music Analysis

Juan F. Montesinos, Olga Slizovskaia, Gloria Haro

Universitat Pompeu Fabra

In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind

source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual selfsupervised task. These videos, gathered from YouTube, consist of solo musical performances of 13 different instruments. Compared to previously proposed audio-visual datasets, Solos is cleaner since a big amount of its recordings are auditions and manually checked recordings, ensuring there is no background noise nor effects added in the video post-processing. Besides, it is, up to the best of our knowledge, the only dataset that contains the whole

set of instruments present in the URMP [1] dataset, a highquality dataset of 44 multi-instrument audio-visual recordings of classical music pieces with individual audio tracks. URMP was intented to be used for source separation, thus, we evaluate the

performance on the URMP dataset of two different BSS models trained on Solos

Montesinos-Solos A Dataset for Audio-Visual Music Analysis-236.pdf