ASP-P: Audio signal processing
6:25pm - 6:30pm
Reversible Degradation for Item Validation on MP3 Digital Files
1Centro Universitário IESB; 2Universidad Peruana de Ciencias Aplicadas; 3Samsung R&D Institute Brazil; 4Banco de Crédito del Perú
This paper proposes an item validation implementation for digital MP3 audio files using the reversible degradation definition. Digital goods provide absolutely no guarantee to a buyer about the purchase outcome. Reversible degradation on MP3 files generates a degraded audio file for item validation and redundancy bytes for its further correction. This degraded file preserves its audible and meaningful properties, turning this method an item validation solution that does not require a third trusted part judgment. The reversible degradation algorithm used in this work is based on the 16-bit symbol systematic Reed-Solomon error correction code. Reed-Solomon redundancy bytes are sent from the seller to the buyer to recover the original MP3 file, if and only if, the buyer is authorized to retrieve the full version of the degraded and audible MP3 file. The transmission of the Reed-Solomon redundancy bytes, instead of the original MP3 file, saves around 91% data in the transit link for 128 Kbps, 192 Kbps, and 320 Kbps MP3 bit rates compared to the retransmission of the entire file. The architecture structure and entropy data is evaluated to show the prototype feasibility.
6:30pm - 6:35pm
Automatic Gain Control for Enhanced HDR Performance on Audio
MannLab Canada, 330 Dundas Street West, Toronto, Ontario, M5T 1G5
We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver's dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composted signal. The results validate that the benefits given by each method are compounded when they are used together. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.
6:35pm - 6:40pm
An Evolutionary-based Generative Approach for Audio Data Augmentation
Augsburg University, Germany
In this paper, we introduce a novel framework to augment raw audio data for machine learning classification tasks.
For the first part of our framework, we employ a generative adversarial network (GAN) to create new variants of the audio samples that are already existing in our source dataset for the classification task.
In the second step, we then utilize an evolutionary algorithm to search the input domain space of the previously trained GAN, with respect to predefined characteristics of the generated audio.
This way we are able to generate audio in a controlled manner that contributes to an improvement in classification performance of the original task.
To validate our approach, we chose to test it on the task of soundscape classification.
We show that our approach leads to a substantial improvement in classification results when compared to a training routine without data augmentation and training with uncontrolled data augmentation with GANs.
6:40pm - 6:45pm
Wavelet Scattering Transform and CNN for Closed Set Speaker Identification
1CNRS NormaSTIC, France; 2GREYC,UNICAEN, France
In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.