Title: Applications of information geometry to audio signal processing
Abstract: In this talk, we present some applications of information geometry to audio signal processing. We seek a comprehensive framework that allows to quantify, process and represent the information contained in audio signals. In digital audio, a sound signal is generally encoded as a waveform, and a common problematic is to extract relevant information about the signal by computing sound features from this waveform. A key issue in this context is then to bridge the gap between the raw signal or low-level features (e.g. attack time, frequency content), and the symbolic properties or high-level features (e.g. speaker, instrument, music genre). We address this issue by employing the theoretical framework of information geometry. In general terms, information geometry is a field of mathematics that studies the notions of probability and of information by the way of differential geometry [1]. The main idea is to analyze the geometrical structure of differential manifold owned by certain families of probability distributions which form a statistical manifold. We aim to investigate the intrinsic geometry of families of probability distributions that represent audio signals, and to manipulate informative entities of sounds within this geometry. We focus on the statistical manifolds related to exponential families. Exponential families are parametric families of probability distributions that encompass most of the distributions commonly used in statistical learning. Moreover, exponential families equipped with the dual exponential and mixture affine connections possess two dual affine coordinate systems, respectively the natural and the expectation parameters. The underlying dually flat geometry exhibits a strong Hessian dualistic structure, induced by a twice differentiable convex function, called potential, together with its Legendre-Fenchel conjugate. This geometry generalizes the standard self-dual Euclidean geometry, with two dual Bregman divergences instead of the self-dual Euclidean distance, as well as dual geodesics, a generalized Pythagorean theorem and dual projections. However, the Bregman divergences are generalized distances that are not symmetric and do not verify the triangular inequality in general. From a computational viewpoint, several machine learning algorithms that rely on strong metric properties possessed by the Euclidean distance are therefore not suitable anymore. Yet, recent works have proposed to generalize some of these algorithms to the case of exponential families and of their associated Bregman
Publication Year: 2011
Publication Date: 2011-09-01
Language: en
Type: preprint
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot