Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Cyrus Vahidi; Han Han; Changhong Wang; Mathieu Lagrange; György Fazekas; Vincent Lostanlen

Article Dans Une Revue Journal of the Audio Engineering Society Année : 2023

Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Mésostructures : au-delà de la perte spectrale en analyse temps-fréquence différentiable

(1) , (2) , (2) , (3) , (1) , (2)

1
2
3

Cyrus Vahidi

Fonction : Auteur correspondant
PersonId : 1259680

Connectez-vous pour contacter l'auteur

Queen Mary University of London

Han Han

Fonction : Auteur

Laboratoire des Sciences du Numérique de Nantes

Changhong Wang

Fonction : Auteur

Laboratoire des Sciences du Numérique de Nantes

Mathieu Lagrange

Fonction : Auteur
PersonId : 4329
IdHAL : mathieu-lagrange

Laboratoire des Sciences du Numérique de Nantes

György Fazekas

Fonction : Auteur

Queen Mary University of London

Vincent Lostanlen

Fonction : Auteur
PersonId : 749246
IdHAL : lostanlen
ORCID : 0000-0003-0580-1651
IdRef : 203022769

Laboratoire des Sciences du Numérique de Nantes

Résumé

Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in recent applications of deep learning to the analysis and synthesis of musical audio. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure: i.e., local amplitude variations up to 100 milliseconds or so. In this paper, we formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. We empirically demonstrate that time-frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, we motivate the need for a time-invariant and multiscale differentiable time-frequency model of similarity at the level of both local spectra and spectrotemporal modulations.

Domaines

Son [cs.SD] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

JAES_2023___Mesostructures__Beyond_Spectrogram_Loss_in_Differentiable_Time_Frequency_Analysis (1).pdf (8.7 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Vincent Lostanlen : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04118474

Soumis le : mardi 6 juin 2023-10:52:17

Dernière modification le : mardi 23 avril 2024-10:18:03

Archivage à long terme le : jeudi 7 septembre 2023-18:43:52

Dates et versions

hal-04118474 , version 1 (06-06-2023)

Identifiants

HAL Id : hal-04118474 , version 1

Citer

Cyrus Vahidi, Han Han, Changhong Wang, Mathieu Lagrange, György Fazekas, et al.. Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis. Journal of the Audio Engineering Society, 2023, 71 (9), pp.577-585. ⟨hal-04118474⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS INRIA EC-NANTES UNAM LS2N LS2N-SIMS NANTES-UNIVERSITE

39 Consultations

22 Téléchargements

Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Mésostructures : au-delà de la perte spectrale en analyse temps-fréquence différentiable

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager