Fusion of Multimodal Information in Music Content Analysis

Authors Slim Essid, Gaël Richard

Thumbnail PDF


  • Filesize: 1.48 MB
  • 16 pages

Document Identifiers

Author Details

Slim Essid
Gaël Richard

Cite AsGet BibTex

Slim Essid and Gaël Richard. Fusion of Multimodal Information in Music Content Analysis. In Multimodal Music Processing. Dagstuhl Follow-Ups, Volume 3, pp. 37-52, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)


Music is often processed through its acoustic realization. This is restrictive in the sense that music is clearly a highly multimodal concept where various types of heterogeneous information can be associated to a given piece of music (a musical score, musicians' gestures, lyrics, user-generated metadata, etc.). This has recently led researchers to apprehend music through its various facets, giving rise to "multimodal music analysis" studies. This article gives a synthetic overview of methods that have been successfully employed in multimodal signal analysis. In particular, their use in music content processing is discussed in more details through five case studies that highlight different multimodal integration techniques. The case studies include an example of cross-modal correlation for music video analysis, an audiovisual drum transcription system, a description of the concept of informed source separation, a discussion of multimodal dance-scene analysis, and an example of user-interactive music analysis. In the light of these case studies, some perspectives of multimodality in music processing are finally suggested.
  • Multimodal music processing
  • music signals indexing and transcription
  • information fusion
  • audio
  • video


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail