Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing

Identifieur interne : 000010 ( PascalFrancis/Curation ); précédent : 000009; suivant : 000011

A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing

Auteurs : Meinard Müller [Allemagne] ; NANZHU JIANG [Allemagne] ; Peter Grosche [Allemagne]

Source :

RBID : Pascal:13-0149238

Descripteurs français

English descriptors

Abstract

The automatic extraction of structural information from music recordings constitutes a central research topic. In this paper, we deal with a subproblem of audio structure analysis called audio thumbnailing with the goal to determine the audio segment that best represents a given music recording. Typically, such a segment has many (approximate) repetitions covering large parts of the recording. As the main technical contribution, we introduce a novel fitness measure that assigns a fitness value to each segment that expresses how much and how well the segment "explains" the repetitive structure of the entire recording. The thumbnail is then defined to be the fitness-maximizing segment. To compute the fitness measure, we describe an optimization scheme that jointly performs two error-prone steps, path extraction and grouping, which are usually performed successively. As a result, our approach is even able to cope with strong musical and acoustic variations that may occur within and across related segments. As a further contribution, we introduce the concept of fitness scape plots that reveal global structural properties of an entire recording. Finally, to show the robustness and practicability of our thumbnailing approach, we present various experiments based on different audio collections that comprise popular music, classical music, and folk song field recordings.
pA  
A01 01  1    @0 1558-7916
A03   1    @0 IEEE trans. audio speech lang. process.
A05       @2 21
A06       @2 3-4
A08 01  1  ENG  @1 A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing
A11 01  1    @1 MÜLLER (Meinard)
A11 02  1    @1 NANZHU JIANG
A11 03  1    @1 GROSCHE (Peter)
A14 01      @1 International Audio Laboratories Erlangen, which is a joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS @2 91058 Erlangen @3 DEU @Z 1 aut.
A14 02      @1 Saarland University and the Max-Planck Institut fur Informatik @2 66123 Saarbrücken @3 DEU @Z 2 aut. @Z 3 aut.
A20       @1 531-543
A21       @1 2013
A23 01      @0 ENG
A43 01      @1 INIST @2 26266 @5 354000173296660070
A44       @0 0000 @1 © 2013 INIST-CNRS. All rights reserved.
A45       @0 34 ref.
A47 01  1    @0 13-0149238
A60       @1 P
A61       @0 A
A64 01  1    @0 IEEE transactions on audio, speech, and language processing
A66 01      @0 USA
C01 01    ENG  @0 The automatic extraction of structural information from music recordings constitutes a central research topic. In this paper, we deal with a subproblem of audio structure analysis called audio thumbnailing with the goal to determine the audio segment that best represents a given music recording. Typically, such a segment has many (approximate) repetitions covering large parts of the recording. As the main technical contribution, we introduce a novel fitness measure that assigns a fitness value to each segment that expresses how much and how well the segment "explains" the repetitive structure of the entire recording. The thumbnail is then defined to be the fitness-maximizing segment. To compute the fitness measure, we describe an optimization scheme that jointly performs two error-prone steps, path extraction and grouping, which are usually performed successively. As a result, our approach is even able to cope with strong musical and acoustic variations that may occur within and across related segments. As a further contribution, we introduce the concept of fitness scape plots that reveal global structural properties of an entire recording. Finally, to show the robustness and practicability of our thumbnailing approach, we present various experiments based on different audio collections that comprise popular music, classical music, and folk song field recordings.
C02 01  X    @0 001D04A04A1
C02 02  X    @0 001D04A04A2
C02 03  X    @0 001D04A03
C03 01  X  FRE  @0 Reconnaissance automatique @5 01
C03 01  X  ENG  @0 Automatic recognition @5 01
C03 01  X  SPA  @0 Reconocimiento automático @5 01
C03 02  3  FRE  @0 Extraction caractéristique @5 02
C03 02  3  ENG  @0 Feature extraction @5 02
C03 03  X  FRE  @0 Extraction information @5 03
C03 03  X  ENG  @0 Information extraction @5 03
C03 03  X  SPA  @0 Extracción información @5 03
C03 04  X  FRE  @0 Analyse son @5 04
C03 04  X  ENG  @0 Sound analysis @5 04
C03 04  X  SPA  @0 Análisis sonido @5 04
C03 05  X  FRE  @0 Analyse structurale @5 05
C03 05  X  ENG  @0 Structural analysis @5 05
C03 05  X  SPA  @0 Análisis estructural @5 05
C03 06  X  FRE  @0 Structure périodique @5 06
C03 06  X  ENG  @0 Periodic structure @5 06
C03 06  X  SPA  @0 Estructura periódica @5 06
C03 07  X  FRE  @0 Optimisation @5 07
C03 07  X  ENG  @0 Optimization @5 07
C03 07  X  SPA  @0 Optimización @5 07
C03 08  X  FRE  @0 Acoustique musicale @5 08
C03 08  X  ENG  @0 Musical acoustics @5 08
C03 08  X  SPA  @0 Acústica musical @5 08
C03 09  X  FRE  @0 Robustesse @5 09
C03 09  X  ENG  @0 Robustness @5 09
C03 09  X  SPA  @0 Robustez @5 09
C03 10  X  FRE  @0 Traitement signal @5 46
C03 10  X  ENG  @0 Signal processing @5 46
C03 10  X  SPA  @0 Procesamiento señal @5 46
C07 01  X  FRE  @0 Traitement information @5 10
C07 01  X  ENG  @0 Information processing @5 10
C07 01  X  SPA  @0 Procesamiento información @5 10
N21       @1 125
N44 01      @1 OTO
N82       @1 OTO

Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:13-0149238

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>International Audio Laboratories Erlangen, which is a joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS</s1>
<s2>91058 Erlangen</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Nanzhu Jiang" sort="Nanzhu Jiang" uniqKey="Nanzhu Jiang" last="Nanzhu Jiang">NANZHU JIANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Saarland University and the Max-Planck Institut fur Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Grosche, Peter" sort="Grosche, Peter" uniqKey="Grosche P" first="Peter" last="Grosche">Peter Grosche</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Saarland University and the Max-Planck Institut fur Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0149238</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0149238 INIST</idno>
<idno type="RBID">Pascal:13-0149238</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000004</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000010</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>International Audio Laboratories Erlangen, which is a joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS</s1>
<s2>91058 Erlangen</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Nanzhu Jiang" sort="Nanzhu Jiang" uniqKey="Nanzhu Jiang" last="Nanzhu Jiang">NANZHU JIANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Saarland University and the Max-Planck Institut fur Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Grosche, Peter" sort="Grosche, Peter" uniqKey="Grosche P" first="Peter" last="Grosche">Peter Grosche</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Saarland University and the Max-Planck Institut fur Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic recognition</term>
<term>Feature extraction</term>
<term>Information extraction</term>
<term>Musical acoustics</term>
<term>Optimization</term>
<term>Periodic structure</term>
<term>Robustness</term>
<term>Signal processing</term>
<term>Sound analysis</term>
<term>Structural analysis</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance automatique</term>
<term>Extraction caractéristique</term>
<term>Extraction information</term>
<term>Analyse son</term>
<term>Analyse structurale</term>
<term>Structure périodique</term>
<term>Optimisation</term>
<term>Acoustique musicale</term>
<term>Robustesse</term>
<term>Traitement signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The automatic extraction of structural information from music recordings constitutes a central research topic. In this paper, we deal with a subproblem of audio structure analysis called audio thumbnailing with the goal to determine the audio segment that best represents a given music recording. Typically, such a segment has many (approximate) repetitions covering large parts of the recording. As the main technical contribution, we introduce a novel fitness measure that assigns a fitness value to each segment that expresses how much and how well the segment "explains" the repetitive structure of the entire recording. The thumbnail is then defined to be the fitness-maximizing segment. To compute the fitness measure, we describe an optimization scheme that jointly performs two error-prone steps, path extraction and grouping, which are usually performed successively. As a result, our approach is even able to cope with strong musical and acoustic variations that may occur within and across related segments. As a further contribution, we introduce the concept of fitness scape plots that reveal global structural properties of an entire recording. Finally, to show the robustness and practicability of our thumbnailing approach, we present various experiments based on different audio collections that comprise popular music, classical music, and folk song field recordings.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1558-7916</s0>
</fA01>
<fA03 i2="1">
<s0>IEEE trans. audio speech lang. process.</s0>
</fA03>
<fA05>
<s2>21</s2>
</fA05>
<fA06>
<s2>3-4</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing</s1>
</fA08>
<fA11 i1="01" i2="1">
<s1>MÜLLER (Meinard)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>NANZHU JIANG</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>GROSCHE (Peter)</s1>
</fA11>
<fA14 i1="01">
<s1>International Audio Laboratories Erlangen, which is a joint institution of the University of Erlangen-Nuremberg and Fraunhofer IIS</s1>
<s2>91058 Erlangen</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Saarland University and the Max-Planck Institut fur Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA20>
<s1>531-543</s1>
</fA20>
<fA21>
<s1>2013</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>26266</s2>
<s5>354000173296660070</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2013 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>34 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>13-0149238</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>IEEE transactions on audio, speech, and language processing</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>The automatic extraction of structural information from music recordings constitutes a central research topic. In this paper, we deal with a subproblem of audio structure analysis called audio thumbnailing with the goal to determine the audio segment that best represents a given music recording. Typically, such a segment has many (approximate) repetitions covering large parts of the recording. As the main technical contribution, we introduce a novel fitness measure that assigns a fitness value to each segment that expresses how much and how well the segment "explains" the repetitive structure of the entire recording. The thumbnail is then defined to be the fitness-maximizing segment. To compute the fitness measure, we describe an optimization scheme that jointly performs two error-prone steps, path extraction and grouping, which are usually performed successively. As a result, our approach is even able to cope with strong musical and acoustic variations that may occur within and across related segments. As a further contribution, we introduce the concept of fitness scape plots that reveal global structural properties of an entire recording. Finally, to show the robustness and practicability of our thumbnailing approach, we present various experiments based on different audio collections that comprise popular music, classical music, and folk song field recordings.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D04A04A1</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001D04A04A2</s0>
</fC02>
<fC02 i1="03" i2="X">
<s0>001D04A03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Reconnaissance automatique</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Automatic recognition</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Reconocimiento automático</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE">
<s0>Extraction caractéristique</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG">
<s0>Feature extraction</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Extraction information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Information extraction</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Extracción información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Analyse son</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Sound analysis</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Análisis sonido</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Analyse structurale</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Structural analysis</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Análisis estructural</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Structure périodique</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Periodic structure</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Estructura periódica</s0>
<s5>06</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Optimisation</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Optimization</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Optimización</s0>
<s5>07</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Acoustique musicale</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Musical acoustics</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Acústica musical</s0>
<s5>08</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Robustesse</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Robustness</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Robustez</s0>
<s5>09</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Traitement signal</s0>
<s5>46</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Signal processing</s0>
<s5>46</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Procesamiento señal</s0>
<s5>46</s5>
</fC03>
<fC07 i1="01" i2="X" l="FRE">
<s0>Traitement information</s0>
<s5>10</s5>
</fC07>
<fC07 i1="01" i2="X" l="ENG">
<s0>Information processing</s0>
<s5>10</s5>
</fC07>
<fC07 i1="01" i2="X" l="SPA">
<s0>Procesamiento información</s0>
<s5>10</s5>
</fC07>
<fN21>
<s1>125</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/PascalFrancis/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000010 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Curation/biblio.hfd -nk 000010 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    PascalFrancis
   |étape=   Curation
   |type=    RBID
   |clé=     Pascal:13-0149238
   |texte=   A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024