Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Towards Timbre-Invariant Audio Features for Harmony-Based Music

Identifieur interne : 000005 ( PascalFrancis/Curation ); précédent : 000004; suivant : 000006

Towards Timbre-Invariant Audio Features for Harmony-Based Music

Auteurs : Meinard Müller [Allemagne] ; Sebastian Ewert [Allemagne]

Source :

RBID : Pascal:10-0137839

Descripteurs français

English descriptors

Abstract

Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.
pA  
A01 01  1    @0 1558-7916
A03   1    @0 IEEE trans. audio speech lang. process.
A05       @2 18
A06       @2 3
A08 01  1  ENG  @1 Towards Timbre-Invariant Audio Features for Harmony-Based Music
A09 01  1  ENG  @1 SPECIAL ISSUE ON SIGNAL MODELS AND REPRESENTATIONS OF MUSICAL AND ENVIRONMENTAL SOUNDS
A11 01  1    @1 MÜLLER (Meinard)
A11 02  1    @1 EWERT (Sebastian)
A12 01  1    @1 DAVID (Bertrand) @9 ed.
A12 02  1    @1 GOTO (Masataka) @9 ed.
A12 03  1    @1 DAUDET (Laurent) @9 ed.
A12 04  1    @1 SMARAGDIS (Paris) @9 ed.
A14 01      @1 Saarland University and the Max-Planck Institut für Informatik @2 66123 Saarbrücken @3 DEU @Z 1 aut.
A14 02      @1 Multimedia Signal Processing Group, Department of Computer Science III, Bonn University @2 53117 Bonn @3 DEU @Z 2 aut.
A15 01      @1 Institut TELECOM; TELECOM ParisTech; CNRS @2 75634 Paris @3 FRA @Z 1 aut.
A15 02      @1 National Institute of Advanced Industrial Science and Technology (AIST) @2 Tsukuba 305-8568 @3 JPN @Z 2 aut.
A15 03      @1 Université Pierre et Marie Curie-Paris 6, Institut Jean le Rond d'Alembert-LAM @2 75015 Paris @3 FRA @Z 3 aut.
A15 04      @1 Adobe Systems, Inc. @2 Newtown, MA 02466 @3 USA @Z 4 aut.
A20       @1 649-662
A21       @1 2010
A23 01      @0 ENG
A43 01      @1 INIST @2 26266 @5 354000181408720200
A44       @0 0000 @1 © 2010 INIST-CNRS. All rights reserved.
A45       @0 39 ref.
A47 01  1    @0 10-0137839
A60       @1 P
A61       @0 A
A64 01  1    @0 IEEE transactions on audio, speech, and language processing
A66 01      @0 USA
C01 01    ENG  @0 Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.
C02 01  X    @0 001D04A04A1
C02 02  X    @0 001D04A03
C03 01  X  FRE  @0 Analyse son @5 01
C03 01  X  ENG  @0 Sound analysis @5 01
C03 01  X  SPA  @0 Análisis sonido @5 01
C03 02  X  FRE  @0 Son musical @5 02
C03 02  X  ENG  @0 Musical sound @5 02
C03 02  X  SPA  @0 Sonido musical @5 02
C03 03  X  FRE  @0 Recherche information @5 03
C03 03  X  ENG  @0 Information retrieval @5 03
C03 03  X  SPA  @0 Búsqueda información @5 03
C03 04  X  FRE  @0 Invariance @5 04
C03 04  X  ENG  @0 Invariance @5 04
C03 04  X  SPA  @0 Invarianza @5 04
C03 05  X  FRE  @0 Analyse discriminante @5 05
C03 05  X  ENG  @0 Discriminant analysis @5 05
C03 05  X  SPA  @0 Análisis discriminante @5 05
C03 06  3  FRE  @0 Analyse cepstrale @5 06
C03 06  3  ENG  @0 Cepstral analysis @5 06
C03 07  X  FRE  @0 Tonie @5 07
C03 07  X  ENG  @0 Pitch(acoustics) @5 07
C03 07  X  SPA  @0 Altura sonida @5 07
C03 08  X  FRE  @0 Evaluation performance @5 08
C03 08  X  ENG  @0 Performance evaluation @5 08
C03 08  X  SPA  @0 Evaluación prestación @5 08
C03 09  X  FRE  @0 Etat actuel @5 09
C03 09  X  ENG  @0 State of the art @5 09
C03 09  X  SPA  @0 Estado actual @5 09
C03 10  3  FRE  @0 Extraction caractéristique @5 10
C03 10  3  ENG  @0 Feature extraction @5 10
C03 11  X  FRE  @0 Timbre @5 11
C03 11  X  ENG  @0 Timbre @5 11
C03 11  X  SPA  @0 Sello @5 11
C03 12  X  FRE  @0 Signal acoustique @5 46
C03 12  X  ENG  @0 Acoustic signal @5 46
C03 12  X  SPA  @0 Señal acústica @5 46
C03 13  X  FRE  @0 Traitement signal @5 47
C03 13  X  ENG  @0 Signal processing @5 47
C03 13  X  SPA  @0 Procesamiento señal @5 47
N21       @1 088

Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:10-0137839

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Towards Timbre-Invariant Audio Features for Harmony-Based Music</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Ewert, Sebastian" sort="Ewert, Sebastian" uniqKey="Ewert S" first="Sebastian" last="Ewert">Sebastian Ewert</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0137839</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0137839 INIST</idno>
<idno type="RBID">Pascal:10-0137839</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000009</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000005</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Towards Timbre-Invariant Audio Features for Harmony-Based Music</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
<author>
<name sortKey="Ewert, Sebastian" sort="Ewert, Sebastian" uniqKey="Ewert S" first="Sebastian" last="Ewert">Sebastian Ewert</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Acoustic signal</term>
<term>Cepstral analysis</term>
<term>Discriminant analysis</term>
<term>Feature extraction</term>
<term>Information retrieval</term>
<term>Invariance</term>
<term>Musical sound</term>
<term>Performance evaluation</term>
<term>Pitch(acoustics)</term>
<term>Signal processing</term>
<term>Sound analysis</term>
<term>State of the art</term>
<term>Timbre</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse son</term>
<term>Son musical</term>
<term>Recherche information</term>
<term>Invariance</term>
<term>Analyse discriminante</term>
<term>Analyse cepstrale</term>
<term>Tonie</term>
<term>Evaluation performance</term>
<term>Etat actuel</term>
<term>Extraction caractéristique</term>
<term>Timbre</term>
<term>Signal acoustique</term>
<term>Traitement signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1558-7916</s0>
</fA01>
<fA03 i2="1">
<s0>IEEE trans. audio speech lang. process.</s0>
</fA03>
<fA05>
<s2>18</s2>
</fA05>
<fA06>
<s2>3</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>Towards Timbre-Invariant Audio Features for Harmony-Based Music</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>SPECIAL ISSUE ON SIGNAL MODELS AND REPRESENTATIONS OF MUSICAL AND ENVIRONMENTAL SOUNDS</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>MÜLLER (Meinard)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>EWERT (Sebastian)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>DAVID (Bertrand)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>GOTO (Masataka)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1">
<s1>DAUDET (Laurent)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="04" i2="1">
<s1>SMARAGDIS (Paris)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01">
<s1>Institut TELECOM; TELECOM ParisTech; CNRS</s1>
<s2>75634 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02">
<s1>National Institute of Advanced Industrial Science and Technology (AIST)</s1>
<s2>Tsukuba 305-8568</s2>
<s3>JPN</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03">
<s1>Université Pierre et Marie Curie-Paris 6, Institut Jean le Rond d'Alembert-LAM</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA15 i1="04">
<s1>Adobe Systems, Inc.</s1>
<s2>Newtown, MA 02466</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</fA15>
<fA20>
<s1>649-662</s1>
</fA20>
<fA21>
<s1>2010</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>26266</s2>
<s5>354000181408720200</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2010 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>39 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>10-0137839</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>IEEE transactions on audio, speech, and language processing</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D04A04A1</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001D04A03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Analyse son</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Sound analysis</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Análisis sonido</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Son musical</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Musical sound</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Sonido musical</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Recherche information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Information retrieval</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Búsqueda información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Invariance</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Invariance</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Invarianza</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Analyse discriminante</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Discriminant analysis</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Análisis discriminante</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE">
<s0>Analyse cepstrale</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG">
<s0>Cepstral analysis</s0>
<s5>06</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Tonie</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Pitch(acoustics)</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Altura sonida</s0>
<s5>07</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Evaluation performance</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Performance evaluation</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Evaluación prestación</s0>
<s5>08</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Etat actuel</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>State of the art</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Estado actual</s0>
<s5>09</s5>
</fC03>
<fC03 i1="10" i2="3" l="FRE">
<s0>Extraction caractéristique</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="3" l="ENG">
<s0>Feature extraction</s0>
<s5>10</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Timbre</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>Timbre</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Sello</s0>
<s5>11</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Signal acoustique</s0>
<s5>46</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Acoustic signal</s0>
<s5>46</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Señal acústica</s0>
<s5>46</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE">
<s0>Traitement signal</s0>
<s5>47</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG">
<s0>Signal processing</s0>
<s5>47</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA">
<s0>Procesamiento señal</s0>
<s5>47</s5>
</fC03>
<fN21>
<s1>088</s1>
</fN21>
</pA>
</standard>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/PascalFrancis/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000005 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Curation/biblio.hfd -nk 000005 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    PascalFrancis
   |étape=   Curation
   |type=    RBID
   |clé=     Pascal:10-0137839
   |texte=   Towards Timbre-Invariant Audio Features for Harmony-Based Music
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024