Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Towards Timbre-Invariant Audio Features for Harmony-Based Music

Identifieur interne : 000007 ( PascalFrancis/Checkpoint ); précédent : 000006; suivant : 000008

Towards Timbre-Invariant Audio Features for Harmony-Based Music

Auteurs : Meinard Müller [Allemagne] ; Sebastian Ewert [Allemagne]

Source :

RBID : Pascal:10-0137839

Descripteurs français

English descriptors

Abstract

Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:10-0137839

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Towards Timbre-Invariant Audio Features for Harmony-Based Music</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ewert, Sebastian" sort="Ewert, Sebastian" uniqKey="Ewert S" first="Sebastian" last="Ewert">Sebastian Ewert</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Bonn</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0137839</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0137839 INIST</idno>
<idno type="RBID">Pascal:10-0137839</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000009</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000005</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000007</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000007</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Towards Timbre-Invariant Audio Features for Harmony-Based Music</title>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ewert, Sebastian" sort="Ewert, Sebastian" uniqKey="Ewert S" first="Sebastian" last="Ewert">Sebastian Ewert</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Bonn</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">IEEE transactions on audio, speech, and language processing</title>
<title level="j" type="abbreviated">IEEE trans. audio speech lang. process.</title>
<idno type="ISSN">1558-7916</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Acoustic signal</term>
<term>Cepstral analysis</term>
<term>Discriminant analysis</term>
<term>Feature extraction</term>
<term>Information retrieval</term>
<term>Invariance</term>
<term>Musical sound</term>
<term>Performance evaluation</term>
<term>Pitch(acoustics)</term>
<term>Signal processing</term>
<term>Sound analysis</term>
<term>State of the art</term>
<term>Timbre</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse son</term>
<term>Son musical</term>
<term>Recherche information</term>
<term>Invariance</term>
<term>Analyse discriminante</term>
<term>Analyse cepstrale</term>
<term>Tonie</term>
<term>Evaluation performance</term>
<term>Etat actuel</term>
<term>Extraction caractéristique</term>
<term>Timbre</term>
<term>Signal acoustique</term>
<term>Traitement signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1558-7916</s0>
</fA01>
<fA03 i2="1">
<s0>IEEE trans. audio speech lang. process.</s0>
</fA03>
<fA05>
<s2>18</s2>
</fA05>
<fA06>
<s2>3</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>Towards Timbre-Invariant Audio Features for Harmony-Based Music</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>SPECIAL ISSUE ON SIGNAL MODELS AND REPRESENTATIONS OF MUSICAL AND ENVIRONMENTAL SOUNDS</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>MÜLLER (Meinard)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>EWERT (Sebastian)</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>DAVID (Bertrand)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>GOTO (Masataka)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1">
<s1>DAUDET (Laurent)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="04" i2="1">
<s1>SMARAGDIS (Paris)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Saarland University and the Max-Planck Institut für Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Multimedia Signal Processing Group, Department of Computer Science III, Bonn University</s1>
<s2>53117 Bonn</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA15 i1="01">
<s1>Institut TELECOM; TELECOM ParisTech; CNRS</s1>
<s2>75634 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</fA15>
<fA15 i1="02">
<s1>National Institute of Advanced Industrial Science and Technology (AIST)</s1>
<s2>Tsukuba 305-8568</s2>
<s3>JPN</s3>
<sZ>2 aut.</sZ>
</fA15>
<fA15 i1="03">
<s1>Université Pierre et Marie Curie-Paris 6, Institut Jean le Rond d'Alembert-LAM</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>3 aut.</sZ>
</fA15>
<fA15 i1="04">
<s1>Adobe Systems, Inc.</s1>
<s2>Newtown, MA 02466</s2>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</fA15>
<fA20>
<s1>649-662</s1>
</fA20>
<fA21>
<s1>2010</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>26266</s2>
<s5>354000181408720200</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2010 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>39 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>10-0137839</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>IEEE transactions on audio, speech, and language processing</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>Chroma-based audio features are a well-established tool for analyzing and comparing harmony-based Western music that is based on the equal-tempered scale. By identifying spectral components that differ by a musical octave, chroma features possess a considerable amount of robustness to changes in timbre and instrumentation. In this paper, we describe a novel procedure that further enhances chroma features by significantly boosting the degree of timbre invariance without degrading the features' discriminative power. Our idea is based on the generally accepted observation that the lower mel-frequency cepstral coefficients (MFCCs) are closely related to timbre. Now, instead of keeping the lower coefficients, we discard them and only keep the upper coefficients. Furthermore, using a pitch scale instead of a mel scale allows us to project the remaining coefficients onto the 12 chroma bins. We present a series of experiments to demonstrate that the resulting chroma features outperform various state-of-the art features in the context of music matching and retrieval applications. As a final contribution, we give a detailed analysis of our enhancement procedure revealing the musical meaning of certain pitch-frequency cepstral coefficients.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D04A04A1</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001D04A03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Analyse son</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Sound analysis</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Análisis sonido</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Son musical</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Musical sound</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Sonido musical</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Recherche information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Information retrieval</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Búsqueda información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Invariance</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Invariance</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Invarianza</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Analyse discriminante</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Discriminant analysis</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Análisis discriminante</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE">
<s0>Analyse cepstrale</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG">
<s0>Cepstral analysis</s0>
<s5>06</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Tonie</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Pitch(acoustics)</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Altura sonida</s0>
<s5>07</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Evaluation performance</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Performance evaluation</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Evaluación prestación</s0>
<s5>08</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Etat actuel</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>State of the art</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Estado actual</s0>
<s5>09</s5>
</fC03>
<fC03 i1="10" i2="3" l="FRE">
<s0>Extraction caractéristique</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="3" l="ENG">
<s0>Feature extraction</s0>
<s5>10</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Timbre</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>Timbre</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Sello</s0>
<s5>11</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Signal acoustique</s0>
<s5>46</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Acoustic signal</s0>
<s5>46</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Señal acústica</s0>
<s5>46</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE">
<s0>Traitement signal</s0>
<s5>47</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG">
<s0>Signal processing</s0>
<s5>47</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA">
<s0>Procesamiento señal</s0>
<s5>47</s5>
</fC03>
<fN21>
<s1>088</s1>
</fN21>
</pA>
</standard>
</inist>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>District de Cologne</li>
<li>Rhénanie-du-Nord-Westphalie</li>
<li>Sarre (Land)</li>
</region>
<settlement>
<li>Bonn</li>
<li>Sarrebruck</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Sarre (Land)">
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
</region>
<name sortKey="Ewert, Sebastian" sort="Ewert, Sebastian" uniqKey="Ewert S" first="Sebastian" last="Ewert">Sebastian Ewert</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/PascalFrancis/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000007 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Checkpoint/biblio.hfd -nk 000007 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    PascalFrancis
   |étape=   Checkpoint
   |type=    RBID
   |clé=     Pascal:10-0137839
   |texte=   Towards Timbre-Invariant Audio Features for Harmony-Based Music
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024