Serveur d'exploration sur la musique en Sarre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Automatic Transcription of Recorded Music

Identifieur interne : 000005 ( PascalFrancis/Checkpoint ); précédent : 000004; suivant : 000006

Automatic Transcription of Recorded Music

Auteurs : Peter Grosche [Allemagne] ; Björn Schuller [Allemagne] ; Meinard Müller [Allemagne] ; Gerhard Rigoll [Allemagne]

Source :

RBID : Pascal:12-0406083

Descripteurs français

English descriptors

Abstract

The automatic transcription of music recordings with the objective to derive a score-like representation from a given audio representation is a fundamental and challenging task. In particular for polyphonic music recordings with overlapping sound sources, current transcription systems still have problems to accurately extract the parameters of individual notes specified by pitch, onset, and duration. In this article, we present a music transcription system that is carefully designed to cope with various facets of music. One main idea of our approach is to consistently employ a mid-level representation that is based on a musically meaningful pitch scale. To achieve the necessary spectral and temporal resolution, we use a multi-resolution Fourier transform enhanced by an instantaneous frequency estimation. Subsequently, having extracted pitch and note onset information from this representation, we employ Hidden Markov Models (HMM) for determining the note events in a context-sensitive fashion. As another contribution, we evaluate our transcription system on an extensive dataset containing audio recordings of various genre. Here, opposed to many previous approaches, we do not only rely on synthetic audio material, but evaluate our system on real audio recordings using MIDI-audio synchronization techniques to automatically generate reference annotations.


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:12-0406083

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Automatic Transcription of Recorded Music</title>
<author>
<name sortKey="Grosche, Peter" sort="Grosche, Peter" uniqKey="Grosche P" first="Peter" last="Grosche">Peter Grosche</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and MPI Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Schuller, Bjorn" sort="Schuller, Bjorn" uniqKey="Schuller B" first="Björn" last="Schuller">Björn Schuller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Institute for Human-Machine Communication, Technische Universitat München</s1>
<s2>80333 München</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and MPI Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rigoll, Gerhard" sort="Rigoll, Gerhard" uniqKey="Rigoll G" first="Gerhard" last="Rigoll">Gerhard Rigoll</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Institute for Human-Machine Communication, Technische Universitat München</s1>
<s2>80333 München</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">12-0406083</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 12-0406083 INIST</idno>
<idno type="RBID">Pascal:12-0406083</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000005</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000009</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000005</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000005</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Automatic Transcription of Recorded Music</title>
<author>
<name sortKey="Grosche, Peter" sort="Grosche, Peter" uniqKey="Grosche P" first="Peter" last="Grosche">Peter Grosche</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and MPI Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Schuller, Bjorn" sort="Schuller, Bjorn" uniqKey="Schuller B" first="Björn" last="Schuller">Björn Schuller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Institute for Human-Machine Communication, Technische Universitat München</s1>
<s2>80333 München</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Saarland University and MPI Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rigoll, Gerhard" sort="Rigoll, Gerhard" uniqKey="Rigoll G" first="Gerhard" last="Rigoll">Gerhard Rigoll</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Institute for Human-Machine Communication, Technische Universitat München</s1>
<s2>80333 München</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Acta acustica united with acustica : (Print)</title>
<title level="j" type="abbreviated">Acta aucust. united Acust. : (Print)</title>
<idno type="ISSN">1610-1928</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Acta acustica united with acustica : (Print)</title>
<title level="j" type="abbreviated">Acta aucust. united Acust. : (Print)</title>
<idno type="ISSN">1610-1928</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Annotation</term>
<term>Audio acoustics</term>
<term>Automatic transcription</term>
<term>Context aware</term>
<term>Duration</term>
<term>Fourier transformation</term>
<term>Hidden Markov model</term>
<term>Markov model</term>
<term>Multiresolution analysis</term>
<term>Music</term>
<term>Musical acoustics</term>
<term>Pitch(acoustics)</term>
<term>Sound record</term>
<term>Sound recording</term>
<term>Sound reproduction</term>
<term>Sound source</term>
<term>Spectral limit</term>
<term>Synchronization</term>
<term>Time resolution</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Acoustique musicale</term>
<term>Enregistrement son</term>
<term>Tonie</term>
<term>Limite spectrale</term>
<term>Sensibilité contexte</term>
<term>Reproduction son</term>
<term>Emission sonore enregistrée</term>
<term>Synchronisation</term>
<term>Transcription automatique</term>
<term>Source sonore</term>
<term>Musique</term>
<term>Annotation</term>
<term>Durée</term>
<term>Transformation Fourier</term>
<term>Modèle Markov caché</term>
<term>Modèle Markov</term>
<term>Résolution temporelle</term>
<term>Analyse multirésolution</term>
<term>Acoustique audio</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Musique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The automatic transcription of music recordings with the objective to derive a score-like representation from a given audio representation is a fundamental and challenging task. In particular for polyphonic music recordings with overlapping sound sources, current transcription systems still have problems to accurately extract the parameters of individual notes specified by pitch, onset, and duration. In this article, we present a music transcription system that is carefully designed to cope with various facets of music. One main idea of our approach is to consistently employ a mid-level representation that is based on a musically meaningful pitch scale. To achieve the necessary spectral and temporal resolution, we use a multi-resolution Fourier transform enhanced by an instantaneous frequency estimation. Subsequently, having extracted pitch and note onset information from this representation, we employ Hidden Markov Models (HMM) for determining the note events in a context-sensitive fashion. As another contribution, we evaluate our transcription system on an extensive dataset containing audio recordings of various genre. Here, opposed to many previous approaches, we do not only rely on synthetic audio material, but evaluate our system on real audio recordings using MIDI-audio synchronization techniques to automatically generate reference annotations.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1610-1928</s0>
</fA01>
<fA03 i2="1">
<s0>Acta aucust. united Acust. : (Print)</s0>
</fA03>
<fA05>
<s2>98</s2>
</fA05>
<fA06>
<s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>Automatic Transcription of Recorded Music</s1>
</fA08>
<fA11 i1="01" i2="1">
<s1>GROSCHE (Peter)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>SCHULLER (Björn)</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>MÜLLER (Meinard)</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>RIGOLL (Gerhard)</s1>
</fA11>
<fA14 i1="01">
<s1>Saarland University and MPI Informatik</s1>
<s2>66123 Saarbrücken</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Institute for Human-Machine Communication, Technische Universitat München</s1>
<s2>80333 München</s2>
<s3>DEU</s3>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</fA14>
<fA20>
<s1>199-215</s1>
</fA20>
<fA21>
<s1>2012</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>6827</s2>
<s5>354000506780780010</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2012 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>92 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>12-0406083</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Acta acustica united with acustica : (Print)</s0>
</fA64>
<fA66 i1="01">
<s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>The automatic transcription of music recordings with the objective to derive a score-like representation from a given audio representation is a fundamental and challenging task. In particular for polyphonic music recordings with overlapping sound sources, current transcription systems still have problems to accurately extract the parameters of individual notes specified by pitch, onset, and duration. In this article, we present a music transcription system that is carefully designed to cope with various facets of music. One main idea of our approach is to consistently employ a mid-level representation that is based on a musically meaningful pitch scale. To achieve the necessary spectral and temporal resolution, we use a multi-resolution Fourier transform enhanced by an instantaneous frequency estimation. Subsequently, having extracted pitch and note onset information from this representation, we employ Hidden Markov Models (HMM) for determining the note events in a context-sensitive fashion. As another contribution, we evaluate our transcription system on an extensive dataset containing audio recordings of various genre. Here, opposed to many previous approaches, we do not only rely on synthetic audio material, but evaluate our system on real audio recordings using MIDI-audio synchronization techniques to automatically generate reference annotations.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001B40C60</s0>
</fC02>
<fC02 i1="02" i2="X">
<s0>001B40C75</s0>
</fC02>
<fC02 i1="03" i2="X">
<s0>001B40C38</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Acoustique musicale</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Musical acoustics</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Acústica musical</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Enregistrement son</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Sound recording</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Registro sonido</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Tonie</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Pitch(acoustics)</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Altura sonida</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Limite spectrale</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Spectral limit</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Límite espectral</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Sensibilité contexte</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Context aware</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Sensibilidad contexto</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Reproduction son</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Sound reproduction</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Reproducción sonido</s0>
<s5>11</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Emission sonore enregistrée</s0>
<s5>15</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Sound record</s0>
<s5>15</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Registro sonoro</s0>
<s5>15</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Synchronisation</s0>
<s5>16</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Synchronization</s0>
<s5>16</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Sincronización</s0>
<s5>16</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Transcription automatique</s0>
<s5>18</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Automatic transcription</s0>
<s5>18</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Transcripción automática</s0>
<s5>18</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Source sonore</s0>
<s5>19</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Sound source</s0>
<s5>19</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Fuente sonora</s0>
<s5>19</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Musique</s0>
<s5>20</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>Music</s0>
<s5>20</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Música</s0>
<s5>20</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Annotation</s0>
<s5>21</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Annotation</s0>
<s5>21</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Anotación</s0>
<s5>21</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE">
<s0>Durée</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG">
<s0>Duration</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA">
<s0>Duración</s0>
<s5>23</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE">
<s0>Transformation Fourier</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG">
<s0>Fourier transformation</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA">
<s0>Transformación Fourier</s0>
<s5>24</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE">
<s0>Modèle Markov caché</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="ENG">
<s0>Hidden Markov model</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="SPA">
<s0>Modelo Markov oculto</s0>
<s5>25</s5>
</fC03>
<fC03 i1="16" i2="X" l="FRE">
<s0>Modèle Markov</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="ENG">
<s0>Markov model</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="SPA">
<s0>Modelo Markov</s0>
<s5>26</s5>
</fC03>
<fC03 i1="17" i2="X" l="FRE">
<s0>Résolution temporelle</s0>
<s5>33</s5>
</fC03>
<fC03 i1="17" i2="X" l="ENG">
<s0>Time resolution</s0>
<s5>33</s5>
</fC03>
<fC03 i1="17" i2="X" l="SPA">
<s0>Resolución temporal</s0>
<s5>33</s5>
</fC03>
<fC03 i1="18" i2="X" l="FRE">
<s0>Analyse multirésolution</s0>
<s5>34</s5>
</fC03>
<fC03 i1="18" i2="X" l="ENG">
<s0>Multiresolution analysis</s0>
<s5>34</s5>
</fC03>
<fC03 i1="18" i2="X" l="SPA">
<s0>Análisis multiresolución</s0>
<s5>34</s5>
</fC03>
<fC03 i1="19" i2="X" l="FRE">
<s0>Acoustique audio</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="ENG">
<s0>Audio acoustics</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="SPA">
<s0>Acústica audio</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fN21>
<s1>317</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
</inist>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>Bavière</li>
<li>District de Haute-Bavière</li>
<li>Sarre (Land)</li>
</region>
<settlement>
<li>Munich</li>
<li>Sarrebruck</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Sarre (Land)">
<name sortKey="Grosche, Peter" sort="Grosche, Peter" uniqKey="Grosche P" first="Peter" last="Grosche">Peter Grosche</name>
</region>
<name sortKey="Muller, Meinard" sort="Muller, Meinard" uniqKey="Muller M" first="Meinard" last="Müller">Meinard Müller</name>
<name sortKey="Rigoll, Gerhard" sort="Rigoll, Gerhard" uniqKey="Rigoll G" first="Gerhard" last="Rigoll">Gerhard Rigoll</name>
<name sortKey="Schuller, Bjorn" sort="Schuller, Bjorn" uniqKey="Schuller B" first="Björn" last="Schuller">Björn Schuller</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/PascalFrancis/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000005 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Checkpoint/biblio.hfd -nk 000005 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    PascalFrancis
   |étape=   Checkpoint
   |type=    RBID
   |clé=     Pascal:12-0406083
   |texte=   Automatic Transcription of Recorded Music
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024