TeiVM2, Istex, Corpus, bibRecord, 000420

Linguistic documents synchronizing sound and text

Identifieur interne : 000420 ( Istex/Corpus ); précédent : 000419; suivant : 000421

Linguistic documents synchronizing sound and text

Auteurs : Michel Jacobson ; Boyd Michailovsky ; John B. Lowe

Source :

Speech Communication [ 0167-6393 ] ; 2000.

RBID : ISTEX:76F37F4EC8D5D4F4473AAD428436716F4418582A

Abstract

The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.

Url:

https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/pdf

DOI: 10.1016/S0167-6393(00)00070-4

Links to Exploration step

ISTEX:76F37F4EC8D5D4F4473AAD428436716F4418582A

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Linguistic documents synchronizing sound and text</title>
<author><name sortKey="Jacobson, Michel" sort="Jacobson, Michel" uniqKey="Jacobson M" first="Michel" last="Jacobson">Michel Jacobson</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: jacobson@idf.ext.jussieu.fr</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Michailovsky, Boyd" sort="Michailovsky, Boyd" uniqKey="Michailovsky B" first="Boyd" last="Michailovsky">Boyd Michailovsky</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="B Lowe, John" sort="B Lowe, John" uniqKey="B Lowe J" first="John" last="B. Lowe">John B. Lowe</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:76F37F4EC8D5D4F4473AAD428436716F4418582A</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0167-6393(00)00070-4</idno>
<idno type="url">https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000420</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Linguistic documents synchronizing sound and text</title>
<author><name sortKey="Jacobson, Michel" sort="Jacobson, Michel" uniqKey="Jacobson M" first="Michel" last="Jacobson">Michel Jacobson</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>E-mail: jacobson@idf.ext.jussieu.fr</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Michailovsky, Boyd" sort="Michailovsky, Boyd" uniqKey="Michailovsky B" first="Boyd" last="Michailovsky">Boyd Michailovsky</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="B Lowe, John" sort="B Lowe, John" uniqKey="B Lowe J" first="John" last="B. Lowe">John B. Lowe</name>
<affiliation><mods:affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Speech Communication</title>
<title level="j" type="abbrev">SPECOM</title>
<idno type="ISSN">0167-6393</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">1–2</biblScope>
<biblScope unit="page" from="79">79</biblScope>
<biblScope unit="page" to="96">96</biblScope>
</imprint>
<idno type="ISSN">0167-6393</idno>
</series>
<idno type="istex">76F37F4EC8D5D4F4473AAD428436716F4418582A</idno>
<idno type="DOI">10.1016/S0167-6393(00)00070-4</idno>
<idno type="PII">S0167-6393(00)00070-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0167-6393</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.</div>
</front>
</TEI>
<istex><corpusName>elsevier</corpusName>
<author><json:item><name>Michel Jacobson</name>
<affiliations><json:string>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</json:string>
<json:string>E-mail: jacobson@idf.ext.jussieu.fr</json:string>
</affiliations>
</json:item>
<json:item><name>Boyd Michailovsky</name>
<affiliations><json:string>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</json:string>
</affiliations>
</json:item>
<json:item><name>John B. Lowe</name>
<affiliations><json:string>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</json:string>
</affiliations>
</json:item>
</author>
<language><json:string>eng</json:string>
</language>
<originalGenre><json:string>Full-length article</json:string>
</originalGenre>
<abstract>The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.</abstract>
<qualityIndicators><score>7.616</score>
<pdfVersion>1.2</pdfVersion>
<pdfPageSize>544 x 743 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>1482</abstractCharCount>
<pdfWordCount>7380</pdfWordCount>
<pdfCharCount>43092</pdfCharCount>
<pdfPageCount>18</pdfPageCount>
<abstractWordCount>218</abstractWordCount>
</qualityIndicators>
<title>Linguistic documents synchronizing sound and text</title>
<pii><json:string>S0167-6393(00)00070-4</json:string>
</pii>
<genre><json:string>research-article</json:string>
</genre>
<host><volume>33</volume>
<pii><json:string>S0167-6393(00)X0050-7</json:string>
</pii>
<editor><json:item><name>S. Bird and J. Harrington</name>
</json:item>
</editor>
<pages><last>96</last>
<first>79</first>
</pages>
<conference><json:item><name>Speech Annotation and Corpus Tools Speech Annotation</name>
</json:item>
</conference>
<issn><json:string>0167-6393</json:string>
</issn>
<issue>1–2</issue>
<genre><json:string>journal</json:string>
</genre>
<language><json:string>unknown</json:string>
</language>
<title>Speech Communication</title>
<publicationDate>2001</publicationDate>
</host>
<categories><wos><json:string>COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS</json:string>
<json:string>ACOUSTICS</json:string>
</wos>
</categories>
<publicationDate>2001</publicationDate>
<copyrightDate>2001</copyrightDate>
<doi><json:string>10.1016/S0167-6393(00)00070-4</json:string>
</doi>
<id>76F37F4EC8D5D4F4473AAD428436716F4418582A</id>
<score>0.13144752</score>
<fulltext><json:item><original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/pdf</uri>
</json:item>
<json:item><original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/tei"><teiHeader><fileDesc><titleStmt><title level="a">Linguistic documents synchronizing sound and text</title>
</titleStmt>
<publicationStmt><authority>ISTEX</authority>
<publisher>ELSEVIER</publisher>
<availability><p>ELSEVIER</p>
</availability>
<date>2001</date>
</publicationStmt>
<notesStmt><note type="content">Fig. 1: Transcription, interlinear gloss and free translation of a Hayu narrative.</note>
<note type="content">Fig. 2: An example of Shoebox annotation.</note>
<note type="content">Fig. 3: Fragment of the LACITO DTD.</note>
<note type="content">Fig. 4: Markup of a linguistic text.</note>
<note type="content">Fig. 5: SoundIndex1.</note>
<note type="content">Fig. 6: SoundIndex2.</note>
<note type="content">Fig. 7: Browsing the archive.</note>
<note type="content">Fig. 8: Communication from script to applet in text-driven mode.</note>
<note type="content">Fig. 9: Communication between script and applet in time-driven mode.</note>
<note type="content">Fig. 10: Order of execution of anchored elements.</note>
</notesStmt>
<sourceDesc><biblStruct type="inbook"><analytic><title level="a">Linguistic documents synchronizing sound and text</title>
<author><persName><forename type="first">Michel</forename>
<surname>Jacobson</surname>
</persName>
<email>jacobson@idf.ext.jussieu.fr</email>
<note type="correspondence"><p>Corresponding author. Fax: +33-1-49583779</p>
</note>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
</author>
<author><persName><forename type="first">Boyd</forename>
<surname>Michailovsky</surname>
</persName>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
</author>
<author><persName><forename type="first">John</forename>
<surname>B. Lowe</surname>
</persName>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
</author>
</analytic>
<monogr><title level="j">Speech Communication</title>
<title level="j" type="abbrev">SPECOM</title>
<idno type="pISSN">0167-6393</idno>
<idno type="PII">S0167-6393(00)X0050-7</idno>
<meeting><addName>Speech Annotation and Corpus Tools</addName>
<addName>Speech Annotation</addName>
</meeting>
<editor><persName>S. Bird and J. Harrington</persName>
</editor>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000"></date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">1–2</biblScope>
<biblScope unit="page" from="79">79</biblScope>
<biblScope unit="page" to="96">96</biblScope>
</imprint>
</monogr>
<idno type="istex">76F37F4EC8D5D4F4473AAD428436716F4418582A</idno>
<idno type="DOI">10.1016/S0167-6393(00)00070-4</idno>
<idno type="PII">S0167-6393(00)00070-4</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><creation><date>2001</date>
</creation>
<langUsage><language ident="en">en</language>
</langUsage>
<abstract xml:lang="en"><p>The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.</p>
</abstract>
<abstract xml:lang="fr"><p>Le Programme Archivage du LACITO (Laboratoire de Langues et Civilisations à Tradition Orale du CNRS) a pour but la pérennisation, l'exploitation et la diffusion de documents linguistiques intégrant texte et son, en particulier les enregistrements faits et transcrits sur le terrain par les chercheurs du laboratoire. L'annotation (transcription, analyse, gloses interlinéaires, traductions) est balisée selon la norme XML et synchronisé phrase par phrase avec l'enregistrement numérisé, pour donner accès simultanément au texte et au son. Dans la mesure du possible des outils logiciels génériques et librement disponibles sont utilisés. Les documents produits sont consultés à l'aide des browsers les plus courants sur Internet. Le texte balisé est manipulé à l'aide d'outils génériques XML. Le programme a développé (1) un outil de création, SoundIndex, qui facilite la synchronisation du son avec le texte, (2) un applet Java qui permet aux browsers d'accéder au son, (3) des feuilles de style XSL qui définissent les “vues” sur les données, et (4) une interface (CGI) qui permet à l'utilisateur de choisir entre les documents et les vues disponibles ainsi que de formuler des requêtes, par exemple, pour chercher un mot particulier. Une centaine de documents dans une vingtaine de langues ont été préparés, dont certains sont disponibles sur Internet.</p>
</abstract>
</profileDesc>
<revisionDesc><change when="2000-08-02">Registration</change>
<change when="2000">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item><original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata><istex:metadataXml wicri:clean="Elsevier, elements deleted: ce:floats; body; tail"><istex:xmlDeclaration>version="1.0" encoding="utf-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//ES//DTD journal article DTD version 4.5.2//EN//XML" URI="art452.dtd" name="istex:docType"><istex:entity SYSTEM="gr1" NDATA="IMAGE" name="gr1"></istex:entity>
<istex:entity SYSTEM="gr2" NDATA="IMAGE" name="gr2"></istex:entity>
<istex:entity SYSTEM="gr3" NDATA="IMAGE" name="gr3"></istex:entity>
<istex:entity SYSTEM="gr4" NDATA="IMAGE" name="gr4"></istex:entity>
<istex:entity SYSTEM="gr5" NDATA="IMAGE" name="gr5"></istex:entity>
<istex:entity SYSTEM="gr6" NDATA="IMAGE" name="gr6"></istex:entity>
<istex:entity SYSTEM="gr7" NDATA="IMAGE" name="gr7"></istex:entity>
<istex:entity SYSTEM="gr8" NDATA="IMAGE" name="gr8"></istex:entity>
<istex:entity SYSTEM="gr9" NDATA="IMAGE" name="gr9"></istex:entity>
<istex:entity SYSTEM="gr10" NDATA="IMAGE" name="gr10"></istex:entity>
</istex:docType>
<istex:document><converted-article version="4.5.2" docsubtype="fla"><item-info><jid>SPECOM</jid>
<aid>1108</aid>
<ce:pii>S0167-6393(00)00070-4</ce:pii>
<ce:doi>10.1016/S0167-6393(00)00070-4</ce:doi>
<ce:copyright type="full-transfer" year="2001">Elsevier Science B.V.</ce:copyright>
</item-info>
<head><ce:title>Linguistic documents synchronizing sound and text</ce:title>
<ce:author-group><ce:author><ce:given-name>Michel</ce:given-name>
<ce:surname>Jacobson</ce:surname>
<ce:cross-ref refid="CORR1">*</ce:cross-ref>
<ce:e-address>jacobson@idf.ext.jussieu.fr</ce:e-address>
</ce:author>
<ce:author><ce:given-name>Boyd</ce:given-name>
<ce:surname>Michailovsky</ce:surname>
</ce:author>
<ce:author><ce:given-name>John</ce:given-name>
<ce:surname>B. Lowe</ce:surname>
</ce:author>
<ce:affiliation><ce:textfn>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</ce:textfn>
</ce:affiliation>
<ce:correspondence id="CORR1"><ce:label>*</ce:label>
<ce:text>Corresponding author. Fax: +33-1-49583779</ce:text>
</ce:correspondence>
</ce:author-group>
<ce:date-accepted day="2" month="8" year="2000"></ce:date-accepted>
<ce:abstract><ce:section-title>Abstract</ce:section-title>
<ce:abstract-sec><ce:simple-para>The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.</ce:simple-para>
</ce:abstract-sec>
</ce:abstract>
<ce:abstract xml:lang="fr"><ce:section-title>Résumé</ce:section-title>
<ce:abstract-sec><ce:simple-para>Le Programme Archivage du LACITO (Laboratoire de Langues et Civilisations à Tradition Orale du CNRS) a pour but la pérennisation, l'exploitation et la diffusion de documents linguistiques intégrant texte et son, en particulier les enregistrements faits et transcrits sur le terrain par les chercheurs du laboratoire. L'annotation (transcription, analyse, gloses interlinéaires, traductions) est balisée selon la norme XML et synchronisé phrase par phrase avec l'enregistrement numérisé, pour donner accès simultanément au texte et au son. Dans la mesure du possible des outils logiciels génériques et librement disponibles sont utilisés. Les documents produits sont consultés à l'aide des browsers les plus courants sur Internet. Le texte balisé est manipulé à l'aide d'outils génériques XML. Le programme a développé (1) un outil de création, SoundIndex, qui facilite la synchronisation du son avec le texte, (2) un applet Java qui permet aux browsers d'accéder au son, (3) des feuilles de style XSL qui définissent les “vues” sur les données, et (4) une interface (CGI) qui permet à l'utilisateur de choisir entre les documents et les vues disponibles ainsi que de formuler des requêtes, par exemple, pour chercher un mot particulier. Une centaine de documents dans une vingtaine de langues ont été préparés, dont certains sont disponibles sur Internet.</ce:simple-para>
</ce:abstract-sec>
</ce:abstract>
</head>
</converted-article>
</istex:document>
</istex:metadataXml>
<mods version="3.6"><titleInfo><title>Linguistic documents synchronizing sound and text</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA"><title>Linguistic documents synchronizing sound and text</title>
</titleInfo>
<name type="personal"><namePart type="given">Michel</namePart>
<namePart type="family">Jacobson</namePart>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
<affiliation>E-mail: jacobson@idf.ext.jussieu.fr</affiliation>
<description>Corresponding author. Fax: +33-1-49583779</description>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Boyd</namePart>
<namePart type="family">Michailovsky</namePart>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">John</namePart>
<namePart type="family">B. Lowe</namePart>
<affiliation>CNRS/LACITO, 7 rue Guy Moquet, Bat. 23, 94800 Villejuif, France</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="Full-length article"></genre>
<originInfo><publisher>ELSEVIER</publisher>
<dateIssued encoding="w3cdtf">2001</dateIssued>
<copyrightDate encoding="w3cdtf">2001</copyrightDate>
</originInfo>
<language><languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription><internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">The goal of the Langues et Civilisations à Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguistic materials, mainly in unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented into sentences (roughly) and words. Annotations are associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time-alignment is at the sentence and optionally at the word level. The project makes maximum use of standard, generic software tools. Marked-up data are processed using freely available XML software and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet, which enables browsers to access time-aligned speech, (3) XSL stylesheets, which specify “views” on the data, and (4) Common Gateway Interface (CGI) scripts, which allow the user to choose documents and views and to enter queries. Current objectives include development of the annotation and software to facilitate linguistic research beyond simple browsing. Over 100 texts in 20 languages have been processed at the time of writing; some of these are available on the Internet for browsing and simple querying.</abstract>
<abstract lang="fr">Le Programme Archivage du LACITO (Laboratoire de Langues et Civilisations à Tradition Orale du CNRS) a pour but la pérennisation, l'exploitation et la diffusion de documents linguistiques intégrant texte et son, en particulier les enregistrements faits et transcrits sur le terrain par les chercheurs du laboratoire. L'annotation (transcription, analyse, gloses interlinéaires, traductions) est balisée selon la norme XML et synchronisé phrase par phrase avec l'enregistrement numérisé, pour donner accès simultanément au texte et au son. Dans la mesure du possible des outils logiciels génériques et librement disponibles sont utilisés. Les documents produits sont consultés à l'aide des browsers les plus courants sur Internet. Le texte balisé est manipulé à l'aide d'outils génériques XML. Le programme a développé (1) un outil de création, SoundIndex, qui facilite la synchronisation du son avec le texte, (2) un applet Java qui permet aux browsers d'accéder au son, (3) des feuilles de style XSL qui définissent les “vues” sur les données, et (4) une interface (CGI) qui permet à l'utilisateur de choisir entre les documents et les vues disponibles ainsi que de formuler des requêtes, par exemple, pour chercher un mot particulier. Une centaine de documents dans une vingtaine de langues ont été préparés, dont certains sont disponibles sur Internet.</abstract>
<note type="content">Fig. 1: Transcription, interlinear gloss and free translation of a Hayu narrative.</note>
<note type="content">Fig. 2: An example of Shoebox annotation.</note>
<note type="content">Fig. 3: Fragment of the LACITO DTD.</note>
<note type="content">Fig. 4: Markup of a linguistic text.</note>
<note type="content">Fig. 5: SoundIndex1.</note>
<note type="content">Fig. 6: SoundIndex2.</note>
<note type="content">Fig. 7: Browsing the archive.</note>
<note type="content">Fig. 8: Communication from script to applet in text-driven mode.</note>
<note type="content">Fig. 9: Communication between script and applet in time-driven mode.</note>
<note type="content">Fig. 10: Order of execution of anchored elements.</note>
<relatedItem type="host"><titleInfo><title>Speech Communication</title>
</titleInfo>
<titleInfo type="abbreviated"><title>SPECOM</title>
</titleInfo>
<name type="conference"><namePart>Speech Annotation and Corpus Tools</namePart>
<namePart>Speech Annotation</namePart>
</name>
<name type="personal"><namePart>S. Bird and J. Harrington</namePart>
<role><roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="journal">journal</genre>
<originInfo><dateIssued encoding="w3cdtf">200101</dateIssued>
</originInfo>
<identifier type="ISSN">0167-6393</identifier>
<identifier type="PII">S0167-6393(00)X0050-7</identifier>
<part><date>200101</date>
<detail type="issue"><title>Speech Annotation and Corpus Tools</title>
</detail>
<detail type="volume"><number>33</number>
<caption>vol.</caption>
</detail>
<detail type="issue"><number>1–2</number>
<caption>no.</caption>
</detail>
<extent unit="issue pages"><start>1</start>
<end>176</end>
</extent>
<extent unit="pages"><start>79</start>
<end>96</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">76F37F4EC8D5D4F4473AAD428436716F4418582A</identifier>
<identifier type="DOI">10.1016/S0167-6393(00)00070-4</identifier>
<identifier type="PII">S0167-6393(00)00070-4</identifier>
<accessCondition type="use and reproduction" contentType="">© 2001Elsevier Science B.V.</accessCondition>
<recordInfo><recordContentSource>ELSEVIER</recordContentSource>
<recordOrigin>Elsevier Science B.V., ©2001</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments><istex:catWosTEI uri="https://api.istex.fr/document/76F37F4EC8D5D4F4473AAD428436716F4418582A/enrichments/catWos"><teiHeader><profileDesc><textClass><classCode scheme="WOS">COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS</classCode>
<classCode scheme="WOS">ACOUSTICS</classCode>
</textClass>
</profileDesc>
</teiHeader>
</istex:catWosTEI>
</enrichments>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000420 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000420 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:76F37F4EC8D5D4F4473AAD428436716F4418582A
   |texte=   Linguistic documents synchronizing sound and text
}}

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024

	Serveur d'exploration sur la TEI
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la TEI

Linguistic documents synchronizing sound and text

Linguistic documents synchronizing sound and text

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri