TeiVM2, PascalFrancis, Corpus, bibRecord, 000074

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

Identifieur interne : 000074 ( PascalFrancis/Corpus ); précédent : 000073; suivant : 000075

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

Auteurs : P. Resnik ; M. B. Olsen ; M. Diab

Source :

Computers and the humanities [ 0010-4817 ] ; 1999.

RBID : Francis:524-99-12218

Descripteurs français

Pascal (Inist)
- Linguistique informatique, Annotation de corpus, Texte électronique, Traduction, Alignement, Recherche linguistique, Encodage, Bible, TEI, Corpus parallèle.

English descriptors

KwdEn :
- Alignment, Computational linguistics, Corpus annotation, Electronic text, Parallel corpus, TEI, Translation.

Abstract

We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0010-4817`
A02	`01`			`@0 COHUAD`
A03		`1`		`@0 Comput. humanit.`
A05				`@2 33`
A06				`@2 1-2`
A08	`01`	`1`	`ENG`	`@1 The Bible as a parallel corpus : Annotating the "book of 2000 tongues"`
A09	`01`	`1`	`ENG`	`@1 Selected papers from TEI 10: Celebrating the tenth anniversary of the Text Encoding Initiative`
A11	`01`	`1`		`@1 RESNIK (P.)`
A11	`02`	`1`		`@1 OLSEN (M. B.)`
A11	`03`	`1`		`@1 DIAB (M.)`
A12	`01`	`1`		`@1 MYLONAS (Elli) @9 ed.`
A12	`02`	`1`		`@1 RENEAR (Allen) @9 ed.`
A14	`01`			`@1 Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland @2 College Park, MD 20742 @3 USA @Z 1 aut. @Z 2 aut. @Z 3 aut.`
A15	`01`			`@1 Scholarly Technology Group, Brown University @2 Providence, RI @3 USA @Z 1 aut. @Z 2 aut.`
A20				`@1 129-153`
A21				`@1 1999`
A23	`01`			`@0 ENG`
A43	`01`			`@1 INIST @2 14902 @5 354000084333370100`
A44				`@0 0000 @1 © 1999 INIST-CNRS. All rights reserved.`
A45				`@0 1 p.1/2`
A47	`01`	`1`		`@0 524-99-12218`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Computers and the humanities`
A66	`01`			`@0 NLD`
A68	`01`	`1`	`FRE`	`@1 La Bible en tant que corpus parallèle : Annoter le "livre des 2000 langues"`
A69	`01`	`1`	`FRE`	`@1 Sélection d'articles célébrant le 10anniversaire de la TEI`
A99				`@0 22 notes`
C01	`01`		`ENG`	@0 We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language
C02	`01`	`L`		`@0 52478 @1 XV`
C02	`02`	`L`		`@0 524`
C03	`01`	`L`	`FRE`	`@0 Linguistique informatique @5 01`
C03	`01`	`L`	`ENG`	`@0 Computational linguistics @5 01`
C03	`02`	`L`	`FRE`	`@0 Annotation de corpus @5 02`
C03	`02`	`L`	`ENG`	`@0 Corpus annotation @5 02`
C03	`03`	`L`	`FRE`	`@0 Texte électronique @5 04`
C03	`03`	`L`	`ENG`	`@0 Electronic text @5 04`
C03	`04`	`L`	`FRE`	`@0 Traduction @5 05`
C03	`04`	`L`	`ENG`	`@0 Translation @5 05`
C03	`05`	`L`	`FRE`	`@0 Alignement @5 06`
C03	`05`	`L`	`ENG`	`@0 Alignment @5 06`
C03	`06`	`L`	`FRE`	`@0 Recherche linguistique @5 08`
C03	`07`	`L`	`FRE`	`@0 Encodage @4 INC @5 31`
C03	`08`	`L`	`FRE`	`@0 Bible @4 INC @5 32`
C03	`09`	`L`	`FRE`	`@0 TEI @4 CD @5 96`
C03	`09`	`L`	`ENG`	`@0 TEI @4 CD @5 96`
C03	`10`	`L`	`FRE`	`@0 Corpus parallèle @4 CD @5 97`
C03	`10`	`L`	`ENG`	`@0 Parallel corpus @4 CD @5 97`
N21				`@1 193`

A30	`01`	`1`	`ENG`	`@1 Text Encoding Initiative 10th Anniversary Conference @3 Providence, RI USA @4 1997-11`

Format Inist (serveur)

NO :	FRANCIS 524-99-12218 INIST
FT :	(La Bible en tant que corpus parallèle : Annoter le "livre des 2000 langues")
ET :	The Bible as a parallel corpus : Annotating the "book of 2000 tongues"
AU :	RESNIK (P.); OLSEN (M. B.); DIAB (M.); MYLONAS (Elli); RENEAR (Allen)
AF :	Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland/College Park, MD 20742/Etats-Unis (1 aut., 2 aut., 3 aut.); Scholarly Technology Group, Brown University/Providence, RI/Etats-Unis (1 aut., 2 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Computers and the humanities; ISSN 0010-4817; Coden COHUAD; Pays-Bas; Da. 1999; Vol. 33; No. 1-2; Pp. 129-153; Bibl. 1 p.1/2
LA :	Anglais
EA :	We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language
CC :	52478; 524
FD :	Linguistique informatique; Annotation de corpus; Texte électronique; Traduction; Alignement; Recherche linguistique; Encodage; Bible; TEI; Corpus parallèle
ED :	Computational linguistics; Corpus annotation; Electronic text; Translation; Alignment; TEI; Parallel corpus
LO :	INIST-14902.354000084333370100
ID :	524

Links to Exploration step

Francis:524-99-12218

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</title>
<author><name sortKey="Resnik, P" sort="Resnik, P" uniqKey="Resnik P" first="P." last="Resnik">P. Resnik</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Olsen, M B" sort="Olsen, M B" uniqKey="Olsen M" first="M. B." last="Olsen">M. B. Olsen</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Diab, M" sort="Diab, M" uniqKey="Diab M" first="M." last="Diab">M. Diab</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">524-99-12218</idno>
<date when="1999">1999</date>
<idno type="stanalyst">FRANCIS 524-99-12218 INIST</idno>
<idno type="RBID">Francis:524-99-12218</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000074</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</title>
<author><name sortKey="Resnik, P" sort="Resnik, P" uniqKey="Resnik P" first="P." last="Resnik">P. Resnik</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Olsen, M B" sort="Olsen, M B" uniqKey="Olsen M" first="M. B." last="Olsen">M. B. Olsen</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Diab, M" sort="Diab, M" uniqKey="Diab M" first="M." last="Diab">M. Diab</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
<imprint><date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Alignment</term>
<term>Computational linguistics</term>
<term>Corpus annotation</term>
<term>Electronic text</term>
<term>Parallel corpus</term>
<term>TEI</term>
<term>Translation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Linguistique informatique</term>
<term>Annotation de corpus</term>
<term>Texte électronique</term>
<term>Traduction</term>
<term>Alignement</term>
<term>Recherche linguistique</term>
<term>Encodage</term>
<term>Bible</term>
<term>TEI</term>
<term>Corpus parallèle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0010-4817</s0>
</fA01>
<fA02 i1="01"><s0>COHUAD</s0>
</fA02>
<fA03 i2="1"><s0>Comput. humanit.</s0>
</fA03>
<fA05><s2>33</s2>
</fA05>
<fA06><s2>1-2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG"><s1>The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Selected papers from TEI 10: Celebrating the tenth anniversary of the Text Encoding Initiative</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>RESNIK (P.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>OLSEN (M. B.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>DIAB (M.)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>MYLONAS (Elli)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>RENEAR (Allen)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA15 i1="01"><s1>Scholarly Technology Group, Brown University</s1>
<s2>Providence, RI</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA15>
<fA20><s1>129-153</s1>
</fA20>
<fA21><s1>1999</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>14902</s2>
<s5>354000084333370100</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 1999 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>1 p.1/2</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>524-99-12218</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Computers and the humanities</s0>
</fA64>
<fA66 i1="01"><s0>NLD</s0>
</fA66>
<fA68 i1="01" i2="1" l="FRE"><s1>La Bible en tant que corpus parallèle : Annoter le "livre des 2000 langues"</s1>
</fA68>
<fA69 i1="01" i2="1" l="FRE"><s1>Sélection d'articles célébrant le 10<sup> </sup>
anniversaire de la TEI</s1>
</fA69>
<fA99><s0>22 notes</s0>
</fA99>
<fC01 i1="01" l="ENG"><s0>We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language</s0>
</fC01>
<fC02 i1="01" i2="L"><s0>52478</s0>
<s1>XV</s1>
</fC02>
<fC02 i1="02" i2="L"><s0>524</s0>
</fC02>
<fC03 i1="01" i2="L" l="FRE"><s0>Linguistique informatique</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="L" l="ENG"><s0>Computational linguistics</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="L" l="FRE"><s0>Annotation de corpus</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="L" l="ENG"><s0>Corpus annotation</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="L" l="FRE"><s0>Texte électronique</s0>
<s5>04</s5>
</fC03>
<fC03 i1="03" i2="L" l="ENG"><s0>Electronic text</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="L" l="FRE"><s0>Traduction</s0>
<s5>05</s5>
</fC03>
<fC03 i1="04" i2="L" l="ENG"><s0>Translation</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="L" l="FRE"><s0>Alignement</s0>
<s5>06</s5>
</fC03>
<fC03 i1="05" i2="L" l="ENG"><s0>Alignment</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="L" l="FRE"><s0>Recherche linguistique</s0>
<s5>08</s5>
</fC03>
<fC03 i1="07" i2="L" l="FRE"><s0>Encodage</s0>
<s4>INC</s4>
<s5>31</s5>
</fC03>
<fC03 i1="08" i2="L" l="FRE"><s0>Bible</s0>
<s4>INC</s4>
<s5>32</s5>
</fC03>
<fC03 i1="09" i2="L" l="FRE"><s0>TEI</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="09" i2="L" l="ENG"><s0>TEI</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="10" i2="L" l="FRE"><s0>Corpus parallèle</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="10" i2="L" l="ENG"><s0>Parallel corpus</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fN21><s1>193</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>Text Encoding Initiative 10th Anniversary Conference</s1>
<s3>Providence, RI USA</s3>
<s4>1997-11</s4>
</fA30>
</pR>
</standard>
<server><NO>FRANCIS 524-99-12218 INIST</NO>
<FT>(La Bible en tant que corpus parallèle : Annoter le "livre des 2000 langues")</FT>
<ET>The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</ET>
<AU>RESNIK (P.); OLSEN (M. B.); DIAB (M.); MYLONAS (Elli); RENEAR (Allen)</AU>
<AF>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland/College Park, MD 20742/Etats-Unis (1 aut., 2 aut., 3 aut.); Scholarly Technology Group, Brown University/Providence, RI/Etats-Unis (1 aut., 2 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Computers and the humanities; ISSN 0010-4817; Coden COHUAD; Pays-Bas; Da. 1999; Vol. 33; No. 1-2; Pp. 129-153; Bibl. 1 p.1/2</SO>
<LA>Anglais</LA>
<EA>We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language</EA>
<CC>52478; 524</CC>
<FD>Linguistique informatique; Annotation de corpus; Texte électronique; Traduction; Alignement; Recherche linguistique; Encodage; Bible; TEI; Corpus parallèle</FD>
<ED>Computational linguistics; Corpus annotation; Electronic text; Translation; Alignment; TEI; Parallel corpus</ED>
<LO>INIST-14902.354000084333370100</LO>
<ID>524</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000074 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000074 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Francis:524-99-12218
   |texte=   The Bible as a parallel corpus : Annotating the "book of 2000 tongues"
}}

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024

	Serveur d'exploration sur la TEI
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la TEI

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri