Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

TEI Analytics: converting documents into a TEI format for cross-collection text analysis

Identifieur interne : 000266 ( Istex/Corpus ); précédent : 000265; suivant : 000267

TEI Analytics: converting documents into a TEI format for cross-collection text analysis

Auteurs : Brian L. Pytlik Zillig

Source :

RBID : ISTEX:5A0FF8D4446AA580A277839CACF787F8BEEC9271

Abstract

For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.

Url:
DOI: 10.1093/llc/fqp005

Links to Exploration step

ISTEX:5A0FF8D4446AA580A277839CACF787F8BEEC9271

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
<author wicri:is="90%">
<name sortKey="Pytlik Zillig, Brian L" sort="Pytlik Zillig, Brian L" uniqKey="Pytlik Zillig B" first="Brian L." last="Pytlik Zillig">Brian L. Pytlik Zillig</name>
<affiliation>
<mods:affiliation>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: bpytlikz@unlnotes.unl.edu</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5A0FF8D4446AA580A277839CACF787F8BEEC9271</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1093/llc/fqp005</idno>
<idno type="url">https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000266</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
<author wicri:is="90%">
<name sortKey="Pytlik Zillig, Brian L" sort="Pytlik Zillig, Brian L" uniqKey="Pytlik Zillig B" first="Brian L." last="Pytlik Zillig">Brian L. Pytlik Zillig</name>
<affiliation>
<mods:affiliation>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: bpytlikz@unlnotes.unl.edu</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2009-06">2009-06</date>
<biblScope unit="volume">24</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="187">187</biblScope>
<biblScope unit="page" to="192">192</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">5A0FF8D4446AA580A277839CACF787F8BEEC9271</idno>
<idno type="DOI">10.1093/llc/fqp005</idno>
<idno type="ArticleID">fqp005</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract">For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.</div>
</front>
</TEI>
<istex>
<corpusName>oup</corpusName>
<author>
<json:item>
<name>Brian L. Pytlik Zillig</name>
<affiliations>
<json:string>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</json:string>
<json:string>E-mail: bpytlikz@unlnotes.unl.edu</json:string>
</affiliations>
</json:item>
</author>
<subject>
<json:item>
<lang>
<json:string>eng</json:string>
</lang>
<value>Original Articles</value>
</json:item>
</subject>
<articleId>
<json:string>fqp005</json:string>
</articleId>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.</abstract>
<qualityIndicators>
<score>4.635</score>
<pdfVersion>1.4</pdfVersion>
<pdfPageSize>538.583 x 697.323 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>1</keywordCount>
<abstractCharCount>779</abstractCharCount>
<pdfWordCount>2695</pdfWordCount>
<pdfCharCount>18264</pdfCharCount>
<pdfPageCount>6</pdfPageCount>
<abstractWordCount>120</abstractWordCount>
</qualityIndicators>
<title>TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<volume>24</volume>
<publisherId>
<json:string>litlin</json:string>
</publisherId>
<pages>
<last>192</last>
<first>187</first>
</pages>
<issn>
<json:string>0268-1145</json:string>
</issn>
<issue>2</issue>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1477-4615</json:string>
</eissn>
<title>Literary and Linguistic Computing</title>
</host>
<categories>
<wos>
<json:string>LINGUISTICS</json:string>
<json:string>LITERATURE</json:string>
</wos>
</categories>
<publicationDate>2009</publicationDate>
<copyrightDate>2009</copyrightDate>
<doi>
<json:string>10.1093/llc/fqp005</json:string>
</doi>
<id>5A0FF8D4446AA580A277839CACF787F8BEEC9271</id>
<score>0.18960893</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a">TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Oxford University Press</publisher>
<availability>
<p>OUP</p>
</availability>
<date>2009-04-15</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a">TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
<author>
<persName>
<forename type="first">Brian L.</forename>
<surname>Pytlik Zillig</surname>
</persName>
<email>bpytlikz@unlnotes.unl.edu</email>
<affiliation>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Literary and Linguistic Computing</title>
<idno type="pISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2009-06"></date>
<biblScope unit="volume">24</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="187">187</biblScope>
<biblScope unit="page" to="192">192</biblScope>
</imprint>
</monogr>
<idno type="istex">5A0FF8D4446AA580A277839CACF787F8BEEC9271</idno>
<idno type="DOI">10.1093/llc/fqp005</idno>
<idno type="ArticleID">fqp005</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2009-04-15</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract>
<p>For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.</p>
</abstract>
<textClass>
<keywords scheme="keyword">
<list>
<item>
<term>Original Articles</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2009-04-15">Created</change>
<change when="2009-06">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus oup" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="utf-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" URI="journalpublishing.dtd" name="istex:docType"></istex:docType>
<istex:document>
<article article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">litlin</journal-id>
<journal-id journal-id-type="hwp">litlin</journal-id>
<journal-title>Literary and Linguistic Computing</journal-title>
<issn pub-type="ppub">0268-1145</issn>
<issn pub-type="epub">1477-4615</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.1093/llc/fqp005</article-id>
<article-id pub-id-type="publisher-id">fqp005</article-id>
<article-categories>
<subj-group>
<subject>Original Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>TEI Analytics: converting documents into a TEI format for cross-collection text analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Pytlik Zillig</surname>
<given-names>Brian L.</given-names>
</name>
</contrib>
</contrib-group>
<aff>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</aff>
<author-notes>
<corresp>
<bold>Correspondence:</bold>
Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA.
<bold>E-mail:</bold>
<email>bpytlikz@unlnotes.unl.edu</email>
;
<email>bzillig1@unl.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>6</month>
<year>2009</year>
</pub-date>
<pub-date pub-type="epub">
<day>15</day>
<month>4</month>
<year>2009</year>
</pub-date>
<volume>24</volume>
<issue>2</issue>
<issue-title>Special Issue 'Selected papers from Digital Humanities 2008, University of Oulu, Finland, June 25–29'</issue-title>
<fpage>187</fpage>
<lpage>192</lpage>
<permissions>
<copyright-statement>© The Author 2009. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</copyright-statement>
<copyright-year>2009</copyright-year>
</permissions>
<abstract>
<p>For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.</p>
</abstract>
</article-meta>
</front>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo>
<title>TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA">
<title>TEI Analytics: converting documents into a TEI format for cross-collection text analysis</title>
</titleInfo>
<name type="personal">
<namePart type="given">Brian L.</namePart>
<namePart type="family">Pytlik Zillig</namePart>
<affiliation>Center for Digital Research in the Humanities, University of Nebraska, Lincoln, NE, USA</affiliation>
<affiliation>E-mail: bpytlikz@unlnotes.unl.edu</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article"></genre>
<subject>
<topic>Original Articles</topic>
</subject>
<originInfo>
<publisher>Oxford University Press</publisher>
<dateIssued encoding="w3cdtf">2009-06</dateIssued>
<dateCreated encoding="w3cdtf">2009-04-15</dateCreated>
<copyrightDate encoding="w3cdtf">2009</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract>For the purposes of large-scale analysis of XML/SGML files, converting humanities texts into a common form of markup represents a technical challenge. The MONK (Metadata Offer New Knowledge) Project has developed both a common format, TEI Analytics (a TEI subset designed to facilitate interoperability of text archives) and a command-line tool, Abbot, that performs the conversion. Abbot relies upon a new technique, schema harvesting, developed by the author to convert text documents into TEI-A. This article has two aims: first, to describe the TEI-A format itself and, second, to outline the methods used to convert files. More generally, it is hoped that the techniques described will lead to greater interoperability of text documents for text analysis in a wider context.</abstract>
<relatedItem type="host">
<titleInfo>
<title>Literary and Linguistic Computing</title>
</titleInfo>
<genre type="journal">journal</genre>
<identifier type="ISSN">0268-1145</identifier>
<identifier type="eISSN">1477-4615</identifier>
<identifier type="PublisherID">litlin</identifier>
<identifier type="PublisherID-hwp">litlin</identifier>
<part>
<date>2009</date>
<detail type="title">
<title>Special Issue 'Selected papers from Digital Humanities 2008, University of Oulu, Finland, June 2529'</title>
</detail>
<detail type="volume">
<caption>vol.</caption>
<number>24</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>2</number>
</detail>
<extent unit="pages">
<start>187</start>
<end>192</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">5A0FF8D4446AA580A277839CACF787F8BEEC9271</identifier>
<identifier type="DOI">10.1093/llc/fqp005</identifier>
<identifier type="ArticleID">fqp005</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© The Author 2009. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</accessCondition>
<recordInfo>
<recordContentSource>OUP</recordContentSource>
</recordInfo>
</mods>
</metadata>
<covers>
<json:item>
<original>true</original>
<mimetype>image/tiff</mimetype>
<extension>tiff</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/covers/tiff</uri>
</json:item>
</covers>
<annexes>
<json:item>
<original>true</original>
<mimetype>image/jpeg</mimetype>
<extension>jpeg</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/annexes/jpeg</uri>
</json:item>
<json:item>
<original>true</original>
<mimetype>image/gif</mimetype>
<extension>gif</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/annexes/gif</uri>
</json:item>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/annexes/pdf</uri>
</json:item>
</annexes>
<enrichments>
<istex:catWosTEI uri="https://api.istex.fr/document/5A0FF8D4446AA580A277839CACF787F8BEEC9271/enrichments/catWos">
<teiHeader>
<profileDesc>
<textClass>
<classCode scheme="WOS">LINGUISTICS</classCode>
<classCode scheme="WOS">LITERATURE</classCode>
</textClass>
</profileDesc>
</teiHeader>
</istex:catWosTEI>
</enrichments>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000266 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000266 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:5A0FF8D4446AA580A277839CACF787F8BEEC9271
   |texte=   TEI Analytics: converting documents into a TEI format for cross-collection text analysis
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024