Serveur d'exploration sur l'Université de Trèves

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rule-based Search in Text Databases with Nonstandard Orthography

Identifieur interne : 001661 ( Istex/Corpus ); précédent : 001660; suivant : 001662

Rule-based Search in Text Databases with Nonstandard Orthography

Auteurs : Thomas Pilz ; Wolfram Luther ; Norbert Fuhr ; Ulrich Ammon ; Ulrich Ammon

Source :

RBID : ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65

Abstract

In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).

Url:
DOI: 10.1093/llc/fql020

Links to Exploration step

ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<author>
<name sortKey="Pilz, Thomas" sort="Pilz, Thomas" uniqKey="Pilz T" first="Thomas" last="Pilz">Thomas Pilz</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Fuhr, Norbert" sort="Fuhr, Norbert" uniqKey="Fuhr N" first="Norbert" last="Fuhr">Norbert Fuhr</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author wicri:is="90%">
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation>
<mods:affiliation>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1093/llc/fql020</idno>
<idno type="url">https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001661</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001661</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<author>
<name sortKey="Pilz, Thomas" sort="Pilz, Thomas" uniqKey="Pilz T" first="Thomas" last="Pilz">Thomas Pilz</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Fuhr, Norbert" sort="Fuhr, Norbert" uniqKey="Fuhr N" first="Norbert" last="Fuhr">Norbert Fuhr</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation>
<mods:affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
</author>
<author wicri:is="90%">
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation>
<mods:affiliation>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2006-06">2006-06</date>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="179">179</biblScope>
<biblScope unit="page" to="186">186</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</idno>
<idno type="DOI">10.1093/llc/fql020</idno>
<idno type="local">fql020</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</div>
</front>
</TEI>
<istex>
<corpusName>oup</corpusName>
<author>
<json:item>
<name>Thomas Pilz</name>
<affiliations>
<json:string>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</json:string>
</affiliations>
</json:item>
<json:item>
<name>Wolfram Luther</name>
<affiliations>
<json:string>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</json:string>
</affiliations>
</json:item>
<json:item>
<name>Norbert Fuhr</name>
<affiliations>
<json:string>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</json:string>
</affiliations>
</json:item>
<json:item>
<name>Ulrich Ammon</name>
<affiliations>
<json:string>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</json:string>
</affiliations>
</json:item>
<json:item>
<name>Ulrich Ammon</name>
<affiliations>
<json:string>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</json:string>
<json:string>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.</json:string>
</affiliations>
</json:item>
</author>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>research-article</json:string>
</originalGenre>
<abstract>In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</abstract>
<qualityIndicators>
<score>7.081</score>
<pdfVersion>1.4</pdfVersion>
<pdfPageSize>539 x 697 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>1028</abstractCharCount>
<pdfWordCount>3329</pdfWordCount>
<pdfCharCount>23146</pdfCharCount>
<pdfPageCount>8</pdfPageCount>
<abstractWordCount>146</abstractWordCount>
</qualityIndicators>
<title>Rule-based Search in Text Databases with Nonstandard Orthography</title>
<refBibs>
<json:item>
<host>
<author>
<json:item>
<name>U Ammon</name>
</json:item>
</author>
<title>Variationslinguistik/Linguistics of Variation</title>
<publicationDate>1998</publicationDate>
</host>
</json:item>
<json:item>
<author>
<json:item>
<name>D Biella</name>
</json:item>
<json:item>
<name>E Dyllong</name>
</json:item>
<json:item>
<name>H Kaiser</name>
</json:item>
<json:item>
<name>W Luther</name>
</json:item>
<json:item>
<name> Mittmann</name>
</json:item>
<json:item>
<name> Th</name>
</json:item>
</author>
<host>
<pages>
<last>25</last>
<first>23</first>
</pages>
<author></author>
<title>Ein Arbeitsbericht zum Duisburger Retrodigitalisierungsprojekt. Kolloquium</title>
<publicationDate>2002-09</publicationDate>
</host>
<title>Wege zur digitalen Erfassung der Nachwirkung Nietzsches in Deutschland von 1865–1945</title>
<publicationDate>2002-09</publicationDate>
</json:item>
<json:item>
<author>
<json:item>
<name>D Biella</name>
</json:item>
<json:item>
<name>E Dyllong</name>
</json:item>
<json:item>
<name>H Kaiser</name>
</json:item>
<json:item>
<name>W Luther</name>
</json:item>
<json:item>
<name> Mittmann</name>
</json:item>
<json:item>
<name> Th</name>
</json:item>
</author>
<host>
<pages>
<last>12</last>
<first>8</first>
</pages>
<author></author>
<title>Proceedings ICHIM03</title>
<publicationDate>1945-09</publicationDate>
</host>
<title>Edition e ´lectronique de la réception de Nietzsche des années 1865 a</title>
<publicationDate>1945-09</publicationDate>
</json:item>
<json:item>
<author>
<json:item>
<name>R Christmann</name>
</json:item>
<json:item>
<name>T Schares</name>
</json:item>
</author>
<host>
<volume>18</volume>
<pages>
<last>22</last>
<first>11</first>
</pages>
<issue>1</issue>
<author></author>
<title>Literary and Linguistic Computing</title>
<publicationDate>2003</publicationDate>
</host>
<title>Towards the User: the digital edition of the Deutsche Wörterbuch by</title>
<publicationDate>2003</publicationDate>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>Deutsch Diachron Digital ( Ddd ). Berlin</name>
</json:item>
</author>
<publicationDate>2005-12-20</publicationDate>
</host>
</json:item>
<json:item>
<author>
<json:item>
<name>N Fuhr</name>
</json:item>
<json:item>
<name>K Großjohann</name>
</json:item>
</author>
<host>
<volume>22</volume>
<pages>
<last>56</last>
<first>313</first>
</pages>
<issue>2</issue>
<author></author>
<title>ACM Transactions on Information Systems</title>
<publicationDate>2004</publicationDate>
</host>
<title>XIRQL: an XML query language based on information retrieval concepts</title>
<publicationDate>2004</publicationDate>
</json:item>
<json:item>
<author>
<json:item>
<name>N Fuhr</name>
</json:item>
<json:item>
<name>N Gövert</name>
</json:item>
<json:item>
<name>K Großjohann</name>
</json:item>
</author>
<host>
<pages>
<first>449</first>
</pages>
<author></author>
<title>Proceedings of the 25th Annual International Conference on Research and Development in Information Retrieval</title>
<publicationDate>2002</publicationDate>
</host>
<title>HyREX: Hypermedia Retrieval Engine for XML</title>
<publicationDate>2002</publicationDate>
</json:item>
<json:item>
<author>
<json:item>
<name>S Hockey</name>
</json:item>
</author>
<host>
<pages>
<last>24</last>
<first>7</first>
</pages>
<author></author>
<title>Literary and Linguistic Computing</title>
<publicationDate>2004</publicationDate>
</host>
<title>Living with google: perspectives on humanities computing and digital libraries</title>
<publicationDate>2004</publicationDate>
</json:item>
<json:item>
<host>
<author></author>
<title>Rule-based Search in Text Databases Literary and Linguistic Computing</title>
<publicationDate>2006</publicationDate>
</host>
</json:item>
<json:item>
<author>
<json:item>
<name>N Ide</name>
</json:item>
<json:item>
<name>P Bonhomme</name>
</json:item>
<json:item>
<name>L Romary</name>
</json:item>
</author>
<host>
<pages>
<first>31</first>
</pages>
<author></author>
<title>Proceedings of the 2nd International Conference on Language Resources & Evaluation</title>
<publicationDate>2000-03</publicationDate>
</host>
<title>XCES: An XML-based Encoding Standard for Linguistic Corpora</title>
<publicationDate>2000-03</publicationDate>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>R Keller</name>
</json:item>
</author>
<title>Die Deutsche Sprache und ihre historische Entwicklung. trl. Karl-Heinz Mulagk</title>
<publicationDate>1986</publicationDate>
</host>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>S Kempken</name>
</json:item>
</author>
<title>Bewertung historischer und regionaler Schreibvarianten mit Hilfe von Abstandsmaßen. Diploma Thesis</title>
<publicationDate>2005</publicationDate>
</host>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>H,H Munske</name>
</json:item>
</author>
<title>Orthographie als Sprachkultur</title>
<publicationDate>1997</publicationDate>
</host>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>Th Pilz</name>
</json:item>
</author>
<title>Unscharfe Suche in Textdatenbanken mit nichtstandardisierter Rechtschreibung am Beispiel von Frakturtexten zur Nietzsche-Rezeption. Thesis</title>
<publicationDate>2003</publicationDate>
</host>
</json:item>
<json:item>
<author>
<json:item>
<name>P Rayson</name>
</json:item>
<json:item>
<name>D Archer</name>
</json:item>
<json:item>
<name>N Smith</name>
</json:item>
</author>
<host>
<pages>
<last>17</last>
<first>14</first>
</pages>
<author></author>
<title>Proceedings of the Corpus Linguistics 2005 conference</title>
<publicationDate>2005-07</publicationDate>
</host>
<title>VARD versus Word: a comparison of the UCREL variant detector and modern spell checkers on English historical corpora</title>
<publicationDate>2005-07</publicationDate>
</json:item>
<json:item>
<author>
<json:item>
<name>E Ristad</name>
</json:item>
<json:item>
<name>P Yianilos</name>
</json:item>
</author>
<host>
<volume>20</volume>
<pages>
<last>32</last>
<first>522</first>
</pages>
<issue>5</issue>
<author></author>
<title>IEEE Transactions on Pattern Recognition and Machine Intelligence</title>
<publicationDate>1998</publicationDate>
</host>
<title>Learning string edit distance</title>
<publicationDate>1998</publicationDate>
</json:item>
<json:item>
<host>
<author>
<json:item>
<name>C,M Sperberg-Mcqueen</name>
</json:item>
<json:item>
<name>L Burnard</name>
</json:item>
</author>
<title>Guidelines for Electronic Text Encoding and Interchange (TEI P3) Chicago and Oxford: Text Encoding Initiative</title>
<publicationDate>2001</publicationDate>
</host>
</json:item>
<json:item>
<author>
<json:item>
<name>J Zobel</name>
</json:item>
<json:item>
<name>J Dart</name>
</json:item>
</author>
<host>
<pages>
<last>72</last>
<first>166</first>
</pages>
<author></author>
<title>Proceedings of the 19th Interatinoal Conference on Research and Development in Information Retrieval (SIGIR'96)</title>
<publicationDate>1996</publicationDate>
</host>
<title>Phonetic string matching: lessons from information retreival</title>
<publicationDate>1996</publicationDate>
</json:item>
</refBibs>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<volume>21</volume>
<publisherId>
<json:string>litlin</json:string>
</publisherId>
<pages>
<last>186</last>
<first>179</first>
</pages>
<issn>
<json:string>0268-1145</json:string>
</issn>
<issue>2</issue>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1477-4615</json:string>
</eissn>
<title>Literary and Linguistic Computing</title>
</host>
<categories>
<wos>
<json:string>social science</json:string>
<json:string>linguistics</json:string>
</wos>
<scienceMetrix>
<json:string>arts & humanities</json:string>
<json:string>communication & textual studies</json:string>
<json:string>languages & linguistics</json:string>
</scienceMetrix>
</categories>
<publicationDate>2006</publicationDate>
<copyrightDate>2006</copyrightDate>
<doi>
<json:string>10.1093/llc/fql020</json:string>
</doi>
<id>623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</id>
<score>0.027072497</score>
<fulltext>
<json:item>
<extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/pdf</uri>
</json:item>
<json:item>
<extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<respStmt>
<resp>Références bibliographiques récupérées via GROBID</resp>
<name resp="ISTEX-API">ISTEX-API (INIST-CNRS)</name>
</respStmt>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Oxford University Press</publisher>
<availability>
<p>© The Author 2006. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</p>
</availability>
<date>2006-04-20</date>
</publicationStmt>
<notesStmt>
<note>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany. E-mail: pilz@informatik.uni-duisburg.de</note>
</notesStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<author xml:id="author-1">
<persName>
<forename type="first">Thomas</forename>
<surname>Pilz</surname>
</persName>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
</author>
<author xml:id="author-2">
<persName>
<forename type="first">Wolfram</forename>
<surname>Luther</surname>
</persName>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
</author>
<author xml:id="author-3">
<persName>
<forename type="first">Norbert</forename>
<surname>Fuhr</surname>
</persName>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
</author>
<author xml:id="author-4">
<persName>
<forename type="first">Ulrich</forename>
<surname>Ammon</surname>
</persName>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
</author>
<author xml:id="author-5">
<persName>
<forename type="first">Ulrich</forename>
<surname>Ammon</surname>
</persName>
<affiliation>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</affiliation>
<affiliation>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.</affiliation>
</author>
</analytic>
<monogr>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="pISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2006-06"></date>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="179">179</biblScope>
<biblScope unit="page" to="186">186</biblScope>
</imprint>
</monogr>
<idno type="istex">623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</idno>
<idno type="DOI">10.1093/llc/fql020</idno>
<idno type="local">fql020</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2006-04-20</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</p>
</abstract>
</profileDesc>
<revisionDesc>
<change when="2006-04-20">Created</change>
<change when="2006-06">Published</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-12-22">References added</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="corpus oup" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="US-ASCII"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" URI="journalpublishing.dtd" name="istex:docType"></istex:docType>
<istex:document>
<article xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">litlin</journal-id>
<journal-id journal-id-type="hwp">litlin</journal-id>
<journal-title>Literary and Linguistic Computing</journal-title>
<abbrev-journal-title abbrev-type="publisher">Lit Linguist Computing</abbrev-journal-title>
<issn pub-type="ppub">0268-1145</issn>
<issn pub-type="epub">1477-4615</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="other">fql020</article-id>
<article-id pub-id-type="doi">10.1093/llc/fql020</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Rule-based Search in Text Databases with Nonstandard Orthography</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Pilz</surname>
<given-names>Thomas</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Luther</surname>
<given-names>Wolfram</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fuhr</surname>
<given-names>Norbert</given-names>
</name>
</contrib>
<aff>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</aff>
</contrib-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ammon</surname>
<given-names>Ulrich</given-names>
</name>
</contrib>
<aff>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</aff>
</contrib-group>
<author-notes>
<corresp>
<bold>Correspondence:</bold>
T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.
<bold>E-mail:</bold>
<ext-link xlink:href="pilz@informatik.uni-duisburg.de" ext-link-type="email">pilz@informatik.uni-duisburg.de</ext-link>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>4</month>
<year>2006</year>
</pub-date>
<pub-date pub-type="ppub">
<month>June</month>
<year>2006</year>
</pub-date>
<volume>21</volume>
<issue>2</issue>
<fpage>179</fpage>
<lpage>186</lpage>
<permissions>
<copyright-statement>© The Author 2006. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</copyright-statement>
<copyright-year>2006</copyright-year>
</permissions>
<abstract xml:lang="en">
<p>In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</p>
</abstract>
<custom-meta-wrap>
<custom-meta>
<meta-name>hwp-legacy-fpage</meta-name>
<meta-value>179</meta-value>
</custom-meta>
<custom-meta>
<meta-name>cover-date</meta-name>
<meta-value>June 2006</meta-value>
</custom-meta>
<custom-meta>
<meta-name>hwp-legacy-dochead</meta-name>
<meta-value>Original Article</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
</article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Rule-based Search in Text Databases with Nonstandard Orthography</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>Rule-based Search in Text Databases with Nonstandard Orthography</title>
</titleInfo>
<name type="personal">
<namePart type="given">Thomas</namePart>
<namePart type="family">Pilz</namePart>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Wolfram</namePart>
<namePart type="family">Luther</namePart>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Norbert</namePart>
<namePart type="family">Fuhr</namePart>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ulrich</namePart>
<namePart type="family">Ammon</namePart>
<affiliation>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen, Germany</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Ulrich</namePart>
<namePart type="family">Ammon</namePart>
<affiliation>Institute of German Language and Literature Studies, University of Duisburg-Essen, Germany</affiliation>
<affiliation>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany.</affiliation>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="research-article"></genre>
<originInfo>
<publisher>Oxford University Press</publisher>
<dateIssued encoding="w3cdtf">2006-06</dateIssued>
<dateCreated encoding="w3cdtf">2006-04-20</dateCreated>
<copyrightDate encoding="w3cdtf">2006</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</abstract>
<note type="author-notes">Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56, Germany. E-mail: pilz@informatik.uni-duisburg.de</note>
<relatedItem type="host">
<titleInfo>
<title>Literary and Linguistic Computing</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Lit Linguist Computing</title>
</titleInfo>
<genre type="journal">journal</genre>
<identifier type="ISSN">0268-1145</identifier>
<identifier type="eISSN">1477-4615</identifier>
<identifier type="PublisherID">litlin</identifier>
<identifier type="PublisherID-hwp">litlin</identifier>
<part>
<date>2006</date>
<detail type="volume">
<caption>vol.</caption>
<number>21</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>2</number>
</detail>
<extent unit="pages">
<start>179</start>
<end>186</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</identifier>
<identifier type="DOI">10.1093/llc/fql020</identifier>
<identifier type="local">fql020</identifier>
<accessCondition type="use and reproduction" contentType="copyright">© The Author 2006. Published by Oxford University Press on behalf of ALLC and ACH. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org</accessCondition>
<recordInfo>
<recordContentSource>OUP</recordContentSource>
</recordInfo>
</mods>
</metadata>
<covers>
<json:item>
<extension>tiff</extension>
<original>true</original>
<mimetype>image/tiff</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/covers/tiff</uri>
</json:item>
</covers>
<annexes>
<json:item>
<extension>jpeg</extension>
<original>true</original>
<mimetype>image/jpeg</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/annexes/jpeg</uri>
</json:item>
<json:item>
<extension>gif</extension>
<original>true</original>
<mimetype>image/gif</mimetype>
<uri>https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/annexes/gif</uri>
</json:item>
</annexes>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Rhénanie/explor/UnivTrevesV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001661 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 001661 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Rhénanie
   |area=    UnivTrevesV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65
   |texte=   Rule-based Search in Text Databases with Nonstandard Orthography
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Sat Jul 22 16:29:01 2017. Site generation: Wed Feb 28 14:55:37 2024