Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Learning to Lemmatise Slovene Words

Identifieur interne : 000431 ( Istex/Corpus ); précédent : 000430; suivant : 000432

Learning to Lemmatise Slovene Words

Auteurs : Sašo Džeroski ; Tomaž Erjavec

Source :

RBID : ISTEX:927620C06EAB28B7791C8F925DB80BB0EEA4CFAB

Abstract

Abstract: Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.

Url:
DOI: 10.1007/3-540-40030-3_5

Links to Exploration step

ISTEX:927620C06EAB28B7791C8F925DB80BB0EEA4CFAB

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Learning to Lemmatise Slovene Words</title>
<author>
<name sortKey="Dzeroski, Saso" sort="Dzeroski, Saso" uniqKey="Dzeroski S" first="Sašo" last="Džeroski">Sašo Džeroski</name>
<affiliation>
<mods:affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: saso.dzeroski@ijs.si</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Erjavec, Tomaz" sort="Erjavec, Tomaz" uniqKey="Erjavec T" first="Tomaž" last="Erjavec">Tomaž Erjavec</name>
<affiliation>
<mods:affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: tomaz.erjavec@ijs.si</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:927620C06EAB28B7791C8F925DB80BB0EEA4CFAB</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1007/3-540-40030-3_5</idno>
<idno type="url">https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000431</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Learning to Lemmatise Slovene Words</title>
<author>
<name sortKey="Dzeroski, Saso" sort="Dzeroski, Saso" uniqKey="Dzeroski S" first="Sašo" last="Džeroski">Sašo Džeroski</name>
<affiliation>
<mods:affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: saso.dzeroski@ijs.si</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Erjavec, Tomaz" sort="Erjavec, Tomaz" uniqKey="Erjavec T" first="Tomaž" last="Erjavec">Tomaž Erjavec</name>
<affiliation>
<mods:affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</mods:affiliation>
</affiliation>
<affiliation>
<mods:affiliation>E-mail: tomaz.erjavec@ijs.si</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2000</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">927620C06EAB28B7791C8F925DB80BB0EEA4CFAB</idno>
<idno type="DOI">10.1007/3-540-40030-3_5</idno>
<idno type="ChapterID">5</idno>
<idno type="ChapterID">Chap5</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.</div>
</front>
</TEI>
<istex>
<corpusName>springer</corpusName>
<author>
<json:item>
<name>Sašo Džeroski</name>
<affiliations>
<json:string>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</json:string>
<json:string>E-mail: saso.dzeroski@ijs.si</json:string>
</affiliations>
</json:item>
<json:item>
<name>Tomaž Erjavec</name>
<affiliations>
<json:string>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</json:string>
<json:string>E-mail: tomaz.erjavec@ijs.si</json:string>
</affiliations>
</json:item>
</author>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>OriginalPaper</json:string>
</originalGenre>
<abstract>Abstract: Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.</abstract>
<qualityIndicators>
<score>8</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>648 x 864 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>1685</abstractCharCount>
<pdfWordCount>6868</pdfWordCount>
<pdfCharCount>41142</pdfCharCount>
<pdfPageCount>20</pdfPageCount>
<abstractWordCount>259</abstractWordCount>
</qualityIndicators>
<title>Learning to Lemmatise Slovene Words</title>
<chapterId>
<json:string>5</json:string>
<json:string>Chap5</json:string>
</chapterId>
<genre>
<json:string>research-article</json:string>
</genre>
<serie>
<volume>2</volume>
<editor>
<json:item>
<name>G. Goos</name>
</json:item>
<json:item>
<name>J. Hartmanis</name>
</json:item>
<json:item>
<name>J. van Leeuwen</name>
</json:item>
</editor>
<issn>
<json:string>0302-9743</json:string>
</issn>
<language>
<json:string>unknown</json:string>
</language>
<title>Lecture Notes in Computer Science</title>
<copyrightDate>2000</copyrightDate>
</serie>
<host>
<editor>
<json:item>
<name>James Cussens</name>
<affiliations>
<json:string>Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK</json:string>
<json:string>E-mail: jc@cs.york.ac.uk</json:string>
</affiliations>
</json:item>
<json:item>
<name>Sašo Džeroski</name>
<affiliations>
<json:string>Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia</json:string>
<json:string>E-mail: saso.dzeroski@ijs.si</json:string>
</affiliations>
</json:item>
</editor>
<subject>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Artificial Intelligence (incl. Robotics)</value>
</json:item>
<json:item>
<value>Mathematical Logic and Formal Languages</value>
</json:item>
</subject>
<isbn>
<json:string>978-3-540-41145-1</json:string>
</isbn>
<language>
<json:string>unknown</json:string>
</language>
<title>Learning Language in Logic</title>
<bookId>
<json:string>3-540-40030-3</json:string>
</bookId>
<volume>1925</volume>
<pages>
<last>88</last>
<first>69</first>
</pages>
<issn>
<json:string>0302-9743</json:string>
</issn>
<genre>
<json:string>book-series</json:string>
</genre>
<eisbn>
<json:string>978-3-540-40030-1</json:string>
</eisbn>
<copyrightDate>2000</copyrightDate>
<doi>
<json:string>10.1007/3-540-40030-3</json:string>
</doi>
</host>
<publicationDate>2002</publicationDate>
<copyrightDate>2000</copyrightDate>
<doi>
<json:string>10.1007/3-540-40030-3_5</json:string>
</doi>
<id>927620C06EAB28B7791C8F925DB80BB0EEA4CFAB</id>
<score>0.13086782</score>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Learning to Lemmatise Slovene Words</title>
<respStmt xml:id="ISTEX-API" resp="Références bibliographiques récupérées via GROBID" name="ISTEX-API (INIST-CNRS)"></respStmt>
<respStmt>
<resp>Références bibliographiques récupérées via GROBID</resp>
<name resp="ISTEX-API">ISTEX-API (INIST-CNRS)</name>
</respStmt>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<availability>
<p>SPRINGER</p>
</availability>
<date>2000</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Learning to Lemmatise Slovene Words</title>
<author>
<persName>
<forename type="first">Sašo</forename>
<surname>Džeroski</surname>
</persName>
<email>saso.dzeroski@ijs.si</email>
<affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</affiliation>
</author>
<author>
<persName>
<forename type="first">Tomaž</forename>
<surname>Erjavec</surname>
</persName>
<email>tomaz.erjavec@ijs.si</email>
<affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</affiliation>
</author>
</analytic>
<monogr>
<title level="m">Learning Language in Logic</title>
<idno type="pISBN">978-3-540-41145-1</idno>
<idno type="eISBN">978-3-540-40030-1</idno>
<idno type="pISSN">0302-9743</idno>
<idno type="DOI">10.1007/3-540-40030-3</idno>
<idno type="BookID">3-540-40030-3</idno>
<idno type="BookTitleID">62983</idno>
<idno type="BookSequenceNumber">1925</idno>
<idno type="BookVolumeNumber">1925</idno>
<idno type="BookChapterCount">18</idno>
<editor>
<persName>
<forename type="first">James</forename>
<surname>Cussens</surname>
</persName>
<email>jc@cs.york.ac.uk</email>
<affiliation>Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Sašo</forename>
<surname>Džeroski</surname>
</persName>
<email>saso.dzeroski@ijs.si</email>
<affiliation>Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia</affiliation>
</editor>
<imprint>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<date type="published" when="2000"></date>
<biblScope unit="volume">1925</biblScope>
<biblScope unit="page" from="69">69</biblScope>
<biblScope unit="page" to="88">88</biblScope>
</imprint>
</monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<editor>
<persName>
<forename type="first">G.</forename>
<surname>Goos</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">J.</forename>
<surname>Hartmanis</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">J.</forename>
<surname>van Leeuwen</surname>
</persName>
</editor>
<biblScope>
<date>2000</date>
</biblScope>
<biblScope unit="volume">2</biblScope>
<idno type="pISSN">0302-9743</idno>
<idno type="seriesId">558</idno>
</series>
<series>
<title level="s">Lecture Notes in Artificial Intelligence</title>
<title level="s" type="sub">Subseries of Lecture Notes in Computer Science</title>
<editor>
<persName>
<forename type="first">G.</forename>
<surname>Goos</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">J.</forename>
<surname>Hartmanis</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">J.</forename>
<surname>van Leeuwen</surname>
</persName>
</editor>
<editor>
<persName>
<forename type="first">James</forename>
<surname>Cussens</surname>
</persName>
<email>jc@cs.york.ac.uk</email>
<affiliation>Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Sašo</forename>
<surname>Džeroski</surname>
</persName>
<email>saso.dzeroski@ijs.si</email>
<affiliation>Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Jaime</forename>
<forename type="first">G.</forename>
<surname>Carbonell</surname>
</persName>
<affiliation>Carnegie Mellon University, Pittsburgh, PA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Jörg</forename>
<surname>Siekmann</surname>
</persName>
<affiliation>University of Saarland, Saarbrücken, Germany</affiliation>
</editor>
<biblScope type="seriesId">1244</biblScope>
</series>
<idno type="istex">927620C06EAB28B7791C8F925DB80BB0EEA4CFAB</idno>
<idno type="DOI">10.1007/3-540-40030-3_5</idno>
<idno type="ChapterID">5</idno>
<idno type="ChapterID">Chap5</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2000</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Abstract: Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.</p>
</abstract>
<textClass>
<keywords scheme="Book Subject Collection">
<list>
<label>SUCO11645</label>
<item>
<term>Computer Science</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Book Subject Group">
<list>
<label>I</label>
<label>I21017</label>
<label>I16048</label>
<item>
<term>Computer Science</term>
</item>
<item>
<term>Artificial Intelligence (incl. Robotics)</term>
</item>
<item>
<term>Mathematical Logic and Formal Languages</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2000">Published</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-3-19">References added</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-07-26">References added</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Springer, Publisher found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//Springer-Verlag//DTD A++ V2.4//EN" URI="http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd" name="istex:docType"></istex:docType>
<istex:document>
<Publisher>
<PublisherInfo>
<PublisherName>Springer Berlin Heidelberg</PublisherName>
<PublisherLocation>Berlin, Heidelberg</PublisherLocation>
</PublisherInfo>
<Series>
<SeriesInfo SeriesType="Series" TocLevels="0">
<SeriesID>558</SeriesID>
<SeriesPrintISSN>0302-9743</SeriesPrintISSN>
<SeriesTitle Language="En">Lecture Notes in Computer Science</SeriesTitle>
</SeriesInfo>
<SeriesHeader>
<EditorGroup>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>G.</GivenName>
<FamilyName>Goos</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>J.</GivenName>
<FamilyName>Hartmanis</FamilyName>
</EditorName>
</Editor>
<Editor>
<EditorName DisplayOrder="Western">
<GivenName>J.</GivenName>
<Particle>van</Particle>
<FamilyName>Leeuwen</FamilyName>
</EditorName>
</Editor>
</EditorGroup>
</SeriesHeader>
<Book Language="En">
<BookInfo BookProductType="Reference work" Language="En" MediaType="eBook" NumberingStyle="Unnumbered" TocLevels="0">
<BookID>3-540-40030-3</BookID>
<BookTitle>Learning Language in Logic</BookTitle>
<BookVolumeNumber>1925</BookVolumeNumber>
<BookSequenceNumber>1925</BookSequenceNumber>
<BookDOI>10.1007/3-540-40030-3</BookDOI>
<BookTitleID>62983</BookTitleID>
<BookPrintISBN>978-3-540-41145-1</BookPrintISBN>
<BookElectronicISBN>978-3-540-40030-1</BookElectronicISBN>
<BookChapterCount>18</BookChapterCount>
<BookHistory>
<OnlineDate>
<Year>2002</Year>
<Month>2</Month>
<Day>1</Day>
</OnlineDate>
</BookHistory>
<BookCopyright>
<CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2000</CopyrightYear>
</BookCopyright>
<BookSubjectGroup>
<BookSubject Code="I" Type="Primary">Computer Science</BookSubject>
<BookSubject Code="I21017" Priority="1" Type="Secondary">Artificial Intelligence (incl. Robotics)</BookSubject>
<BookSubject Code="I16048" Priority="2" Type="Secondary">Mathematical Logic and Formal Languages</BookSubject>
<SubjectCollection Code="SUCO11645">Computer Science</SubjectCollection>
</BookSubjectGroup>
<BookContext>
<SeriesID>558</SeriesID>
</BookContext>
</BookInfo>
<BookHeader>
<EditorGroup>
<Editor AffiliationIDS="Aff1">
<EditorName DisplayOrder="Western">
<GivenName>James</GivenName>
<FamilyName>Cussens</FamilyName>
</EditorName>
<Contact>
<Email>jc@cs.york.ac.uk</Email>
</Contact>
</Editor>
<Editor AffiliationIDS="Aff2">
<EditorName DisplayOrder="Western">
<GivenName>Sašo</GivenName>
<FamilyName>Džeroski</FamilyName>
</EditorName>
<Contact>
<Email>saso.dzeroski@ijs.si</Email>
</Contact>
</Editor>
<Affiliation ID="Aff1">
<OrgDivision>Department of Computer Science</OrgDivision>
<OrgName>University of York</OrgName>
<OrgAddress>
<Postcode>YO10 5DD</Postcode>
<City>Heslington</City>
<State>York</State>
<Country>UK</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff2">
<OrgName>Jožef Stefan Institute</OrgName>
<OrgAddress>
<Street>Jamova 39</Street>
<Postcode>1000</Postcode>
<City>Ljubljana</City>
<Country>Slovenia</Country>
</OrgAddress>
</Affiliation>
</EditorGroup>
</BookHeader>
<Part ID="Part2">
<PartInfo TocLevels="0">
<PartID>2</PartID>
<PartNumber>2</PartNumber>
<PartSequenceNumber>2</PartSequenceNumber>
<PartTitle>Morphology & Phonology</PartTitle>
<PartChapterCount>3</PartChapterCount>
<PartContext>
<SeriesID>558</SeriesID>
<BookID>3-540-40030-3</BookID>
<BookTitle>Learning Language in Logic</BookTitle>
</PartContext>
</PartInfo>
<Chapter ID="Chap5" Language="En">
<ChapterInfo ChapterType="OriginalPaper" ContainsESM="No" Language="En" NumberingStyle="Unnumbered" TocLevels="0">
<ChapterID>5</ChapterID>
<ChapterDOI>10.1007/3-540-40030-3_5</ChapterDOI>
<ChapterSequenceNumber>5</ChapterSequenceNumber>
<ChapterTitle Language="En">Learning to Lemmatise Slovene Words</ChapterTitle>
<ChapterFirstPage>69</ChapterFirstPage>
<ChapterLastPage>88</ChapterLastPage>
<ChapterCopyright>
<CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2000</CopyrightYear>
</ChapterCopyright>
<ChapterHistory>
<RegistrationDate>
<Year>2002</Year>
<Month>1</Month>
<Day>31</Day>
</RegistrationDate>
<OnlineDate>
<Year>2002</Year>
<Month>2</Month>
<Day>1</Day>
</OnlineDate>
</ChapterHistory>
<ChapterGrants Type="Regular">
<MetadataGrant Grant="OpenAccess"></MetadataGrant>
<AbstractGrant Grant="OpenAccess"></AbstractGrant>
<BodyPDFGrant Grant="Restricted"></BodyPDFGrant>
<BodyHTMLGrant Grant="Restricted"></BodyHTMLGrant>
<BibliographyGrant Grant="Restricted"></BibliographyGrant>
<ESMGrant Grant="Restricted"></ESMGrant>
</ChapterGrants>
<ChapterContext>
<SeriesID>558</SeriesID>
<PartID>2</PartID>
<BookID>3-540-40030-3</BookID>
<BookTitle>Learning Language in Logic</BookTitle>
</ChapterContext>
</ChapterInfo>
<ChapterHeader>
<AuthorGroup>
<Author AffiliationIDS="Aff5">
<AuthorName DisplayOrder="Western">
<GivenName>Sašo</GivenName>
<FamilyName>Džeroski</FamilyName>
</AuthorName>
<Contact>
<Email>saso.dzeroski@ijs.si</Email>
</Contact>
</Author>
<Author AffiliationIDS="Aff5">
<AuthorName DisplayOrder="Western">
<GivenName>Tomaž</GivenName>
<FamilyName>Erjavec</FamilyName>
</AuthorName>
<Contact>
<Email>tomaz.erjavec@ijs.si</Email>
</Contact>
</Author>
<Affiliation ID="Aff5">
<OrgDivision>Department for Intelligent Systems</OrgDivision>
<OrgName>Jožef Stefan Institute</OrgName>
<OrgAddress>
<Street>Jamova 39</Street>
<Postcode>SI-1000</Postcode>
<City>Ljubljana</City>
<Country>Slovenia</Country>
</OrgAddress>
</Affiliation>
</AuthorGroup>
<Abstract ID="Abs1" Language="En">
<Heading>Abstract</Heading>
<Para>Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.</Para>
</Abstract>
</ChapterHeader>
<NoBody></NoBody>
</Chapter>
</Part>
</Book>
<SubSeries>
<SubSeriesInfo>
<SubSeriesID>1244</SubSeriesID>
<SubSeriesTitle Language="En">Lecture Notes in Artificial Intelligence</SubSeriesTitle>
<SubSeriesSubTitle Language="En">Subseries of Lecture Notes in Computer Science</SubSeriesSubTitle>
</SubSeriesInfo>
<SubSeriesHeader>
<EditorGroup>
<Editor AffiliationIDS="Aff3">
<EditorName DisplayOrder="Western">
<GivenName>Jaime</GivenName>
<GivenName>G.</GivenName>
<FamilyName>Carbonell</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff4">
<EditorName DisplayOrder="Western">
<GivenName>Jörg</GivenName>
<FamilyName>Siekmann</FamilyName>
</EditorName>
</Editor>
<Affiliation ID="Aff3">
<OrgName>Carnegie Mellon University</OrgName>
<OrgAddress>
<City>Pittsburgh</City>
<State>PA</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff4">
<OrgName>University of Saarland</OrgName>
<OrgAddress>
<City>Saarbrücken</City>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
</EditorGroup>
</SubSeriesHeader>
</SubSeries>
</Series>
</Publisher>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Learning to Lemmatise Slovene Words</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>Learning to Lemmatise Slovene Words</title>
</titleInfo>
<name type="personal">
<namePart type="given">Sašo</namePart>
<namePart type="family">Džeroski</namePart>
<affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</affiliation>
<affiliation>E-mail: saso.dzeroski@ijs.si</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tomaž</namePart>
<namePart type="family">Erjavec</namePart>
<affiliation>Department for Intelligent Systems, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia</affiliation>
<affiliation>E-mail: tomaz.erjavec@ijs.si</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="OriginalPaper"></genre>
<originInfo>
<publisher>Springer Berlin Heidelberg</publisher>
<place>
<placeTerm type="text">Berlin, Heidelberg</placeTerm>
</place>
<dateIssued encoding="w3cdtf">2002-02-01</dateIssued>
<copyrightDate encoding="w3cdtf">2000</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">Abstract: Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex configuration of endings and stem modifications. The problem is especially difficult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the first is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984’, split into a training and validation set. The validation set is the Appendix of the novel, on which extensive testing of the two components, singly and in combination, is performed. The trained model is then used on an open-domain testing set, which has 25.000 words, pre-annotated with their word lemmas. Here 13.000 nouns or adjective tokens are previously unseen cases. Tested on these unknown words, our method achieves an accuracy of 81% on the lemmatisation task.</abstract>
<relatedItem type="host">
<titleInfo>
<title>Learning Language in Logic</title>
</titleInfo>
<name type="personal">
<namePart type="given">James</namePart>
<namePart type="family">Cussens</namePart>
<affiliation>Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK</affiliation>
<affiliation>E-mail: jc@cs.york.ac.uk</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sašo</namePart>
<namePart type="family">Džeroski</namePart>
<affiliation>Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia</affiliation>
<affiliation>E-mail: saso.dzeroski@ijs.si</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="book-series" displayLabel="Reference work"></genre>
<originInfo>
<copyrightDate encoding="w3cdtf">2000</copyrightDate>
<issuance>monographic</issuance>
</originInfo>
<subject>
<genre>Book-Subject-Collection</genre>
<topic authority="SpringerSubjectCodes" authorityURI="SUCO11645">Computer Science</topic>
</subject>
<subject>
<genre>Book-Subject-Group</genre>
<topic authority="SpringerSubjectCodes" authorityURI="I">Computer Science</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I21017">Artificial Intelligence (incl. Robotics)</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I16048">Mathematical Logic and Formal Languages</topic>
</subject>
<identifier type="DOI">10.1007/3-540-40030-3</identifier>
<identifier type="ISBN">978-3-540-41145-1</identifier>
<identifier type="eISBN">978-3-540-40030-1</identifier>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="BookTitleID">62983</identifier>
<identifier type="BookID">3-540-40030-3</identifier>
<identifier type="BookChapterCount">18</identifier>
<identifier type="BookVolumeNumber">1925</identifier>
<identifier type="BookSequenceNumber">1925</identifier>
<identifier type="PartChapterCount">3</identifier>
<part>
<date>2000</date>
<detail type="part">
<title>2: Morphology & Phonology</title>
</detail>
<detail type="volume">
<number>1925</number>
<caption>vol.</caption>
</detail>
<extent unit="pages">
<start>69</start>
<end>88</end>
</extent>
</part>
<recordInfo>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2000</recordOrigin>
</recordInfo>
</relatedItem>
<relatedItem type="series">
<titleInfo>
<title>Lecture Notes in Computer Science</title>
</titleInfo>
<name type="personal">
<namePart type="given">G.</namePart>
<namePart type="family">Goos</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">J.</namePart>
<namePart type="family">Hartmanis</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">J.</namePart>
<namePart type="family">van Leeuwen</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<copyrightDate encoding="w3cdtf">2000</copyrightDate>
<issuance>serial</issuance>
</originInfo>
<relatedItem type="constituent">
<titleInfo>
<title>Lecture Notes in Artificial Intelligence</title>
<subTitle>Subseries of Lecture Notes in Computer Science</subTitle>
</titleInfo>
<name type="personal">
<namePart type="given">G.</namePart>
<namePart type="family">Goos</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">J.</namePart>
<namePart type="family">Hartmanis</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">J.</namePart>
<namePart type="family">van Leeuwen</namePart>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">James</namePart>
<namePart type="family">Cussens</namePart>
<affiliation>Department of Computer Science, University of York, YO10 5DD, Heslington, York, UK</affiliation>
<affiliation>E-mail: jc@cs.york.ac.uk</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Sašo</namePart>
<namePart type="family">Džeroski</namePart>
<affiliation>Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia</affiliation>
<affiliation>E-mail: saso.dzeroski@ijs.si</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jaime</namePart>
<namePart type="given">G.</namePart>
<namePart type="family">Carbonell</namePart>
<affiliation>Carnegie Mellon University, Pittsburgh, PA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jörg</namePart>
<namePart type="family">Siekmann</namePart>
<affiliation>University of Saarland, Saarbrücken, Germany</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="sub-series"></genre>
<identifier type="SubSeriesID">1244</identifier>
</relatedItem>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="SeriesID">558</identifier>
<part>
<detail type="volume">
<number>2</number>
<caption>vol.</caption>
</detail>
</part>
<recordInfo>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2000</recordOrigin>
</recordInfo>
</relatedItem>
<identifier type="istex">927620C06EAB28B7791C8F925DB80BB0EEA4CFAB</identifier>
<identifier type="DOI">10.1007/3-540-40030-3_5</identifier>
<identifier type="ChapterID">5</identifier>
<identifier type="ChapterID">Chap5</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Springer-Verlag Berlin Heidelberg, 2000</accessCondition>
<recordInfo>
<recordContentSource>SPRINGER</recordContentSource>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2000</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments>
<istex:refBibTEI uri="https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/enrichments/refBib">
<teiHeader></teiHeader>
<text>
<front></front>
<body></body>
<back>
<listBibl>
<biblStruct xml:id="b0">
<analytic>
<title level="a" type="main">TnT -a statistical part-of-speech tagger</title>
<author>
<persName>
<forename type="first">T</forename>
<surname>Brants</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000 Seattle</title>
<meeting>the Sixth Applied Natural Language Processing Conference ANLP-2000 Seattle
<address>
<addrLine>WA. http</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2000"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b1">
<analytic>
<title level="a" type="main">Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging</title>
<author>
<persName>
<forename type="first">E</forename>
<surname>Brill</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Computational Linguistics</title>
<imprint>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="543" to="565"></biblScope>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b2">
<analytic>
<title level="a" type="main">Creating a tagset, lexicon and guesser for a French tagger</title>
<author>
<persName>
<forename type="first">J</forename>
<surname>Chanod</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">P</forename>
<surname>Tapanainen</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the ACL SIGDAT workshop From Text to Tags: Issues in Multilingual Language Analysis Dublin</title>
<meeting>the ACL SIGDAT workshop From Text to Tags: Issues in Multilingual Language Analysis Dublin</meeting>
<imprint>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b3">
<analytic>
<title level="a" type="main">Part-of-speech tagging using Progol</title>
<author>
<persName>
<forename type="first">J</forename>
<surname>Cussens</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the 6th International Workshop on Inductive Logic Programming</title>
<meeting>the 6th International Workshop on Inductive Logic Programming
<address>
<addrLine>Berlin</addrLine>
</address>
</meeting>
<imprint>
<publisher>Springer</publisher>
<date type="published" when="1997"></date>
<biblScope unit="page" from="93" to="108"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b4">
<analytic>
<title level="a" type="main">Morphosyntactic tagging of Slovene using Progol</title>
<author>
<persName>
<forename type="first">J</forename>
<surname>Cussens</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Džeroski</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Inductive Logic Programming; 9th International Workshop ILP-99, Proceedings, No. 1634 in Lecture Notes in Artificial Intelligence</title>
<editor>Džeroski, S., & Flach, P.</editor>
<meeting>
<address>
<addrLine>Berlin</addrLine>
</address>
</meeting>
<imprint>
<publisher>Springer</publisher>
<date type="published" when="1999"></date>
<biblScope unit="page" from="68" to="79"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b5">
<analytic>
<title level="a" type="main">A practical part-of-speech tagger</title>
<author>
<persName>
<forename type="first">D</forename>
<surname>Cutting</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<surname>Kupiec</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<surname>Pedersen</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">P</forename>
<surname>Sibun</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Third Conference on Applied Natural Language Processing</title>
<meeting>the Third Conference on Applied Natural Language Processing
<address>
<addrLine>Trento, Italy</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="1992"></date>
<biblScope unit="page" from="133" to="140"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b6">
<analytic>
<title level="a" type="main">MBT: A memorybased part of speech tagger-generator</title>
<author>
<persName>
<forename type="first">W</forename>
<surname>Daelemans</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<surname>Zavrel</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">P</forename>
<surname>Berck</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Gillis</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Fourth Workshop on Very Large Corpora</title>
<editor>Ejerhed, E., & Dagan, I.</editor>
<meeting>the Fourth Workshop on Very Large Corpora
<address>
<addrLine>Copenhagen</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="1996"></date>
<biblScope unit="page" from="14" to="27"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b7">
<analytic>
<title level="a" type="main">Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages</title>
<author>
<persName>
<forename type="first">L</forename>
<surname>Dimitrova</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">N</forename>
<surname>Ide</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">H.-J</forename>
<surname>Kaalep</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">V</forename>
<surname>Petkevič</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">D</forename>
<surname>Tufi¸tufi¸s</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">COLING-ACL '98</title>
<imprint>
<date type="published" when="1998"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b8">
<analytic>
<title></title>
</analytic>
<monogr>
<title level="j">–319 Montréal</title>
<imprint></imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b9">
<monogr>
<title level="m" type="main">Morphosyntactic Tagging of Slovene: Evaluating PoS Taggers and Tagsets</title>
<author>
<persName>
<forename type="first">S</forename>
<surname>Džeroski</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<surname>Zavrel</surname>
</persName>
</author>
<imprint>
<date type="published" when="1999"></date>
<publisher>Jožef Stefan Institute</publisher>
<pubPlace>Ljubljana</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b10">
<analytic>
<title level="a" type="main">The ELAN Slovene-English Aligned Corpus</title>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Machine Translation Summit VII</title>
<meeting>the Machine Translation Summit VII</meeting>
<imprint>
<date type="published" when="1999"></date>
<biblScope unit="page" from="349" to="357"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b11">
<analytic>
<title level="a" type="main">Specifications and notation for lexicon encoding. MULTEXT-East final report D1</title>
</analytic>
<monogr>
<title level="m">M. M</title>
<editor>Erjavec, T., &</editor>
<meeting>
<address>
<addrLine>Ljubljana</addrLine>
</address>
</meeting>
<imprint>
<publisher>Jožef Stefan Institute</publisher>
<date type="published" when="1997"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b12">
<analytic>
<title></title>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">A</forename>
<surname>Lawson</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">L</forename>
<surname>Romary</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Compendium of Multilingual Resources. CD-ROM</title>
<imprint>
<biblScope unit="volume">ISBN</biblScope>
<biblScope unit="page" from="3" to="922641"></biblScope>
<date type="published" when="1998"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b13">
<analytic>
<title level="a" type="main">Learning multilingual morphology with CLOG</title>
<author>
<persName>
<forename type="first">S</forename>
<surname>Manandhar</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Džeroski</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">T</forename>
<surname>Erjavec</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Inductive Logic Programming ; 8th International Workshop ILP-98, Proceedings, No. 1446 in Lecture Notes in Artificial Intelligence</title>
<editor>Page, D.</editor>
<imprint>
<publisher>Springer</publisher>
<date type="published" when="1998"></date>
<biblScope unit="page" from="135" to="144"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b14">
<analytic>
<title level="a" type="main">Automatic rule induction for unknown-word guessing</title>
<author>
<persName>
<forename type="first">A</forename>
<surname>Mikheev</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Computational Linguistics</title>
<imprint>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="405" to="424"></biblScope>
<date type="published" when="1997"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b15">
<analytic>
<title level="a" type="main">Induction of first-order decision lists: Results on learning the past tense of English verbs</title>
<author>
<persName>
<forename type="first">R</forename>
<forename type="middle">J</forename>
<surname>Mooney</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<forename type="middle">E</forename>
<surname>Califf</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Journal of Artificial Intelligence Research</title>
<imprint>
<biblScope unit="page" from="1" to="24"></biblScope>
<date type="published" when="1995"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b16">
<analytic>
<title level="a" type="main">A maximum entropy part of speech tagger</title>
<author>
<persName>
<forename type="first">A</forename>
<surname>Ratnaparkhi</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proc. ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing</title>
<meeting>. ACL-SIGDAT Conference on Empirical Methods in Natural Language essing
<address>
<addrLine>Philadelphia</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="1996"></date>
<biblScope unit="page" from="491" to="497"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b17">
<monogr>
<title level="m" type="main">Guidelines for Electronic Text Encoding and Interchange</title>
<author>
<persName>
<forename type="first">C</forename>
<forename type="middle">M</forename>
<surname>Sperberg-Mcqueen</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">L</forename>
<surname>Burnard</surname>
</persName>
</author>
<imprint>
<date type="published" when="1994"></date>
<pubPlace>Chicago and Oxford</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b18">
<monogr>
<title level="m" type="main">An implementation os a probabilistic tagger</title>
<author>
<persName>
<forename type="first">R</forename>
<surname>Steetskamp</surname>
</persName>
</author>
<imprint>
<date type="published" when="1995"></date>
<publisher>TOSCA Research Group, University of Nijmegen</publisher>
<biblScope unit="page">48</biblScope>
<pubPlace>Nijmegen</pubPlace>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b19">
<analytic>
<title></title>
</analytic>
<monogr>
<title level="j">Syntactic Wordclass Tagging. Kluwer</title>
<editor>van Halteren, H. (Ed.).</editor>
<imprint>
<date type="published" when="1999"></date>
</imprint>
</monogr>
</biblStruct>
</listBibl>
</back>
</text>
</istex:refBibTEI>
<json:item>
<type>refBibs</type>
<uri>https://api.istex.fr/document/927620C06EAB28B7791C8F925DB80BB0EEA4CFAB/enrichments/refBibs</uri>
</json:item>
</enrichments>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000431 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000431 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:927620C06EAB28B7791C8F925DB80BB0EEA4CFAB
   |texte=   Learning to Lemmatise Slovene Words
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024