Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Learning the Logic of Simple Phonotactics

Identifieur interne : 001D08 ( Istex/Curation ); précédent : 001D07; suivant : 001D09

Learning the Logic of Simple Phonotactics

Auteurs : F. Tjong Kim Sang [Belgique] ; John Nerbonne [Pays-Bas]

Source :

RBID : ISTEX:70C7D08C9A5C9D7C6CC886ADF30144BD359087A2

Abstract

Abstract: We report on experiments which demonstrate that by abductive inference it is possible to learn enough simple phonotactics to distinguish words from non-words for a simplified set of Dutch, the monosyllables. The monosyllables are distinguished in input so that segmentation is not problematic. Frequency information is withheld as is negative data. The methods are all tested using ten-fold cross-validation as well as a fixed number of randomly generated strings. Orthographic and phonetic representations are compared. The work presented in this chapter is part of a larger project comparing different machine learning techniques on linguistic data.

Url:
DOI: 10.1007/3-540-40030-3_7

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:70C7D08C9A5C9D7C6CC886ADF30144BD359087A2

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Learning the Logic of Simple Phonotactics</title>
<author>
<name sortKey="Tjong Kim Sang, F" sort="Tjong Kim Sang, F" uniqKey="Tjong Kim Sang F" first="F." last="Tjong Kim Sang">F. Tjong Kim Sang</name>
<affiliation wicri:level="1">
<mods:affiliation>CNTS - Language Technology Group, University of Antwerp, Belgium</mods:affiliation>
<country xml:lang="fr">Belgique</country>
<wicri:regionArea>CNTS - Language Technology Group, University of Antwerp</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: erikt@uia.ua.ac.be</mods:affiliation>
<country wicri:rule="url">Belgique</country>
</affiliation>
</author>
<author>
<name sortKey="Nerbonne, John" sort="Nerbonne, John" uniqKey="Nerbonne J" first="John" last="Nerbonne">John Nerbonne</name>
<affiliation wicri:level="1">
<mods:affiliation>Alfa-informatica, BCN, University of Groningen, The Netherlands</mods:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Alfa-informatica, BCN, University of Groningen</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: nerbonne@let.rug.nl</mods:affiliation>
<country wicri:rule="url">Pays-Bas</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:70C7D08C9A5C9D7C6CC886ADF30144BD359087A2</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1007/3-540-40030-3_7</idno>
<idno type="url">https://api.istex.fr/document/70C7D08C9A5C9D7C6CC886ADF30144BD359087A2/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001E31</idno>
<idno type="wicri:Area/Istex/Curation">001D08</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Learning the Logic of Simple Phonotactics</title>
<author>
<name sortKey="Tjong Kim Sang, F" sort="Tjong Kim Sang, F" uniqKey="Tjong Kim Sang F" first="F." last="Tjong Kim Sang">F. Tjong Kim Sang</name>
<affiliation wicri:level="1">
<mods:affiliation>CNTS - Language Technology Group, University of Antwerp, Belgium</mods:affiliation>
<country xml:lang="fr">Belgique</country>
<wicri:regionArea>CNTS - Language Technology Group, University of Antwerp</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: erikt@uia.ua.ac.be</mods:affiliation>
<country wicri:rule="url">Belgique</country>
</affiliation>
</author>
<author>
<name sortKey="Nerbonne, John" sort="Nerbonne, John" uniqKey="Nerbonne J" first="John" last="Nerbonne">John Nerbonne</name>
<affiliation wicri:level="1">
<mods:affiliation>Alfa-informatica, BCN, University of Groningen, The Netherlands</mods:affiliation>
<country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Alfa-informatica, BCN, University of Groningen</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: nerbonne@let.rug.nl</mods:affiliation>
<country wicri:rule="url">Pays-Bas</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2000</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">70C7D08C9A5C9D7C6CC886ADF30144BD359087A2</idno>
<idno type="DOI">10.1007/3-540-40030-3_7</idno>
<idno type="ChapterID">7</idno>
<idno type="ChapterID">Chap7</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: We report on experiments which demonstrate that by abductive inference it is possible to learn enough simple phonotactics to distinguish words from non-words for a simplified set of Dutch, the monosyllables. The monosyllables are distinguished in input so that segmentation is not problematic. Frequency information is withheld as is negative data. The methods are all tested using ten-fold cross-validation as well as a fixed number of randomly generated strings. Orthographic and phonetic representations are compared. The work presented in this chapter is part of a larger project comparing different machine learning techniques on linguistic data.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D08 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 001D08 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:70C7D08C9A5C9D7C6CC886ADF30144BD359087A2
   |texte=   Learning the Logic of Simple Phonotactics
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024