Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Successfully detecting and correcting false friends using channel profiles

Identifieur interne : 000A67 ( Main/Curation ); précédent : 000A66; suivant : 000A68

Successfully detecting and correcting false friends using channel profiles

Auteurs : Ulrich Reffle [Allemagne] ; Annette Gotscharek [Allemagne] ; Christoph Ringlstetter [Allemagne] ; Klaus U. Schulz [Allemagne]

Source :

RBID : Pascal:10-0180628

Descripteurs français

English descriptors

Abstract

The detection and correction of false friends- also called real-word errors-is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile (1) helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. In this way, recognition of false friends is improved. Furthermore, the profile (2) helps to isolate the "most promising" correction suggestion for "suspicious" tokens. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.

Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:10-0180628

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Successfully detecting and correcting false friends using channel profiles</title>
<author>
<name sortKey="Reffle, Ulrich" sort="Reffle, Ulrich" uniqKey="Reffle U" first="Ulrich" last="Reffle">Ulrich Reffle</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Gotscharek, Annette" sort="Gotscharek, Annette" uniqKey="Gotscharek A" first="Annette" last="Gotscharek">Annette Gotscharek</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ringlstetter, Christoph" sort="Ringlstetter, Christoph" uniqKey="Ringlstetter C" first="Christoph" last="Ringlstetter">Christoph Ringlstetter</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Klaus U" sort="Schulz, Klaus U" uniqKey="Schulz K" first="Klaus U." last="Schulz">Klaus U. Schulz</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0180628</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0180628 INIST</idno>
<idno type="RBID">Pascal:10-0180628</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000195</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000582</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000182</idno>
<idno type="wicri:doubleKey">1433-2833:2009:Reffle U:successfully:detecting:and</idno>
<idno type="wicri:Area/Main/Merge">000A76</idno>
<idno type="wicri:Area/Main/Curation">000A67</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Successfully detecting and correcting false friends using channel profiles</title>
<author>
<name sortKey="Reffle, Ulrich" sort="Reffle, Ulrich" uniqKey="Reffle U" first="Ulrich" last="Reffle">Ulrich Reffle</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Gotscharek, Annette" sort="Gotscharek, Annette" uniqKey="Gotscharek A" first="Annette" last="Gotscharek">Annette Gotscharek</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Ringlstetter, Christoph" sort="Ringlstetter, Christoph" uniqKey="Ringlstetter C" first="Christoph" last="Ringlstetter">Christoph Ringlstetter</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
<author>
<name sortKey="Schulz, Klaus U" sort="Schulz, Klaus U" uniqKey="Schulz K" first="Klaus U." last="Schulz">Klaus U. Schulz</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>CIS, University of Munich, Oettingenstr 67</s1>
<s2>80538 Munich</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Bavière</region>
<region type="district" nuts="2">District de Haute-Bavière</region>
<settlement type="city">Munich</settlement>
</placeName>
<orgName type="university">Université Louis-et-Maximilien de Munich</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic correction</term>
<term>Character recognition</term>
<term>Dictionaries</term>
<term>Disambiguation</term>
<term>Error correction</term>
<term>Optical character recognition</term>
<term>Probabilistic approach</term>
<term>Statistical analysis</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Texte</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Dictionnaire</term>
<term>Correction automatique</term>
<term>Désambiguïsation</term>
<term>Analyse statistique</term>
<term>Approche probabiliste</term>
<term>Correction erreur</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Dictionnaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The detection and correction of false friends- also called real-word errors-is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile (1) helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. In this way, recognition of false friends is improved. Furthermore, the profile (2) helps to isolate the "most promising" correction suggestion for "suspicious" tokens. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A67 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Curation/biblio.hfd -nk 000A67 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Curation
   |type=    RBID
   |clé=     Pascal:10-0180628
   |texte=   Successfully detecting and correcting false friends using channel profiles
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024