Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Segmentation of Continuous Document Flow by a modified Backward-Forward algorithm

Identifieur interne : 003B66 ( Main/Exploration ); précédent : 003B65; suivant : 003B67

Segmentation of Continuous Document Flow by a modified Backward-Forward algorithm

Auteurs : Th. Meilender [France] ; Abdel Belaïd [France]

Source :

RBID : Pascal:09-0372219

Descripteurs français

English descriptors

Abstract

This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scannedpages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% ofprecision and 90% of recall.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Segmentation of Continuous Document Flow by a modified Backward-Forward algorithm</title>
<author>
<name sortKey="Meilender, Th" sort="Meilender, Th" uniqKey="Meilender T" first="Th." last="Meilender">Th. Meilender</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>University Nancy 2 - LORIA</s1>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaid">Abdel Belaïd</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>University Nancy 2 - LORIA</s1>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372219</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372219 INIST</idno>
<idno type="RBID">Pascal:09-0372219</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000262</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000765</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000228</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000228</idno>
<idno type="wicri:doubleKey">0277-786X:2009:Meilender T:segmentation:of:continuous</idno>
<idno type="wicri:Area/Main/Merge">003C62</idno>
<idno type="wicri:Area/Main/Curation">003B66</idno>
<idno type="wicri:Area/Main/Exploration">003B66</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Segmentation of Continuous Document Flow by a modified Backward-Forward algorithm</title>
<author>
<name sortKey="Meilender, Th" sort="Meilender, Th" uniqKey="Meilender T" first="Th." last="Meilender">Th. Meilender</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>University Nancy 2 - LORIA</s1>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaid">Abdel Belaïd</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>University Nancy 2 - LORIA</s1>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<wicri:noRegion>University Nancy 2 - LORIA</wicri:noRegion>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="laboratoire" n="5">Laboratoire lorrain de recherche en informatique et ses applications</orgName>
<orgName type="university">Université de Lorraine</orgName>
<orgName type="institution">Centre national de la recherche scientifique</orgName>
<orgName type="institution">Institut national de recherche en informatique et en automatique</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Computational complexity</term>
<term>Markov model</term>
<term>Segmentation</term>
<term>Speech recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Algorithme</term>
<term>Segmentation</term>
<term>Reconnaissance parole</term>
<term>Modèle Markov</term>
<term>Complexité calcul</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scannedpages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% ofprecision and 90% of recall.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Nancy</li>
</settlement>
<orgName>
<li>Centre national de la recherche scientifique</li>
<li>Institut national de recherche en informatique et en automatique</li>
<li>Laboratoire lorrain de recherche en informatique et ses applications</li>
<li>Université de Lorraine</li>
</orgName>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Meilender, Th" sort="Meilender, Th" uniqKey="Meilender T" first="Th." last="Meilender">Th. Meilender</name>
</noRegion>
<name sortKey="Belaid, A" sort="Belaid, A" uniqKey="Belaid A" first="A." last="Belaid">Abdel Belaïd</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003B66 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 003B66 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:09-0372219
   |texte=   Segmentation of Continuous Document Flow by a modified Backward-Forward algorithm
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022