Multipage Administrative Document Stream Segmentation
Identifieur interne : 003460 ( Hal/Curation ); précédent : 003459; suivant : 003461Multipage Administrative Document Stream Segmentation
Auteurs : Hani Daher ; Mohamed-Rafik Bouguelia [France] ; Belaïd Abdel [France] ; Vincent Poulain D'Andecy [France]Source :
Abstract
We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
Url:
DOI: 10.1109/ICPR.2014.176
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: Pour aller vers cette notice dans l'étape Curation :003460
Links to Exploration step
Hal:hal-01254785Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Multipage Administrative Document Stream Segmentation</title>
<author><name sortKey="Daher, Hani" sort="Daher, Hani" uniqKey="Daher H" first="Hani" last="Daher">Hani Daher</name>
</author>
<author><name sortKey="Bouguelia, Mohamed Rafik" sort="Bouguelia, Mohamed Rafik" uniqKey="Bouguelia M" first="Mohamed-Rafik" last="Bouguelia">Mohamed-Rafik Bouguelia</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-206042" status="VALID"><orgName>Recognition of writing and analysis of documents</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Abdel, Belaid" sort="Abdel, Belaid" uniqKey="Abdel B" first="Belaïd" last="Abdel">Belaïd Abdel</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-206042" status="VALID"><orgName>Recognition of writing and analysis of documents</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Poulain D Andecy, Vincent" sort="Poulain D Andecy, Vincent" uniqKey="Poulain D Andecy V" first="Vincent" last="Poulain D'Andecy">Vincent Poulain D'Andecy</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23810" status="VALID"><orgName>Itesoft R&D</orgName>
<desc><address><addrLine>Parc d'Andron - Le Sequoïa 30470 Aimargues</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.itesoft.fr</ref>
</desc>
<listRelation><relation active="#struct-365824" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-365824" type="direct"><org type="institution" xml:id="struct-365824" status="INCOMING"><orgName>ITESOFT</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01254785</idno>
<idno type="halId">hal-01254785</idno>
<idno type="halUri">https://hal.inria.fr/hal-01254785</idno>
<idno type="url">https://hal.inria.fr/hal-01254785</idno>
<idno type="doi">10.1109/ICPR.2014.176</idno>
<date when="2014-08-24">2014-08-24</date>
<idno type="wicri:Area/Hal/Corpus">003460</idno>
<idno type="wicri:Area/Hal/Curation">003460</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Multipage Administrative Document Stream Segmentation</title>
<author><name sortKey="Daher, Hani" sort="Daher, Hani" uniqKey="Daher H" first="Hani" last="Daher">Hani Daher</name>
</author>
<author><name sortKey="Bouguelia, Mohamed Rafik" sort="Bouguelia, Mohamed Rafik" uniqKey="Bouguelia M" first="Mohamed-Rafik" last="Bouguelia">Mohamed-Rafik Bouguelia</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-206042" status="VALID"><orgName>Recognition of writing and analysis of documents</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Abdel, Belaid" sort="Abdel, Belaid" uniqKey="Abdel B" first="Belaïd" last="Abdel">Belaïd Abdel</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-206042" status="VALID"><orgName>Recognition of writing and analysis of documents</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Poulain D Andecy, Vincent" sort="Poulain D Andecy, Vincent" uniqKey="Poulain D Andecy V" first="Vincent" last="Poulain D'Andecy">Vincent Poulain D'Andecy</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-23810" status="VALID"><orgName>Itesoft R&D</orgName>
<desc><address><addrLine>Parc d'Andron - Le Sequoïa 30470 Aimargues</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.itesoft.fr</ref>
</desc>
<listRelation><relation active="#struct-365824" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-365824" type="direct"><org type="institution" xml:id="struct-365824" status="INCOMING"><orgName>ITESOFT</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
<idno type="DOI">10.1109/ICPR.2014.176</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.</div>
</front>
</TEI>
<hal api="V3"><titleStmt><title xml:lang="en">Multipage Administrative Document Stream Segmentation</title>
<author role="aut"><persName><forename type="first">Hani</forename>
<surname>Daher</surname>
</persName>
<email>hani.daher@loria.fr</email>
<idno type="halauthor">964569</idno>
</author>
<author role="aut"><persName><forename type="first">Mohamed-Rafik</forename>
<surname>Bouguelia</surname>
</persName>
<email>mohamed.bouguelia@loria.fr</email>
<idno type="halauthor">1074926</idno>
<affiliation ref="#struct-206042"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Belaïd</forename>
<surname>Abdel</surname>
</persName>
<email>abelaid@loria.fr</email>
<ptr type="url" target="http://www.loria.fr/~abelaid/"></ptr>
<idno type="halauthor">964570</idno>
<affiliation ref="#struct-206042"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Vincent</forename>
<surname>Poulain D'Andecy</surname>
</persName>
<email></email>
<idno type="halauthor">671734</idno>
<affiliation ref="#struct-23810"></affiliation>
</author>
<editor role="depositor"><persName><forename>Abdel</forename>
<surname>Belaid</surname>
</persName>
<email>abelaid@loria.fr</email>
</editor>
</titleStmt>
<editionStmt><edition n="v1" type="current"><date type="whenSubmitted">2016-01-12 16:38:56</date>
<date type="whenModified">2016-02-01 13:35:57</date>
<date type="whenReleased">2016-01-12 16:38:56</date>
<date type="whenProduced">2014-08-24</date>
</edition>
<respStmt><resp>contributor</resp>
<name key="113588"><persName><forename>Abdel</forename>
<surname>Belaid</surname>
</persName>
<email>abelaid@loria.fr</email>
</name>
</respStmt>
</editionStmt>
<publicationStmt><distributor>CCSD</distributor>
<idno type="halId">hal-01254785</idno>
<idno type="halUri">https://hal.inria.fr/hal-01254785</idno>
<idno type="halBibtex">daher:hal-01254785</idno>
<idno type="halRefHtml">ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 <10.1109/ICPR.2014.176></idno>
<idno type="halRef">ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 <10.1109/ICPR.2014.176></idno>
</publicationStmt>
<seriesStmt><idno type="stamp" n="CNRS">CNRS - Centre national de la recherche scientifique</idno>
<idno type="stamp" n="LORIA-TALC" p="LORIA">Traitement automatique des langues et des connaissances</idno>
<idno type="stamp" n="LORIA2">Publications du LORIA</idno>
<idno type="stamp" n="INRIA">INRIA - Institut National de Recherche en Informatique et en Automatique</idno>
<idno type="stamp" n="UNIV-LORRAINE">Université de Lorraine</idno>
<idno type="stamp" n="LORIA">LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications</idno>
</seriesStmt>
<notesStmt><note type="audience" n="2">International</note>
<note type="invited" n="0">No</note>
<note type="popular" n="0">No</note>
<note type="peer" n="1">Yes</note>
<note type="proceedings" n="1">Yes</note>
</notesStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Multipage Administrative Document Stream Segmentation</title>
<author role="aut"><persName><forename type="first">Hani</forename>
<surname>Daher</surname>
</persName>
<email>hani.daher@loria.fr</email>
<idno type="halAuthorId">964569</idno>
</author>
<author role="aut"><persName><forename type="first">Mohamed-Rafik</forename>
<surname>Bouguelia</surname>
</persName>
<email>mohamed.bouguelia@loria.fr</email>
<idno type="halAuthorId">1074926</idno>
<affiliation ref="#struct-206042"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Belaïd</forename>
<surname>Abdel</surname>
</persName>
<email>abelaid@loria.fr</email>
<ptr type="url" target="http://www.loria.fr/~abelaid/"></ptr>
<idno type="halAuthorId">964570</idno>
<affiliation ref="#struct-206042"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Vincent</forename>
<surname>Poulain D'Andecy</surname>
</persName>
<idno type="halAuthorId">671734</idno>
<affiliation ref="#struct-23810"></affiliation>
</author>
</analytic>
<monogr><meeting><title>ICPR 2014 - 22nd International Conference on Pattern Recognition</title>
<date type="start">2014-08-24</date>
<date type="end">2014-08-28</date>
<settlement>Stokholm</settlement>
<country key="SE">Sweden</country>
</meeting>
<imprint><biblScope unit="pp">966 - 971 </biblScope>
</imprint>
</monogr>
<idno type="doi">10.1109/ICPR.2014.176</idno>
</biblStruct>
</sourceDesc>
<profileDesc><langUsage><language ident="en">English</language>
</langUsage>
<textClass><classCode scheme="halDomain" n="info">Computer Science [cs]</classCode>
<classCode scheme="halTypology" n="COMM">Conference papers</classCode>
</textClass>
<abstract xml:lang="en">We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.</abstract>
</profileDesc>
</hal>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Hal/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003460 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Hal/Curation/biblio.hfd -nk 003460 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Hal |étape= Curation |type= RBID |clé= Hal:hal-01254785 |texte= Multipage Administrative Document Stream Segmentation }}
This area was generated with Dilib version V0.6.33. |