Recognition of Table of Contents for Electronic Library
Identifieur interne : 000158 ( Hal/Checkpoint ); précédent : 000157; suivant : 000159Recognition of Table of Contents for Electronic Library
Auteurs : Abdel Belaïd [France] ; Nabil MurshedSource :
Descripteurs français
- mix :
Abstract
A labeling approach for automatic recognition of Tables of Contents (ToC) is described in this paper. A prototype is used for electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labeling without using any a priori model. Labeling is based on a Part of Speech Tagging (PoS) which is initiated by a primary labeling of text component using some specific dictionaries.
Url:
Links toward previous steps (curation, corpus...)
Links to Exploration step
Hal:inria-00099147Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Recognition of Table of Contents for Electronic Library</title>
<author><name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-2362" status="OLD"><orgName>READ</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-160" type="direct"><org type="laboratory" xml:id="struct-160" status="OLD"><orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect"><org type="institution" xml:id="struct-300291" status="OLD"><orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc><address><addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect"><org type="institution" xml:id="struct-300292" status="OLD"><orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc><address><addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect"><org type="institution" xml:id="struct-300293" status="OLD"><orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Lorraine</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName><settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Lorraine</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Murshed, Nabil" sort="Murshed, Nabil" uniqKey="Murshed N" first="Nabil" last="Murshed">Nabil Murshed</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:inria-00099147</idno>
<idno type="halId">inria-00099147</idno>
<idno type="halUri">https://hal.inria.fr/inria-00099147</idno>
<idno type="url">https://hal.inria.fr/inria-00099147</idno>
<date when="2000">2000</date>
<idno type="wicri:Area/Hal/Corpus">000102</idno>
<idno type="wicri:Area/Hal/Curation">000102</idno>
<idno type="wicri:Area/Hal/Checkpoint">000158</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Recognition of Table of Contents for Electronic Library</title>
<author><name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-2362" status="OLD"><orgName>READ</orgName>
<orgName type="acronym">READ</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
<listRelation><relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-160" type="direct"><org type="laboratory" xml:id="struct-160" status="OLD"><orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect"><org type="institution" xml:id="struct-300291" status="OLD"><orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc><address><addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect"><org type="institution" xml:id="struct-300292" status="OLD"><orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc><address><addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect"><org type="institution" xml:id="struct-300293" status="OLD"><orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Lorraine</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName><settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Lorraine</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author><name sortKey="Murshed, Nabil" sort="Murshed, Nabil" uniqKey="Murshed N" first="Nabil" last="Murshed">Nabil Murshed</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="fr"><term>digital libraries</term>
<term>reconnaissance de documents</term>
<term>structural analysis</term>
<term>tagging</term>
<term>|| étiquetage morpho-syntaxique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A labeling approach for automatic recognition of Tables of Contents (ToC) is described in this paper. A prototype is used for electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labeling without using any a priori model. Labeling is based on a Part of Speech Tagging (PoS) which is initiated by a primary labeling of text component using some specific dictionaries.</div>
</front>
</TEI>
<hal api="V3"><titleStmt><title xml:lang="en">Recognition of Table of Contents for Electronic Library</title>
<author role="aut"><persName><forename type="first">Abdel</forename>
<surname>Belaïd</surname>
</persName>
<email></email>
<idno type="halauthor">129619</idno>
<orgName ref="#struct-441569"></orgName>
<affiliation ref="#struct-2362"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Nabil</forename>
<surname>Murshed</surname>
</persName>
<email></email>
<idno type="halauthor">130329</idno>
<orgName ref="#struct-364660"></orgName>
</author>
<editor role="depositor"><persName><forename>Publications</forename>
<surname>Loria</surname>
</persName>
<email>publications@loria.fr</email>
</editor>
</titleStmt>
<editionStmt><edition n="v1" type="current"><date type="whenSubmitted">2006-09-26 08:51:21</date>
<date type="whenModified">2016-05-19 01:05:19</date>
<date type="whenReleased">2006-09-28 15:22:46</date>
<date type="whenProduced">2000</date>
</edition>
<respStmt><resp>contributor</resp>
<name key="108626"><persName><forename>Publications</forename>
<surname>Loria</surname>
</persName>
<email>publications@loria.fr</email>
</name>
</respStmt>
</editionStmt>
<publicationStmt><distributor>CCSD</distributor>
<idno type="halId">inria-00099147</idno>
<idno type="halUri">https://hal.inria.fr/inria-00099147</idno>
<idno type="halBibtex">belaid:inria-00099147</idno>
<idno type="halRefHtml">4th International Workshop on Document Analysis Systems - DAS'2000, 2000, Rio de Janeiro, Brésil, 28 p, 2000</idno>
<idno type="halRef">4th International Workshop on Document Analysis Systems - DAS'2000, 2000, Rio de Janeiro, Brésil, 28 p, 2000</idno>
</publicationStmt>
<seriesStmt><idno type="stamp" n="INRIA">INRIA - Institut National de Recherche en Informatique et en Automatique</idno>
<idno type="stamp" n="CNRS">CNRS - Centre national de la recherche scientifique</idno>
<idno type="stamp" n="LORIA2">Publications du LORIA</idno>
<idno type="stamp" n="LABO-LORIA-SET" p="LORIA">LABO-LORIA-SET</idno>
<idno type="stamp" n="LORIA">LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications</idno>
<idno type="stamp" n="LORIA-TALC" p="LORIA">Traitement automatique des langues et des connaissances</idno>
<idno type="stamp" n="UNIV-LORRAINE">Université de Lorraine</idno>
<idno type="stamp" n="INPL">Institut National Polytechnique de Lorraine</idno>
</seriesStmt>
<notesStmt><note type="commentary">Colloque avec actes et comité de lecture. internationale.</note>
<note type="audience" n="2">International</note>
<note type="invited" n="0">No</note>
<note type="popular" n="0">No</note>
<note type="peer" n="1">Yes</note>
<note type="proceedings" n="1">Yes</note>
</notesStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Recognition of Table of Contents for Electronic Library</title>
<author role="aut"><persName><forename type="first">Abdel</forename>
<surname>Belaïd</surname>
</persName>
<idno type="halAuthorId">129619</idno>
<orgName ref="#struct-441569"></orgName>
<affiliation ref="#struct-2362"></affiliation>
</author>
<author role="aut"><persName><forename type="first">Nabil</forename>
<surname>Murshed</surname>
</persName>
<idno type="halAuthorId">130329</idno>
<orgName ref="#struct-364660"></orgName>
</author>
</analytic>
<monogr><idno type="localRef">A00-R-283 || belaïd00b</idno>
<meeting><title>4th International Workshop on Document Analysis Systems - DAS'2000</title>
<date type="start">2000</date>
<settlement>Rio de Janeiro, Brésil</settlement>
</meeting>
<imprint><biblScope unit="pp">28 p</biblScope>
<date type="datePub">2000</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
<profileDesc><langUsage><language ident="en">English</language>
</langUsage>
<textClass><keywords scheme="author"><term xml:lang="fr">digital libraries</term>
<term xml:lang="fr">tagging</term>
<term xml:lang="fr">structural analysis</term>
<term xml:lang="fr">|| étiquetage morpho-syntaxique</term>
<term xml:lang="fr">reconnaissance de documents</term>
</keywords>
<classCode scheme="halDomain" n="info.info-oh">Computer Science [cs]/Other [cs.OH]</classCode>
<classCode scheme="halTypology" n="COMM">Conference papers</classCode>
</textClass>
<abstract xml:lang="en">A labeling approach for automatic recognition of Tables of Contents (ToC) is described in this paper. A prototype is used for electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labeling without using any a priori model. Labeling is based on a Part of Speech Tagging (PoS) which is initiated by a primary labeling of text component using some specific dictionaries.</abstract>
</profileDesc>
</hal>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Hal/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000158 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Hal/Checkpoint/biblio.hfd -nk 000158 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Hal |étape= Checkpoint |type= RBID |clé= Hal:inria-00099147 |texte= Recognition of Table of Contents for Electronic Library }}
This area was generated with Dilib version V0.6.32. |