Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Automatic document navigation for digital content re-mastering

Identifieur interne : 001646 ( Main/Exploration ); précédent : 001645; suivant : 001647

Automatic document navigation for digital content re-mastering

Auteurs : XIAOFAN LIN [États-Unis] ; Steven Simske [États-Unis]

Source :

RBID : Pascal:04-0470730

Descripteurs français

English descriptors

Abstract

This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Automatic document navigation for digital content re-mastering</title>
<author>
<name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">04-0470730</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0470730 INIST</idno>
<idno type="RBID">Pascal:04-0470730</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000532</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000258</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000501</idno>
<idno type="wicri:doubleKey">1017-2653:2004:Xiaofan Lin:automatic:document:navigation</idno>
<idno type="wicri:Area/Main/Merge">001712</idno>
<idno type="wicri:Area/Main/Curation">001646</idno>
<idno type="wicri:Area/Main/Exploration">001646</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Automatic document navigation for digital content re-mastering</title>
<author>
<name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Digitized document</term>
<term>Digitizing</term>
<term>Electronic book</term>
<term>Information browsing</term>
<term>Remastering</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Algorithme</term>
<term>Navigation information</term>
<term>Livre électronique</term>
<term>Numérisation</term>
<term>Rematriçage</term>
<term>Document numérisé</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Californie">
<name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
</region>
<name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001646 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001646 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:04-0470730
   |texte=   Automatic document navigation for digital content re-mastering
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024