Automatic document navigation for digital content re-mastering
Identifieur interne : 001646 ( Main/Exploration ); précédent : 001645; suivant : 001647Automatic document navigation for digital content re-mastering
Auteurs : XIAOFAN LIN [États-Unis] ; Steven Simske [États-Unis]Source :
- SPIE proceedings series [ 1017-2653 ] ; 2004.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Numérisation.
English descriptors
- KwdEn :
Abstract
This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000532
- to stream PascalFrancis, to step Curation: 000258
- to stream PascalFrancis, to step Checkpoint: 000501
- to stream Main, to step Merge: 001712
- to stream Main, to step Curation: 001646
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Automatic document navigation for digital content re-mastering</title>
<author><name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">04-0470730</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0470730 INIST</idno>
<idno type="RBID">Pascal:04-0470730</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000532</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000258</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000501</idno>
<idno type="wicri:doubleKey">1017-2653:2004:Xiaofan Lin:automatic:document:navigation</idno>
<idno type="wicri:Area/Main/Merge">001712</idno>
<idno type="wicri:Area/Main/Curation">001646</idno>
<idno type="wicri:Area/Main/Exploration">001646</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Automatic document navigation for digital content re-mastering</title>
<author><name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Hewlett-Packard Laboratories, 1501 Page Mill Road, MS 1203</s1>
<s2>Palo Alto, CA 94304</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithm</term>
<term>Digitized document</term>
<term>Digitizing</term>
<term>Electronic book</term>
<term>Information browsing</term>
<term>Remastering</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Algorithme</term>
<term>Navigation information</term>
<term>Livre électronique</term>
<term>Numérisation</term>
<term>Rematriçage</term>
<term>Document numérisé</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matching for building the links. The proposed method utilizes the tree-structured dictionary and directional graph of the table of contents to efficiently conduct the text matching. Information fusion further increases the robustness of the algorithm. The experimental results on the MIT Press digital library project are discussed and the key functional features of the system are illustrated. We have also investigated how the quality of the OCR engine affects the linking algorithm. In addition, the analogy between this work and Web link mining has been pointed out.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Xiaofan Lin" sort="Xiaofan Lin" uniqKey="Xiaofan Lin" last="Xiaofan Lin">XIAOFAN LIN</name>
</region>
<name sortKey="Simske, Steven" sort="Simske, Steven" uniqKey="Simske S" first="Steven" last="Simske">Steven Simske</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001646 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001646 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:04-0470730 |texte= Automatic document navigation for digital content re-mastering }}
This area was generated with Dilib version V0.6.32. |