Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Document Reverse Engineering: From Paper to XML

Identifieur interne : 001993 ( Main/Merge ); précédent : 001992; suivant : 001994

Document Reverse Engineering: From Paper to XML

Auteurs : Kyong-Ho Lee [États-Unis] ; Yoon-Chul Choy [Corée du Sud] ; Sung-Bae Cho [Corée du Sud] ; Xiao Tang [États-Unis] ; Victor Mccrary [États-Unis]

Source :

RBID : ISTEX:09C513E24DF93766F77EC8FA412D79E76CCDC8F0

Abstract

Abstract: Since XML has the advantage of embedding logical structure information into documents, it is widely used as the universal format for structured documents on the Web. This makes it attractive to convert paper-based documents with logical hierarchy into XML representations automatically. Document image analysis and understanding [1] consists of two phases: geometric and logical structure analysis. Because the two phases take different kinds of data as input, it may not be desirable to apply the same method to them. Targeting technical journal document with multiple pages, we present a hybridization of knowledge-based and syntactic methods for geometric and logical structure analysis of document images.

Url:
DOI: 10.1007/3-540-45869-7_53

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:09C513E24DF93766F77EC8FA412D79E76CCDC8F0

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Document Reverse Engineering: From Paper to XML</title>
<author>
<name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong-Ho" last="Lee">Kyong-Ho Lee</name>
</author>
<author>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon-Chul" last="Choy">Yoon-Chul Choy</name>
</author>
<author>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
</author>
<author>
<name sortKey="Tang, Xiao" sort="Tang, Xiao" uniqKey="Tang X" first="Xiao" last="Tang">Xiao Tang</name>
</author>
<author>
<name sortKey="Mccrary, Victor" sort="Mccrary, Victor" uniqKey="Mccrary V" first="Victor" last="Mccrary">Victor Mccrary</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:09C513E24DF93766F77EC8FA412D79E76CCDC8F0</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-45869-7_53</idno>
<idno type="url">https://api.istex.fr/document/09C513E24DF93766F77EC8FA412D79E76CCDC8F0/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003B19</idno>
<idno type="wicri:Area/Istex/Curation">003853</idno>
<idno type="wicri:Area/Istex/Checkpoint">001027</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Lee K:document:reverse:engineering</idno>
<idno type="wicri:Area/Main/Merge">001993</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Document Reverse Engineering: From Paper to XML</title>
<author>
<name sortKey="Lee, Kyong Ho" sort="Lee, Kyong Ho" uniqKey="Lee K" first="Kyong-Ho" last="Lee">Kyong-Ho Lee</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Institute of Standards and Technology, 100 Bureau Drive, 20889, Gaithersburg, MD</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: kyongho@nist.gov</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Choy, Yoon Chul" sort="Choy, Yoon Chul" uniqKey="Choy Y" first="Yoon-Chul" last="Choy">Yoon-Chul Choy</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. Computer Science, Yonsei Univ., 134 Shinchon-dong, 120-749, Seodaemun-ku, Seoul</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Cho, Sung Bae" sort="Cho, Sung Bae" uniqKey="Cho S" first="Sung-Bae" last="Cho">Sung-Bae Cho</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Dept. Computer Science, Yonsei Univ., 134 Shinchon-dong, 120-749, Seodaemun-ku, Seoul</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Tang, Xiao" sort="Tang, Xiao" uniqKey="Tang X" first="Xiao" last="Tang">Xiao Tang</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Institute of Standards and Technology, 100 Bureau Drive, 20889, Gaithersburg, MD</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: xiao.tang@nist.gov</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Mccrary, Victor" sort="Mccrary, Victor" uniqKey="Mccrary V" first="Victor" last="Mccrary">Victor Mccrary</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Institute of Standards and Technology, 100 Bureau Drive, 20889, Gaithersburg, MD</wicri:regionArea>
<placeName>
<region type="state">Maryland</region>
</placeName>
</affiliation>
<affiliation>
<wicri:noCountry code="no comma">E-mail: victor.mccrary@nist.gov</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">09C513E24DF93766F77EC8FA412D79E76CCDC8F0</idno>
<idno type="DOI">10.1007/3-540-45869-7_53</idno>
<idno type="ChapterID">53</idno>
<idno type="ChapterID">Chap53</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Since XML has the advantage of embedding logical structure information into documents, it is widely used as the universal format for structured documents on the Web. This makes it attractive to convert paper-based documents with logical hierarchy into XML representations automatically. Document image analysis and understanding [1] consists of two phases: geometric and logical structure analysis. Because the two phases take different kinds of data as input, it may not be desirable to apply the same method to them. Targeting technical journal document with multiple pages, we present a hybridization of knowledge-based and syntactic methods for geometric and logical structure analysis of document images.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001993 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001993 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:09C513E24DF93766F77EC8FA412D79E76CCDC8F0
   |texte=   Document Reverse Engineering: From Paper to XML
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024