Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Automatic document processing: A survey

Identifieur interne : 002840 ( Main/Merge ); précédent : 002839; suivant : 002841

Automatic document processing: A survey

Auteurs : Yuan Y. Tang [Hong Kong] ; Seong-Whan Lee [Corée du Sud] ; Ching Y. Suen [Corée du Sud]

Source :

RBID : ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415

Abstract

Surveys of the basic concepts and underlying techniques are presented in this paper. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed. Two categories of methods have been used in document analysis, namely, (1) hierarchical methods including top-down and bottomdashup approaches, (2) no-hierarchical methods including modified fractal signature. Tree transform, formatting knowledge and description language approaches have been used in document understanding. A particular case of form document processing is discussed. Form description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such as skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, etc. which have been used in these approaches are discussed.

Url:
DOI: 10.1016/S0031-3203(96)00044-1

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Automatic document processing: A survey</title>
<author>
<name sortKey="Tang, Yuan Y" sort="Tang, Yuan Y" uniqKey="Tang Y" first="Yuan Y." last="Tang">Yuan Y. Tang</name>
</author>
<author>
<name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
</author>
<author>
<name sortKey="Suen, Ching Y" sort="Suen, Ching Y" uniqKey="Suen C" first="Ching Y." last="Suen">Ching Y. Suen</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415</idno>
<date when="1996" year="1996">1996</date>
<idno type="doi">10.1016/S0031-3203(96)00044-1</idno>
<idno type="url">https://api.istex.fr/document/481F02E8D4008B6F673C509E6D43CC955535D415/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B82</idno>
<idno type="wicri:Area/Istex/Curation">000B67</idno>
<idno type="wicri:Area/Istex/Checkpoint">001B07</idno>
<idno type="wicri:doubleKey">0031-3203:1996:Tang Y:automatic:document:processing</idno>
<idno type="wicri:Area/Main/Merge">002840</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Automatic document processing: A survey</title>
<author>
<name sortKey="Tang, Yuan Y" sort="Tang, Yuan Y" uniqKey="Tang Y" first="Yuan Y." last="Tang">Yuan Y. Tang</name>
<affiliation wicri:level="1">
<country wicri:rule="url">Hong Kong</country>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Computing Studies, Hong Kong Baptist University, Kowloon Tong, Kowloon</wicri:regionArea>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Korea University, 1, 5-ka, Anamdashdong, Seongbuk-ku, Seoul 136–701</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Suen, Ching Y" sort="Suen, Ching Y" uniqKey="Suen C" first="Ching Y." last="Suen">Ching Y. Suen</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Korea University, 1, 5-ka, Anamdashdong, Seongbuk-ku, Seoul 136–701</wicri:regionArea>
<placeName>
<settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">29</biblScope>
<biblScope unit="issue">12</biblScope>
<biblScope unit="page" from="1931">1931</biblScope>
<biblScope unit="page" to="1952">1952</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">481F02E8D4008B6F673C509E6D43CC955535D415</idno>
<idno type="DOI">10.1016/S0031-3203(96)00044-1</idno>
<idno type="PII">S0031-3203(96)00044-1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Surveys of the basic concepts and underlying techniques are presented in this paper. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed. Two categories of methods have been used in document analysis, namely, (1) hierarchical methods including top-down and bottomdashup approaches, (2) no-hierarchical methods including modified fractal signature. Tree transform, formatting knowledge and description language approaches have been used in document understanding. A particular case of form document processing is discussed. Form description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such as skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, etc. which have been used in these approaches are discussed.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002840 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002840 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415
   |texte=   Automatic document processing: A survey
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024