Automatic document processing: A survey
Identifieur interne : 002840 ( Main/Merge ); précédent : 002839; suivant : 002841Automatic document processing: A survey
Auteurs : Yuan Y. Tang [Hong Kong] ; Seong-Whan Lee [Corée du Sud] ; Ching Y. Suen [Corée du Sud]Source :
- Pattern Recognition [ 0031-3203 ] ; 1996.
Abstract
Surveys of the basic concepts and underlying techniques are presented in this paper. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed. Two categories of methods have been used in document analysis, namely, (1) hierarchical methods including top-down and bottomdashup approaches, (2) no-hierarchical methods including modified fractal signature. Tree transform, formatting knowledge and description language approaches have been used in document understanding. A particular case of form document processing is discussed. Form description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such as skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, etc. which have been used in these approaches are discussed.
Url:
DOI: 10.1016/S0031-3203(96)00044-1
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000B82
- to stream Istex, to step Curation: 000B67
- to stream Istex, to step Checkpoint: 001B07
Links to Exploration step
ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Automatic document processing: A survey</title>
<author><name sortKey="Tang, Yuan Y" sort="Tang, Yuan Y" uniqKey="Tang Y" first="Yuan Y." last="Tang">Yuan Y. Tang</name>
</author>
<author><name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
</author>
<author><name sortKey="Suen, Ching Y" sort="Suen, Ching Y" uniqKey="Suen C" first="Ching Y." last="Suen">Ching Y. Suen</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415</idno>
<date when="1996" year="1996">1996</date>
<idno type="doi">10.1016/S0031-3203(96)00044-1</idno>
<idno type="url">https://api.istex.fr/document/481F02E8D4008B6F673C509E6D43CC955535D415/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B82</idno>
<idno type="wicri:Area/Istex/Curation">000B67</idno>
<idno type="wicri:Area/Istex/Checkpoint">001B07</idno>
<idno type="wicri:doubleKey">0031-3203:1996:Tang Y:automatic:document:processing</idno>
<idno type="wicri:Area/Main/Merge">002840</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Automatic document processing: A survey</title>
<author><name sortKey="Tang, Yuan Y" sort="Tang, Yuan Y" uniqKey="Tang Y" first="Yuan Y." last="Tang">Yuan Y. Tang</name>
<affiliation wicri:level="1"><country wicri:rule="url">Hong Kong</country>
</affiliation>
<affiliation wicri:level="1"><country xml:lang="fr">Hong Kong</country>
<wicri:regionArea>Department of Computing Studies, Hong Kong Baptist University, Kowloon Tong, Kowloon</wicri:regionArea>
<wicri:noRegion>Kowloon</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lee, Seong Whan" sort="Lee, Seong Whan" uniqKey="Lee S" first="Seong-Whan" last="Lee">Seong-Whan Lee</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Korea University, 1, 5-ka, Anamdashdong, Seongbuk-ku, Seoul 136–701</wicri:regionArea>
<placeName><settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Suen, Ching Y" sort="Suen, Ching Y" uniqKey="Suen C" first="Ching Y." last="Suen">Ching Y. Suen</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Korea University, 1, 5-ka, Anamdashdong, Seongbuk-ku, Seoul 136–701</wicri:regionArea>
<placeName><settlement type="city">Séoul</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">29</biblScope>
<biblScope unit="issue">12</biblScope>
<biblScope unit="page" from="1931">1931</biblScope>
<biblScope unit="page" to="1952">1952</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">481F02E8D4008B6F673C509E6D43CC955535D415</idno>
<idno type="DOI">10.1016/S0031-3203(96)00044-1</idno>
<idno type="PII">S0031-3203(96)00044-1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Surveys of the basic concepts and underlying techniques are presented in this paper. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed. Two categories of methods have been used in document analysis, namely, (1) hierarchical methods including top-down and bottomdashup approaches, (2) no-hierarchical methods including modified fractal signature. Tree transform, formatting knowledge and description language approaches have been used in document understanding. A particular case of form document processing is discussed. Form description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such as skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, etc. which have been used in these approaches are discussed.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002840 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002840 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:481F02E8D4008B6F673C509E6D43CC955535D415 |texte= Automatic document processing: A survey }}
This area was generated with Dilib version V0.6.32. |