Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Content-lossless document image compression based on structural analysis and pattern matching

Identifieur interne : 001D77 ( Main/Merge ); précédent : 001D76; suivant : 001D78

Content-lossless document image compression based on structural analysis and pattern matching

Auteurs : Yibing Yang [Australie] ; Hong Yan [Australie] ; Donggang Yu [Australie]

Source :

RBID : ISTEX:319D7164E812A6A66B87554AB4FAC97B8586A393

Abstract

This paper presents a highly efficient content-lossless document image compression scheme. The method consists of three stages. Firstly, the image is analysed and segmented into symbols and position parameters by analysing the relation of the foreground to background and their connectivity. Secondly, the initial representative symbol set from symbols in the image is extracted and matched by direction-based bit-map analysis and matching, and the final representative and synthetic pattern set with less-repeated symbol is formed from the previous symbol set by multi-stage structure clustering and representative pattern deriving and synthesis. This final component set is reorganized into a compact library image. Finally, high ratio compression is achieved by coding relative positions of symbols, parameters of representative patterns and the library image using the adaptive arithmetic coder with different orders and the Q-Coder, respectively. Our scheme achieves much better compression and less error-map than most of alternative systems. Its lossiness can be reduced to a quite small level in a well-defined pattern deriving and synthesis manner compromising compression ratio. Our method can assure content-lossless reconstruction in our symbol-level content-lossless criteria. The method can be easily combined with soft pattern matching to extend to lossless mode. In addition, combining this method with the JBIG1 progressive mode with less-redundancy component library can achieve content-lossless progressive transmission capability. Our method can also be used to deal with various symbolic images including nested symbols like Chinese character images by means of symbolic segmentation based on only connection and position-based bit-map reconstruction.

Url:
DOI: 10.1016/S0031-3203(99)00112-0

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:319D7164E812A6A66B87554AB4FAC97B8586A393

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Content-lossless document image compression based on structural analysis and pattern matching</title>
<author>
<name sortKey="Yang, Yibing" sort="Yang, Yibing" uniqKey="Yang Y" first="Yibing" last="Yang">Yibing Yang</name>
</author>
<author>
<name sortKey="Yan, Hong" sort="Yan, Hong" uniqKey="Yan H" first="Hong" last="Yan">Hong Yan</name>
</author>
<author>
<name sortKey="Yu, Donggang" sort="Yu, Donggang" uniqKey="Yu D" first="Donggang" last="Yu">Donggang Yu</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:319D7164E812A6A66B87554AB4FAC97B8586A393</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1016/S0031-3203(99)00112-0</idno>
<idno type="url">https://api.istex.fr/document/319D7164E812A6A66B87554AB4FAC97B8586A393/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000637</idno>
<idno type="wicri:Area/Istex/Curation">000629</idno>
<idno type="wicri:Area/Istex/Checkpoint">001280</idno>
<idno type="wicri:doubleKey">0031-3203:2000:Yang Y:content:lossless:document</idno>
<idno type="wicri:Area/Main/Merge">001D77</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Content-lossless document image compression based on structural analysis and pattern matching</title>
<author>
<name sortKey="Yang, Yibing" sort="Yang, Yibing" uniqKey="Yang Y" first="Yibing" last="Yang">Yibing Yang</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Electrical Engineering, The University of Sydney, NSW 2006</wicri:regionArea>
<wicri:noRegion>NSW 2006</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Australie</country>
</affiliation>
</author>
<author>
<name sortKey="Yan, Hong" sort="Yan, Hong" uniqKey="Yan H" first="Hong" last="Yan">Hong Yan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Electrical Engineering, The University of Sydney, NSW 2006</wicri:regionArea>
<wicri:noRegion>NSW 2006</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Yu, Donggang" sort="Yu, Donggang" uniqKey="Yu D" first="Donggang" last="Yu">Donggang Yu</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Electrical Engineering, The University of Sydney, NSW 2006</wicri:regionArea>
<wicri:noRegion>NSW 2006</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1999">1999</date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="1277">1277</biblScope>
<biblScope unit="page" to="1293">1293</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">319D7164E812A6A66B87554AB4FAC97B8586A393</idno>
<idno type="DOI">10.1016/S0031-3203(99)00112-0</idno>
<idno type="PII">S0031-3203(99)00112-0</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a highly efficient content-lossless document image compression scheme. The method consists of three stages. Firstly, the image is analysed and segmented into symbols and position parameters by analysing the relation of the foreground to background and their connectivity. Secondly, the initial representative symbol set from symbols in the image is extracted and matched by direction-based bit-map analysis and matching, and the final representative and synthetic pattern set with less-repeated symbol is formed from the previous symbol set by multi-stage structure clustering and representative pattern deriving and synthesis. This final component set is reorganized into a compact library image. Finally, high ratio compression is achieved by coding relative positions of symbols, parameters of representative patterns and the library image using the adaptive arithmetic coder with different orders and the Q-Coder, respectively. Our scheme achieves much better compression and less error-map than most of alternative systems. Its lossiness can be reduced to a quite small level in a well-defined pattern deriving and synthesis manner compromising compression ratio. Our method can assure content-lossless reconstruction in our symbol-level content-lossless criteria. The method can be easily combined with soft pattern matching to extend to lossless mode. In addition, combining this method with the JBIG1 progressive mode with less-redundancy component library can achieve content-lossless progressive transmission capability. Our method can also be used to deal with various symbolic images including nested symbols like Chinese character images by means of symbolic segmentation based on only connection and position-based bit-map reconstruction.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D77 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001D77 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:319D7164E812A6A66B87554AB4FAC97B8586A393
   |texte=   Content-lossless document image compression based on structural analysis and pattern matching
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024