Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

JBIG2 text image compression based on OCR

Identifieur interne : 000332 ( PascalFrancis/Corpus ); précédent : 000331; suivant : 000333

JBIG2 text image compression based on OCR

Auteurs : JUNQING SHANG ; CHANGSONG LIU ; XIAOQING DING

Source :

RBID : Pascal:07-0377973

Descripteurs français

English descriptors

Abstract

The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0277-786X
A05       @2 6067
A08 01  1  ENG  @1 JBIG2 text image compression based on OCR
A09 01  1  ENG  @1 Document recognition and retrieval XIII : 18-19 January 2006, San Jose, California, USA
A11 01  1    @1 JUNQING SHANG
A11 02  1    @1 CHANGSONG LIU
A11 03  1    @1 XIAOQING DING
A12 01  1    @1 TAGHVA (Kazem) @9 ed.
A12 02  1    @1 LIN (Xiaofan) @9 ed.
A14 01      @1 State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University @2 Beijing, 100084 @3 CHN @Z 1 aut. @Z 2 aut. @Z 3 aut.
A18 01  1    @1 IS&T--The Society for Imaging Science and Technology @3 USA @9 org-cong.
A18 02  1    @1 Society of photo-optical instrumentation engineers @3 USA @9 org-cong.
A20       @2 60670D.1-60670D.12
A21       @1 2006
A23 01      @0 ENG
A26 01      @0 0-8194-6107-5
A43 01      @1 INIST @2 21760 @5 354000153562240120
A44       @0 0000 @1 © 2007 INIST-CNRS. All rights reserved.
A45       @0 13 ref.
A47 01  1    @0 07-0377973
A60       @1 P @2 C
A61       @0 A
A64 01  1    @0 Proceedings of SPIE, the International Society for Optical Engineering
A66 01      @0 USA
C01 01    ENG  @0 The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.
C02 01  3    @0 001B00G05P
C02 02  3    @0 001B40B30S
C02 03  X    @0 001D04A04B
C02 04  X    @0 001D04A05A
C03 01  3  FRE  @0 Codage image @5 03
C03 01  3  ENG  @0 Image coding @5 03
C03 02  3  FRE  @0 Algorithme @5 23
C03 02  3  ENG  @0 Algorithms @5 23
C03 03  3  FRE  @0 Etude expérimentale @5 30
C03 03  3  ENG  @0 Experimental study @5 30
C03 04  X  FRE  @0 Qualité image @5 41
C03 04  X  ENG  @0 Image quality @5 41
C03 04  X  SPA  @0 Calidad imagen @5 41
C03 05  X  FRE  @0 Compression image @5 61
C03 05  X  ENG  @0 Image compression @5 61
C03 05  X  SPA  @0 Compresión imagen @5 61
C03 06  3  FRE  @0 Compression donnée @5 62
C03 06  3  ENG  @0 Data compression @5 62
C03 07  3  FRE  @0 Reconnaissance optique caractère @5 63
C03 07  3  ENG  @0 Optical character recognition @5 63
C03 08  3  FRE  @0 Concordance forme @5 64
C03 08  3  ENG  @0 Pattern matching @5 64
C03 09  X  FRE  @0 Milieu dissipatif @5 65
C03 09  X  ENG  @0 Lossy medium @5 65
C03 09  X  SPA  @0 Medio dispersor @5 65
C03 10  X  FRE  @0 Désadaptation @5 66
C03 10  X  ENG  @0 Mismatching @5 66
C03 10  X  SPA  @0 Desadaptación @5 66
C03 11  3  FRE  @0 Taux compression @5 67
C03 11  3  ENG  @0 Compression ratio @5 67
C03 12  X  FRE  @0 Circuit sans perte @5 68
C03 12  X  ENG  @0 Lossless circuit @5 68
C03 12  X  SPA  @0 Circuito sin pérdida @5 68
C03 13  3  FRE  @0 Mesure phase @5 69
C03 13  3  ENG  @0 Phase measurement @5 69
C03 14  3  FRE  @0 0705P @4 INC @5 83
C03 15  3  FRE  @0 4230S @4 INC @5 84
N21       @1 246
pR  
A30 01  1  ENG  @1 Document recognition and retrieval @2 13 @3 USA @4 2006

Format Inist (serveur)

NO : PASCAL 07-0377973 INIST
ET : JBIG2 text image compression based on OCR
AU : JUNQING SHANG; CHANGSONG LIU; XIAOQING DING; TAGHVA (Kazem); LIN (Xiaofan)
AF : State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut.)
DT : Publication en série; Congrès; Niveau analytique
SO : Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Etats-Unis; Da. 2006; Vol. 6067; 60670D.1-60670D.12; Bibl. 13 ref.
LA : Anglais
EA : The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.
CC : 001B00G05P; 001B40B30S; 001D04A04B; 001D04A05A
FD : Codage image; Algorithme; Etude expérimentale; Qualité image; Compression image; Compression donnée; Reconnaissance optique caractère; Concordance forme; Milieu dissipatif; Désadaptation; Taux compression; Circuit sans perte; Mesure phase; 0705P; 4230S
ED : Image coding; Algorithms; Experimental study; Image quality; Image compression; Data compression; Optical character recognition; Pattern matching; Lossy medium; Mismatching; Compression ratio; Lossless circuit; Phase measurement
SD : Calidad imagen; Compresión imagen; Medio dispersor; Desadaptación; Circuito sin pérdida
LO : INIST-21760.354000153562240120
ID : 07-0377973

Links to Exploration step

Pascal:07-0377973

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">JBIG2 text image compression based on OCR</title>
<author>
<name sortKey="Junqing Shang" sort="Junqing Shang" uniqKey="Junqing Shang" last="Junqing Shang">JUNQING SHANG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">07-0377973</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0377973 INIST</idno>
<idno type="RBID">Pascal:07-0377973</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000332</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">JBIG2 text image compression based on OCR</title>
<author>
<name sortKey="Junqing Shang" sort="Junqing Shang" uniqKey="Junqing Shang" last="Junqing Shang">JUNQING SHANG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation>
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Compression ratio</term>
<term>Data compression</term>
<term>Experimental study</term>
<term>Image coding</term>
<term>Image compression</term>
<term>Image quality</term>
<term>Lossless circuit</term>
<term>Lossy medium</term>
<term>Mismatching</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Phase measurement</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Codage image</term>
<term>Algorithme</term>
<term>Etude expérimentale</term>
<term>Qualité image</term>
<term>Compression image</term>
<term>Compression donnée</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Milieu dissipatif</term>
<term>Désadaptation</term>
<term>Taux compression</term>
<term>Circuit sans perte</term>
<term>Mesure phase</term>
<term>0705P</term>
<term>4230S</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0277-786X</s0>
</fA01>
<fA05>
<s2>6067</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG">
<s1>JBIG2 text image compression based on OCR</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval XIII : 18-19 January 2006, San Jose, California, USA</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>JUNQING SHANG</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>CHANGSONG LIU</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>XIAOQING DING</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>TAGHVA (Kazem)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>LIN (Xiaofan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1">
<s1>IS&T--The Society for Imaging Science and Technology</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1">
<s1>Society of photo-optical instrumentation engineers</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA20>
<s2>60670D.1-60670D.12</s2>
</fA20>
<fA21>
<s1>2006</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA26 i1="01">
<s0>0-8194-6107-5</s0>
</fA26>
<fA43 i1="01">
<s1>INIST</s1>
<s2>21760</s2>
<s5>354000153562240120</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2007 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>13 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>07-0377973</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</s0>
</fC01>
<fC02 i1="01" i2="3">
<s0>001B00G05P</s0>
</fC02>
<fC02 i1="02" i2="3">
<s0>001B40B30S</s0>
</fC02>
<fC02 i1="03" i2="X">
<s0>001D04A04B</s0>
</fC02>
<fC02 i1="04" i2="X">
<s0>001D04A05A</s0>
</fC02>
<fC03 i1="01" i2="3" l="FRE">
<s0>Codage image</s0>
<s5>03</s5>
</fC03>
<fC03 i1="01" i2="3" l="ENG">
<s0>Image coding</s0>
<s5>03</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE">
<s0>Algorithme</s0>
<s5>23</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG">
<s0>Algorithms</s0>
<s5>23</s5>
</fC03>
<fC03 i1="03" i2="3" l="FRE">
<s0>Etude expérimentale</s0>
<s5>30</s5>
</fC03>
<fC03 i1="03" i2="3" l="ENG">
<s0>Experimental study</s0>
<s5>30</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Qualité image</s0>
<s5>41</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Image quality</s0>
<s5>41</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Calidad imagen</s0>
<s5>41</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Compression image</s0>
<s5>61</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Image compression</s0>
<s5>61</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Compresión imagen</s0>
<s5>61</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE">
<s0>Compression donnée</s0>
<s5>62</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG">
<s0>Data compression</s0>
<s5>62</s5>
</fC03>
<fC03 i1="07" i2="3" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>63</s5>
</fC03>
<fC03 i1="07" i2="3" l="ENG">
<s0>Optical character recognition</s0>
<s5>63</s5>
</fC03>
<fC03 i1="08" i2="3" l="FRE">
<s0>Concordance forme</s0>
<s5>64</s5>
</fC03>
<fC03 i1="08" i2="3" l="ENG">
<s0>Pattern matching</s0>
<s5>64</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Milieu dissipatif</s0>
<s5>65</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Lossy medium</s0>
<s5>65</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Medio dispersor</s0>
<s5>65</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Désadaptation</s0>
<s5>66</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Mismatching</s0>
<s5>66</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Desadaptación</s0>
<s5>66</s5>
</fC03>
<fC03 i1="11" i2="3" l="FRE">
<s0>Taux compression</s0>
<s5>67</s5>
</fC03>
<fC03 i1="11" i2="3" l="ENG">
<s0>Compression ratio</s0>
<s5>67</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Circuit sans perte</s0>
<s5>68</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Lossless circuit</s0>
<s5>68</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Circuito sin pérdida</s0>
<s5>68</s5>
</fC03>
<fC03 i1="13" i2="3" l="FRE">
<s0>Mesure phase</s0>
<s5>69</s5>
</fC03>
<fC03 i1="13" i2="3" l="ENG">
<s0>Phase measurement</s0>
<s5>69</s5>
</fC03>
<fC03 i1="14" i2="3" l="FRE">
<s0>0705P</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="15" i2="3" l="FRE">
<s0>4230S</s0>
<s4>INC</s4>
<s5>84</s5>
</fC03>
<fN21>
<s1>246</s1>
</fN21>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval</s1>
<s2>13</s2>
<s3>USA</s3>
<s4>2006</s4>
</fA30>
</pR>
</standard>
<server>
<NO>PASCAL 07-0377973 INIST</NO>
<ET>JBIG2 text image compression based on OCR</ET>
<AU>JUNQING SHANG; CHANGSONG LIU; XIAOQING DING; TAGHVA (Kazem); LIN (Xiaofan)</AU>
<AF>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Etats-Unis; Da. 2006; Vol. 6067; 60670D.1-60670D.12; Bibl. 13 ref.</SO>
<LA>Anglais</LA>
<EA>The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</EA>
<CC>001B00G05P; 001B40B30S; 001D04A04B; 001D04A05A</CC>
<FD>Codage image; Algorithme; Etude expérimentale; Qualité image; Compression image; Compression donnée; Reconnaissance optique caractère; Concordance forme; Milieu dissipatif; Désadaptation; Taux compression; Circuit sans perte; Mesure phase; 0705P; 4230S</FD>
<ED>Image coding; Algorithms; Experimental study; Image quality; Image compression; Data compression; Optical character recognition; Pattern matching; Lossy medium; Mismatching; Compression ratio; Lossless circuit; Phase measurement</ED>
<SD>Calidad imagen; Compresión imagen; Medio dispersor; Desadaptación; Circuito sin pérdida</SD>
<LO>INIST-21760.354000153562240120</LO>
<ID>07-0377973</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000332 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000332 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:07-0377973
   |texte=   JBIG2 text image compression based on OCR
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024