JBIG2 text image compression based on OCR
Identifieur interne :
000332 ( PascalFrancis/Corpus );
précédent :
000331;
suivant :
000333
JBIG2 text image compression based on OCR
Auteurs : JUNQING SHANG ;
CHANGSONG LIU ;
XIAOQING DINGSource :
-
Proceedings of SPIE, the International Society for Optical Engineering [ 0277-786X ] ; 2006.
RBID : Pascal:07-0377973
Descripteurs français
- Pascal (Inist)
- Codage image,
Algorithme,
Etude expérimentale,
Qualité image,
Compression image,
Compression donnée,
Reconnaissance optique caractère,
Concordance forme,
Milieu dissipatif,
Désadaptation,
Taux compression,
Circuit sans perte,
Mesure phase,
0705P,
4230S.
English descriptors
- KwdEn :
- Algorithms,
Compression ratio,
Data compression,
Experimental study,
Image coding,
Image compression,
Image quality,
Lossless circuit,
Lossy medium,
Mismatching,
Optical character recognition,
Pattern matching,
Phase measurement.
Abstract
The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.
Notice en format standard (ISO 2709)
Pour connaître la documentation sur le format Inist Standard.
pA |
A01 | 01 | 1 | | @0 0277-786X |
---|
A05 | | | | @2 6067 |
---|
A08 | 01 | 1 | ENG | @1 JBIG2 text image compression based on OCR |
---|
A09 | 01 | 1 | ENG | @1 Document recognition and retrieval XIII : 18-19 January 2006, San Jose, California, USA |
---|
A11 | 01 | 1 | | @1 JUNQING SHANG |
---|
A11 | 02 | 1 | | @1 CHANGSONG LIU |
---|
A11 | 03 | 1 | | @1 XIAOQING DING |
---|
A12 | 01 | 1 | | @1 TAGHVA (Kazem) @9 ed. |
---|
A12 | 02 | 1 | | @1 LIN (Xiaofan) @9 ed. |
---|
A14 | 01 | | | @1 State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University @2 Beijing, 100084 @3 CHN @Z 1 aut. @Z 2 aut. @Z 3 aut. |
---|
A18 | 01 | 1 | | @1 IS&T--The Society for Imaging Science and Technology @3 USA @9 org-cong. |
---|
A18 | 02 | 1 | | @1 Society of photo-optical instrumentation engineers @3 USA @9 org-cong. |
---|
A20 | | | | @2 60670D.1-60670D.12 |
---|
A21 | | | | @1 2006 |
---|
A23 | 01 | | | @0 ENG |
---|
A26 | 01 | | | @0 0-8194-6107-5 |
---|
A43 | 01 | | | @1 INIST @2 21760 @5 354000153562240120 |
---|
A44 | | | | @0 0000 @1 © 2007 INIST-CNRS. All rights reserved. |
---|
A45 | | | | @0 13 ref. |
---|
A47 | 01 | 1 | | @0 07-0377973 |
---|
A60 | | | | @1 P @2 C |
---|
A61 | | | | @0 A |
---|
A64 | 01 | 1 | | @0 Proceedings of SPIE, the International Society for Optical Engineering |
---|
A66 | 01 | | | @0 USA |
---|
C01 | 01 | | ENG | @0 The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality. |
---|
C02 | 01 | 3 | | @0 001B00G05P |
---|
C02 | 02 | 3 | | @0 001B40B30S |
---|
C02 | 03 | X | | @0 001D04A04B |
---|
C02 | 04 | X | | @0 001D04A05A |
---|
C03 | 01 | 3 | FRE | @0 Codage image @5 03 |
---|
C03 | 01 | 3 | ENG | @0 Image coding @5 03 |
---|
C03 | 02 | 3 | FRE | @0 Algorithme @5 23 |
---|
C03 | 02 | 3 | ENG | @0 Algorithms @5 23 |
---|
C03 | 03 | 3 | FRE | @0 Etude expérimentale @5 30 |
---|
C03 | 03 | 3 | ENG | @0 Experimental study @5 30 |
---|
C03 | 04 | X | FRE | @0 Qualité image @5 41 |
---|
C03 | 04 | X | ENG | @0 Image quality @5 41 |
---|
C03 | 04 | X | SPA | @0 Calidad imagen @5 41 |
---|
C03 | 05 | X | FRE | @0 Compression image @5 61 |
---|
C03 | 05 | X | ENG | @0 Image compression @5 61 |
---|
C03 | 05 | X | SPA | @0 Compresión imagen @5 61 |
---|
C03 | 06 | 3 | FRE | @0 Compression donnée @5 62 |
---|
C03 | 06 | 3 | ENG | @0 Data compression @5 62 |
---|
C03 | 07 | 3 | FRE | @0 Reconnaissance optique caractère @5 63 |
---|
C03 | 07 | 3 | ENG | @0 Optical character recognition @5 63 |
---|
C03 | 08 | 3 | FRE | @0 Concordance forme @5 64 |
---|
C03 | 08 | 3 | ENG | @0 Pattern matching @5 64 |
---|
C03 | 09 | X | FRE | @0 Milieu dissipatif @5 65 |
---|
C03 | 09 | X | ENG | @0 Lossy medium @5 65 |
---|
C03 | 09 | X | SPA | @0 Medio dispersor @5 65 |
---|
C03 | 10 | X | FRE | @0 Désadaptation @5 66 |
---|
C03 | 10 | X | ENG | @0 Mismatching @5 66 |
---|
C03 | 10 | X | SPA | @0 Desadaptación @5 66 |
---|
C03 | 11 | 3 | FRE | @0 Taux compression @5 67 |
---|
C03 | 11 | 3 | ENG | @0 Compression ratio @5 67 |
---|
C03 | 12 | X | FRE | @0 Circuit sans perte @5 68 |
---|
C03 | 12 | X | ENG | @0 Lossless circuit @5 68 |
---|
C03 | 12 | X | SPA | @0 Circuito sin pérdida @5 68 |
---|
C03 | 13 | 3 | FRE | @0 Mesure phase @5 69 |
---|
C03 | 13 | 3 | ENG | @0 Phase measurement @5 69 |
---|
C03 | 14 | 3 | FRE | @0 0705P @4 INC @5 83 |
---|
C03 | 15 | 3 | FRE | @0 4230S @4 INC @5 84 |
---|
N21 | | | | @1 246 |
---|
|
pR |
A30 | 01 | 1 | ENG | @1 Document recognition and retrieval @2 13 @3 USA @4 2006 |
---|
|
Format Inist (serveur)
NO : | PASCAL 07-0377973 INIST |
ET : | JBIG2 text image compression based on OCR |
AU : | JUNQING SHANG; CHANGSONG LIU; XIAOQING DING; TAGHVA (Kazem); LIN (Xiaofan) |
AF : | State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut.) |
DT : | Publication en série; Congrès; Niveau analytique |
SO : | Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Etats-Unis; Da. 2006; Vol. 6067; 60670D.1-60670D.12; Bibl. 13 ref. |
LA : | Anglais |
EA : | The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality. |
CC : | 001B00G05P; 001B40B30S; 001D04A04B; 001D04A05A |
FD : | Codage image; Algorithme; Etude expérimentale; Qualité image; Compression image; Compression donnée; Reconnaissance optique caractère; Concordance forme; Milieu dissipatif; Désadaptation; Taux compression; Circuit sans perte; Mesure phase; 0705P; 4230S |
ED : | Image coding; Algorithms; Experimental study; Image quality; Image compression; Data compression; Optical character recognition; Pattern matching; Lossy medium; Mismatching; Compression ratio; Lossless circuit; Phase measurement |
SD : | Calidad imagen; Compresión imagen; Medio dispersor; Desadaptación; Circuito sin pérdida |
LO : | INIST-21760.354000153562240120 |
ID : | 07-0377973 |
Links to Exploration step
Pascal:07-0377973
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">JBIG2 text image compression based on OCR</title>
<author><name sortKey="Junqing Shang" sort="Junqing Shang" uniqKey="Junqing Shang" last="Junqing Shang">JUNQING SHANG</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0377973</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0377973 INIST</idno>
<idno type="RBID">Pascal:07-0377973</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000332</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">JBIG2 text image compression based on OCR</title>
<author><name sortKey="Junqing Shang" sort="Junqing Shang" uniqKey="Junqing Shang" last="Junqing Shang">JUNQING SHANG</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Compression ratio</term>
<term>Data compression</term>
<term>Experimental study</term>
<term>Image coding</term>
<term>Image compression</term>
<term>Image quality</term>
<term>Lossless circuit</term>
<term>Lossy medium</term>
<term>Mismatching</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Phase measurement</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Codage image</term>
<term>Algorithme</term>
<term>Etude expérimentale</term>
<term>Qualité image</term>
<term>Compression image</term>
<term>Compression donnée</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Milieu dissipatif</term>
<term>Désadaptation</term>
<term>Taux compression</term>
<term>Circuit sans perte</term>
<term>Mesure phase</term>
<term>0705P</term>
<term>4230S</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0277-786X</s0>
</fA01>
<fA05><s2>6067</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>JBIG2 text image compression based on OCR</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Document recognition and retrieval XIII : 18-19 January 2006, San Jose, California, USA</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>JUNQING SHANG</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>CHANGSONG LIU</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>XIAOQING DING</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>TAGHVA (Kazem)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>LIN (Xiaofan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1"><s1>IS&T--The Society for Imaging Science and Technology</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1"><s1>Society of photo-optical instrumentation engineers</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA20><s2>60670D.1-60670D.12</s2>
</fA20>
<fA21><s1>2006</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA26 i1="01"><s0>0-8194-6107-5</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>21760</s2>
<s5>354000153562240120</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2007 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>13 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>07-0377973</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA64 i1="01" i2="1"><s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01"><s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</s0>
</fC01>
<fC02 i1="01" i2="3"><s0>001B00G05P</s0>
</fC02>
<fC02 i1="02" i2="3"><s0>001B40B30S</s0>
</fC02>
<fC02 i1="03" i2="X"><s0>001D04A04B</s0>
</fC02>
<fC02 i1="04" i2="X"><s0>001D04A05A</s0>
</fC02>
<fC03 i1="01" i2="3" l="FRE"><s0>Codage image</s0>
<s5>03</s5>
</fC03>
<fC03 i1="01" i2="3" l="ENG"><s0>Image coding</s0>
<s5>03</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE"><s0>Algorithme</s0>
<s5>23</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG"><s0>Algorithms</s0>
<s5>23</s5>
</fC03>
<fC03 i1="03" i2="3" l="FRE"><s0>Etude expérimentale</s0>
<s5>30</s5>
</fC03>
<fC03 i1="03" i2="3" l="ENG"><s0>Experimental study</s0>
<s5>30</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Qualité image</s0>
<s5>41</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Image quality</s0>
<s5>41</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Calidad imagen</s0>
<s5>41</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Compression image</s0>
<s5>61</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Image compression</s0>
<s5>61</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Compresión imagen</s0>
<s5>61</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE"><s0>Compression donnée</s0>
<s5>62</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG"><s0>Data compression</s0>
<s5>62</s5>
</fC03>
<fC03 i1="07" i2="3" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>63</s5>
</fC03>
<fC03 i1="07" i2="3" l="ENG"><s0>Optical character recognition</s0>
<s5>63</s5>
</fC03>
<fC03 i1="08" i2="3" l="FRE"><s0>Concordance forme</s0>
<s5>64</s5>
</fC03>
<fC03 i1="08" i2="3" l="ENG"><s0>Pattern matching</s0>
<s5>64</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Milieu dissipatif</s0>
<s5>65</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Lossy medium</s0>
<s5>65</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Medio dispersor</s0>
<s5>65</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Désadaptation</s0>
<s5>66</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Mismatching</s0>
<s5>66</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Desadaptación</s0>
<s5>66</s5>
</fC03>
<fC03 i1="11" i2="3" l="FRE"><s0>Taux compression</s0>
<s5>67</s5>
</fC03>
<fC03 i1="11" i2="3" l="ENG"><s0>Compression ratio</s0>
<s5>67</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Circuit sans perte</s0>
<s5>68</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Lossless circuit</s0>
<s5>68</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Circuito sin pérdida</s0>
<s5>68</s5>
</fC03>
<fC03 i1="13" i2="3" l="FRE"><s0>Mesure phase</s0>
<s5>69</s5>
</fC03>
<fC03 i1="13" i2="3" l="ENG"><s0>Phase measurement</s0>
<s5>69</s5>
</fC03>
<fC03 i1="14" i2="3" l="FRE"><s0>0705P</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="15" i2="3" l="FRE"><s0>4230S</s0>
<s4>INC</s4>
<s5>84</s5>
</fC03>
<fN21><s1>246</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>Document recognition and retrieval</s1>
<s2>13</s2>
<s3>USA</s3>
<s4>2006</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 07-0377973 INIST</NO>
<ET>JBIG2 text image compression based on OCR</ET>
<AU>JUNQING SHANG; CHANGSONG LIU; XIAOQING DING; TAGHVA (Kazem); LIN (Xiaofan)</AU>
<AF>State Key Laboratory of Intelligent Technology and Systems Department of Electronic Engineering, Tsinghua University/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Etats-Unis; Da. 2006; Vol. 6067; 60670D.1-60670D.12; Bibl. 13 ref.</SO>
<LA>Anglais</LA>
<EA>The JBIG2 (joint bi-level image group) standard for bi-level image coding is drafted to allow encoder designs by individuals. In JBIG2, text images are compressed by pattern matching techniques. In this paper, we propose a lossy text image compression method based on OCR (optical character recognition) which compresses bi-level images into the JBIG2 format. By processing text images with OCR, we can obtain recognition results of characters and the confidence of these results. A representative symbol image could be generated for similar character image blocks by OCR results, sizes of blocks and mismatches between blocks. This symbol image could replace all the similar image blocks and thus a high compression ratio could be achieved. Experiment results show that our algorithm achieves improvements of 75.86% over lossless SPM and 14.05% over lossy PM&S in Latin Character images, and 37.9% over lossless SPM and 4.97% over lossy PM&S in Chinese character images. Our algorithm leads to much fewer substitution errors than previous lossy PM&S and thus preserves acceptable decoded image quality.</EA>
<CC>001B00G05P; 001B40B30S; 001D04A04B; 001D04A05A</CC>
<FD>Codage image; Algorithme; Etude expérimentale; Qualité image; Compression image; Compression donnée; Reconnaissance optique caractère; Concordance forme; Milieu dissipatif; Désadaptation; Taux compression; Circuit sans perte; Mesure phase; 0705P; 4230S</FD>
<ED>Image coding; Algorithms; Experimental study; Image quality; Image compression; Data compression; Optical character recognition; Pattern matching; Lossy medium; Mismatching; Compression ratio; Lossless circuit; Phase measurement</ED>
<SD>Calidad imagen; Compresión imagen; Medio dispersor; Desadaptación; Circuito sin pérdida</SD>
<LO>INIST-21760.354000153562240120</LO>
<ID>07-0377973</ID>
</server>
</inist>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000332 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000332 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien
|wiki= Ticri/CIDE
|area= OcrV1
|flux= PascalFrancis
|étape= Corpus
|type= RBID
|clé= Pascal:07-0377973
|texte= JBIG2 text image compression based on OCR
}}
| This area was generated with Dilib version V0.6.32. Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024 | |