Composite document analysis by means of typographic characteristics
Identifieur interne : 002665 ( Main/Merge ); précédent : 002664; suivant : 002666Composite document analysis by means of typographic characteristics
Auteurs : Laurence Duffy ; Frank Lebourgeois ; Hubert Emptoz [France]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 1997.
Abstract
Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language
Url:
DOI: 10.1007/3-540-63791-5_14
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000436
- to stream Istex, to step Curation: 000429
- to stream Istex, to step Checkpoint: 001986
Links to Exploration step
ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54FLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author><name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author><name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1007/3-540-63791-5_14</idno>
<idno type="url">https://api.istex.fr/document/332F277976CC0117A5E8758C2755BA5958D3D54F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000436</idno>
<idno type="wicri:Area/Istex/Curation">000429</idno>
<idno type="wicri:Area/Istex/Checkpoint">001986</idno>
<idno type="wicri:doubleKey">0302-9743:1997:Duffy L:composite:document:analysis</idno>
<idno type="wicri:Area/Main/Merge">002665</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author><name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author><name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1997</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<idno type="DOI">10.1007/3-540-63791-5_14</idno>
<idno type="ChapterID">14</idno>
<idno type="ChapterID">Chap14</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002665 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002665 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F |texte= Composite document analysis by means of typographic characteristics }}
This area was generated with Dilib version V0.6.32. |