OcrV1, Main, Merge, bibRecord, 002665

Composite document analysis by means of typographic characteristics

Identifieur interne : 002665 ( Main/Merge ); précédent : 002664; suivant : 002666

Composite document analysis by means of typographic characteristics

Auteurs : Laurence Duffy ; Frank Lebourgeois ; Hubert Emptoz [France]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 1997.

RBID : ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F

Abstract

Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language

Url:

https://api.istex.fr/document/332F277976CC0117A5E8758C2755BA5958D3D54F/fulltext/pdf

DOI: 10.1007/3-540-63791-5_14

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000436
to stream Istex, to step Curation: 000429
to stream Istex, to step Checkpoint: 001986

Links to Exploration step

ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author><name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author><name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1007/3-540-63791-5_14</idno>
<idno type="url">https://api.istex.fr/document/332F277976CC0117A5E8758C2755BA5958D3D54F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000436</idno>
<idno type="wicri:Area/Istex/Curation">000429</idno>
<idno type="wicri:Area/Istex/Checkpoint">001986</idno>
<idno type="wicri:doubleKey">0302-9743:1997:Duffy L:composite:document:analysis</idno>
<idno type="wicri:Area/Main/Merge">002665</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author><name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author><name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author><name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1997</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<idno type="DOI">10.1007/3-540-63791-5_14</idno>
<idno type="ChapterID">14</idno>
<idno type="ChapterID">Chap14</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002665 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002665 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F
   |texte=   Composite document analysis by means of typographic characteristics
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Composite document analysis by means of typographic characteristics

Composite document analysis by means of typographic characteristics

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri