Improving OCR Accuracy for Classical Critical Editions
Identifieur interne : 000962 ( Main/Curation ); précédent : 000961; suivant : 000963Improving OCR Accuracy for Classical Critical Editions
Auteurs : Federico Boschetti [États-Unis] ; Matteo Romanello [États-Unis] ; Alison Babeu [États-Unis] ; David Bamman [États-Unis] ; Gregory Crane [États-Unis]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.
Abstract
Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.
Url:
DOI: 10.1007/978-3-642-04346-8_17
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000008
- to stream Istex, to step Curation: Pour aller vers cette notice dans l'étape Curation :000008
- to stream Istex, to step Checkpoint: Pour aller vers cette notice dans l'étape Curation :000484
- to stream Main, to step Merge: Pour aller vers cette notice dans l'étape Curation :000970
Links to Exploration step
ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FFLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author><name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
</author>
<author><name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
</author>
<author><name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
</author>
<author><name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
</author>
<author><name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-04346-8_17</idno>
<idno type="url">https://api.istex.fr/document/E139A13B4800B4F0FC4DA869252849D648DB14FF/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000008</idno>
<idno type="wicri:Area/Istex/Curation">000008</idno>
<idno type="wicri:Area/Istex/Checkpoint">000484</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Boschetti F:improving:ocr:accuracy</idno>
<idno type="wicri:Area/Main/Merge">000970</idno>
<idno type="wicri:Area/Main/Curation">000962</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Improving OCR Accuracy for Classical Critical Editions</title>
<author><name sortKey="Boschetti, Federico" sort="Boschetti, Federico" uniqKey="Boschetti F" first="Federico" last="Boschetti">Federico Boschetti</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Romanello, Matteo" sort="Romanello, Matteo" uniqKey="Romanello M" first="Matteo" last="Romanello">Matteo Romanello</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Babeu, Alison" sort="Babeu, Alison" uniqKey="Babeu A" first="Alison" last="Babeu">Alison Babeu</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Bamman, David" sort="Bamman, David" uniqKey="Bamman D" first="David" last="Bamman">David Bamman</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Crane, Gregory" sort="Crane, Gregory" uniqKey="Crane G" first="Gregory" last="Crane">Gregory Crane</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Perseus Digital Library, Tufts University, Eaton 124, 02155, Medford, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E139A13B4800B4F0FC4DA869252849D648DB14FF</idno>
<idno type="DOI">10.1007/978-3-642-04346-8_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper describes a work-flow designed to populate a digital library of ancient Greek critical editions with highly accurate OCR scanned text. While the most recently available OCR engines are now able after suitable training to deal with the polytonic Greek fonts used in 19th and 20th century editions, further improvements can also be achieved with postprocessing. In particular, the progressive multiple alignment method applied to different OCR outputs based on the same images is discussed in this paper.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000962 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Curation/biblio.hfd -nk 000962 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Curation |type= RBID |clé= ISTEX:E139A13B4800B4F0FC4DA869252849D648DB14FF |texte= Improving OCR Accuracy for Classical Critical Editions }}
This area was generated with Dilib version V0.6.32. |