The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
Identifieur interne : 000200 ( Ncbi/Curation ); précédent : 000199; suivant : 000201The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels
Auteurs : Robyn E. Drinkwater [Royaume-Uni] ; Robert W. N. Cubey [Royaume-Uni] ; Elspeth M. Haston [Royaume-Uni]Source :
- PhytoKeys [ 1314-2011 ] ; 2014.
Abstract
At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed.
When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.
Url:
DOI: 10.3897/phytokeys.38.7168
PubMed: 25009435
PubMed Central: 4086207
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000190
- to stream Pmc, to step Curation: Pour aller vers cette notice dans l'étape Curation :000190
- to stream Pmc, to step Checkpoint: Pour aller vers cette notice dans l'étape Curation :000034
- to stream PubMed, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000017
- to stream PubMed, to step Curation: Pour aller vers cette notice dans l'étape Curation :000017
- to stream PubMed, to step Checkpoint: Pour aller vers cette notice dans l'étape Curation :000017
- to stream Ncbi, to step Merge: Pour aller vers cette notice dans l'étape Curation :000200
Links to Exploration step
PMC:4086207Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels</title>
<author><name sortKey="Drinkwater, Robyn E" sort="Drinkwater, Robyn E" uniqKey="Drinkwater R" first="Robyn E." last="Drinkwater">Robyn E. Drinkwater</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Cubey, Robert W N" sort="Cubey, Robert W N" uniqKey="Cubey R" first="Robert W. N." last="Cubey">Robert W. N. Cubey</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Haston, Elspeth M" sort="Haston, Elspeth M" uniqKey="Haston E" first="Elspeth M." last="Haston">Elspeth M. Haston</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">25009435</idno>
<idno type="pmc">4086207</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086207</idno>
<idno type="RBID">PMC:4086207</idno>
<idno type="doi">10.3897/phytokeys.38.7168</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000190</idno>
<idno type="wicri:Area/Pmc/Curation">000190</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000034</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="wicri:Area/PubMed/Corpus">000017</idno>
<idno type="wicri:Area/PubMed/Curation">000017</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000017</idno>
<idno type="wicri:Area/Ncbi/Merge">000200</idno>
<idno type="wicri:Area/Ncbi/Curation">000200</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels</title>
<author><name sortKey="Drinkwater, Robyn E" sort="Drinkwater, Robyn E" uniqKey="Drinkwater R" first="Robyn E." last="Drinkwater">Robyn E. Drinkwater</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Cubey, Robert W N" sort="Cubey, Robert W N" uniqKey="Cubey R" first="Robert W. N." last="Cubey">Robert W. N. Cubey</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Haston, Elspeth M" sort="Haston, Elspeth M" uniqKey="Haston E" first="Elspeth M." last="Haston">Elspeth M. Haston</name>
<affiliation wicri:level="1"><nlm:aff id="A1">Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR</wicri:regionArea>
<wicri:noRegion>EH3 5LR</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">PhytoKeys</title>
<idno type="ISSN">1314-2011</idno>
<idno type="eISSN">1314-2003</idno>
<imprint><date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><label>Abstract</label>
<p>At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed.</p>
<p>When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Barber, A" uniqKey="Barber A">A Barber</name>
</author>
<author><name sortKey="Lafferty, D" uniqKey="Lafferty D">D Lafferty</name>
</author>
<author><name sortKey="Landrum, Lr" uniqKey="Landrum L">LR Landrum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Beaman, Rs" uniqKey="Beaman R">RS Beaman</name>
</author>
<author><name sortKey="Cellinese, N" uniqKey="Cellinese N">N Cellinese</name>
</author>
<author><name sortKey="Heidorn, Pb" uniqKey="Heidorn P">PB Heidorn</name>
</author>
<author><name sortKey="Guo, Y" uniqKey="Guo Y">Y Guo</name>
</author>
<author><name sortKey="Green, Am" uniqKey="Green A">AM Green</name>
</author>
<author><name sortKey="Thiers, B" uniqKey="Thiers B">B Thiers</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bebber, Dp" uniqKey="Bebber D">DP Bebber</name>
</author>
<author><name sortKey="Carine, Ma" uniqKey="Carine M">MA Carine</name>
</author>
<author><name sortKey="Wood, Jri" uniqKey="Wood J">JRI Wood</name>
</author>
<author><name sortKey="Wortley, Ah" uniqKey="Wortley A">AH Wortley</name>
</author>
<author><name sortKey="Harris, Dj" uniqKey="Harris D">DJ Harris</name>
</author>
<author><name sortKey="Prance, Gt" uniqKey="Prance G">GT Prance</name>
</author>
<author><name sortKey="Davidse, G" uniqKey="Davidse G">G Davidse</name>
</author>
<author><name sortKey="Paige, J" uniqKey="Paige J">J Paige</name>
</author>
<author><name sortKey="Pennington, Td" uniqKey="Pennington T">TD Pennington</name>
</author>
<author><name sortKey="Robson, Nkb" uniqKey="Robson N">NKB Robson</name>
</author>
<author><name sortKey="Scotland, Rw" uniqKey="Scotland R">RW Scotland</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Berendsohn, Wg" uniqKey="Berendsohn W">WG Berendsohn</name>
</author>
<author><name sortKey="Chavan, V" uniqKey="Chavan V">V Chavan</name>
</author>
<author><name sortKey="Macklin, Ja" uniqKey="Macklin J">JA Macklin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Davis, Ph" uniqKey="Davis P">PH Davis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Elith, J" uniqKey="Elith J">J Elith</name>
</author>
<author><name sortKey="Graham, Ch" uniqKey="Graham C">CH Graham</name>
</author>
<author><name sortKey="Anderson, Rp" uniqKey="Anderson R">RP Anderson</name>
</author>
<author><name sortKey="Dudik, M" uniqKey="Dudik M">M Dudik</name>
</author>
<author><name sortKey="Ferrier, S" uniqKey="Ferrier S">S Ferrier</name>
</author>
<author><name sortKey="Guisan, A" uniqKey="Guisan A">A Guisan</name>
</author>
<author><name sortKey="Hijmans, Rj" uniqKey="Hijmans R">RJ Hijmans</name>
</author>
<author><name sortKey="Huettmann, F" uniqKey="Huettmann F">F Huettmann</name>
</author>
<author><name sortKey="Leathwick, Jr" uniqKey="Leathwick J">JR Leathwick</name>
</author>
<author><name sortKey="Lehmann, A" uniqKey="Lehmann A">A Lehmann</name>
</author>
<author><name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author><name sortKey="Lohmann, Lg" uniqKey="Lohmann L">LG Lohmann</name>
</author>
<author><name sortKey="Loiselle, Ba" uniqKey="Loiselle B">BA Loiselle</name>
</author>
<author><name sortKey="Manion, G" uniqKey="Manion G">G Manion</name>
</author>
<author><name sortKey="Moritz, C" uniqKey="Moritz C">C Moritz</name>
</author>
<author><name sortKey="Nakamura, M" uniqKey="Nakamura M">M Nakamura</name>
</author>
<author><name sortKey="Nakazawa, Y" uniqKey="Nakazawa Y">Y Nakazawa</name>
</author>
<author><name sortKey="Overton, Jmcc" uniqKey="Overton J">JMcC Overton</name>
</author>
<author><name sortKey="Peterson, At" uniqKey="Peterson A">AT Peterson</name>
</author>
<author><name sortKey="Phillips, Sj" uniqKey="Phillips S">SJ Phillips</name>
</author>
<author><name sortKey="Richardson, K" uniqKey="Richardson K">K Richardson</name>
</author>
<author><name sortKey="Scachetti Pereire, R" uniqKey="Scachetti Pereire R">R Scachetti-Pereire</name>
</author>
<author><name sortKey="Schapire, Re" uniqKey="Schapire R">RE Schapire</name>
</author>
<author><name sortKey="Sober N, J" uniqKey="Sober N J">J Soberón</name>
</author>
<author><name sortKey="Williams, S" uniqKey="Williams S">S Williams</name>
</author>
<author><name sortKey="Wisz, Ms" uniqKey="Wisz M">MS Wisz</name>
</author>
<author><name sortKey="Zimmerman, Ne" uniqKey="Zimmerman N">NE Zimmerman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hardisty, A" uniqKey="Hardisty A">A Hardisty</name>
</author>
<author><name sortKey="Roberts, D" uniqKey="Roberts D">D Roberts</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haston, E" uniqKey="Haston E">E Haston</name>
</author>
<author><name sortKey="Cubey, R" uniqKey="Cubey R">R Cubey</name>
</author>
<author><name sortKey="Harris, Dj" uniqKey="Harris D">DJ Harris</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haston, E" uniqKey="Haston E">E Haston</name>
</author>
<author><name sortKey="Cubey, R" uniqKey="Cubey R">R Cubey</name>
</author>
<author><name sortKey="Pullan, M" uniqKey="Pullan M">M Pullan</name>
</author>
<author><name sortKey="Atkins, H" uniqKey="Atkins H">H Atkins</name>
</author>
<author><name sortKey="Harris, D" uniqKey="Harris D">D Harris</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heidorn, Pb" uniqKey="Heidorn P">PB Heidorn</name>
</author>
<author><name sortKey="Wei, Q" uniqKey="Wei Q">Q Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hyam, R" uniqKey="Hyam R">R Hyam</name>
</author>
<author><name sortKey="Drinkwater, Re" uniqKey="Drinkwater R">RE Drinkwater</name>
</author>
<author><name sortKey="Harris, Dj" uniqKey="Harris D">DJ Harris</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Lafferty, D" uniqKey="Lafferty D">D Lafferty</name>
</author>
<author><name sortKey="Landrum, Lr" uniqKey="Landrum L">LR Landrum</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lavoie, C" uniqKey="Lavoie C">C Lavoie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lees, Dc" uniqKey="Lees D">DC Lees</name>
</author>
<author><name sortKey="Lack, Hw" uniqKey="Lack H">HW Lack</name>
</author>
<author><name sortKey="Rougerie, R" uniqKey="Rougerie R">R Rougerie</name>
</author>
<author><name sortKey="Hernandez Lopez, A" uniqKey="Hernandez Lopez A">A Hernandez-Lopez</name>
</author>
<author><name sortKey="Raus, T" uniqKey="Raus T">T Raus</name>
</author>
<author><name sortKey="Avtzis, Nd" uniqKey="Avtzis N">ND Avtzis</name>
</author>
<author><name sortKey="Augustin, S" uniqKey="Augustin S">S Augustin</name>
</author>
<author><name sortKey="Lopez Vaamonde, C" uniqKey="Lopez Vaamonde C">C Lopez-Vaamonde</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Miller, Ag" uniqKey="Miller A">AG Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Moen, We" uniqKey="Moen W">WE Moen</name>
</author>
<author><name sortKey="Huang, J" uniqKey="Huang J">J Huang</name>
</author>
<author><name sortKey="Mccotter, M" uniqKey="Mccotter M">M McCotter</name>
</author>
<author><name sortKey="Neill, A" uniqKey="Neill A">A Neill</name>
</author>
<author><name sortKey="Best, J" uniqKey="Best J">J Best</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nelson, G" uniqKey="Nelson G">G Nelson</name>
</author>
<author><name sortKey="Paul, D" uniqKey="Paul D">D Paul</name>
</author>
<author><name sortKey="Riccardi, G" uniqKey="Riccardi G">G Riccardi</name>
</author>
<author><name sortKey="Mast, Ar" uniqKey="Mast A">AR Mast</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Purves, D" uniqKey="Purves D">D Purves</name>
</author>
<author><name sortKey="Scharlemann, Jpw" uniqKey="Scharlemann J">JPW Scharlemann</name>
</author>
<author><name sortKey="Harfoot, M" uniqKey="Harfoot M">M Harfoot</name>
</author>
<author><name sortKey="Newbold, T" uniqKey="Newbold T">T Newbold</name>
</author>
<author><name sortKey="Tittensor, Dp" uniqKey="Tittensor D">DP Tittensor</name>
</author>
<author><name sortKey="Hutton, J" uniqKey="Hutton J">J Hutton</name>
</author>
<author><name sortKey="Emmott, S" uniqKey="Emmott S">S Emmott</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Tulig, M" uniqKey="Tulig M">M Tulig</name>
</author>
<author><name sortKey="Tarnowsky, N" uniqKey="Tarnowsky N">N Tarnowsky</name>
</author>
<author><name sortKey="Bevans, M" uniqKey="Bevans M">M Bevans</name>
</author>
<author><name sortKey="Kirchgessner, A" uniqKey="Kirchgessner A">A Kirchgessner</name>
</author>
<author><name sortKey="Thiers, B" uniqKey="Thiers B">B Thiers</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Ncbi/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000200 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Curation/biblio.hfd -nk 000200 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Ncbi |étape= Curation |type= RBID |clé= PMC:4086207 |texte= The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Curation/RBID.i -Sk "pubmed:25009435" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Curation/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |