Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

Identifieur interne : 000028 ( PubMed/Corpus ); précédent : 000027; suivant : 000029

Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

Auteurs : Elspeth Haston ; Robert Cubey ; Martin Pullan ; Hannah Atkins ; David J. Harris

Source :

RBID : pubmed:22859881

Abstract

Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.

DOI: 10.3897/zookeys.209.3121
PubMed: 22859881

Links to Exploration step

pubmed:22859881

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.</title>
<author>
<name sortKey="Haston, Elspeth" sort="Haston, Elspeth" uniqKey="Haston E" first="Elspeth" last="Haston">Elspeth Haston</name>
<affiliation>
<nlm:affiliation>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Cubey, Robert" sort="Cubey, Robert" uniqKey="Cubey R" first="Robert" last="Cubey">Robert Cubey</name>
</author>
<author>
<name sortKey="Pullan, Martin" sort="Pullan, Martin" uniqKey="Pullan M" first="Martin" last="Pullan">Martin Pullan</name>
</author>
<author>
<name sortKey="Atkins, Hannah" sort="Atkins, Hannah" uniqKey="Atkins H" first="Hannah" last="Atkins">Hannah Atkins</name>
</author>
<author>
<name sortKey="Harris, David J" sort="Harris, David J" uniqKey="Harris D" first="David J" last="Harris">David J. Harris</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2012">2012</date>
<idno type="doi">10.3897/zookeys.209.3121</idno>
<idno type="RBID">pubmed:22859881</idno>
<idno type="pmid">22859881</idno>
<idno type="wicri:Area/PubMed/Corpus">000028</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.</title>
<author>
<name sortKey="Haston, Elspeth" sort="Haston, Elspeth" uniqKey="Haston E" first="Elspeth" last="Haston">Elspeth Haston</name>
<affiliation>
<nlm:affiliation>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Cubey, Robert" sort="Cubey, Robert" uniqKey="Cubey R" first="Robert" last="Cubey">Robert Cubey</name>
</author>
<author>
<name sortKey="Pullan, Martin" sort="Pullan, Martin" uniqKey="Pullan M" first="Martin" last="Pullan">Martin Pullan</name>
</author>
<author>
<name sortKey="Atkins, Hannah" sort="Atkins, Hannah" uniqKey="Atkins H" first="Hannah" last="Atkins">Hannah Atkins</name>
</author>
<author>
<name sortKey="Harris, David J" sort="Harris, David J" uniqKey="Harris D" first="David J" last="Harris">David J. Harris</name>
</author>
</analytic>
<series>
<title level="j">ZooKeys</title>
<idno type="eISSN">1313-2970</idno>
<imprint>
<date when="2012" type="published">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="PubMed-not-MEDLINE">
<PMID Version="1">22859881</PMID>
<DateCreated>
<Year>2012</Year>
<Month>08</Month>
<Day>03</Day>
</DateCreated>
<DateCompleted>
<Year>2012</Year>
<Month>08</Month>
<Day>31</Day>
</DateCompleted>
<DateRevised>
<Year>2013</Year>
<Month>05</Month>
<Day>30</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1313-2970</ISSN>
<JournalIssue CitedMedium="Internet">
<Issue>209</Issue>
<PubDate>
<Year>2012</Year>
</PubDate>
</JournalIssue>
<Title>ZooKeys</Title>
<ISOAbbreviation>Zookeys</ISOAbbreviation>
</Journal>
<ArticleTitle>Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.</ArticleTitle>
<Pagination>
<MedlinePgn>93-102</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.3897/zookeys.209.3121</ELocationID>
<Abstract>
<AbstractText>Digitisation programmes in many institutes frequently involve disparate and irregular funding, diverse selection criteria and scope, with different members of staff managing and operating the processes. These factors have influenced the decision at the Royal Botanic Garden Edinburgh to develop an integrated workflow for the digitisation of herbarium specimens which is modular and scalable to enable a single overall workflow to be used for all digitisation projects. This integrated workflow is comprised of three principal elements: a specimen workflow, a data workflow and an image workflow.The specimen workflow is strongly linked to curatorial processes which will impact on the prioritisation, selection and preparation of the specimens. The importance of including a conservation element within the digitisation workflow is highlighted. The data workflow includes the concept of three main categories of collection data: label data, curatorial data and supplementary data. It is shown that each category of data has its own properties which influence the timing of data capture within the workflow. Development of software has been carried out for the rapid capture of curatorial data, and optical character recognition (OCR) software is being used to increase the efficiency of capturing label data and supplementary data. The large number and size of the images has necessitated the inclusion of automated systems within the image workflow.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Haston</LastName>
<ForeName>Elspeth</ForeName>
<Initials>E</Initials>
<AffiliationInfo>
<Affiliation>Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Cubey</LastName>
<ForeName>Robert</ForeName>
<Initials>R</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Pullan</LastName>
<ForeName>Martin</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Atkins</LastName>
<ForeName>Hannah</ForeName>
<Initials>H</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Harris</LastName>
<ForeName>David J</ForeName>
<Initials>DJ</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2012</Year>
<Month>07</Month>
<Day>20</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>Bulgaria</Country>
<MedlineTA>Zookeys</MedlineTA>
<NlmUniqueID>101497933</NlmUniqueID>
<ISSNLinking>1313-2970</ISSNLinking>
</MedlineJournalInfo>
<OtherID Source="NLM">PMC3406469</OtherID>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Large-scale digitisation</Keyword>
<Keyword MajorTopicYN="N">curation</Keyword>
<Keyword MajorTopicYN="N">data entry</Keyword>
<Keyword MajorTopicYN="N">image capture</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2012</Year>
<Month>3</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2012</Year>
<Month>7</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="epublish">
<Year>2012</Year>
<Month>7</Month>
<Day>20</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2012</Year>
<Month>8</Month>
<Day>4</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2012</Year>
<Month>8</Month>
<Day>4</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2012</Year>
<Month>8</Month>
<Day>4</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="doi">10.3897/zookeys.209.3121</ArticleId>
<ArticleId IdType="pubmed">22859881</ArticleId>
<ArticleId IdType="pmc">PMC3406469</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000028 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000028 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:22859881
   |texte=   Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:22859881" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024