Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

Identifieur interne : 000418 ( Pmc/Checkpoint ); précédent : 000417; suivant : 000419

From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks

Auteurs : Andrea Thomer [États-Unis] ; Gaurav Vaidya [États-Unis] ; Robert Guralnick [États-Unis] ; David Bloom [États-Unis] ; Laura Russell [États-Unis]

Source :

RBID : PMC:3406479

Abstract

Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names, PageBreakPageBreakvia a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.

“Compose your notes as if you were writing a letter to someone a century in the future.”

Perrine and Patton (2011)


Url:
DOI: 10.3897/zookeys.209.3247
PubMed: 22859891
PubMed Central: 3406479


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:3406479

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks</title>
<author>
<name sortKey="Thomer, Andrea" sort="Thomer, Andrea" uniqKey="Thomer A" first="Andrea" last="Thomer">Andrea Thomer</name>
<affiliation wicri:level="1">
<nlm:aff id="A1">University of Illinois, Urbana-Champaign, Graduate School of Library and Information Science, 501 E. Daniel Street, Champaign, Illinois, 61820, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Illinois, Urbana-Champaign, Graduate School of Library and Information Science, 501 E. Daniel Street, Champaign, Illinois, 61820</wicri:regionArea>
<wicri:noRegion>61820</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Vaidya, Gaurav" sort="Vaidya, Gaurav" uniqKey="Vaidya G" first="Gaurav" last="Vaidya">Gaurav Vaidya</name>
<affiliation wicri:level="1">
<nlm:aff id="A2">University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309</wicri:regionArea>
<wicri:noRegion>80309</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Guralnick, Robert" sort="Guralnick, Robert" uniqKey="Guralnick R" first="Robert" last="Guralnick">Robert Guralnick</name>
<affiliation wicri:level="1">
<nlm:aff id="A2">University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309</wicri:regionArea>
<wicri:noRegion>80309</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bloom, David" sort="Bloom, David" uniqKey="Bloom D" first="David" last="Bloom">David Bloom</name>
<affiliation wicri:level="1">
<nlm:aff id="A3">University of California, Berkeley, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Berkeley, California, 94705, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, Berkeley, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Berkeley, California, 94705</wicri:regionArea>
<wicri:noRegion>94705</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Russell, Laura" sort="Russell, Laura" uniqKey="Russell L" first="Laura" last="Russell">Laura Russell</name>
<affiliation wicri:level="1">
<nlm:aff id="A4">University of Kansas, KU Biodiversity Institute, 1345 Jayhawk Blvd., Room 606, Lawrence, Kansas, 66045, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Kansas, KU Biodiversity Institute, 1345 Jayhawk Blvd., Room 606, Lawrence, Kansas, 66045</wicri:regionArea>
<wicri:noRegion>66045</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22859891</idno>
<idno type="pmc">3406479</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3406479</idno>
<idno type="RBID">PMC:3406479</idno>
<idno type="doi">10.3897/zookeys.209.3247</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000670</idno>
<idno type="wicri:Area/Pmc/Curation">000670</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000418</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks</title>
<author>
<name sortKey="Thomer, Andrea" sort="Thomer, Andrea" uniqKey="Thomer A" first="Andrea" last="Thomer">Andrea Thomer</name>
<affiliation wicri:level="1">
<nlm:aff id="A1">University of Illinois, Urbana-Champaign, Graduate School of Library and Information Science, 501 E. Daniel Street, Champaign, Illinois, 61820, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Illinois, Urbana-Champaign, Graduate School of Library and Information Science, 501 E. Daniel Street, Champaign, Illinois, 61820</wicri:regionArea>
<wicri:noRegion>61820</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Vaidya, Gaurav" sort="Vaidya, Gaurav" uniqKey="Vaidya G" first="Gaurav" last="Vaidya">Gaurav Vaidya</name>
<affiliation wicri:level="1">
<nlm:aff id="A2">University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309</wicri:regionArea>
<wicri:noRegion>80309</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Guralnick, Robert" sort="Guralnick, Robert" uniqKey="Guralnick R" first="Robert" last="Guralnick">Robert Guralnick</name>
<affiliation wicri:level="1">
<nlm:aff id="A2">University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309</wicri:regionArea>
<wicri:noRegion>80309</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bloom, David" sort="Bloom, David" uniqKey="Bloom D" first="David" last="Bloom">David Bloom</name>
<affiliation wicri:level="1">
<nlm:aff id="A3">University of California, Berkeley, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Berkeley, California, 94705, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of California, Berkeley, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Berkeley, California, 94705</wicri:regionArea>
<wicri:noRegion>94705</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Russell, Laura" sort="Russell, Laura" uniqKey="Russell L" first="Laura" last="Russell">Laura Russell</name>
<affiliation wicri:level="1">
<nlm:aff id="A4">University of Kansas, KU Biodiversity Institute, 1345 Jayhawk Blvd., Room 606, Lawrence, Kansas, 66045, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>University of Kansas, KU Biodiversity Institute, 1345 Jayhawk Blvd., Room 606, Lawrence, Kansas, 66045</wicri:regionArea>
<wicri:noRegion>66045</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">ZooKeys</title>
<idno type="ISSN">1313-2989</idno>
<idno type="eISSN">1313-2970</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<label>Abstract</label>
<p>Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names,
<pmc-comment>PageBreak</pmc-comment>
<pmc-comment>PageBreak</pmc-comment>
via a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.</p>
<p>“Compose your notes as if you were writing a letter to someone a century in the future.”</p>
<p>
<xref ref-type="bibr" rid="B13">Perrine and Patton (2011)</xref>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Grinnell, J" uniqKey="Grinnell J">J Grinnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gwinn, N" uniqKey="Gwinn N">N Gwinn</name>
</author>
<author>
<name sortKey="Rinaldo, C" uniqKey="Rinaldo C">C Rinaldo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hagedorn, G" uniqKey="Hagedorn G">G Hagedorn</name>
</author>
<author>
<name sortKey="Mietchen, D" uniqKey="Mietchen D">D Mietchen</name>
</author>
<author>
<name sortKey="Morris, Ra" uniqKey="Morris R">RA Morris</name>
</author>
<author>
<name sortKey="Agosti, D" uniqKey="Agosti D">D Agosti</name>
</author>
<author>
<name sortKey="Penev, L" uniqKey="Penev L">L Penev</name>
</author>
<author>
<name sortKey="Berendsohn, Wg" uniqKey="Berendsohn W">WG Berendsohn</name>
</author>
<author>
<name sortKey="Hobern, D" uniqKey="Hobern D">D Hobern</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heywood, Vh" uniqKey="Heywood V">VH Heywood</name>
</author>
<author>
<name sortKey="Watson, Rt" uniqKey="Watson R">RT Watson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Henderson, J" uniqKey="Henderson J">J Henderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jenkins, M" uniqKey="Jenkins M">M Jenkins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kramer, Kl" uniqKey="Kramer K">KL Kramer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lally, Am" uniqKey="Lally A">AM Lally</name>
</author>
<author>
<name sortKey="Dunford, C" uniqKey="Dunford C">C Dunford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loreau, M" uniqKey="Loreau M">M Loreau</name>
</author>
<author>
<name sortKey="Oteng Yeboah, A" uniqKey="Oteng Yeboah A">A Oteng-Yeboah</name>
</author>
<author>
<name sortKey="Arroyo, Mtk" uniqKey="Arroyo M">MTK Arroyo</name>
</author>
<author>
<name sortKey="Babin, D" uniqKey="Babin D">D Babin</name>
</author>
<author>
<name sortKey="Barbault, R" uniqKey="Barbault R">R Barbault</name>
</author>
<author>
<name sortKey="Donoghue, M" uniqKey="Donoghue M">M Donoghue</name>
</author>
<author>
<name sortKey="Gadgil, M" uniqKey="Gadgil M">M Gadgil</name>
</author>
<author>
<name sortKey="H User, C" uniqKey="H User C">C Häuser</name>
</author>
<author>
<name sortKey="Heip, C" uniqKey="Heip C">C Heip</name>
</author>
<author>
<name sortKey="Larigauderie, A" uniqKey="Larigauderie A">A Larigauderie</name>
</author>
<author>
<name sortKey="Ma, K" uniqKey="Ma K">K Ma</name>
</author>
<author>
<name sortKey="Mace, G" uniqKey="Mace G">G Mace</name>
</author>
<author>
<name sortKey="Mooney, Ha" uniqKey="Mooney H">HA Mooney</name>
</author>
<author>
<name sortKey="Perrings, C" uniqKey="Perrings C">C Perrings</name>
</author>
<author>
<name sortKey="Raven, P" uniqKey="Raven P">P Raven</name>
</author>
<author>
<name sortKey="Sarukhan, J" uniqKey="Sarukhan J">J Sarukhan</name>
</author>
<author>
<name sortKey="Schei, P" uniqKey="Schei P">P Schei</name>
</author>
<author>
<name sortKey="Scholes, Rj" uniqKey="Scholes R">RJ Scholes</name>
</author>
<author>
<name sortKey="Watson, Rt" uniqKey="Watson R">RT Watson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Millennium Ecosystem Assessment" uniqKey="Millennium Ecosystem Assessment">Millennium Ecosystem Assessment</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moritz, C" uniqKey="Moritz C">C Moritz</name>
</author>
<author>
<name sortKey="Patton, Jl" uniqKey="Patton J">JL Patton</name>
</author>
<author>
<name sortKey="Conroy, Cj" uniqKey="Conroy C">CJ Conroy</name>
</author>
<author>
<name sortKey="Parra, Jl" uniqKey="Parra J">JL Parra</name>
</author>
<author>
<name sortKey="White, Gc" uniqKey="White G">GC White</name>
</author>
<author>
<name sortKey="Beissinger, Sr" uniqKey="Beissinger S">SR Beissinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nufio, Cr" uniqKey="Nufio C">CR Nufio</name>
</author>
<author>
<name sortKey="Mcguire, Cr" uniqKey="Mcguire C">CR McGuire</name>
</author>
<author>
<name sortKey="Bowers, Md" uniqKey="Bowers M">MD Bowers</name>
</author>
<author>
<name sortKey="Guralnick, Rp" uniqKey="Guralnick R">RP Guralnick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perrine, Jd" uniqKey="Perrine J">JD Perrine</name>
</author>
<author>
<name sortKey="Patton, Jl" uniqKey="Patton J">JL Patton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Remsen, D" uniqKey="Remsen D">D Remsen</name>
</author>
<author>
<name sortKey="Knapp, S" uniqKey="Knapp S">S Knapp</name>
</author>
<author>
<name sortKey="Georgiev, T" uniqKey="Georgiev T">T Georgiev</name>
</author>
<author>
<name sortKey="Stoev, P" uniqKey="Stoev P">P Stoev</name>
</author>
<author>
<name sortKey="Penev, L" uniqKey="Penev L">L Penev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sheffield, C" uniqKey="Sheffield C">C Sheffield</name>
</author>
<author>
<name sortKey="Nakasone, S" uniqKey="Nakasone S">S Nakasone</name>
</author>
<author>
<name sortKey="Ferrante, R" uniqKey="Ferrante R">R Ferrante</name>
</author>
<author>
<name sortKey="Peters, T" uniqKey="Peters T">T Peters</name>
</author>
<author>
<name sortKey="Russell, R" uniqKey="Russell R">R Russell</name>
</author>
<author>
<name sortKey="Van Camp, A" uniqKey="Van Camp A">A Van Camp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sheffield, C" uniqKey="Sheffield C">C Sheffield</name>
</author>
<author>
<name sortKey="Nakasone, S" uniqKey="Nakasone S">S Nakasone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tingley, Mw" uniqKey="Tingley M">MW Tingley</name>
</author>
<author>
<name sortKey="Monahan, Wb" uniqKey="Monahan W">WB Monahan</name>
</author>
<author>
<name sortKey="Beissinger, Sr" uniqKey="Beissinger S">SR Beissinger</name>
</author>
<author>
<name sortKey="Moritz, C" uniqKey="Moritz C">C Moritz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wake, D" uniqKey="Wake D">D Wake</name>
</author>
<author>
<name sortKey="Vredenburg, Vt" uniqKey="Vredenburg V">VT Vredenburg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wieczorek, J" uniqKey="Wieczorek J">J Wieczorek</name>
</author>
<author>
<name sortKey="Bloom, D" uniqKey="Bloom D">D Bloom</name>
</author>
<author>
<name sortKey="Guralnick, R" uniqKey="Guralnick R">R Guralnick</name>
</author>
<author>
<name sortKey="Blum, S" uniqKey="Blum S">S Blum</name>
</author>
<author>
<name sortKey="Doring, M" uniqKey="Doring M">M Döring</name>
</author>
<author>
<name sortKey="Giovanni, R" uniqKey="Giovanni R">R Giovanni</name>
</author>
<author>
<name sortKey="Robertson, T" uniqKey="Robertson T">T Robertson</name>
</author>
<author>
<name sortKey="Vieglais, D" uniqKey="Vieglais D">D Vieglais</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Worm, B" uniqKey="Worm B">B Worm</name>
</author>
<author>
<name sortKey="Barbier, Eb" uniqKey="Barbier E">EB Barbier</name>
</author>
<author>
<name sortKey="Beaumont, N" uniqKey="Beaumont N">N Beaumont</name>
</author>
<author>
<name sortKey="Duffy, Je" uniqKey="Duffy J">JE Duffy</name>
</author>
<author>
<name sortKey="Folke, C" uniqKey="Folke C">C Folke</name>
</author>
<author>
<name sortKey="Halpern, B" uniqKey="Halpern B">B Halpern</name>
</author>
<author>
<name sortKey="Jackson, Jbc" uniqKey="Jackson J">JBC Jackson</name>
</author>
<author>
<name sortKey="Lotze, Hk" uniqKey="Lotze H">HK Lotze</name>
</author>
<author>
<name sortKey="Micheli, F" uniqKey="Micheli F">F Micheli</name>
</author>
<author>
<name sortKey="Palumbi, Sr" uniqKey="Palumbi S">SR Palumbi</name>
</author>
<author>
<name sortKey="Sala, E" uniqKey="Sala E">E Sala</name>
</author>
<author>
<name sortKey="Selkoe, Ka" uniqKey="Selkoe K">KA Selkoe</name>
</author>
<author>
<name sortKey="Stachowicz, Jj" uniqKey="Stachowicz J">JJ Stachowicz</name>
</author>
<author>
<name sortKey="Watson, R" uniqKey="Watson R">R Watson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Zookeys</journal-id>
<journal-id journal-id-type="iso-abbrev">Zookeys</journal-id>
<journal-id journal-id-type="publisher-id">ZooKeys</journal-id>
<journal-title-group>
<journal-title>ZooKeys</journal-title>
</journal-title-group>
<issn pub-type="ppub">1313-2989</issn>
<issn pub-type="epub">1313-2970</issn>
<publisher>
<publisher-name>Pensoft Publishers</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22859891</article-id>
<article-id pub-id-type="pmc">3406479</article-id>
<article-id pub-id-type="doi">10.3897/zookeys.209.3247</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Thomer</surname>
<given-names>Andrea</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vaidya</surname>
<given-names>Gaurav</given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guralnick</surname>
<given-names>Robert</given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bloom</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="A3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Russell</surname>
<given-names>Laura</given-names>
</name>
<xref ref-type="aff" rid="A4">4</xref>
</contrib>
</contrib-group>
<aff id="A1">
<label>1</label>
University of Illinois, Urbana-Champaign, Graduate School of Library and Information Science, 501 E. Daniel Street, Champaign, Illinois, 61820, USA</aff>
<aff id="A2">
<label>2</label>
University of Colorado, Boulder; University of Colorado Museum of Natural History, Henderson Building, Boulder, Colorado, 80309, USA</aff>
<aff id="A3">
<label>3</label>
University of California, Berkeley, Museum of Vertebrate Zoology, 3101 Valley Life Sciences Building, Berkeley, California, 94705, USA</aff>
<aff id="A4">
<label>4</label>
University of Kansas, KU Biodiversity Institute, 1345 Jayhawk Blvd., Room 606, Lawrence, Kansas, 66045, USA</aff>
<author-notes>
<corresp>Corresponding author: David Bloom (
<email>dabblepop@gmail.com</email>
) </corresp>
<fn fn-type="edited-by">
<p>Academic editor: Vladimir Blagoderov</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>20</day>
<month>7</month>
<year>2012</year>
</pub-date>
<issue>209</issue>
<fpage>235</fpage>
<lpage>253</lpage>
<history>
<date date-type="received">
<day>18</day>
<month>4</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>7</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Andrea Thomer, Gaurav Vaidya, Robert Guralnick, David Bloom, Laura Russell</copyright-statement>
<license license-type="creative-commons-attribution" xlink:href="http://creativecommons.org/licenses/by/3.0">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License 3.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract>
<label>Abstract</label>
<p>Part diary, part scientific record, biological field notebooks often contain details necessary to understanding the location and environmental conditions existent during collecting events. Despite their clear value for (and recent use in) global change studies, the text-mining outputs from field notebooks have been idiosyncratic to specific research projects, and impossible to discover or re-use. Best practices and workflows for digitization, transcription, extraction, and integration with other sources are nascent or non-existent. In this paper, we demonstrate a workflow to generate structured outputs while also maintaining links to the original texts. The first step in this workflow was to place already digitized and transcribed field notebooks from the University of Colorado Museum of Natural History founder, Junius Henderson, on Wikisource, an open text transcription platform. Next, we created Wikisource templates to document places, dates, and taxa to facilitate annotation and wiki-linking. We then requested help from the public, through social media tools, to take advantage of volunteer efforts and energy. After three notebooks were fully annotated, content was converted into XML and annotations were extracted and cross-walked into Darwin Core compliant record sets. Finally, these recordsets were vetted, to provide valid taxon names,
<pmc-comment>PageBreak</pmc-comment>
<pmc-comment>PageBreak</pmc-comment>
via a process we call “taxonomic referencing.” The result is identification and mobilization of 1,068 observations from three of Henderson’s thirteen notebooks and a publishable Darwin Core record set for use in other analyses. Although challenges remain, this work demonstrates a feasible approach to unlock observations from field notebooks that enhances their discovery and interoperability without losing the narrative context from which those observations are drawn.</p>
<p>“Compose your notes as if you were writing a letter to someone a century in the future.”</p>
<p>
<xref ref-type="bibr" rid="B13">Perrine and Patton (2011)</xref>
</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>Field notes</kwd>
<kwd>notebooks</kwd>
<kwd>crowd sourcing</kwd>
<kwd>digitization</kwd>
<kwd>biodiversity</kwd>
<kwd>transcription</kwd>
<kwd>text-mining</kwd>
<kwd>Darwin Core</kwd>
<kwd>Junius Henderson</kwd>
<kwd>annotation</kwd>
<kwd>taxonomic referencing</kwd>
<kwd>natural history</kwd>
<kwd>Wikisource</kwd>
<kwd>Colorado</kwd>
<kwd>species occurrence records</kwd>
</kwd-group>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
</list>
<tree>
<country name="États-Unis">
<noRegion>
<name sortKey="Thomer, Andrea" sort="Thomer, Andrea" uniqKey="Thomer A" first="Andrea" last="Thomer">Andrea Thomer</name>
</noRegion>
<name sortKey="Bloom, David" sort="Bloom, David" uniqKey="Bloom D" first="David" last="Bloom">David Bloom</name>
<name sortKey="Guralnick, Robert" sort="Guralnick, Robert" uniqKey="Guralnick R" first="Robert" last="Guralnick">Robert Guralnick</name>
<name sortKey="Russell, Laura" sort="Russell, Laura" uniqKey="Russell L" first="Laura" last="Russell">Laura Russell</name>
<name sortKey="Vaidya, Gaurav" sort="Vaidya, Gaurav" uniqKey="Vaidya G" first="Gaurav" last="Vaidya">Gaurav Vaidya</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000418 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000418 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:3406479
   |texte=   From documents to datasets: A MediaWiki-based method of annotating and extracting species observations in century-old field notebooks
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:22859891" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024