CyberinfraV1, Pmc, Curation, bibRecord, 000076

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Identifieur interne : 000076 ( Pmc/Curation ); précédent : 000075; suivant : 000077

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Auteurs : Kevin B. Read [États-Unis] ; Jerry R. Sheehan [États-Unis] ; Michael F. Huerta [États-Unis] ; Lou S. Knecht [États-Unis] ; James G. Mork [États-Unis] ; Betsy L. Humphreys [États-Unis]

Source :

PLoS ONE [ 1932-6203 ] ; 2015.

RBID : PMC:4514623

Abstract

Objective

This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.

Methods

We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.

Results

About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.

Conclusion

In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514623

DOI: 10.1371/journal.pone.0132735
PubMed: 26207759
PubMed Central: 4514623

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000076

Links to Exploration step

PMC:4514623

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study</title>
<author><name sortKey="Read, Kevin B" sort="Read, Kevin B" uniqKey="Read K" first="Kevin B." last="Read">Kevin B. Read</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Medical Library, NYU Langone Medical Center, New York, New York, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Medical Library, NYU Langone Medical Center, New York, New York</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Sheehan, Jerry R" sort="Sheehan, Jerry R" uniqKey="Sheehan J" first="Jerry R." last="Sheehan">Jerry R. Sheehan</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Huerta, Michael F" sort="Huerta, Michael F" uniqKey="Huerta M" first="Michael F." last="Huerta">Michael F. Huerta</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Knecht, Lou S" sort="Knecht, Lou S" uniqKey="Knecht L" first="Lou S." last="Knecht">Lou S. Knecht</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Mork, James G" sort="Mork, James G" uniqKey="Mork J" first="James G." last="Mork">James G. Mork</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Humphreys, Betsy L" sort="Humphreys, Betsy L" uniqKey="Humphreys B" first="Betsy L." last="Humphreys">Betsy L. Humphreys</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26207759</idno>
<idno type="pmc">4514623</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4514623</idno>
<idno type="RBID">PMC:4514623</idno>
<idno type="doi">10.1371/journal.pone.0132735</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000076</idno>
<idno type="wicri:Area/Pmc/Curation">000076</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study</title>
<author><name sortKey="Read, Kevin B" sort="Read, Kevin B" uniqKey="Read K" first="Kevin B." last="Read">Kevin B. Read</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Medical Library, NYU Langone Medical Center, New York, New York, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Medical Library, NYU Langone Medical Center, New York, New York</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Sheehan, Jerry R" sort="Sheehan, Jerry R" uniqKey="Sheehan J" first="Jerry R." last="Sheehan">Jerry R. Sheehan</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Huerta, Michael F" sort="Huerta, Michael F" uniqKey="Huerta M" first="Michael F." last="Huerta">Michael F. Huerta</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Knecht, Lou S" sort="Knecht, Lou S" uniqKey="Knecht L" first="Lou S." last="Knecht">Lou S. Knecht</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Mork, James G" sort="Mork, James G" uniqKey="Mork J" first="James G." last="Mork">James G. Mork</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Humphreys, Betsy L" sort="Humphreys, Betsy L" uniqKey="Humphreys B" first="Betsy L." last="Humphreys">Betsy L. Humphreys</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>National Library of Medicine, National Institutes of Health, Bethesda, Maryland</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec id="sec001"><title>Objective</title>
<p>This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.</p>
</sec>
<sec id="sec002"><title>Methods</title>
<p>We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.</p>
</sec>
<sec id="sec003"><title>Results</title>
<p>About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.</p>
</sec>
<sec id="sec004"><title>Conclusion</title>
<p>In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Chan, Aw" uniqKey="Chan A">AW Chan</name>
</author>
<author><name sortKey="Song, F" uniqKey="Song F">F Song</name>
</author>
<author><name sortKey="Vickers, A" uniqKey="Vickers A">A Vickers</name>
</author>
<author><name sortKey="Jefferson, T" uniqKey="Jefferson T">T Jefferson</name>
</author>
<author><name sortKey="Dickersin, K" uniqKey="Dickersin K">K Dickersin</name>
</author>
<author><name sortKey="G Tzsche, Pc" uniqKey="G Tzsche P">PC Gøtzsche</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Neveol, A" uniqKey="Neveol A">A Névéol</name>
</author>
<author><name sortKey="Wilbur, Wj" uniqKey="Wilbur W">WJ Wilbur</name>
</author>
<author><name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Margolis, R" uniqKey="Margolis R">R Margolis</name>
</author>
<author><name sortKey="Derr, L" uniqKey="Derr L">L Derr</name>
</author>
<author><name sortKey="Dunn, M" uniqKey="Dunn M">M Dunn</name>
</author>
<author><name sortKey="Huerta, M" uniqKey="Huerta M">M Huerta</name>
</author>
<author><name sortKey="Larkin, J" uniqKey="Larkin J">J Larkin</name>
</author>
<author><name sortKey="Sheehan, J" uniqKey="Sheehan J">J Sheehan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Alsheikh Ali, Aa" uniqKey="Alsheikh Ali A">AA Alsheikh-Ali</name>
</author>
<author><name sortKey="Qureshi, W" uniqKey="Qureshi W">W Qureshi</name>
</author>
<author><name sortKey="Al Mallah, Mh" uniqKey="Al Mallah M">MH Al-Mallah</name>
</author>
<author><name sortKey="Ioannidis, Jp" uniqKey="Ioannidis J">JP Ioannidis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mooney, H" uniqKey="Mooney H">H Mooney</name>
</author>
<author><name sortKey="Newton, Mp" uniqKey="Newton M">MP Newton</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Belter, Cw" uniqKey="Belter C">CW Belter</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
<author><name sortKey="Carlson, D" uniqKey="Carlson D">D Carlson</name>
</author>
<author><name sortKey="Vision, Tj" uniqKey="Vision T">TJ Vision</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ari O, A" uniqKey="Ari O A">A. Ariño</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ross, Js" uniqKey="Ross J">JS Ross</name>
</author>
<author><name sortKey="Tse, T" uniqKey="Tse T">T Tse</name>
</author>
<author><name sortKey="Zarin, Da" uniqKey="Zarin D">DA Zarin</name>
</author>
<author><name sortKey="Xu, H" uniqKey="Xu H">H Xu</name>
</author>
<author><name sortKey="Zhou, L" uniqKey="Zhou L">L Zhou</name>
</author>
<author><name sortKey="Krumholz, Hm" uniqKey="Krumholz H">HM Krumholz</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Vines, Th" uniqKey="Vines T">TH Vines</name>
</author>
<author><name sortKey="Albert, Ay" uniqKey="Albert A">AY Albert</name>
</author>
<author><name sortKey="Andrew, Rl" uniqKey="Andrew R">RL Andrew</name>
</author>
<author><name sortKey="Debarre, F" uniqKey="Debarre F">F Débarre</name>
</author>
<author><name sortKey="Bock, Dg" uniqKey="Bock D">DG Bock</name>
</author>
<author><name sortKey="Franklin, Mt" uniqKey="Franklin M">MT Franklin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Hinchliff, Ce" uniqKey="Hinchliff C">CE Hinchliff</name>
</author>
<author><name sortKey="Smith, Sa" uniqKey="Smith S">SA Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Robinson Garcia, N" uniqKey="Robinson Garcia N">N Robinson-Garcia</name>
</author>
<author><name sortKey="Jimenez Contreras, E" uniqKey="Jimenez Contreras E">E Jimenez-Contreras</name>
</author>
<author><name sortKey="Torres Salinas, D" uniqKey="Torres Salinas D">D Torres-Salinas</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Parson, Ma" uniqKey="Parson M">MA Parson</name>
</author>
<author><name sortKey="Duerr, R" uniqKey="Duerr R">R Duerr</name>
</author>
<author><name sortKey="Minster, Jb" uniqKey="Minster J">JB Minster</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Callaghan, S" uniqKey="Callaghan S">S Callaghan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lynch, C" uniqKey="Lynch C">C Lynch</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lindberg, Da" uniqKey="Lindberg D">DA Lindberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Thoma, Gr" uniqKey="Thoma G">GR Thoma</name>
</author>
<author><name sortKey="Ford, G" uniqKey="Ford G">G Ford</name>
</author>
<author><name sortKey="Antani, S" uniqKey="Antani S">S Antani</name>
</author>
<author><name sortKey="Demner Fushman, D" uniqKey="Demner Fushman D">D Demner-Fushman</name>
</author>
<author><name sortKey="Chung, M" uniqKey="Chung M">M Chung</name>
</author>
<author><name sortKey="Simpson, M" uniqKey="Simpson M">M Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mons, B" uniqKey="Mons B">B Mons</name>
</author>
<author><name sortKey="Van Haagen, H" uniqKey="Van Haagen H">H van Haagen</name>
</author>
<author><name sortKey="Chichester, C" uniqKey="Chichester C">C Chichester</name>
</author>
<author><name sortKey="Hoen, Pb" uniqKey="Hoen P">PB Hoen</name>
</author>
<author><name sortKey="Den Dunnen, Jt" uniqKey="Den Dunnen J">JT den Dunnen</name>
</author>
<author><name sortKey="Van Ommen, G" uniqKey="Van Ommen G">G van Ommen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Chavan, V" uniqKey="Chavan V">V Chavan</name>
</author>
<author><name sortKey="Penev, L" uniqKey="Penev L">L Penev</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Costello, Mj" uniqKey="Costello M">MJ Costello</name>
</author>
<author><name sortKey="Michener, Wk" uniqKey="Michener W">WK Michener</name>
</author>
<author><name sortKey="Gahegan, M" uniqKey="Gahegan M">M Gahegan</name>
</author>
<author><name sortKey="Zhang, Zq" uniqKey="Zhang Z">ZQ Zhang</name>
</author>
<author><name sortKey="Bourne, Pe" uniqKey="Bourne P">PE Bourne</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rousidis, D" uniqKey="Rousidis D">D Rousidis</name>
</author>
<author><name sortKey="Garoufallou, E" uniqKey="Garoufallou E">E Garoufallou</name>
</author>
<author><name sortKey="Balatsoukas, P" uniqKey="Balatsoukas P">P Balatsoukas</name>
</author>
<author><name sortKey="Sicilia, Ma" uniqKey="Sicilia M">MA Sicilia</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group><journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher><publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">26207759</article-id>
<article-id pub-id-type="pmc">4514623</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0132735</article-id>
<article-id pub-id-type="publisher-id">PONE-D-15-00963</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study</article-title>
<alt-title alt-title-type="running-head">Improving Discovery and Access to NIH-Funded Data: A Preliminary Study</alt-title>
</title-group>
<contrib-group><contrib contrib-type="author" equal-contrib="yes"><name><surname>Read</surname>
<given-names>Kevin B.</given-names>
</name>
<xref ref-type="aff" rid="aff001"><sup>1</sup>
</xref>
<xref rid="cor001" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes"><name><surname>Sheehan</surname>
<given-names>Jerry R.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes"><name><surname>Huerta</surname>
<given-names>Michael F.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes"><name><surname>Knecht</surname>
<given-names>Lou S.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes"><name><surname>Mork</surname>
<given-names>James G.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" equal-contrib="yes"><name><surname>Humphreys</surname>
<given-names>Betsy L.</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author"><collab>NIH Big Data Annotator Group</collab>
<xref ref-type="aff" rid="aff003"><sup>3</sup>
</xref>
<xref ref-type="author-notes" rid="fn001"><sup>¶</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff001"><label>1</label>
<addr-line>Medical Library, NYU Langone Medical Center, New York, New York, United States of America</addr-line>
</aff>
<aff id="aff002"><label>2</label>
<addr-line>National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</aff>
<aff id="aff003"><label>3</label>
<addr-line>National Institutes of Health, Bethesda, Maryland, United States of America</addr-line>
</aff>
<contrib-group><contrib contrib-type="editor"><name><surname>Larivière</surname>
<given-names>Vincent</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1"><addr-line>Université de Montréal, CANADA</addr-line>
</aff>
<author-notes><fn fn-type="conflict" id="coi001"><p><bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001"><p>Conceived and designed the experiments: KBR JRS MFH LSK JGM BLH. Performed the experiments: KBR JRS MFH LSK JGM BLH NBDAG. Analyzed the data: KBR JRS MFH LSK JGM BLH. Wrote the paper: KBR JRS MFH LSK JGM BLH.</p>
</fn>
<fn fn-type="other" id="fn001"><p>¶ Membership of the NIH Big Data Annotator Group is listed in the Acknowledgments.</p>
</fn>
<corresp id="cor001">* E-mail: <email>kevin.read@nyumc.org</email>
</corresp>
</author-notes>
<pub-date pub-type="epub"><day>24</day>
<month>7</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection"><year>2015</year>
</pub-date>
<volume>10</volume>
<issue>7</issue>
<elocation-id>e0132735</elocation-id>
<history><date date-type="received"><day>8</day>
<month>1</month>
<year>2015</year>
</date>
<date date-type="accepted"><day>17</day>
<month>6</month>
<year>2015</year>
</date>
</history>
<permissions><license xlink:href="https://creativecommons.org/publicdomain/zero/1.0/"><license-p>This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/publicdomain/zero/1.0/">Creative Commons CC0</ext-link>
 public domain dedication</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0132735.pdf"></self-uri>
<abstract><sec id="sec001"><title>Objective</title>
<p>This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.</p>
</sec>
<sec id="sec002"><title>Methods</title>
<p>We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.</p>
</sec>
<sec id="sec003"><title>Results</title>
<p>About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.</p>
</sec>
<sec id="sec004"><title>Conclusion</title>
<p>In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.</p>
</sec>
</abstract>
<funding-group><funding-statement>This research was supported by the Intramural Research Program of the U.S. National Institutes of Health, National Library of Medicine (NLM) and in part by an appointment to the NLM Associate Fellowship Program sponsored by the National Library of Medicine and administered by the Oak Ridge Institute for Science and Education.</funding-statement>
</funding-group>
<counts><fig-count count="5"></fig-count>
<table-count count="8"></table-count>
<page-count count="18"></page-count>
</counts>
<custom-meta-group><custom-meta id="data-availability"><meta-name>Data Availability</meta-name>
<meta-value>The data analysis file and all annotator data files are available in the Figshare repository /m9.figshare.1285515. Read K. (2015). Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A Preliminary Study (Datasets). Figshare. Available: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1285515">http://dx.doi.org/10.6084/m9.figshare.1285515</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes><title>Data Availability</title>
<p>The data analysis file and all annotator data files are available in the Figshare repository /m9.figshare.1285515. Read K. (2015). Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A Preliminary Study (Datasets). Figshare. Available: <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1285515">http://dx.doi.org/10.6084/m9.figshare.1285515</ext-link>
.</p>
</notes>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Curation

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000076 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000076 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4514623
   |texte=   Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:26207759" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024

	Serveur d'exploration Cyberinfrastructure
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration Cyberinfrastructure

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki