Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data

Identifieur interne : 000642 ( Pmc/Corpus ); précédent : 000641; suivant : 000643

Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data

Auteurs : Heather A. Piwowar

Source :

RBID : PMC:3135593

Abstract

Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication.

Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available.

First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.

These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.


Url:
DOI: 10.1371/journal.pone.0018657
PubMed: 21765886
PubMed Central: 3135593

Links to Exploration step

PMC:3135593

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data</title>
<author>
<name sortKey="Piwowar, Heather A" sort="Piwowar, Heather A" uniqKey="Piwowar H" first="Heather A." last="Piwowar">Heather A. Piwowar</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21765886</idno>
<idno type="pmc">3135593</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3135593</idno>
<idno type="RBID">PMC:3135593</idno>
<idno type="doi">10.1371/journal.pone.0018657</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">000642</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data</title>
<author>
<name sortKey="Piwowar, Heather A" sort="Piwowar, Heather A" uniqKey="Piwowar H" first="Heather A." last="Piwowar">Heather A. Piwowar</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication.</p>
<p>Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available.</p>
<p>First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.</p>
<p>These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccain, K" uniqKey="Mccain K">K McCain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, H" uniqKey="Piwowar H">H Piwowar</name>
</author>
<author>
<name sortKey="Chapman, W" uniqKey="Chapman W">W Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fienberg, Se" uniqKey="Fienberg S">SE Fienberg</name>
</author>
<author>
<name sortKey="Martin, Me" uniqKey="Martin M">ME Martin</name>
</author>
<author>
<name sortKey="Straf, Ml" uniqKey="Straf M">ML Straf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cech, Tr" uniqKey="Cech T">TR Cech</name>
</author>
<author>
<name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author>
<name sortKey="Eisenberg, D" uniqKey="Eisenberg D">D Eisenberg</name>
</author>
<author>
<name sortKey="Hersey, K" uniqKey="Hersey K">K Hersey</name>
</author>
<author>
<name sortKey="Holtzman, Sh" uniqKey="Holtzman S">SH Holtzman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kakazu, Kk" uniqKey="Kakazu K">KK Kakazu</name>
</author>
<author>
<name sortKey="Cheung, Lw" uniqKey="Cheung L">LW Cheung</name>
</author>
<author>
<name sortKey="Lynne, W" uniqKey="Lynne W">W Lynne</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brazma, A" uniqKey="Brazma A">A Brazma</name>
</author>
<author>
<name sortKey="Hingamp, P" uniqKey="Hingamp P">P Hingamp</name>
</author>
<author>
<name sortKey="Quackenbush, J" uniqKey="Quackenbush J">J Quackenbush</name>
</author>
<author>
<name sortKey="Sherlock, G" uniqKey="Sherlock G">G Sherlock</name>
</author>
<author>
<name sortKey="Spellman, P" uniqKey="Spellman P">P Spellman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barrett, T" uniqKey="Barrett T">T Barrett</name>
</author>
<author>
<name sortKey="Troup, D" uniqKey="Troup D">D Troup</name>
</author>
<author>
<name sortKey="Wilhite, S" uniqKey="Wilhite S">S Wilhite</name>
</author>
<author>
<name sortKey="Ledoux, P" uniqKey="Ledoux P">P Ledoux</name>
</author>
<author>
<name sortKey="Rudnev, D" uniqKey="Rudnev D">D Rudnev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Noor, Ma" uniqKey="Noor M">MA Noor</name>
</author>
<author>
<name sortKey="Zimmerman, Kj" uniqKey="Zimmerman K">KJ Zimmerman</name>
</author>
<author>
<name sortKey="Teeter, Kc" uniqKey="Teeter K">KC Teeter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ochsner, Sa" uniqKey="Ochsner S">SA Ochsner</name>
</author>
<author>
<name sortKey="Steffen, Dl" uniqKey="Steffen D">DL Steffen</name>
</author>
<author>
<name sortKey="Stoeckert, Cj" uniqKey="Stoeckert C">CJ Stoeckert</name>
</author>
<author>
<name sortKey="Mckenna, Nj" uniqKey="Mckenna N">NJ McKenna</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reidpath, Dd" uniqKey="Reidpath D">DD Reidpath</name>
</author>
<author>
<name sortKey="Allotey, Pa" uniqKey="Allotey P">PA Allotey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kyzas, P" uniqKey="Kyzas P">P Kyzas</name>
</author>
<author>
<name sortKey="Loizou, K" uniqKey="Loizou K">K Loizou</name>
</author>
<author>
<name sortKey="Ioannidis, J" uniqKey="Ioannidis J">J Ioannidis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blumenthal, D" uniqKey="Blumenthal D">D Blumenthal</name>
</author>
<author>
<name sortKey="Campbell, Eg" uniqKey="Campbell E">EG Campbell</name>
</author>
<author>
<name sortKey="Gokhale, M" uniqKey="Gokhale M">M Gokhale</name>
</author>
<author>
<name sortKey="Yucel, R" uniqKey="Yucel R">R Yucel</name>
</author>
<author>
<name sortKey="Clarridge, B" uniqKey="Clarridge B">B Clarridge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Campbell, Eg" uniqKey="Campbell E">EG Campbell</name>
</author>
<author>
<name sortKey="Clarridge, Br" uniqKey="Clarridge B">BR Clarridge</name>
</author>
<author>
<name sortKey="Gokhale, M" uniqKey="Gokhale M">M Gokhale</name>
</author>
<author>
<name sortKey="Birenbaum, L" uniqKey="Birenbaum L">L Birenbaum</name>
</author>
<author>
<name sortKey="Hilgartner, S" uniqKey="Hilgartner S">S Hilgartner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hedstrom, M" uniqKey="Hedstrom M">M Hedstrom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ventura, B" uniqKey="Ventura B">B Ventura</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giordano, R" uniqKey="Giordano R">R Giordano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hedstrom, M" uniqKey="Hedstrom M">M Hedstrom</name>
</author>
<author>
<name sortKey="Niu, J" uniqKey="Niu J">J Niu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niu, J" uniqKey="Niu J">J Niu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lowrance, W" uniqKey="Lowrance W">W Lowrance</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, C" uniqKey="Brown C">C Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccullough, Bd" uniqKey="Mccullough B">BD McCullough</name>
</author>
<author>
<name sortKey="Mcgeary, Ka" uniqKey="Mcgeary K">KA McGeary</name>
</author>
<author>
<name sortKey="Harrison, Td" uniqKey="Harrison T">TD Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Constant, D" uniqKey="Constant D">D Constant</name>
</author>
<author>
<name sortKey="Kiesler, S" uniqKey="Kiesler S">S Kiesler</name>
</author>
<author>
<name sortKey="Sproull, L" uniqKey="Sproull L">L Sproull</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matzler, K" uniqKey="Matzler K">K Matzler</name>
</author>
<author>
<name sortKey="Renzl, B" uniqKey="Renzl B">B Renzl</name>
</author>
<author>
<name sortKey="Muller, J" uniqKey="Muller J">J Muller</name>
</author>
<author>
<name sortKey="Herting, S" uniqKey="Herting S">S Herting</name>
</author>
<author>
<name sortKey="Mooradian, T" uniqKey="Mooradian T">T Mooradian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ryu, S" uniqKey="Ryu S">S Ryu</name>
</author>
<author>
<name sortKey="Ho, Sh" uniqKey="Ho S">SH Ho</name>
</author>
<author>
<name sortKey="Han, I" uniqKey="Han I">I Han</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bitzer, J" uniqKey="Bitzer J">J Bitzer</name>
</author>
<author>
<name sortKey="Schrettl, W" uniqKey="Schrettl W">W Schrettl</name>
</author>
<author>
<name sortKey="Schroder, Pjh" uniqKey="Schroder P">PJH Schröder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Seonghee, K" uniqKey="Seonghee K">K Seonghee</name>
</author>
<author>
<name sortKey="Boryung, J" uniqKey="Boryung J">J Boryung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warlick, S" uniqKey="Warlick S">S Warlick</name>
</author>
<author>
<name sortKey="Vaughan, K" uniqKey="Vaughan K">K Vaughan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, C" uniqKey="Lee C">C Lee</name>
</author>
<author>
<name sortKey="Dourish, P" uniqKey="Dourish P">P Dourish</name>
</author>
<author>
<name sortKey="Mark, G" uniqKey="Mark G">G Mark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuo, F" uniqKey="Kuo F">F Kuo</name>
</author>
<author>
<name sortKey="Young, M" uniqKey="Young M">M Young</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rhodes, Dr" uniqKey="Rhodes D">DR Rhodes</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Shanker, K" uniqKey="Shanker K">K Shanker</name>
</author>
<author>
<name sortKey="Deshpande, N" uniqKey="Deshpande N">N Deshpande</name>
</author>
<author>
<name sortKey="Varambally, R" uniqKey="Varambally R">R Varambally</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hrynaszkiewicz, I" uniqKey="Hrynaszkiewicz I">I Hrynaszkiewicz</name>
</author>
<author>
<name sortKey="Altman, D" uniqKey="Altman D">D Altman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parkinson, H" uniqKey="Parkinson H">H Parkinson</name>
</author>
<author>
<name sortKey="Kapushesky, M" uniqKey="Kapushesky M">M Kapushesky</name>
</author>
<author>
<name sortKey="Shojatalab, M" uniqKey="Shojatalab M">M Shojatalab</name>
</author>
<author>
<name sortKey="Abeygunawardena, N" uniqKey="Abeygunawardena N">N Abeygunawardena</name>
</author>
<author>
<name sortKey="Coulson, R" uniqKey="Coulson R">R Coulson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Brazma, A" uniqKey="Brazma A">A Brazma</name>
</author>
<author>
<name sortKey="Causton, H" uniqKey="Causton H">H Causton</name>
</author>
<author>
<name sortKey="Chervitz, S" uniqKey="Chervitz S">S Chervitz</name>
</author>
<author>
<name sortKey="Edgar, R" uniqKey="Edgar R">R Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, H" uniqKey="Piwowar H">H Piwowar</name>
</author>
<author>
<name sortKey="Chapman, W" uniqKey="Chapman W">W Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, W" uniqKey="Yu W">W Yu</name>
</author>
<author>
<name sortKey="Yesupriya, A" uniqKey="Yesupriya A">A Yesupriya</name>
</author>
<author>
<name sortKey="Wulf, A" uniqKey="Wulf A">A Wulf</name>
</author>
<author>
<name sortKey="Qu, J" uniqKey="Qu J">J Qu</name>
</author>
<author>
<name sortKey="Gwinn, M" uniqKey="Gwinn M">M Gwinn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Torvik, V" uniqKey="Torvik V">V Torvik</name>
</author>
<author>
<name sortKey="Smalheiser, Nr" uniqKey="Smalheiser N">NR Smalheiser</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bird, S" uniqKey="Bird S">S Bird</name>
</author>
<author>
<name sortKey="Loper, E" uniqKey="Loper E">E Loper</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Theus, M" uniqKey="Theus M">M Theus</name>
</author>
<author>
<name sortKey="Urbanek, S" uniqKey="Urbanek S">S Urbanek</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harrell, Fe" uniqKey="Harrell F">FE Harrell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gorsuch, Rl" uniqKey="Gorsuch R">RL Gorsuch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vickers, Aj" uniqKey="Vickers A">AJ Vickers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Siemsen, E" uniqKey="Siemsen E">E Siemsen</name>
</author>
<author>
<name sortKey="Roth, A" uniqKey="Roth A">A Roth</name>
</author>
<author>
<name sortKey="Balasubramanian, S" uniqKey="Balasubramanian S">S Balasubramanian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tucker, J" uniqKey="Tucker J">J Tucker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Malin, B" uniqKey="Malin B">B Malin</name>
</author>
<author>
<name sortKey="Karp, D" uniqKey="Karp D">D Karp</name>
</author>
<author>
<name sortKey="Scheuermann, Rh" uniqKey="Scheuermann R">RH Scheuermann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Foster, M" uniqKey="Foster M">M Foster</name>
</author>
<author>
<name sortKey="Sharp, R" uniqKey="Sharp R">R Sharp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Navarro, R" uniqKey="Navarro R">R Navarro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blumenthal, D" uniqKey="Blumenthal D">D Blumenthal</name>
</author>
<author>
<name sortKey="Campbell, E" uniqKey="Campbell E">E Campbell</name>
</author>
<author>
<name sortKey="Anderson, M" uniqKey="Anderson M">M Anderson</name>
</author>
<author>
<name sortKey="Causino, N" uniqKey="Causino N">N Causino</name>
</author>
<author>
<name sortKey="Louis, K" uniqKey="Louis K">K Louis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vogeli, C" uniqKey="Vogeli C">C Vogeli</name>
</author>
<author>
<name sortKey="Yucel, R" uniqKey="Yucel R">R Yucel</name>
</author>
<author>
<name sortKey="Bendavid, E" uniqKey="Bendavid E">E Bendavid</name>
</author>
<author>
<name sortKey="Jones, L" uniqKey="Jones L">L Jones</name>
</author>
<author>
<name sortKey="Anderson, M" uniqKey="Anderson M">M Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
<author>
<name sortKey="Chapman, Ww" uniqKey="Chapman W">WW Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hosek, Sd" uniqKey="Hosek S">SD Hosek</name>
</author>
<author>
<name sortKey="Cox, Ag" uniqKey="Cox A">AG Cox</name>
</author>
<author>
<name sortKey="Ghosh Dastidar, B" uniqKey="Ghosh Dastidar B">B Ghosh-Dastidar</name>
</author>
<author>
<name sortKey="Kofner, A" uniqKey="Kofner A">A Kofner</name>
</author>
<author>
<name sortKey="Ramphal, N" uniqKey="Ramphal N">N Ramphal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bornmann, L" uniqKey="Bornmann L">L Bornmann</name>
</author>
<author>
<name sortKey="Mutz, R" uniqKey="Mutz R">R Mutz</name>
</author>
<author>
<name sortKey="Daniel, H D" uniqKey="Daniel H">H-D Daniel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">21765886</article-id>
<article-id pub-id-type="pmc">3135593</article-id>
<article-id pub-id-type="publisher-id">PONE-D-10-01931</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0018657</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Biology</subject>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Biological Data Management</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Computer Science</subject>
<subj-group>
<subject>Information Technology</subject>
<subj-group>
<subject>Databases</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Science Policy</subject>
<subj-group>
<subject>Research Assessment</subject>
<subj-group>
<subject>Bibliometrics</subject>
<subject>Research Reporting Guidelines</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Social and Behavioral Sciences</subject>
<subj-group>
<subject>Information Science</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data</article-title>
<alt-title alt-title-type="running-head">Who Shares? Who Doesn't?</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Piwowar</surname>
<given-names>Heather A.</given-names>
</name>
<xref ref-type="aff" rid="aff1"></xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
<xref ref-type="author-notes" rid="fn1">
<sup>¤</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<addr-line>Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Neylon</surname>
<given-names>Cameron</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">Science and Technology Facilities Council, United Kingdom</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>hpiwowar@gmail.com</email>
</corresp>
<fn fn-type="con">
<p>Conceived and designed the experiments: HAP. Performed the experiments: HAP. Analyzed the data: HAP. Contributed reagents/materials/analysis tools: HAP. Wrote the paper: HAP.</p>
</fn>
<fn id="fn1" fn-type="current-aff">
<label>¤</label>
<p>Current address: NESCent, The National Evolutionary Synthesis Center, Durham, North Carolina, United States of America</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2011</year>
</pub-date>
<pub-date pub-type="epub">
<day>13</day>
<month>7</month>
<year>2011</year>
</pub-date>
<volume>6</volume>
<issue>7</issue>
<elocation-id>e18657</elocation-id>
<history>
<date date-type="received">
<day>16</day>
<month>9</month>
<year>2010</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>3</month>
<year>2011</year>
</date>
</history>
<permissions>
<copyright-statement>Healther A. Piwowar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement>
<copyright-year>2011</copyright-year>
</permissions>
<abstract>
<p>Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication.</p>
<p>Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available.</p>
<p>First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.</p>
<p>These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.</p>
</abstract>
<counts>
<page-count count="13"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Sharing and reusing primary research datasets has the potential to increase research efficiency and quality. Raw data can be used to explore related or new hypotheses, particularly when combined with other available datasets. Real data are indispensable for developing and validating study methods, analysis techniques, and software implementations. The larger scientific community also benefits: Sharing data encourages multiple perspectives, helps to identify errors, discourages fraud, is useful for training new researchers, and increases efficient use of funding and population resources by avoiding duplicate data collection.</p>
<p>Eager to realize these benefits, funders, publishers, societies, and individual research groups have developed tools, resources, and policies to encourage investigators to make their data publicly available. For example, some journals require the submission of detailed biomedical datasets to publicly available databases as a condition of publication
<xref ref-type="bibr" rid="pone.0018657-McCain1">[1]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Piwowar1">[2]</xref>
. Many funders require data sharing plans as a condition of funding: Since 2003, the National Institutes of Health (NIH) in the USA has required a data sharing plan for all large funding grants
<xref ref-type="bibr" rid="pone.0018657-National1">[3]</xref>
and has more recently introduced stronger requirements for genome-wide association studies
<xref ref-type="bibr" rid="pone.0018657-National2">[4]</xref>
. As of January 2011, the US National Science Foundation requires that data sharing plans accompany all research grant proposals
<xref ref-type="bibr" rid="pone.0018657-Nation1">[5]</xref>
. Several government whitepapers
<xref ref-type="bibr" rid="pone.0018657-Fienberg1">[6]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Cech1">[7]</xref>
and high-profile editorials
<xref ref-type="bibr" rid="pone.0018657-Time1">[8]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Got1">[9]</xref>
call for responsible data sharing and reuse. Large-scale collaborative science is increasing the need to share datasets
<xref ref-type="bibr" rid="pone.0018657-Kakazu1">[10]</xref>
, , and many guidelines, tools, standards, and databases are being developed and maintained to facilitate data sharing and reuse
<xref ref-type="bibr" rid="pone.0018657-Brazma1">[12]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Barrett1">[13]</xref>
.</p>
<p>Despite these investments of time and money, we do not yet understand the impact of these initiatives. There is a well-known adage: You cannot manage what you do not measure. For those with a goal of promoting responsible data sharing, it would be helpful to evaluate the effectiveness of requirements, recommendations, and tools. When data sharing is voluntary, insights could be gained by learning which datasets are shared, on what topics, by whom, and in what locations. When policies make data sharing mandatory, monitoring is useful to understand compliance and unexpected consequences.</p>
<p>Dimensions of data sharing action and intention have been investigated by a variety of studies. Manual annotations and systematic data requests have been used to estimate the frequency of data sharing within biomedicine
<xref ref-type="bibr" rid="pone.0018657-Noor1">[14]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Ochsner1">[15]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Reidpath1">[16]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Kyzas1">[17]</xref>
, though few attempts were made to determine patterns of sharing and withholding within these samples. Blumenthal
<xref ref-type="bibr" rid="pone.0018657-Blumenthal1">[18]</xref>
, Campbell
<xref ref-type="bibr" rid="pone.0018657-Campbell1">[19]</xref>
, Hedstrom
<xref ref-type="bibr" rid="pone.0018657-Hedstrom1">[20]</xref>
, and others have used survey results to correlate self-reported instances of data sharing and withholding with self-reported attributes like industry involvement, perceived competitiveness, career productivity, and anticipated data sharing costs. Others have used surveys and interviews to analyze opinions about the effectiveness of mandates
<xref ref-type="bibr" rid="pone.0018657-Ventura1">[21]</xref>
and the value of various incentives
<xref ref-type="bibr" rid="pone.0018657-Hedstrom1">[20]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Giordano1">[22]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Hedstrom2">[23]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Niu1">[24]</xref>
. A few inventories list the data-sharing policies of funders
<xref ref-type="bibr" rid="pone.0018657-Lowrance1">[25]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-University1">[26]</xref>
and journals
<xref ref-type="bibr" rid="pone.0018657-McCain1">[1]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Brown1">[27]</xref>
, and some work has been done to correlate policy strength with outcome
<xref ref-type="bibr" rid="pone.0018657-Piwowar1">[2]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-McCullough1">[28]</xref>
. Surveys and case studies have been used to develop models of information behavior in related domains, including knowledge sharing within an organization
<xref ref-type="bibr" rid="pone.0018657-Constant1">[29]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Matzler1">[30]</xref>
, physician knowledge sharing in hospitals
<xref ref-type="bibr" rid="pone.0018657-Ryu1">[31]</xref>
, participation in open source projects
<xref ref-type="bibr" rid="pone.0018657-Bitzer1">[32]</xref>
, academic contributions to institutional archives
<xref ref-type="bibr" rid="pone.0018657-Kim1">[33]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Seonghee1">[34]</xref>
, the choice to publish in open access journals
<xref ref-type="bibr" rid="pone.0018657-Warlick1">[35]</xref>
, sharing social science datasets
<xref ref-type="bibr" rid="pone.0018657-Hedstrom1">[20]</xref>
, and participation in large-scale biomedical research collaborations
<xref ref-type="bibr" rid="pone.0018657-Lee1">[36]</xref>
.</p>
<p>Although these studies provide valuable insights and their methods facilitate investigation into an author's intentions and opinions, they have several limitations. First, associations to an investigator's intention to share data do not directly translate to associations with actually sharing data
<xref ref-type="bibr" rid="pone.0018657-Kuo1">[37]</xref>
. Second, associations that rely on self-reported data sharing and withholding likely suffer from underreporting and confounding, since people admit withholding data much less frequently than they report having experienced the data withholding of others
<xref ref-type="bibr" rid="pone.0018657-Blumenthal1">[18]</xref>
.</p>
<p>I suggest a supplemental approach for investigating research data-sharing behavior. I have collected and analyzed a large set of observed data sharing actions and associated study, investigator, journal, funding, and institutional variables. The reported analysis explores common factors behind these attributes and looks at the association between these factors and data sharing prevalence.</p>
<p>I chose to study data sharing for one particular type of data: biological gene expression microarray intensity values. Microarray studies provide a useful environment for exploring data sharing policies and behaviors. Despite being a rich resource valuable for reuse
<xref ref-type="bibr" rid="pone.0018657-Rhodes1">[38]</xref>
, microarray data are often, but not yet, universally shared. Best-practice guidelines for sharing microarray data are fairly mature
<xref ref-type="bibr" rid="pone.0018657-Brazma1">[12]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Hrynaszkiewicz1">[39]</xref>
. Two centralized databases have emerged as best-practice repositories: the Gene Expression Omnibus (GEO)
<xref ref-type="bibr" rid="pone.0018657-Barrett1">[13]</xref>
and ArrayExpress
<xref ref-type="bibr" rid="pone.0018657-Parkinson1">[40]</xref>
. Finally, high-profile letters have called for strong journal data-sharing policies
<xref ref-type="bibr" rid="pone.0018657-Ball1">[41]</xref>
, resulting in unusually strong data sharing requirements in some journals
<xref ref-type="bibr" rid="pone.0018657-Microarray1">[42]</xref>
. As such, the results here represent data sharing in an environment where it has been particularly encouraged and supported.</p>
</sec>
<sec sec-type="methods" id="s2">
<title>Methods</title>
<p>In brief, I used a full-text query to identify a set of studies in which the investigators generated gene expression microarray datasets. Best-practice data repositories were searched for associated datasets. Attributes of the studies were used to derive factors related to the investigators, journals, funding, institutions, and topic of the studies. Associations between these study factors and the frequency of public data archiving were determined through multivariate regression.</p>
<sec id="s2a">
<title>Studies for analysis</title>
<p>The set of “gene expression microarray creation” articles was identified by querying the title, abstract, and full-text of PubMed, PubMed Central, Highwire Press, Scirus, and Google Scholar with portal-specific variants of the following query:</p>
<disp-quote>
<p>
<bold>(“gene expression” [text] AND “microarray” [text] AND “cell” [text] AND “rna” [text])</bold>
</p>
<p>
<bold>AND (“rneasy” [text] OR “trizol” [text] OR “real-time pcr” [text])</bold>
</p>
<p>
<bold>NOT (“tissue microarray*” [text] OR “cpg island*” [text])</bold>
</p>
</disp-quote>
<p>Retrieved articles were mapped to PubMed identifiers whenever possible; the union of the PubMed identifiers returned by the full text portals was used as the definitive list of articles for analysis. An independent evaluation of this approach found that it identified articles that created microarray data with a precision of 90% (95% confidence interval, 86% to 93%) and a recall of 56% (52% to 61%), compared to manual identification of articles that created microarray data
<xref ref-type="bibr" rid="pone.0018657-Piwowar2">[43]</xref>
.</p>
<p>Because Google Scholar only displays the first 1000 results of a query, I was not able to view all of its hits. I tried to identify as many Google Scholar search results as possible by iteratively appending a variety of attributes to the end of the query, including various publisher names, journal title words, and years of publication, thereby retrieving distinct subsets of the results 1000 hits at a time.</p>
</sec>
<sec id="s2b">
<title>Data availability</title>
<p>The dependent variable in this study was whether each gene expression microarray research article had an associated dataset in a best-practice public centralized data repository. A community letter encouraging mandatory archiving in 2004
<xref ref-type="bibr" rid="pone.0018657-Ball1">[41]</xref>
identified three best-practice repositories for storing gene expression microarray data: NCBI's Gene Expression Omnibus (GEO), EBI's ArrayExpress, and Japan's CIBEX database. The first two were included in this analysis, since CIBEX was defunct until recently.</p>
<p>An earlier evaluation found that querying GEO and ArrayExpress with article PubMed identifiers located a representative 77% of all associated publicly available datasets
<xref ref-type="bibr" rid="pone.0018657-Piwowar3">[44]</xref>
. I used the same method for finding datasets associated with published articles in this study: I queried GEO for links to the PubMed identifiers in the analysis sample using the “pubmed_gds [filter]” and queried ArrayExpress by searching for each PubMed identifier in a downloaded copy of the ArrayExpress database. Articles linked from a dataset in either of these two centralized repositories were considered to have “shared their data” for the endpoint of this study, and those without such a link were considered not to have shared their data.</p>
</sec>
<sec id="s2c">
<title>Study attributes</title>
<p>For each study article I collected 124 attributes for use as independent variables. The attributes were collected automatically from a wide variety of sources. Basic bibliometric metadata was extracted from the MEDLINE record, including journal, year of publication, number of authors, Medical Subject Heading (MeSH) terms, number of citations from PubMed Central, inclusion in PubMed subsets for cancer, whether the journal is published with an open-access model and if it had data-submission links from Genbank, PDB, and SwissProt.</p>
<p>ISI Journal Impact Factors and associated metrics were extracted from the 2008 ISI Journal Citation Reports. I quantified the content of journal data-sharing policies based on the “Instruction for Authors” for the most commonly occurring journals.</p>
<p>NIH grant details were extracted by cross-referencing grant numbers in the MEDLINE record with the NIH award information (
<ext-link ext-link-type="uri" xlink:href="http://report.nih.gov/award/state/state.cfm">http://report.nih.gov/award/state/state.cfm</ext-link>
). From this information I tabulated the amount of total funding received for each of the fiscal years from 2003 to 2008. I also estimated the date of renewal by identifying the most recent year in which a grant number was prefixed by a “1” or “2” —indication that the grant is “new” or “renewed,” respectively.</p>
<p>The corresponding address was parsed for institution and country, following the methods of Yu et al.
<xref ref-type="bibr" rid="pone.0018657-Yu1">[45]</xref>
. Institutions were cross-referenced to the SCImago Institutions Rankings 2009 World Report (
<ext-link ext-link-type="uri" xlink:href="http://www.scimagoir.com/">http://www.scimagoir.com/</ext-link>
) to estimate the relative degree of research output and impact of the institutions.</p>
<p>Attributes of study authors were collected for first and last authors (in biomedicine, customarily, the first and last authors make the largest contributions to a study and have the most power in publication decisions). The gender of the first and last authors were estimated using the Baby Name Guesser website (
<ext-link ext-link-type="uri" xlink:href="http://www.gpeters.com/names/baby-names.php">http://www.gpeters.com/names/baby-names.php</ext-link>
). A list of prior publications in MEDLINE was extracted from Author-ity clusters, 2009 edition
<xref ref-type="bibr" rid="pone.0018657-Torvik1">[46]</xref>
, for the first and last author of each article in this study. To limit the impact of extremely large “lumped” clusters that erroneously contain the publications of more than one actual author, I excluded prior publication lists for first or last authors in the largest 2% of clusters and instead considered these data missing. For each paper in an author's publication history with PubMed identifiers numerically less than the PubMed identifier of the paper in question, I queried to find if any of these prior publications had been published in an open source journal, were included in the “gene expression microarray creation” subset themselves, or had reused gene expression data. I recorded the date of the earliest publication by the author and the number of citations to date that their earlier papers received in PubMed Central.</p>
<p>I attempted to estimate if the paper itself reused publicly available gene expression microarray data by looking for its inclusion in the list that GEO keeps of reuse at
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/projects/geo/info/ucitations.html">http://www.ncbi.nlm.nih.gov/projects/geo/info/ucitations.html</ext-link>
.</p>
<p>Data collection scripts were coded in Python version 2.5.2 (many libraries were used, including EUtils, BeautifulSoup, pyparsing and nltk
<xref ref-type="bibr" rid="pone.0018657-Bird1">[47]</xref>
) and SQLite version 3.4. Data collection source code is available at github (
<ext-link ext-link-type="uri" xlink:href="http://github.com/hpiwowar/pypub">http://github.com/hpiwowar/pypub</ext-link>
).</p>
</sec>
<sec id="s2d">
<title>Statistical methods</title>
<p>Statistical analysis was performed in R version 2.10.1
<xref ref-type="bibr" rid="pone.0018657-R1">[48]</xref>
. P-values were two-tailed. The collected data were visually explored using Mondrian version 1.1
<xref ref-type="bibr" rid="pone.0018657-Theus1">[49]</xref>
and the Hmisc package
<xref ref-type="bibr" rid="pone.0018657-Harrell1">[50]</xref>
. I applied a square-root transformation to variables representing count data to improve their normality prior to calculating correlations.</p>
<p>To calculate variable correlations, I used the hector function in the polycor library. This computes polyserial correlations between pairs of numeric and ordinal variables and polychoric correlations between two ordinal variables. I modified it to calculate Pearson correlations between numeric variables using the rcorr function in the Hmisc library. I used a pairwise-complete approach to missing data and used the nearcor function in the sfsmisc library to make the correlation matrix positive definite. A correlation heatmap was produced using the gplots library.</p>
<p>I used the nFactors library to calculate and display the scree plot for correlations.</p>
<p>Since the correlation matrix was not well-behaved enough for maximum-likelihood factor analysis, first-order exploratory factor analysis was performed with the fa function in the psych library, using the minimum residual (minres) solution and a promax oblique rotation. Second-order factor analysis also used the minres solution but a varimax rotation, since I wanted these factors to be orthogonal. I computed the loadings on the original variables for the second-order factors using the method described by Gorsuch
<xref ref-type="bibr" rid="pone.0018657-Gorsuch1">[51]</xref>
.</p>
<p>Before computing the factor scores for the original dataset, missing values were imputed through Gibbs sampling with two iterations through the mice library.</p>
<p>Using this complete dataset, I computed scores for each of the datapoints onto all of the first and second-order factors using Bartlett's algorithm as extracted from the factanal function. I submitted these factor scores to a logistic regression using the lrm function in the rms package. Continuous variables were modeled as cubic splines with 4 knots using the rcs function from the rms package, and all two-way interactions were explored.</p>
<p>Finally, hierarchical supervised clustering on the datapoints was performed to learn which factors were most predictive and then estimated the data sharing prevalence in a contingency table of these two clusters split at their medians.</p>
</sec>
</sec>
<sec id="s3">
<title>Results</title>
<p>Full-text queries for articles describing the creation of gene expression microarray datasets returned PubMed identifiers for 11,603 studies.</p>
<p>MEDLINE fields were still “in process” for 512 records, resulting in missing data for MeSH-derived variables. Impact factors were found for all but 1,001 articles. Journal policy variables were missing for 4,107 articles. The institution ranking attributes were missing for 6,185. I cross-referenced NIH grant details for 3,064 studies (some grant numbers could not be parsed, because they were incomplete or strangely formatted). I was able to determine the gender of the first and last authors, based on the forenames in the MEDLINE record, for all but 2,841 first authors and 2,790 last authors. All but 1,765 first authors and 797 last authors were found to have a publication history in the 2009 Author-ity clusters.</p>
<p>PubMed identifiers were found in the “primary citation” field of dataset records in GEO or ArrayExpress for 2,901 of the 11,603 articles in this dataset, indicating that 25% (95% confidence intervals: 24% to 26%) of the studies deposited their data in GEO or ArrayExpress and completed the citation fields with the primary article PubMed identifier. This is my estimate for the prevalence of gene expression microarray data deposited into the two predominant, centralized, publicly accessible databases.</p>
<p>This data-sharing rate increased with each subsequent article publication year, as seen in
<xref ref-type="fig" rid="pone-0018657-g001">Figure 1</xref>
, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for the sensitivity of my automated method for detecting open data anywhere on the internet (about 77%
<xref ref-type="bibr" rid="pone.0018657-Piwowar3">[44]</xref>
), it could be estimated that approximately 45% (0.35/0.77) of recent gene expression studies have made their data publicly available.</p>
<fig id="pone-0018657-g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Proportion of articles with shared datasets, by year (error bars are 95% confidence intervals of the proportions).</title>
</caption>
<graphic xlink:href="pone.0018657.g001"></graphic>
</fig>
<p>The data-sharing rate also varied across journals.
<xref ref-type="fig" rid="pone-0018657-g002">Figure 2</xref>
shows the data-sharing rate across the 50 journals with the most articles in this study. Many of the other attributes were also associated with the prevalence of data sharing in univariate analysis, as seen in
<xref ref-type="supplementary-material" rid="pone.0018657.s001">Figures S1</xref>
,
<xref ref-type="supplementary-material" rid="pone.0018657.s002">S2</xref>
,
<xref ref-type="supplementary-material" rid="pone.0018657.s003">S3</xref>
,
<xref ref-type="supplementary-material" rid="pone.0018657.s004">S4</xref>
,
<xref ref-type="supplementary-material" rid="pone.0018657.s005">S5</xref>
.</p>
<fig id="pone-0018657-g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Proportion of articles with shared datasets, by journal (error bars are 95% confidence intervals of the proportions).</title>
</caption>
<graphic xlink:href="pone.0018657.g002"></graphic>
</fig>
<sec id="s3a">
<title>First-order factors</title>
<p>I tried to use a scree plot to determine the optimal number of factors for first-order analysis. Since the scree plot did not have a clear drop-off, I experimented with a range of factor counts near the optimal coordinates index (as calculated by nScree in the nFactors R-project library) and finalized on 15 factors. The correlation matrix was not sufficiently well-behaved for maximum-likelihood factor analysis, so I used a minimum residual (minres) solution. I chose to rotate the factors with the promax oblique algorithm, because first-order factors were expected to have significant correlations with one another. The rotated first-order factors are given in
<xref ref-type="table" rid="pone-0018657-t001">Table 1</xref>
with loadings larger than 0.4 or less than −0.4. Some of the loadings are greater than one. This is not unexpected since the factors are oblique and thus the loadings in the pattern matrix represent regression coefficients rather than correlations. Correlations between attributes and the first-order factors are given in the structure matrix in
<xref ref-type="supplementary-material" rid="pone.0018657.s006">Table S1</xref>
. The factors have been named based on the variables they load most heavily, using abbreviations for publishing in an Open Access journal (OA) and previously depositing data in the Gene Expression Omnibus (GEO) or ArrayExpress (AE) databases.</p>
<table-wrap id="pone-0018657-t001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.t001</object-id>
<label>Table 1</label>
<caption>
<title>First-order factor loadings.</title>
</caption>
<alternatives>
<graphic id="pone-0018657-t001-1" xlink:href="pone.0018657.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Large NIH grant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.97 num.post2005.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.96 num.post2005.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.92 num.post2004.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.91 num.post2004.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.91 num.post2005.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.89 num.post2006.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.89 num.post2006.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.86 num.post2004.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.85 num.post2006.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.84 num.post2003.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.84 num.post2003.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.80 num.post2003.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.74 has.U.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.71 has.P.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.58 nih.sum.avg.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.56 nih.sum.sum.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.44 nih.max.max.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Has journal policy</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.00 journal.policy.contains..geo.omnibus</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.95 journal.policy.at.least.requests.sharing.array</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.95 journal.policy.mentions.any.sharing</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.93 journal.policy.contains.word.microarray</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.91 journal.policy.requests.sharing.other.data</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.85 journal.policy.says.must.deposit</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.83 journal.policy.contains.word.arrayexpress</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.72 journal.policy.requires.microarray.accession</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.71 journal.policy.requests.accession</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.58 journal.policy.contains.word.miame.mged</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.48 journal.microarray.creating.count.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.45 journal.policy.mentions.consequences</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.42 journal.policy.general.statement</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">NOT institution NCI or intramural</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.59 pubmed.is.funded.non.us.govt</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.55 institution.is.higher.ed</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.89 institution.nci</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.86 pubmed.is.funded.nih.intramural</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.42 country.usa</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Count of R01 & other NIH grants</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.15 has.R01.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">1.14 has.R.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.89 num.grants.via.nih.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.86 nih.cumulative.years.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.82 num.grant.numbers.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.80 max.grant.duration.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.66 pubmed.is.funded.nih</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.50 nih.max.max.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.45 num.nih.is.nigms.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.44 country.usa</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.42 has.T.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.41 num.nih.is.niaid.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Journal impact</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.88 journal.5yr.impact.factor.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.88 journal.impact.factor.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.85 journal.immediacy.index.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.70 journal.policy.mentions.exceptions</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.54 journal.num.articles.2008.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.51 journal.policy.contains.word.miame.mged</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.61 journal.policy.contains.word.arrayexpress</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.48 pubmed.is.open.access</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Last author num prev pubs & first year pub</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.84 last.author.num.prev.pubs.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.74 last.author.year.first.pub.ago.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.73 last.author.num.prev.pmc.cites.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.68 last.author.num.prev.other.sharing.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.48 country.japan</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.44 last.author.num.prev.microarray.creations.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Journal policy consequences & long half-life</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.78 journal.policy.mentions.consequences</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.73 journal.cited.halflife</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.60 pubmed.is.bacteria</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.42 journal.policy.requires.microarray.accession</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.54 pubmed.is.open.access</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.45 journal.policy.general.statement</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Institution high citations & collaboration</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.76 institution.mean.norm.citation.score</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.72 institution.international.collaboration</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.64 institution.mean.norm.impact.factor</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.41 country.germany</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.67 country.china</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.61 country.korea</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.56 last.author.gender.not.found</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.43 country.japan</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">NO geo reuse & YES high institution output</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.66 institution.research.output.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.58 institution.harvard</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.46 has.K.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.42 institution.stanford</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.79 pubmed.is.geo.reuse</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.62 country.australia</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.46 institution.rank</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">NOT animals or mice</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.51 pubmed.is.humans</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.43 pubmed.is.diagnosis</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 pubmed.is.effectiveness</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.93 pubmed.is.animals</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.86 pubmed.is.mice</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Humans & cancer</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.84 pubmed.is.humans</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.75 pubmed.is.cancer</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.67 pubmed.is.cultured.cells</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.52 institution.is.medical</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.47 pubmed.is.core.clinical.journal</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.68 pubmed.is.plants</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.49 pubmed.is.fungi</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Institution is government & NOT higher ed</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.92 institution.is.govnt</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.70 country.germany</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.65 country.france</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.46 institution.international.collaboration</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.78 institution.is.higher.ed</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.56 country.canada</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.51 institution.stanford</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.42 institution.is.medical</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">NO K funding or P funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.56 has.R01.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.49 has.R.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.41 num.post2006.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.41 num.post2006.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 num.post2006.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.65 has.K.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.63 has.P.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Authors prev GEOAE sharing & OA & arry creation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.83 last.author.num.prev.geoae.sharing.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.74 last.author.num.prev.microarray.creations.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.73 last.author.num.prev.oa.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.60 first.author.num.prev.geoae.sharing.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.47 first.author.num.prev.oa.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.46 first.author.num.prev.microarray.creations.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 institution.stanford</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.44 years.ago.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">First author num prev pubs & first year pub</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.83 first.author.num.prev.pubs.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.77 first.author.year.first.pub.ago.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.73 first.author.num.prev.pmc.cites.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.52 first.author.num.prev.other.sharing.tr</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>After imputing missing values, I calculated scores for each of the 15 factors for each of the 11,603 data collection studies.</p>
<p>Many of the factor scores demonstrated a correlation with frequency of data sharing in univariate analysis, as seen in
<xref ref-type="fig" rid="pone-0018657-g003">Figure 3</xref>
. Several factors seemed to have a linear relationship with data sharing across their whole range. For example, whereas the data sharing rate was relatively low for studies with the lowest scores on the factor related to the citation and collaboration rate of the corresponding author's institution (in
<xref ref-type="fig" rid="pone-0018657-g003">Figure 3</xref>
, the first row under the heading “Institution high citation & collaboration”), the data sharing rate was higher for studies that scored within the 25
<sup>th</sup>
to 50
<sup>th</sup>
percentile on that factor, higher still for studies the third quartile, and studies from highly-cited institutions, above the 75
<sup>th</sup>
percentile had a relatively high rate of data sharing. A trend in the opposite direction can be seen for the factor “Humans & cancer”: the higher a study scored on that factor, the less likely it was to have shared its data.</p>
<fig id="pone-0018657-g003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g003</object-id>
<label>Figure 3</label>
<caption>
<title>Association between shared data and first-order factors.</title>
<p>Percentage of studies with shared data is shown for each quartile for each factor. Univariate analysis.</p>
</caption>
<graphic xlink:href="pone.0018657.g003"></graphic>
</fig>
<p>Most of these factors were significantly associated with data-sharing behavior in a multivariate logistic regression: p = 0.18 for “Large NIH grant”, p<0.05 for “No GEO reuse & YES high institution output” and “No K funding or P funding”, and p<0.005 for the other first-order factors. The increase in the odds of data sharing is illustrated in
<xref ref-type="fig" rid="pone-0018657-g004">Figure 4</xref>
as scores on each factor in the model are moved from their 25
<sup>th</sup>
percentile value to their 75
<sup>th</sup>
percentile value.</p>
<fig id="pone-0018657-g004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g004</object-id>
<label>Figure 4</label>
<caption>
<title>Odds ratios of data sharing for first-order factor, multivariate model.</title>
<p>Odd ratios are calculated as factor scores are each varied from their 25
<sup>th</sup>
percentile value to their 75
<sup>th</sup>
percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.</p>
</caption>
<graphic xlink:href="pone.0018657.g004"></graphic>
</fig>
</sec>
<sec id="s3b">
<title>Second-order factors</title>
<p>The heavy correlations between the first-order factors suggested that second-order factors may be illuminating. Scree plot analysis of the correlations between the first-order factors suggested a solution containing five second-order factors. I calculated the factors using a “varimax” rotation to find orthogonal factors. Loadings on the first-order factors are given in
<xref ref-type="table" rid="pone-0018657-t002">Table 2</xref>
.</p>
<table-wrap id="pone-0018657-t002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.t002</object-id>
<label>Table 2</label>
<caption>
<title>Second-order factor loadings, by first-order factors.</title>
</caption>
<alternatives>
<graphic id="pone-0018657-t002-2" xlink:href="pone.0018657.t002"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Amount of NIH funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.89 Count of R01 & other NIH grants</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.49 Large NIH grant</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.55 NO K funding or P funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Cancer & humans</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.83 Humans & cancer</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">OA journal & previous GEO-AE sharing</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.59 Authors prev GEOAE sharing & OA & microarray creation</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.43 Institution high citations & collaboration</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 First author num prev pubs & first year pub</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.36 Last author num prev pubs & first year pub</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Journal impact factor and policy</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.57 Journal impact</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.51 Last author num prev pubs & first year pub</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Higher Ed in USA</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 NO geo reuse+YES high institution output</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.44 Institution is government & NOT higher ed</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>Since interactions make these second-order variables slightly difficult to interpret, I followed the method explained by Gorsuch
<xref ref-type="bibr" rid="pone.0018657-Gorsuch1">[51]</xref>
to calculate the loadings of the second-order variables directly on the original variables. The results are listed in
<xref ref-type="table" rid="pone-0018657-t003">Table 3</xref>
. I named the second-order factors based on the loadings on the original variables.</p>
<table-wrap id="pone-0018657-t003" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.t003</object-id>
<label>Table 3</label>
<caption>
<title>Second-order factor loadings, by original variables.</title>
</caption>
<alternatives>
<graphic id="pone-0018657-t003-3" xlink:href="pone.0018657.t003"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
</colgroup>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Amount of NIH funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.87 nih.cumulative.years.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.85 num.grants.via.nih.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.84 max.grant.duration.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.82 num.grant.numbers.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.80 pubmed.is.funded.nih</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.79 nih.max.max.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.70 nih.sum.avg.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.70 nih.sum.sum.dollars.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.59 has.R.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.59 num.post2003.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.58 country.usa</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.58 has.U.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.57 has.R01.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.55 num.post2003.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.53 has.T.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.53 num.post2003.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.49 num.post2004.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.45 num.post2004.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.44 has.P.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.43 num.post2004.morethan1000k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.43 num.nih.is.nci.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.35 num.post2005.morethan500k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.32 num.nih.is.nigms.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 num.post2005.morethan750k.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Cancer & humans</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.60 pubmed.is.cancer</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.59 pubmed.is.humans</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.52 pubmed.is.cultured.cells</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.43 pubmed.is.core.clinical.journal</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.39 institution.is.medical</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.58 pubmed.is.plants</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.50 pubmed.is.fungi</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.37 pubmed.is.shared.other</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.30 pubmed.is.bacteria</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">OA journal & previous GEO-AE sharing</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 first.author.num.prev.geoae.sharing.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.37 pubmed.is.open.access</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.37 first.author.num.prev.oa.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.35 last.author.num.prev.geoae.sharing.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.32 pubmed.is.effectiveness</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.32 last.author.num.prev.oa.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 pubmed.is.geo.reuse</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.38 country.japan</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Journal impact factor and policy</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.48 journal.impact.factor.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.47 jour.policy.requires.microarray.accession</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.46 jour.policy.mentions.exceptions</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.46 pubmed.num.cites.from.pmc.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.45 journal.5yr.impact.factor.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.45 jour.policy.contains.word.miame.mged</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.42 last.author.num.prev.pmc.cites.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.41 jour.policy.requests.accession</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 journal.immediacy.index.log</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.40 journal.num.articles.2008.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.39 years.ago.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.36 jour.policy.says.must.deposit</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.35 pubmed.num.cites.from.pmc.per.year</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.33 institution.mean.norm.citation.score</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.32 last.author.year.first.pub.ago.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 country.usa</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 last.author.num.prev.pubs.tr</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.31 jour.policy.contains.word.microarray</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.31 pubmed.is.open.access</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Higher Ed in USA</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.36 institution.stanford</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.36 institution.is.higher.ed</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.35 country.usa</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.35 has.R.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.33 has.R01.funding</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">0.30 institution.harvard</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">−0.37 institution.is.govnt</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<p>I then calculated factor scores for each of these second-order factors using the original attributes of the 11,603 datapoints. In univariate analysis, scores on several of the five factors showed a clear linear relationship with data sharing frequency, as illustrated in
<xref ref-type="fig" rid="pone-0018657-g005">Figure 5</xref>
.</p>
<fig id="pone-0018657-g005" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g005</object-id>
<label>Figure 5</label>
<caption>
<title>Association between shared data and second-order factors.</title>
<p>Percentage of studies with shared data is shown for each quartile for each factor. Univariate analysis.</p>
</caption>
<graphic xlink:href="pone.0018657.g005"></graphic>
</fig>
<p>All five of the second-order factors were associated with data sharing in multivariate logistic regression, p<0.001.The increase in odds of data sharing is illustrated in
<xref ref-type="fig" rid="pone-0018657-g006">Figure 6</xref>
as each factor in the model is moved from its 25
<sup>th</sup>
percentile value to its 75
<sup>th</sup>
percentile value.</p>
<fig id="pone-0018657-g006" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.g006</object-id>
<label>Figure 6</label>
<caption>
<title>Odds ratios of data sharing for second-order factor, multivariate model.</title>
<p>Odd ratios are calculated as factor scores are each varied from their 25
<sup>th</sup>
percentile value to their 75
<sup>th</sup>
percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.</p>
</caption>
<graphic xlink:href="pone.0018657.g006"></graphic>
</fig>
<p>Finally, to understand which of these factors was most predictive of data sharing behaviour, I performed supervised hierarchical clustering using the second-order factors. Splits on “OA journal & previous GEO-AE sharing” and “Cancer & Humans” were clearly the most informative, so I simply split these two factors at their medians and looked at the data sharing prevalence. As shown in
<xref ref-type="table" rid="pone-0018657-t004">Table 4</xref>
, studies that scored high on the “OA journal & previous GEO-AE sharing” factor and low on the “Cancer & Humans” factor were almost three times as likely to share their data as a “Cancer & Humans” study published without a strong “OA journal & previous GEO-AE sharing” environment.</p>
<table-wrap id="pone-0018657-t004" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0018657.t004</object-id>
<label>Table 4</label>
<caption>
<title>Data sharing prevalence of subgroups divided at medians of two second-order factors [95% confidence intervals].</title>
</caption>
<alternatives>
<graphic id="pone-0018657-t004-4" xlink:href="pone.0018657.t004"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">number of studies with shared data/ number of studies</td>
<td align="left" rowspan="1" colspan="1">Above the median value for the factor “Cancer & Humans”</td>
<td align="left" rowspan="1" colspan="1">Below the median value for the factor “Cancer & Humans”</td>
<td align="left" rowspan="1" colspan="1">Total</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Above the median value for the factor “OA and previous GEO-AE sharing”</bold>
</td>
<td align="left" rowspan="1" colspan="1">629/2624 = 24% [22%, 26%]</td>
<td align="left" rowspan="1" colspan="1">1184/3178 = 
<bold>37% [36%, 39%]</bold>
</td>
<td align="left" rowspan="1" colspan="1">1813/5802 = 31% [30%, 32%]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Below the median value for the factor “OA and previous GEO-AE sharing”</bold>
</td>
<td align="left" rowspan="1" colspan="1">440/3178 = 
<bold>14% [13%, 15%]</bold>
</td>
<td align="left" rowspan="1" colspan="1">648/2623 = 25% [23%, 26%]</td>
<td align="left" rowspan="1" colspan="1">1088/5801 = 19% [18%, 20%]</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Total</bold>
</td>
<td align="left" rowspan="1" colspan="1">1069/5802 = 18% [17%, 19%]</td>
<td align="left" rowspan="1" colspan="1">1832/5801 = 32% [30%, 33%]</td>
<td align="left" rowspan="1" colspan="1">
<bold>2901/11603 = 25% [24%, 26%]</bold>
</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<title>Discussion</title>
<p>This study explored the association between attributes of a published experiment and the probability that its raw dataset was shared in a publicly accessible database. I found that 25% of studies that performed gene expression microarray experiments have deposited their raw research data in a primary public repository. The proportion of studies that shared their gene expression datasets increased over time, from less than 5% in early years, before mature standards and repositories, to 30%–35% in 2007–2009. This suggests that perhaps 45% of recent gene expression studies have made their data available somewhere on the internet, after accounting for datasets overlooked by the automated methods of discovery
<xref ref-type="bibr" rid="pone.0018657-Piwowar3">[44]</xref>
. This estimate is consistent with a previous manual inventory
<xref ref-type="bibr" rid="pone.0018657-Ochsner1">[15]</xref>
.</p>
<p>Many factors derived from an experiment's topic, impact, funding, publishing, institutional, and authorship environments were associated with the probability of data sharing. In particular, authors publishing in an open access journal, or with a history of sharing and reusing shared gene expression microarray data, were most likely to share their data, and those studying cancer or human subjects were least likely to share.</p>
<p>It is disheartening to discover that human and cancer studies have particularly low rates of data sharing. These data are surely some of the most valuable for reuse, to confirm, refute, inform and advance bench-to-bedside translational research
<xref ref-type="bibr" rid="pone.0018657-Vickers1">[52]</xref>
Further studies are required to understand the interplay of an investigator's motivation, opportunity, and ability to share their raw datasets
<xref ref-type="bibr" rid="pone.0018657-Siemsen1">[53]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Tucker1">[54]</xref>
. In the mean time, we can make some guesses: As is appropriate, concerns about privacy of human subjects' data undoubtedly affect a researcher's willingness and ability (perceived or actual) to share raw study data. I do not presume to recommend a proper balance between privacy and the societal benefit of data sharing, but I will emphasize that researchers should assess the degree of re-identification risk on a study-by-study basis
<xref ref-type="bibr" rid="pone.0018657-Malin1">[55]</xref>
, evaluate the risks and benefits across the wide range of stakeholder interests
<xref ref-type="bibr" rid="pone.0018657-Foster1">[56]</xref>
, and consider an ethical framework to make these difficult decisions
<xref ref-type="bibr" rid="pone.0018657-Navarro1">[57]</xref>
. Learning how to make these decisions well is difficult: it is vital that we educate and mentor both new and experienced researchers in best practices. Given the low risk of re-identification through gene expression microarray data (illustrated by its inclusion in the Open-Access Data Tier at
<ext-link ext-link-type="uri" xlink:href="http://target.cancer.gov/dataportal/access/policy.asp">http://target.cancer.gov/dataportal/access/policy.asp</ext-link>
), data-sharing rates could also be low for reasons other than privacy. Cancer researchers may perceive their field as particularly competitive, or cancer studies may have relatively strong links to industry – two attributes previously associated with data withholding
<xref ref-type="bibr" rid="pone.0018657-Blumenthal2">[58]</xref>
,
<xref ref-type="bibr" rid="pone.0018657-Vogeli1">[59]</xref>
.</p>
<p>NIH funding levels were associated with increased prevalence of data sharing, though the overall probability of sharing remains low even in well-funded studies. Data sharing was infrequent even in studies funded by grants clearly covered by the NIH Data Sharing Policy, such as those that received more than one million dollars per year and were awarded or renewed since 2006. This result is consistent with reports that the NIH Data Sharing Policy is often not taken seriously because compliance is not enforced
<xref ref-type="bibr" rid="pone.0018657-Tucker1">[54]</xref>
. It is surprising how infrequently the NIH Data Sharing Policy applies to gene expression microarray studies (19% as per a pilot to this study
<xref ref-type="bibr" rid="pone.0018657-Piwowar4">[60]</xref>
). The NIH may address these issues soon within its renewed commitment to make data more available
<xref ref-type="bibr" rid="pone.0018657-Wellcome1">[61]</xref>
.</p>
<p>I am intrigued that publishing in an open access journal, previously sharing gene expression data, and previously reusing gene expression data were associated with data sharing outcomes. More research is required to understand the drivers behind this association. Does the factor represent an attitude towards “openness” by the decision-making authors? Does the act of sharing data lower the perceived effort of sharing data again? Does it dispel fears induced by possible negative outcomes from sharing data? To what extent does recognizing the value of shared data through data reuse motivate an author to share his or her own datasets?</p>
<p>People often wonder whether the attitude towards data sharing varies with age. Although I was not able to capture author age, I did estimate the number of years since first and last authors had published their first paper. The analysis suggests that first authors with many years in the field are less likely to share data than those with fewer years of experience, but no such association was found for last authors. More work is needed to confirm this finding given the confounding factor of previous data-sharing experience.</p>
<p>Gene expression publications associated with Stanford University have a very high level of data sharing. The true level of open data archiving is actually much higher than that reflected in this study: Stanford University hosts a public microarray repository, and many articles that did not have a dataset link from GEO or ArrayExpress do mention submission to the Stanford Microarray Database. If one were looking for a community on which to model best practices for data sharing adoption, Stanford would be a great place to start.</p>
<p>Similarly,
<italic>Physiological Genomics</italic>
has very high rates of public archiving relative to other journals. Perhaps not coincidentally, to my knowledge
<italic>Physiological Genomics</italic>
is the only journal to have published an evaluation of their author's attitudes and experiences following the adoption of new data archiving requirements for gene expression microarray data
<xref ref-type="bibr" rid="pone.0018657-Ventura1">[21]</xref>
.</p>
<p>Analyzing data sharing through bibliometric and data-mining attributes has several advantages: We can look at a very large set of studies and attributes, our results are not biased by survey response self-selection or reporting bias, and the analysis can be repeated over time with little additional effort.</p>
<p>However, this approach does suffer its own limitations. Filters for identifying microarray creation studies do not have perfect precision, so some non-data-creation studies may be included in the analysis. Because studies that do not create data will not have data deposits, their inclusion alters the composition of what I consider to be studies that create but do not share data. Furthermore, my method for detecting data deposits overlooks data deposits that are missing PubMed identifiers in GEO and ArrayExpress, so the dataset misclassifies some studies that did in fact share their data in these repositories.</p>
<p>I made decisions to facilitate analysis, such as assuming that PubMed identifiers were monotonically increasing with publication date and using the current journal data-sharing policy as a surrogate for the data-sharing policy in place when papers were published. These decisions may have introduced errors.</p>
<p>Missing data may have obscured important information. For example, articles published in journals with policies that I did not examine had a lower rate of data sharing than articles published in journals whose “Instructions to Authors” policies I did quantify. It is likely that a more comprehensive analysis of journal data-sharing policies would provide additional insight. Similarly, the information on funders was limited: I only included funding data on NIH grants. Inclusion of more funders would help us understand the general role of funder policy and funding levels.</p>
<p>Previous work
<xref ref-type="bibr" rid="pone.0018657-Blumenthal2">[58]</xref>
found that investigator gender was correlated with data withholding. It is important to look at gender in multivariate analysis since male scientists are more likely than women to have large NIH grants
<xref ref-type="bibr" rid="pone.0018657-Hosek1">[62]</xref>
. Because gender did not contribute heavily to any of the derived factors in this study, additional analysis will be necessary to investigate its association with data sharing behaviour in this dataset. It should be noted that the source of gender data has limitations. The Baby Name Guesser algorithm empirically estimates gender by analyzing popular usage on the internet. Although coverage across names from diverse ethnicities seems quite good, the algorithm is relatively unsuccessful in determining the gender of Asian names. This may have confounded the gender analysis, and the “gender not found” variable might have served as an unexpected proxy for author ethnicity.</p>
<p>The Author-ity system provides accurate author publication histories: A previous evaluation on a different sample found that only 0.5% of publication histories erroneously included more than one author, and about 2% of clusters contained a partial inventory of an author's publication history due to splitting a given author across multiple clusters
<xref ref-type="bibr" rid="pone.0018657-Torvik1">[46]</xref>
. However, because the lumping does not occur randomly, my attributes based on author publication histories may have included some bias. For example, the documented tendency of Author-ity to erroneously lump common Japanese names
<xref ref-type="bibr" rid="pone.0018657-Torvik1">[46]</xref>
may have confounded the author-history variables with author-ethnicity and thereby influenced the findings on first-author age and experience.</p>
<p>In previous work I used h-index and a-index metrics to measure “author experience” for both the first and last author
<xref ref-type="bibr" rid="pone.0018657-Piwowar4">[60]</xref>
(in biomedicine, customarily, the first and last authors make the largest contributions to a study and have the most power in publication decisions). A recent paper
<xref ref-type="bibr" rid="pone.0018657-Bornmann1">[63]</xref>
suggests that a raw count of number of papers and number of citations is functionally equivalent to the h-index and a-index, so I used the raw counts in this study for computational simplicity. Reliance on citations from PubMed Central (to enable scripted data collection) meant that older studies and those published in areas less well represented in PubMed Central were characterized by an artificially low citation count.</p>
<p>The large sample of 11,603 studies captured a fairly diverse and representative subset of gene expression microarray studies, though it is possible that gene expression microarray studies missed by the full-text filter differed in significant ways from those that used mainstream vocabulary to describe their wetlab methods. Selecting a sample based on queries of non-subscription full-text content may have introduced a slight bias towards open access journals. It is worth noting that this study demonstrates the value of open access and open full-text resources for research evaluation.</p>
<p>In regression studies it is important to remember that associations do not imply causation. It is possible, for example, that receiving a high level of NIH funding and deciding to share data are not causally related, but rather result from the exposure and excitement inherent in a “hot” subfield of study.</p>
<p>Importantly, this study did not consider directed sharing, such as peer-to-peer data exchange or sharing within a defined collaboration network, and thus underestimates the amount of data sharing in all its forms. Furthermore, this study underestimated public sharing of gene expression data on the Internet. It did not recognize data listed in journal supplementary information, on lab or personal web sites, or in institutional or specialized repositories (including the well-regarded and well-populated Stanford Microarray Database). Finally, the study methods did not recognize deposits into the Gene Expression Omnibus or ArrayExpress unless the database entry was accompanied by a citation to the research paper, complete with PubMed identifier.</p>
<p>Due to these limitations, care should be taken in interpreting the estimated levels of absolute data sharing and the data-sharing status of any particular study listed in the raw data. More research is needed to attain a deep understanding of information behaviour around research data sharing, its costs and benefits to science, society and individual investigators, and what makes for effective policy.</p>
<p>That said, the results presented here argue for action. Even in a field with mature policies, repositories and standards, research data sharing levels are low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing and work to embrace the full potential of our research output.</p>
<sec id="s4a">
<title>Availability of dataset, statistical scripts, and data collection source code</title>
<p>Raw data and statistical scripts are available in the Dryad data repository at
<underline>doi:10.5061/dryad.mf1sd</underline>
<xref ref-type="bibr" rid="pone.0018657-Piwowar5">[64]</xref>
. Data collection source code is available at
<ext-link ext-link-type="uri" xlink:href="http://github.com/hpiwowar/pypub">http://github.com/hpiwowar/pypub</ext-link>
.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s5">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0018657.s001">
<label>Figure S1</label>
<caption>
<p>
<bold>Associations between shared data and author attribute variables.</bold>
Percentage of studies with shared data is shown for each quartile for continuous variables.</p>
<p>(EPS)</p>
</caption>
<media xlink:href="pone.0018657.s001.eps" mimetype="application" mime-subtype="postscript">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0018657.s002">
<label>Figure S2</label>
<caption>
<p>
<bold>Associations between shared data and journal attribute variables.</bold>
Percentage of studies with shared data is shown for each quartile for continuous variables.</p>
<p>(EPS)</p>
</caption>
<media xlink:href="pone.0018657.s002.eps" mimetype="application" mime-subtype="postscript">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0018657.s003">
<label>Figure S3</label>
<caption>
<p>
<bold>Associations between shared data and study attribute variables.</bold>
Percentage of studies with shared data is shown for each quartile for continuous variables.</p>
<p>(EPS)</p>
</caption>
<media xlink:href="pone.0018657.s003.eps" mimetype="application" mime-subtype="postscript">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0018657.s004">
<label>Figure S4</label>
<caption>
<p>
<bold>Associations between shared data and funding attribute variables.</bold>
Percentage of studies with shared data is shown for each quartile for continuous variables.</p>
<p>(EPS)</p>
</caption>
<media xlink:href="pone.0018657.s004.eps" mimetype="application" mime-subtype="postscript">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0018657.s005">
<label>Figure S5</label>
<caption>
<p>
<bold>Associations between shared data and country and institution attribute variables.</bold>
Percentage of studies with shared data is shown for each quartile for continuous variables.</p>
<p>(EPS)</p>
</caption>
<media xlink:href="pone.0018657.s005.eps" mimetype="application" mime-subtype="postscript">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0018657.s006">
<label>Table S1</label>
<caption>
<p>
<bold>Structure matrix with correlations between all attributes and first-order factors.</bold>
</p>
<p>(TXT)</p>
</caption>
<media xlink:href="pone.0018657.s006.txt" mimetype="text" mime-subtype="plain">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>I sincerely thank my doctoral dissertation advisor, Dr. Wendy Chapman, for her support and feedback.</p>
</ack>
<fn-group>
<fn fn-type="conflict">
<p>
<bold>Competing Interests: </bold>
The author has declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure">
<p>
<bold>Funding: </bold>
This work was completed in the Department of Biomedical Informatics at the University of Pittsburgh and partially funded by an NIH National Library of Medicine training grant 5 T15 LM007059. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="pone.0018657-McCain1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCain</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>Mandating Sharing: Journal Policies in the Natural Sciences.</article-title>
<source>Science Communication</source>
<volume>16</volume>
<fpage>403</fpage>
<lpage>431</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Piwowar1">
<label>2</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Piwowar</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2008</year>
<source>A review of journal policies for sharing research data</source>
<publisher-name>ELPUB, Toronto Canada</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-National1">
<label>3</label>
<element-citation publication-type="other">
<collab>National Institutes of Health (NIH)</collab>
<year>2003</year>
<comment>NOT-OD-03-032: Final NIH Statement on Sharing Research Data</comment>
</element-citation>
</ref>
<ref id="pone.0018657-National2">
<label>4</label>
<element-citation publication-type="other">
<collab>National Institutes of Health (NIH)</collab>
<year>2007</year>
<comment>NOT-OD-08-013: Implementation Guidance and Instructions for Applicants: Policy for Sharing of Data Obtained in NIH-Supported or Conducted Genome-Wide Association Studies (GWAS)</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Nation1">
<label>5</label>
<element-citation publication-type="other">
<collab>Nation Science Foundation (NSF)</collab>
<year>2011</year>
<comment>Grant Proposal Guide, Chapter II.C.2.j</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Fienberg1">
<label>6</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Fienberg</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Martin</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Straf</surname>
<given-names>ML</given-names>
</name>
</person-group>
<year>1985</year>
<source>Sharing research data</source>
<publisher-name>National Academy Press</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Cech1">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cech</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Eisenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Hersey</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Holtzman</surname>
<given-names>SH</given-names>
</name>
<etal></etal>
</person-group>
<year>2003</year>
<article-title>Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences.</article-title>
<source>Plant Physiol</source>
<volume>132</volume>
<fpage>19</fpage>
<lpage>24</lpage>
<pub-id pub-id-type="pmid">12746507</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Time1">
<label>8</label>
<element-citation publication-type="journal">
<year>2007</year>
<article-title>Time for leadership.</article-title>
<source>Nat Biotech</source>
<volume>25</volume>
<fpage>821</fpage>
<lpage>821</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Got1">
<label>9</label>
<element-citation publication-type="journal">
<year>2007</year>
<article-title>Got data?</article-title>
<source>Nat Neurosci</source>
<volume>10</volume>
<fpage>931</fpage>
<lpage>931</lpage>
<pub-id pub-id-type="pmid">17657230</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Kakazu1">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kakazu</surname>
<given-names>KK</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>LW</given-names>
</name>
<name>
<surname>Lynne</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>The Cancer Biomedical Informatics Grid (caBIG): pioneering an expansive network of information and tools for collaborative cancer research.</article-title>
<source>Hawaii Med J</source>
<volume>63</volume>
<fpage>273</fpage>
<lpage>275</lpage>
<pub-id pub-id-type="pmid">15540527</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-The1">
<label>11</label>
<element-citation publication-type="journal">
<collab>The GAIN Collaborative Research Group</collab>
<year>2007</year>
<article-title>New models of collaboration in genome-wide association studies: the Genetic Association Information Network.</article-title>
<source>Nat Genet</source>
<volume>39</volume>
<fpage>1045</fpage>
<lpage>1051</lpage>
<pub-id pub-id-type="pmid">17728769</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Brazma1">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brazma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hingamp</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Quackenbush</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sherlock</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Spellman</surname>
<given-names>P</given-names>
</name>
<etal></etal>
</person-group>
<year>2001</year>
<article-title>Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.</article-title>
<source>Nat Genet</source>
<volume>29</volume>
<fpage>365</fpage>
<lpage>371</lpage>
<pub-id pub-id-type="pmid">11726920</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Barrett1">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barrett</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Troup</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wilhite</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ledoux</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rudnev</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>NCBI GEO: mining tens of millions of expression profiles–database and tools update.</article-title>
<source>Nucleic Acids Res</source>
<volume>35</volume>
</element-citation>
</ref>
<ref id="pone.0018657-Noor1">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noor</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Zimmerman</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Teeter</surname>
<given-names>KC</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Data Sharing: How Much Doesn't Get Submitted to GenBank?</article-title>
<source>PLoS Biol</source>
<volume>4</volume>
<fpage>e228</fpage>
<pub-id pub-id-type="pmid">16822095</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Ochsner1">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ochsner</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Steffen</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Stoeckert</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>McKenna</surname>
<given-names>NJ</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Much room for improvement in deposition rates of expression microarray datasets.</article-title>
<source>Nature Methods</source>
<volume>5</volume>
<fpage>991</fpage>
<pub-id pub-id-type="pmid">19034265</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Reidpath1">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reidpath</surname>
<given-names>DD</given-names>
</name>
<name>
<surname>Allotey</surname>
<given-names>PA</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Data sharing in medical research: an empirical investigation.</article-title>
<source>Bioethics</source>
<volume>15</volume>
<fpage>125</fpage>
<lpage>134</lpage>
<pub-id pub-id-type="pmid">11697377</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Kyzas1">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kyzas</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Loizou</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ioannidis</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Selective reporting biases in cancer prognostic factor studies.</article-title>
<source>J Natl Cancer Inst</source>
<volume>97</volume>
<fpage>1043</fpage>
<lpage>1055</lpage>
<pub-id pub-id-type="pmid">16030302</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Blumenthal1">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blumenthal</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Gokhale</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yucel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Clarridge</surname>
<given-names>B</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Data withholding in genetics and the other life sciences: prevalences and predictors.</article-title>
<source>Acad Med</source>
<volume>81</volume>
<fpage>137</fpage>
<lpage>145</lpage>
<pub-id pub-id-type="pmid">16436574</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Campbell1">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Campbell</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Clarridge</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Gokhale</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Birenbaum</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hilgartner</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Data withholding in academic genetics: evidence from a national survey.</article-title>
<source>JAMA</source>
<volume>287</volume>
<fpage>473</fpage>
<lpage>480</lpage>
<pub-id pub-id-type="pmid">11798369</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Hedstrom1">
<label>20</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Hedstrom</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Producing Archive-Ready Datasets: Compliance, Incentives, and Motivation.</article-title>
<comment>IASSIST Conference 2006: Presentations</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Ventura1">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ventura</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Mandatory submission of microarray data to public repositories: how is it working?</article-title>
<source>Physiol Genomics</source>
<volume>20</volume>
<fpage>153</fpage>
<lpage>156</lpage>
<pub-id pub-id-type="pmid">15661852</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Giordano1">
<label>22</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Giordano</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>The Scientist: Secretive, Selfish, or Reticent?</article-title>
<comment>A Social Network Analysis. E-Social Science</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Hedstrom2">
<label>23</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Hedstrom</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Research Forum Presentation: Incentives to Create “Archive-Ready” Data: Implications for Archives and Records Management.</article-title>
<comment>Society of American Archivists Annual Meeting</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Niu1">
<label>24</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Niu</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Incentive study for research data sharing.</article-title>
<comment>A case study on NIJ grantees</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Lowrance1">
<label>25</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Lowrance</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2006</year>
<comment>Access to Collections of Data and Materials for Heath Research: A report to the Medical Research Council and the Wellcome Trust</comment>
</element-citation>
</ref>
<ref id="pone.0018657-University1">
<label>26</label>
<element-citation publication-type="other">
<comment>University of Nottingham JULIET: Research funders' open access policies</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Brown1">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>The changing face of scientific discourse: Analysis of genomic and proteomic database usage and acceptance.</article-title>
<source>Journal of the American Society for Information Science and Technology</source>
<volume>54</volume>
<fpage>926</fpage>
<lpage>938</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-McCullough1">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCullough</surname>
<given-names>BD</given-names>
</name>
<name>
<surname>McGeary</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Harrison</surname>
<given-names>TD</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Do Economics Journal Archives Promote Replicable Research?</article-title>
<source>Canadian Journal of Economics</source>
<volume>41</volume>
<fpage>1406</fpage>
<lpage>1420</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Constant1">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Constant</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kiesler</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sproull</surname>
<given-names>L</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>What's mine is ours, or is it? A study of attitudes about information sharing.</article-title>
<source>Information Systems Research</source>
<volume>5</volume>
<fpage>400</fpage>
<lpage>421</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Matzler1">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matzler</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Renzl</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Muller</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Herting</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mooradian</surname>
<given-names>T</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Personality traits and knowledge sharing.</article-title>
<source>Journal of Economic Psychology</source>
<volume>29</volume>
<fpage>301</fpage>
<lpage>313</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Ryu1">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ryu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>SH</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>I</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Knowledge sharing behavior of physicians in hospitals.</article-title>
<source>Expert Systems With Applications</source>
<volume>25</volume>
<fpage>113</fpage>
<lpage>122</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Bitzer1">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bitzer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schrettl</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Schröder</surname>
<given-names>PJH</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Intrinsic motivation in open source software development.</article-title>
<source>Journal of Comparative Economics</source>
<volume>35</volume>
<fpage>160</fpage>
<lpage>169</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Kim1">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Motivating and Impeding Factors Affecting Faculty Contribution to Institutional Repositories.</article-title>
<source>Journal of Digital Information</source>
<volume>8</volume>
<fpage>2</fpage>
</element-citation>
</ref>
<ref id="pone.0018657-Seonghee1">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Seonghee</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Boryung</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>An analysis of faculty perceptions: Attitudes toward knowledge sharing and collaboration in an academic institution.</article-title>
<source>Library</source>
<volume>30</volume>
<fpage>282</fpage>
<lpage>290</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Warlick1">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Warlick</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Vaughan</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Factors influencing publication choice: why faculty choose open access.</article-title>
<source>Biomed Digit Libr</source>
<volume>4</volume>
<fpage>1</fpage>
<pub-id pub-id-type="pmid">17349038</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Lee1">
<label>36</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Dourish</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>G</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>The human infrastructure of cyberinfrastructure.</article-title>
<comment>Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. Banff, Canada</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Kuo1">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kuo</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>A study of the intention–action gap in knowledge sharing practices.</article-title>
<source>Journal of the American Society for Information Science and Technology</source>
<volume>59</volume>
<fpage>1224</fpage>
<lpage>1237</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Rhodes1">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rhodes</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shanker</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Deshpande</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Varambally</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2004</year>
<article-title>Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.</article-title>
<source>Proc Natl Acad Sci U S A</source>
<volume>101</volume>
<fpage>9309</fpage>
<lpage>9314</lpage>
<pub-id pub-id-type="pmid">15184677</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Hrynaszkiewicz1">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hrynaszkiewicz</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Towards agreement on best practice for publishing raw clinical trial data.</article-title>
<source>Trials</source>
<volume>10</volume>
<fpage>17</fpage>
<pub-id pub-id-type="pmid">19296844</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Parkinson1">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parkinson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kapushesky</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Shojatalab</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Abeygunawardena</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Coulson</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>ArrayExpress–a public database of microarray experiments and gene expression profiles.</article-title>
<source>Nucleic Acids Res</source>
<volume>35</volume>
<issue>Database issue</issue>
<fpage>D747</fpage>
<lpage>D750</lpage>
<pub-id pub-id-type="pmid">17132828</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Ball1">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Brazma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Causton</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chervitz</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Edgar</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<year>2004</year>
<article-title>Submission of microarray data to public repositories.</article-title>
<source>PLoS Biol</source>
<volume>2</volume>
<fpage>e317</fpage>
<pub-id pub-id-type="pmid">15340489</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Microarray1">
<label>42</label>
<element-citation publication-type="journal">
<year>2002</year>
<article-title>Microarray standards at last.</article-title>
<source>Nature</source>
<volume>419</volume>
<fpage>323</fpage>
</element-citation>
</ref>
<ref id="pone.0018657-Piwowar2">
<label>43</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data.</article-title>
<comment>Doctoral Dissertation: University of Pittsburgh</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Piwowar3">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piwowar</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>W</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers.</article-title>
<source>J Biomed Discov Collab</source>
<volume>5</volume>
<fpage>7</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">20349403</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Yu1">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Yesupriya</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Wulf</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gwinn</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2007</year>
<article-title>An automatic method to generate domain-specific investigator networks using PubMed abstracts.</article-title>
<source>BMC medical informatics and decision making</source>
<volume>7</volume>
<fpage>17</fpage>
<pub-id pub-id-type="pmid">17584920</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Torvik1">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Torvik</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Smalheiser</surname>
<given-names>NR</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Author Name Disambiguation in MEDLINE.</article-title>
<source>Transactions on Knowledge Discovery from Data</source>
<fpage>1</fpage>
<lpage>37</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Bird1">
<label>47</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Bird</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Loper</surname>
<given-names>E</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Natural Language Toolkit.</article-title>
<comment>Available:
<ext-link ext-link-type="uri" xlink:href="http://nltk.sourceforge.net/">http://nltk.sourceforge.net/</ext-link>
</comment>
</element-citation>
</ref>
<ref id="pone.0018657-R1">
<label>48</label>
<element-citation publication-type="book">
<collab>R Development Core Team</collab>
<year>2008</year>
<source>R: A Language and Environment for Statistical Computing</source>
<publisher-loc>Vienna, Austria</publisher-loc>
<publisher-name>ISBN 3-900051-07-0</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Theus1">
<label>49</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Theus</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Urbanek</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2008</year>
<source>Interactive Graphics for Data Analysis: Principles and Examples (Computer Science and Data Analysis)</source>
<publisher-name>Chapman & Hall/CRC</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Harrell1">
<label>50</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Harrell</surname>
<given-names>FE</given-names>
</name>
</person-group>
<year>2001</year>
<source>Regression Modeling Strategies</source>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Gorsuch1">
<label>51</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Gorsuch</surname>
<given-names>RL</given-names>
</name>
</person-group>
<year>1983</year>
<source>Factor Analysis, Second Edition</source>
<publisher-name>Psychology Press</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Vickers1">
<label>52</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Vickers</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<year>2008 January 22</year>
<source>Cancer Data? Sorry, Can't Have It</source>
<publisher-name>The New York Times</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Siemsen1">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Siemsen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Roth</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Balasubramanian</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>How motivation, opportunity, and ability drive knowledge sharing: The constraining-factor model.</article-title>
<source>Journal of Operations Management</source>
<volume>26</volume>
<fpage>426</fpage>
<lpage>445</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Tucker1">
<label>54</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Tucker</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2009</year>
<source>Motivating Subjects: Data Sharing in Cancer Research [PhD dissertation.]</source>
<publisher-name>Virginia Polytechnic Institute and State University</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Malin1">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Malin</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Karp</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Scheuermann</surname>
<given-names>RH</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research.</article-title>
<source>J Investig Med</source>
<volume>58</volume>
<fpage>11</fpage>
<lpage>18</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Foster1">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Foster</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sharp</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Share and share alike: deciding how to distribute the scientific and social benefits of genomic data.</article-title>
<source>Nat Rev Genet</source>
<volume>8</volume>
<fpage>633</fpage>
<lpage>639</lpage>
<pub-id pub-id-type="pmid">17607307</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Navarro1">
<label>57</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Navarro</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>An ethical framework for sharing patient data without consent.</article-title>
<source>Inform Prim Care</source>
<volume>16</volume>
<fpage>257</fpage>
<lpage>262</lpage>
<pub-id pub-id-type="pmid">19192326</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Blumenthal2">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blumenthal</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Causino</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Louis</surname>
<given-names>K</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>Withholding research results in academic life science. Evidence from a national survey of faculty.</article-title>
<source>JAMA</source>
<volume>277</volume>
<fpage>1224</fpage>
<lpage>1228</lpage>
<pub-id pub-id-type="pmid">9103347</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Vogeli1">
<label>59</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vogeli</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Yucel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bendavid</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>2006</year>
<article-title>Data withholding and the next generation of scientists: results of a national survey.</article-title>
<source>Acad Med</source>
<volume>81</volume>
<fpage>128</fpage>
<lpage>136</lpage>
<pub-id pub-id-type="pmid">16436573</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Piwowar4">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>WW</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Public Sharing of Research Datasets: A Pilot Study of Associations.</article-title>
<source>Journal of Informetrics</source>
<volume>4</volume>
<fpage>148</fpage>
<lpage>156</lpage>
<pub-id pub-id-type="pmid">21339841</pub-id>
</element-citation>
</ref>
<ref id="pone.0018657-Wellcome1">
<label>61</label>
<element-citation publication-type="other">
<collab>Wellcome Trust</collab>
<year>2010</year>
<comment>Sharing research data to improve public health: full joint statement by funders of health research</comment>
</element-citation>
</ref>
<ref id="pone.0018657-Hosek1">
<label>62</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Hosek</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Ghosh-Dastidar</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kofner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ramphal</surname>
<given-names>N</given-names>
</name>
<etal></etal>
</person-group>
<year>2005</year>
<source>Gender Differences in Major Federal External Grant Programs</source>
<publisher-name>RAND Corporation</publisher-name>
</element-citation>
</ref>
<ref id="pone.0018657-Bornmann1">
<label>63</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bornmann</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Mutz</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Daniel</surname>
<given-names>H-D</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Do we need the h index and its variants in addition to standard bibliometric measures?</article-title>
<source>Journal of the American Society for Information Science and Technology</source>
<volume>60</volume>
<fpage>1286</fpage>
<lpage>1289</lpage>
</element-citation>
</ref>
<ref id="pone.0018657-Piwowar5">
<label>64</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Data From: Who Shares?</article-title>
<comment>Who Doesn't? Factors Associated with Openly Archiving Raw Research Data. Dryad Digital Repository. Available doi:10.5061/dryad.mf1sd</comment>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000642 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000642 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3135593
   |texte=   Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:21765886" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024