Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Data issues in the life sciences

Identifieur interne : 000719 ( Pmc/Corpus ); précédent : 000718; suivant : 000720

Data issues in the life sciences

Auteurs : Anne E. Thessen ; David J. Patterson

Source :

RBID : PMC:3234430

Abstract

We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the “Big New Biology”. Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.


Url:
DOI: 10.3897/zookeys.150.1766
PubMed: 22207805
PubMed Central: 3234430

Links to Exploration step

PMC:3234430

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Data issues in the life sciences</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation>
<nlm:aff id="A1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Patterson, David J" sort="Patterson, David J" uniqKey="Patterson D" first="David J." last="Patterson">David J. Patterson</name>
<affiliation>
<nlm:aff id="A1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22207805</idno>
<idno type="pmc">3234430</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3234430</idno>
<idno type="RBID">PMC:3234430</idno>
<idno type="doi">10.3897/zookeys.150.1766</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">000719</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Data issues in the life sciences</title>
<author>
<name sortKey="Thessen, Anne E" sort="Thessen, Anne E" uniqKey="Thessen A" first="Anne E." last="Thessen">Anne E. Thessen</name>
<affiliation>
<nlm:aff id="A1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Patterson, David J" sort="Patterson, David J" uniqKey="Patterson D" first="David J." last="Patterson">David J. Patterson</name>
<affiliation>
<nlm:aff id="A1">Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">ZooKeys</title>
<idno type="ISSN">1313-2989</idno>
<idno type="eISSN">1313-2970</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<label>Abstract</label>
<p>We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the “Big New Biology”. Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Ackoff, R" uniqKey="Ackoff R">R Ackoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arlinghaus, R" uniqKey="Arlinghaus R">R Arlinghaus</name>
</author>
<author>
<name sortKey="Cooke, Sj" uniqKey="Cooke S">SJ Cooke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ausubel, Jh" uniqKey="Ausubel J">JH Ausubel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bard, Jbl" uniqKey="Bard J">JBL Bard</name>
</author>
<author>
<name sortKey="Rhee, Sy" uniqKey="Rhee S">SY Rhee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bechhofer, S" uniqKey="Bechhofer S">S Bechhofer</name>
</author>
<author>
<name sortKey="Ainsworth, J" uniqKey="Ainsworth J">J Ainsworth</name>
</author>
<author>
<name sortKey="Bhagat, J" uniqKey="Bhagat J">J Bhagat</name>
</author>
<author>
<name sortKey="Buchan, I" uniqKey="Buchan I">I Buchan</name>
</author>
<author>
<name sortKey="Couch, P" uniqKey="Couch P">P Couch</name>
</author>
<author>
<name sortKey="Cruickshank, D" uniqKey="Cruickshank D">D Cruickshank</name>
</author>
<author>
<name sortKey="De Roure, D" uniqKey="De Roure D">D De Roure</name>
</author>
<author>
<name sortKey="Delderfield, M" uniqKey="Delderfield M">M Delderfield</name>
</author>
<author>
<name sortKey="Dunlop, I" uniqKey="Dunlop I">I Dunlop</name>
</author>
<author>
<name sortKey="Gamble, M" uniqKey="Gamble M">M Gamble</name>
</author>
<author>
<name sortKey="Goble, C" uniqKey="Goble C">C Goble</name>
</author>
<author>
<name sortKey="Michaelides, D" uniqKey="Michaelides D">D Michaelides</name>
</author>
<author>
<name sortKey="Missier, P" uniqKey="Missier P">P Missier</name>
</author>
<author>
<name sortKey="Owen, S" uniqKey="Owen S">S Owen</name>
</author>
<author>
<name sortKey="Newman, D" uniqKey="Newman D">D Newman</name>
</author>
<author>
<name sortKey="Sufi, S" uniqKey="Sufi S">S Sufi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berman, H" uniqKey="Berman H">H Berman</name>
</author>
<author>
<name sortKey="Henrick, K" uniqKey="Henrick K">K Henrick</name>
</author>
<author>
<name sortKey="Nakamura, H" uniqKey="Nakamura H">H Nakamura</name>
</author>
<author>
<name sortKey="Markley, Jl" uniqKey="Markley J">JL Markley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berners Lee, T" uniqKey="Berners Lee T">T Berners-Lee</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Chilton, L" uniqKey="Chilton L">L Chilton</name>
</author>
<author>
<name sortKey="Connolly, D" uniqKey="Connolly D">D Connolly</name>
</author>
<author>
<name sortKey="Dhanaraj, R" uniqKey="Dhanaraj R">R Dhanaraj</name>
</author>
<author>
<name sortKey="Hollenbach, J" uniqKey="Hollenbach J">J Hollenbach</name>
</author>
<author>
<name sortKey="Lerer, A" uniqKey="Lerer A">A Lerer</name>
</author>
<author>
<name sortKey="Sheets, D" uniqKey="Sheets D">D Sheets</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bilofsky, Hs" uniqKey="Bilofsky H">HS Bilofsky</name>
</author>
<author>
<name sortKey="Christian, B" uniqKey="Christian B">B Christian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bizer, C" uniqKey="Bizer C">C Bizer</name>
</author>
<author>
<name sortKey="Jentzsch, A" uniqKey="Jentzsch A">A Jentzsch</name>
</author>
<author>
<name sortKey="Cyganiak, R" uniqKey="Cyganiak R">R Cyganiak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Booth, D" uniqKey="Booth D">D Booth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bunin, Vd" uniqKey="Bunin V">VD Bunin</name>
</author>
<author>
<name sortKey="Ignatov, Ov" uniqKey="Ignatov O">OV Ignatov</name>
</author>
<author>
<name sortKey="Gulii, Oi" uniqKey="Gulii O">OI Gulii</name>
</author>
<author>
<name sortKey="Voloshin, Ag" uniqKey="Voloshin A">AG Voloshin</name>
</author>
<author>
<name sortKey="Dykman, La" uniqKey="Dykman L">LA Dykman</name>
</author>
<author>
<name sortKey="O Eil, D" uniqKey="O Eil D">D O’Neil</name>
</author>
<author>
<name sortKey="Ivnitskii, D" uniqKey="Ivnitskii D">D Ivnitskii</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Burton, A" uniqKey="Burton A">A Burton</name>
</author>
<author>
<name sortKey="Treloar, A" uniqKey="Treloar A">A Treloar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Campbell, Lm" uniqKey="Campbell L">LM Campbell</name>
</author>
<author>
<name sortKey="Macneill, S" uniqKey="Macneill S">S MacNeill</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chapman, Ad" uniqKey="Chapman A">AD Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chapman, Ad" uniqKey="Chapman A">AD Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chapman, Ad" uniqKey="Chapman A">AD Chapman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chavan, V" uniqKey="Chavan V">V Chavan</name>
</author>
<author>
<name sortKey="Krishnan, S" uniqKey="Krishnan S">S Krishnan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cisneros Montemayor, Am" uniqKey="Cisneros Montemayor A">AM Cisneros-Montemayor</name>
</author>
<author>
<name sortKey="Sumaila, Ur" uniqKey="Sumaila U">UR Sumaila</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Coale, Kh" uniqKey="Coale K">KH Coale</name>
</author>
<author>
<name sortKey="Johnson, Ks" uniqKey="Johnson K">KS Johnson</name>
</author>
<author>
<name sortKey="Chavez, Fp" uniqKey="Chavez F">FP Chavez</name>
</author>
<author>
<name sortKey="Buesseler, Ko" uniqKey="Buesseler K">KO Buesseler</name>
</author>
<author>
<name sortKey="Barber, Rt" uniqKey="Barber R">RT Barber</name>
</author>
<author>
<name sortKey="Brzezinski, Ma" uniqKey="Brzezinski M">MA Brzezinski</name>
</author>
<author>
<name sortKey="Cochlan, Wp" uniqKey="Cochlan W">WP Cochlan</name>
</author>
<author>
<name sortKey="Millero, Fj" uniqKey="Millero F">FJ Millero</name>
</author>
<author>
<name sortKey="Falkowski, Pg" uniqKey="Falkowski P">PG Falkowski</name>
</author>
<author>
<name sortKey="Bauer, Je" uniqKey="Bauer J">JE Bauer</name>
</author>
<author>
<name sortKey="Wanninkhof, Rh" uniqKey="Wanninkhof R">RH Wanninkhof</name>
</author>
<author>
<name sortKey="Kudela, Rm" uniqKey="Kudela R">RM Kudela</name>
</author>
<author>
<name sortKey="Altabet, Ma" uniqKey="Altabet M">MA Altabet</name>
</author>
<author>
<name sortKey="Hales, Be" uniqKey="Hales B">BE Hales</name>
</author>
<author>
<name sortKey="Takahashi, T" uniqKey="Takahashi T">T Takahashi</name>
</author>
<author>
<name sortKey="Landry, Mr" uniqKey="Landry M">MR Landry</name>
</author>
<author>
<name sortKey="Bidigare, Rr" uniqKey="Bidigare R">RR Bidigare</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Chase, Z" uniqKey="Chase Z">Z Chase</name>
</author>
<author>
<name sortKey="Strutton, Pg" uniqKey="Strutton P">PG Strutton</name>
</author>
<author>
<name sortKey="Friederich, Ge" uniqKey="Friederich G">GE Friederich</name>
</author>
<author>
<name sortKey="Gorbunov, My" uniqKey="Gorbunov M">MY Gorbunov</name>
</author>
<author>
<name sortKey="Lance, Vp" uniqKey="Lance V">VP Lance</name>
</author>
<author>
<name sortKey="Hilting, Ak" uniqKey="Hilting A">AK Hilting</name>
</author>
<author>
<name sortKey="Hiscock, Mr" uniqKey="Hiscock M">MR Hiscock</name>
</author>
<author>
<name sortKey="Demarest, M" uniqKey="Demarest M">M Demarest</name>
</author>
<author>
<name sortKey="Hiscock, Wt" uniqKey="Hiscock W">WT Hiscock</name>
</author>
<author>
<name sortKey="Sullivan, Kf" uniqKey="Sullivan K">KF Sullivan</name>
</author>
<author>
<name sortKey="Tanner, Sj" uniqKey="Tanner S">SJ Tanner</name>
</author>
<author>
<name sortKey="Gordon, Rm" uniqKey="Gordon R">RM Gordon</name>
</author>
<author>
<name sortKey="Hunter, Cn" uniqKey="Hunter C">CN Hunter</name>
</author>
<author>
<name sortKey="Elrod, Va" uniqKey="Elrod V">VA Elrod</name>
</author>
<author>
<name sortKey="Fitzwater, Se" uniqKey="Fitzwater S">SE Fitzwater</name>
</author>
<author>
<name sortKey="Jones, Jl" uniqKey="Jones J">JL Jones</name>
</author>
<author>
<name sortKey="Tozzi, S" uniqKey="Tozzi S">S Tozzi</name>
</author>
<author>
<name sortKey="Koblizek, M" uniqKey="Koblizek M">M Koblizek</name>
</author>
<author>
<name sortKey="Roberts, Ae" uniqKey="Roberts A">AE Roberts</name>
</author>
<author>
<name sortKey="Herndon, J" uniqKey="Herndon J">J Herndon</name>
</author>
<author>
<name sortKey="Brewster, J" uniqKey="Brewster J">J Brewster</name>
</author>
<author>
<name sortKey="Ladizinsky, N" uniqKey="Ladizinsky N">N Ladizinsky</name>
</author>
<author>
<name sortKey="Smith, G" uniqKey="Smith G">G Smith</name>
</author>
<author>
<name sortKey="Cooper, D" uniqKey="Cooper D">D Cooper</name>
</author>
<author>
<name sortKey="Timothy, D" uniqKey="Timothy D">D Timothy</name>
</author>
<author>
<name sortKey="Brown, Sl" uniqKey="Brown S">SL Brown</name>
</author>
<author>
<name sortKey="Selph, Ke" uniqKey="Selph K">KE Selph</name>
</author>
<author>
<name sortKey="Sheridan, Cc" uniqKey="Sheridan C">CC Sheridan</name>
</author>
<author>
<name sortKey="Twining, Bs" uniqKey="Twining B">BS Twining</name>
</author>
<author>
<name sortKey="Johnson, Zi" uniqKey="Johnson Z">ZI Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Coburn, Ta" uniqKey="Coburn T">TA Coburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cochrane, G" uniqKey="Cochrane G">G Cochrane</name>
</author>
<author>
<name sortKey="Akhtar, R" uniqKey="Akhtar R">R Akhtar</name>
</author>
<author>
<name sortKey="Bonfield, J" uniqKey="Bonfield J">J Bonfield</name>
</author>
<author>
<name sortKey="Bower, L" uniqKey="Bower L">L Bower</name>
</author>
<author>
<name sortKey="Demiralp, F" uniqKey="Demiralp F">F Demiralp</name>
</author>
<author>
<name sortKey="Faruque, N" uniqKey="Faruque N">N Faruque</name>
</author>
<author>
<name sortKey="Gibson, R" uniqKey="Gibson R">R Gibson</name>
</author>
<author>
<name sortKey="Hoad, G" uniqKey="Hoad G">G Hoad</name>
</author>
<author>
<name sortKey="Hubbard, T" uniqKey="Hubbard T">T Hubbard</name>
</author>
<author>
<name sortKey="Hunter, C" uniqKey="Hunter C">C Hunter</name>
</author>
<author>
<name sortKey="Jang, M" uniqKey="Jang M">M Jang</name>
</author>
<author>
<name sortKey="Juhos, S" uniqKey="Juhos S">S Juhos</name>
</author>
<author>
<name sortKey="Leinonen, R" uniqKey="Leinonen R">R Leinonen</name>
</author>
<author>
<name sortKey="Leonard, S" uniqKey="Leonard S">S Leonard</name>
</author>
<author>
<name sortKey="Lin, Q" uniqKey="Lin Q">Q Lin</name>
</author>
<author>
<name sortKey="Lopez, R" uniqKey="Lopez R">R Lopez</name>
</author>
<author>
<name sortKey="Lorenc, D" uniqKey="Lorenc D">D Lorenc</name>
</author>
<author>
<name sortKey="Mcwilliam, H" uniqKey="Mcwilliam H">H McWilliam</name>
</author>
<author>
<name sortKey="Mukherjee, G" uniqKey="Mukherjee G">G Mukherjee</name>
</author>
<author>
<name sortKey="Plaister, S" uniqKey="Plaister S">S Plaister</name>
</author>
<author>
<name sortKey="Radhakrishan, R" uniqKey="Radhakrishan R">R Radhakrishan</name>
</author>
<author>
<name sortKey="Robinson, S" uniqKey="Robinson S">S Robinson</name>
</author>
<author>
<name sortKey="Sobhany, S" uniqKey="Sobhany S">S Sobhany</name>
</author>
<author>
<name sortKey="Hoopen, Pt" uniqKey="Hoopen P">PT Hoopen</name>
</author>
<author>
<name sortKey="Vaughan, R" uniqKey="Vaughan R">R Vaughan</name>
</author>
<author>
<name sortKey="Zalunin, V" uniqKey="Zalunin V">V Zalunin</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Costanza, R" uniqKey="Costanza R">R Costanza</name>
</author>
<author>
<name sortKey="D Rge, R" uniqKey="D Rge R">R D’arge</name>
</author>
<author>
<name sortKey="De Groot, R" uniqKey="De Groot R">R de Groot</name>
</author>
<author>
<name sortKey="Farber, S" uniqKey="Farber S">S Farber</name>
</author>
<author>
<name sortKey="Grasso, M" uniqKey="Grasso M">M Grasso</name>
</author>
<author>
<name sortKey="Hannon, B" uniqKey="Hannon B">B Hannon</name>
</author>
<author>
<name sortKey="Limburg, K" uniqKey="Limburg K">K Limburg</name>
</author>
<author>
<name sortKey="Naeem, S" uniqKey="Naeem S">S Naeem</name>
</author>
<author>
<name sortKey="O Eill, Rv" uniqKey="O Eill R">RV O’Neill</name>
</author>
<author>
<name sortKey="Paruelo, J" uniqKey="Paruelo J">J Paruelo</name>
</author>
<author>
<name sortKey="Raskin, Rg" uniqKey="Raskin R">RG Raskin</name>
</author>
<author>
<name sortKey="Sutton, P" uniqKey="Sutton P">P Sutton</name>
</author>
<author>
<name sortKey="Van Den Belt, M" uniqKey="Van Den Belt M">M van den Belt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Costello, M" uniqKey="Costello M">M Costello</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cryer, P" uniqKey="Cryer P">P Cryer</name>
</author>
<author>
<name sortKey="Hyam, R" uniqKey="Hyam R">R Hyam</name>
</author>
<author>
<name sortKey="Miller, C" uniqKey="Miller C">C Miller</name>
</author>
<author>
<name sortKey="Nicolson, N" uniqKey="Nicolson N">N Nicolson</name>
</author>
<author>
<name sortKey=" Tuama, E" uniqKey=" Tuama E">É Ó Tuama</name>
</author>
<author>
<name sortKey="Page, R" uniqKey="Page R">R Page</name>
</author>
<author>
<name sortKey="Rees, J" uniqKey="Rees J">J Rees</name>
</author>
<author>
<name sortKey="Riccardi, G" uniqKey="Riccardi G">G Riccardi</name>
</author>
<author>
<name sortKey="Richards, K" uniqKey="Richards K">K Richards</name>
</author>
<author>
<name sortKey="White, R" uniqKey="White R">R White</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davis, Pm" uniqKey="Davis P">PM Davis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Rosnay, J" uniqKey="De Rosnay J">J De Rosnay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dittert, N" uniqKey="Dittert N">N Dittert</name>
</author>
<author>
<name sortKey="Diepenbroek, M" uniqKey="Diepenbroek M">M Diepenbroek</name>
</author>
<author>
<name sortKey="Grobe, H" uniqKey="Grobe H">H Grobe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doom, T" uniqKey="Doom T">T Doom</name>
</author>
<author>
<name sortKey="Raymer, M" uniqKey="Raymer M">M Raymer</name>
</author>
<author>
<name sortKey="Krane, D" uniqKey="Krane D">D Krane</name>
</author>
<author>
<name sortKey="Garcia, O" uniqKey="Garcia O">O Garcia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Esf European Science, Foundation" uniqKey="Esf European Science F">Foundation) ESF (European Science</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Evans, Ja" uniqKey="Evans J">JA Evans</name>
</author>
<author>
<name sortKey="Foster, Jg" uniqKey="Foster J">JG Foster</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fauchart, E" uniqKey="Fauchart E">E Fauchart</name>
</author>
<author>
<name sortKey="Von Hippel, E" uniqKey="Von Hippel E">E von Hippel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feijen, M" uniqKey="Feijen M">M Feijen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fergraus, Eh" uniqKey="Fergraus E">EH Fergraus</name>
</author>
<author>
<name sortKey="Andelman, S" uniqKey="Andelman S">S Andelman</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
<author>
<name sortKey="Schildhauer, M" uniqKey="Schildhauer M">M Schildhauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fox, P" uniqKey="Fox P">P Fox</name>
</author>
<author>
<name sortKey="Hendler, J" uniqKey="Hendler J">J Hendler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Froese, R" uniqKey="Froese R">R Froese</name>
</author>
<author>
<name sortKey="Lloris, D" uniqKey="Lloris D">D Lloris</name>
</author>
<author>
<name sortKey="Opitz, S" uniqKey="Opitz S">S Opitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gargouri, Y" uniqKey="Gargouri Y">Y Gargouri</name>
</author>
<author>
<name sortKey="Hajjen, C" uniqKey="Hajjen C">C Hajjen</name>
</author>
<author>
<name sortKey="Lariviere, V" uniqKey="Lariviere V">V Larivière</name>
</author>
<author>
<name sortKey="Gingras, Y" uniqKey="Gingras Y">Y Gingras</name>
</author>
<author>
<name sortKey="Carr, L" uniqKey="Carr L">L Carr</name>
</author>
<author>
<name sortKey="Brody, T" uniqKey="Brody T">T Brody</name>
</author>
<author>
<name sortKey="Harnad, S" uniqKey="Harnad S">S Harnad</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Groth, P" uniqKey="Groth P">P Groth</name>
</author>
<author>
<name sortKey="Gibson, A" uniqKey="Gibson A">A Gibson</name>
</author>
<author>
<name sortKey="Velterop, J" uniqKey="Velterop J">J Velterop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guttmacher, Ae" uniqKey="Guttmacher A">AE Guttmacher</name>
</author>
<author>
<name sortKey="Nabel, Eg" uniqKey="Nabel E">EG Nabel</name>
</author>
<author>
<name sortKey="Collins, Fs" uniqKey="Collins F">FS Collins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gwinn, Ne" uniqKey="Gwinn N">NE Gwinn</name>
</author>
<author>
<name sortKey="Rinaldo, C" uniqKey="Rinaldo C">C Rinaldo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harnad, S" uniqKey="Harnad S">S Harnad</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harnad, S" uniqKey="Harnad S">S Harnad</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heidorn, Pb" uniqKey="Heidorn P">PB Heidorn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hey, T" uniqKey="Hey T">T Hey</name>
</author>
<author>
<name sortKey="Tansley, S" uniqKey="Tansley S">S Tansley</name>
</author>
<author>
<name sortKey="Tolle, K" uniqKey="Tolle K">K Tolle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Higgins, D" uniqKey="Higgins D">D Higgins</name>
</author>
<author>
<name sortKey="Berkley, C" uniqKey="Berkley C">C Berkley</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hillerkuss, D" uniqKey="Hillerkuss D">D Hillerkuss</name>
</author>
<author>
<name sortKey="Schmogrow, R" uniqKey="Schmogrow R">R Schmogrow</name>
</author>
<author>
<name sortKey="Schellinger, T" uniqKey="Schellinger T">T Schellinger</name>
</author>
<author>
<name sortKey="Jordan, M" uniqKey="Jordan M">M Jordan</name>
</author>
<author>
<name sortKey="Winter, M" uniqKey="Winter M">M Winter</name>
</author>
<author>
<name sortKey="Huber, G" uniqKey="Huber G">G Huber</name>
</author>
<author>
<name sortKey="Vallaitis, T" uniqKey="Vallaitis T">T Vallaitis</name>
</author>
<author>
<name sortKey="Bonk, R" uniqKey="Bonk R">R Bonk</name>
</author>
<author>
<name sortKey="Kleinow, P" uniqKey="Kleinow P">P Kleinow</name>
</author>
<author>
<name sortKey="Frey, F" uniqKey="Frey F">F Frey</name>
</author>
<author>
<name sortKey="Roeger, M" uniqKey="Roeger M">M Roeger</name>
</author>
<author>
<name sortKey="Koenig, S" uniqKey="Koenig S">S Koenig</name>
</author>
<author>
<name sortKey="Ludwig, A" uniqKey="Ludwig A">A Ludwig</name>
</author>
<author>
<name sortKey="Marculescu, A" uniqKey="Marculescu A">A Marculescu</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Hoh, M" uniqKey="Hoh M">M Hoh</name>
</author>
<author>
<name sortKey="Dreschmann, M" uniqKey="Dreschmann M">M Dreschmann</name>
</author>
<author>
<name sortKey="Meyer, J" uniqKey="Meyer J">J Meyer</name>
</author>
<author>
<name sortKey="Ben Ezra, S" uniqKey="Ben Ezra S">S Ben Ezra</name>
</author>
<author>
<name sortKey="Narkiss, N" uniqKey="Narkiss N">N Narkiss</name>
</author>
<author>
<name sortKey="Nebendahl, B" uniqKey="Nebendahl B">B Nebendahl</name>
</author>
<author>
<name sortKey="Parmigiani, F" uniqKey="Parmigiani F">F Parmigiani</name>
</author>
<author>
<name sortKey="Petropoulos, P" uniqKey="Petropoulos P">P Petropoulos</name>
</author>
<author>
<name sortKey="Resan, B" uniqKey="Resan B">B Resan</name>
</author>
<author>
<name sortKey="Oehler, A" uniqKey="Oehler A">A Oehler</name>
</author>
<author>
<name sortKey="Weingarten, K" uniqKey="Weingarten K">K Weingarten</name>
</author>
<author>
<name sortKey="Ellermeyer, T" uniqKey="Ellermeyer T">T Ellermeyer</name>
</author>
<author>
<name sortKey="Lutz, J" uniqKey="Lutz J">J Lutz</name>
</author>
<author>
<name sortKey="Moeller, M" uniqKey="Moeller M">M Moeller</name>
</author>
<author>
<name sortKey="Huebner, M" uniqKey="Huebner M">M Huebner</name>
</author>
<author>
<name sortKey="Becker, J" uniqKey="Becker J">J Becker</name>
</author>
<author>
<name sortKey="Koos, C" uniqKey="Koos C">C Koos</name>
</author>
<author>
<name sortKey="Freude, W" uniqKey="Freude W">W Freude</name>
</author>
<author>
<name sortKey="Leuthold, J" uniqKey="Leuthold J">J Leuthold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hopkins, Gw" uniqKey="Hopkins G">GW Hopkins</name>
</author>
<author>
<name sortKey="Freckleton, Rp" uniqKey="Freckleton R">RP Freckleton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huynh, Df" uniqKey="Huynh D">DF Huynh</name>
</author>
<author>
<name sortKey="Karger, Dr" uniqKey="Karger D">DR Karger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Innocenti, P" uniqKey="Innocenti P">P Innocenti</name>
</author>
<author>
<name sortKey="Mchugh, A" uniqKey="Mchugh A">A McHugh</name>
</author>
<author>
<name sortKey="Ross, S" uniqKey="Ross S">S Ross</name>
</author>
<author>
<name sortKey="Ruusalepp, R" uniqKey="Ruusalepp R">R Ruusalepp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iso" uniqKey="Iso">ISO</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
<author>
<name sortKey="Berkley, C" uniqKey="Berkley C">C Berkley</name>
</author>
<author>
<name sortKey="Bojilova, J" uniqKey="Bojilova J">J Bojilova</name>
</author>
<author>
<name sortKey="Schilhauer, M" uniqKey="Schilhauer M">M Schilhauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
<author>
<name sortKey="Schildhauer, Mp" uniqKey="Schildhauer M">MP Schildhauer</name>
</author>
<author>
<name sortKey="Reichman, Oj" uniqKey="Reichman O">OJ Reichman</name>
</author>
<author>
<name sortKey="Bowers, S" uniqKey="Bowers S">S Bowers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaye, J" uniqKey="Kaye J">J Kaye</name>
</author>
<author>
<name sortKey="Heeney, C" uniqKey="Heeney C">C Heeney</name>
</author>
<author>
<name sortKey="Hawkins, N" uniqKey="Hawkins N">N Hawkins</name>
</author>
<author>
<name sortKey="De Vries, J" uniqKey="De Vries J">J de Vries</name>
</author>
<author>
<name sortKey="Boddington, P" uniqKey="Boddington P">P Boddington</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelling, S" uniqKey="Kelling S">S Kelling</name>
</author>
<author>
<name sortKey="Hochachka, Wm" uniqKey="Hochachka W">WM Hochachka</name>
</author>
<author>
<name sortKey="Fink, D" uniqKey="Fink D">D Fink</name>
</author>
<author>
<name sortKey="Riedewald, M" uniqKey="Riedewald M">M Riedewald</name>
</author>
<author>
<name sortKey="Caruana, R" uniqKey="Caruana R">R Caruana</name>
</author>
<author>
<name sortKey="Ballard, G" uniqKey="Ballard G">G Ballard</name>
</author>
<author>
<name sortKey="Hooker, G" uniqKey="Hooker G">G Hooker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kerlinger, P" uniqKey="Kerlinger P">P Kerlinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Key Perspectives Ltd" uniqKey="Key Perspectives Ltd">Key Perspectives Ltd</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kidd, Dm" uniqKey="Kidd D">DM Kidd</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klump, J" uniqKey="Klump J">J Klump</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kobilarov, G" uniqKey="Kobilarov G">G Kobilarov</name>
</author>
<author>
<name sortKey="Dickinson, I" uniqKey="Dickinson I">I Dickinson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohnke, D" uniqKey="Kohnke D">D Kohnke</name>
</author>
<author>
<name sortKey="Costello, Mj" uniqKey="Costello M">MJ Costello</name>
</author>
<author>
<name sortKey="Crease, J" uniqKey="Crease J">J Crease</name>
</author>
<author>
<name sortKey="Folack, J" uniqKey="Folack J">J Folack</name>
</author>
<author>
<name sortKey="Martinez Guingla, R" uniqKey="Martinez Guingla R">R Martinez Guingla</name>
</author>
<author>
<name sortKey="Michida, Y" uniqKey="Michida Y">Y Michida</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lambrix, P" uniqKey="Lambrix P">P Lambrix</name>
</author>
<author>
<name sortKey="Tan, H" uniqKey="Tan H">H Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langille, Mgi" uniqKey="Langille M">MGI Langille</name>
</author>
<author>
<name sortKey="Eisen, Ja" uniqKey="Eisen J">JA Eisen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, Cp" uniqKey="Lee C">CP Lee</name>
</author>
<author>
<name sortKey="Dourish, P" uniqKey="Dourish P">P Dourish</name>
</author>
<author>
<name sortKey="Mark, G" uniqKey="Mark G">G Mark</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lynch, Ca" uniqKey="Lynch C">CA Lynch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macleod, N" uniqKey="Macleod N">N MacLeod</name>
</author>
<author>
<name sortKey="Benfield, M" uniqKey="Benfield M">M Benfield</name>
</author>
<author>
<name sortKey="Culverhouse, P" uniqKey="Culverhouse P">P Culverhouse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madin, J" uniqKey="Madin J">J Madin</name>
</author>
<author>
<name sortKey="Bowers, S" uniqKey="Bowers S">S Bowers</name>
</author>
<author>
<name sortKey="Schildhauer, M" uniqKey="Schildhauer M">M Schildhauer</name>
</author>
<author>
<name sortKey="Krivov, S" uniqKey="Krivov S">S Krivov</name>
</author>
<author>
<name sortKey="Pennington, D" uniqKey="Pennington D">D Pennington</name>
</author>
<author>
<name sortKey="Villa, F" uniqKey="Villa F">F Villa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madin, Js" uniqKey="Madin J">JS Madin</name>
</author>
<author>
<name sortKey="Bowers, S" uniqKey="Bowers S">S Bowers</name>
</author>
<author>
<name sortKey="Schildhauer, Sm" uniqKey="Schildhauer S">SM Schildhauer</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mandavilli, A" uniqKey="Mandavilli A">A Mandavilli</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marris, E" uniqKey="Marris E">E Marris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccown, F" uniqKey="Mccown F">F McCown</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Nelson, Ml" uniqKey="Nelson M">ML Nelson</name>
</author>
<author>
<name sortKey="Zubair, M" uniqKey="Zubair M">M Zubair</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Michener, Wk" uniqKey="Michener W">WK Michener</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mons, B" uniqKey="Mons B">B Mons</name>
</author>
<author>
<name sortKey="Velterop, J" uniqKey="Velterop J">J Velterop</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Morris, R" uniqKey="Morris R">R Morris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nas, National Academy Of Sciences" uniqKey="Nas ">(National Academy of Sciences) NAS</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="National Science Board" uniqKey="National Science Board">National Science Board</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="National Science Board" uniqKey="National Science Board">National Science Board</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Norris, M" uniqKey="Norris M">M Norris</name>
</author>
<author>
<name sortKey="Oppenheim, C" uniqKey="Oppenheim C">C Oppenheim</name>
</author>
<author>
<name sortKey="Rowland, F" uniqKey="Rowland F">F Rowland</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nsf, National Sciencefoundation" uniqKey="Nsf ">(National ScienceFoundation) NSF</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nsf, National Sciencefoundation" uniqKey="Nsf ">(National ScienceFoundation) NSF</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oecd" uniqKey="Oecd">OECD</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parse" uniqKey="Parse">PARSE</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author>
<name sortKey="Faulwetter, S" uniqKey="Faulwetter S">S Faulwetter</name>
</author>
<author>
<name sortKey="Shipunov, A" uniqKey="Shipunov A">A Shipunov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
<author>
<name sortKey="Cooper, J" uniqKey="Cooper J">J Cooper</name>
</author>
<author>
<name sortKey="Kirk, Pm" uniqKey="Kirk P">PM Kirk</name>
</author>
<author>
<name sortKey="Pyle, Rl" uniqKey="Pyle R">RL Pyle</name>
</author>
<author>
<name sortKey="Remsen, Dp" uniqKey="Remsen D">DP Remsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
<author>
<name sortKey="Day, Rs" uniqKey="Day R">RS Day</name>
</author>
<author>
<name sortKey="Fridsma, Db" uniqKey="Fridsma D">DB Fridsma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Piwowar, Ha" uniqKey="Piwowar H">HA Piwowar</name>
</author>
<author>
<name sortKey="Vision, Tj" uniqKey="Vision T">TJ Vision</name>
</author>
<author>
<name sortKey="Whitlock, Mc" uniqKey="Whitlock M">MC Whitlock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Porter, Jh" uniqKey="Porter J">JH Porter</name>
</author>
<author>
<name sortKey="Callahan, Jt" uniqKey="Callahan J">JT Callahan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pullin, As" uniqKey="Pullin A">AS Pullin</name>
</author>
<author>
<name sortKey="Salafsky, N" uniqKey="Salafsky N">N Salafsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raup, D" uniqKey="Raup D">D Raup</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reichman, Oj" uniqKey="Reichman O">OJ Reichman</name>
</author>
<author>
<name sortKey="Jones, Mb" uniqKey="Jones M">MB Jones</name>
</author>
<author>
<name sortKey="Schildauer, Mp" uniqKey="Schildauer M">MP Schildauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rhee, Sy" uniqKey="Rhee S">SY Rhee</name>
</author>
<author>
<name sortKey="Beavis, W" uniqKey="Beavis W">W Beavis</name>
</author>
<author>
<name sortKey="Berardini, Tz" uniqKey="Berardini T">TZ Berardini</name>
</author>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
<author>
<name sortKey="Dixon, D" uniqKey="Dixon D">D Dixon</name>
</author>
<author>
<name sortKey="Doyle, A" uniqKey="Doyle A">A Doyle</name>
</author>
<author>
<name sortKey="Carcia Hernandez, M" uniqKey="Carcia Hernandez M">M Carcia-Hernandez</name>
</author>
<author>
<name sortKey="Huala, E" uniqKey="Huala E">E Huala</name>
</author>
<author>
<name sortKey="Lander, G" uniqKey="Lander G">G Lander</name>
</author>
<author>
<name sortKey="Montoya, M" uniqKey="Montoya M">M Montoya</name>
</author>
<author>
<name sortKey="Miller, N" uniqKey="Miller N">N Miller</name>
</author>
<author>
<name sortKey="Mueller, La" uniqKey="Mueller L">LA Mueller</name>
</author>
<author>
<name sortKey="Mundodi, S" uniqKey="Mundodi S">S Mundodi</name>
</author>
<author>
<name sortKey="Reiser, L" uniqKey="Reiser L">L Reiser</name>
</author>
<author>
<name sortKey="Tacklind, J" uniqKey="Tacklind J">J Tacklind</name>
</author>
<author>
<name sortKey="Weems, Dc" uniqKey="Weems D">DC Weems</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Xu, I" uniqKey="Xu I">I Xu</name>
</author>
<author>
<name sortKey="Yoo, D" uniqKey="Yoo D">D Yoo</name>
</author>
<author>
<name sortKey="Yoonj Zhang, P" uniqKey="Yoonj Zhang P">P YoonJ, Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rin, Research Informationnetwork" uniqKey="Rin "> (Research InformationNetwork) RIN</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rogers, Em" uniqKey="Rogers E">EM Rogers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Savage, Cj" uniqKey="Savage C">CJ Savage</name>
</author>
<author>
<name sortKey="Vickers, Aj" uniqKey="Vickers A">AJ Vickers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schofield, Pn" uniqKey="Schofield P">PN Schofield</name>
</author>
<author>
<name sortKey="Eppig, J" uniqKey="Eppig J">J Eppig</name>
</author>
<author>
<name sortKey="Huala, E" uniqKey="Huala E">E Huala</name>
</author>
<author>
<name sortKey="Hrabe De Angelis, M" uniqKey="Hrabe De Angelis M">M Hrabe de Angelis</name>
</author>
<author>
<name sortKey="Harvey, M" uniqKey="Harvey M">M Harvey</name>
</author>
<author>
<name sortKey="Davidson, D" uniqKey="Davidson D">D Davidson</name>
</author>
<author>
<name sortKey="Weaver, T" uniqKey="Weaver T">T Weaver</name>
</author>
<author>
<name sortKey="Brown, S" uniqKey="Brown S">S Brown</name>
</author>
<author>
<name sortKey="Smedley, D" uniqKey="Smedley D">D Smedley</name>
</author>
<author>
<name sortKey="Rosenthal, N" uniqKey="Rosenthal N">N Rosenthal</name>
</author>
<author>
<name sortKey="Schughart, K" uniqKey="Schughart K">K Schughart</name>
</author>
<author>
<name sortKey="Aidinis, V" uniqKey="Aidinis V">V Aidinis</name>
</author>
<author>
<name sortKey="Tocchini Valentini, G" uniqKey="Tocchini Valentini G">G Tocchini-Valentini</name>
</author>
<author>
<name sortKey="Hancock, Jm" uniqKey="Hancock J">JM Hancock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Science Staff Editorial" uniqKey="Science Staff Editorial">Science staff editorial</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shirky, C" uniqKey="Shirky C">C Shirky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Silvertown, J" uniqKey="Silvertown J">J Silvertown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sinha, Ak" uniqKey="Sinha A">AK Sinha</name>
</author>
<author>
<name sortKey="Malik, Z" uniqKey="Malik Z">Z Malik</name>
</author>
<author>
<name sortKey="Rezgui, A" uniqKey="Rezgui A">A Rezgui</name>
</author>
<author>
<name sortKey="Barnes, Cg" uniqKey="Barnes C">CG Barnes</name>
</author>
<author>
<name sortKey="Lin, K" uniqKey="Lin K">K Lin</name>
</author>
<author>
<name sortKey="Heiken, G" uniqKey="Heiken G">G Heiken</name>
</author>
<author>
<name sortKey="Thomas, Wa" uniqKey="Thomas W">WA Thomas</name>
</author>
<author>
<name sortKey="Gundersen, Lc" uniqKey="Gundersen L">LC Gundersen</name>
</author>
<author>
<name sortKey="Raskin, R" uniqKey="Raskin R">R Raskin</name>
</author>
<author>
<name sortKey="Jackson, I" uniqKey="Jackson I">I Jackson</name>
</author>
<author>
<name sortKey="Fox, P" uniqKey="Fox P">P Fox</name>
</author>
<author>
<name sortKey="Mcguinness, D" uniqKey="Mcguinness D">D McGuinness</name>
</author>
<author>
<name sortKey="Seber, D" uniqKey="Seber D">D Seber</name>
</author>
<author>
<name sortKey="Zimmerman, H" uniqKey="Zimmerman H">H Zimmerman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sirovich, L" uniqKey="Sirovich L">L Sirovich</name>
</author>
<author>
<name sortKey="Stoeckle, My" uniqKey="Stoeckle M">MY Stoeckle</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, Vs" uniqKey="Smith V">VS Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smithsonian, Institution" uniqKey="Smithsonian I">Institution Smithsonian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stein, Ld" uniqKey="Stein L">LD Stein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Taylor, Cf" uniqKey="Taylor C">CF Taylor</name>
</author>
<author>
<name sortKey="Field, D" uniqKey="Field D">D Field</name>
</author>
<author>
<name sortKey="Sansone, Sa" uniqKey="Sansone S">SA Sansone</name>
</author>
<author>
<name sortKey="Aerts, J" uniqKey="Aerts J">J Aerts</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Binz, Pa" uniqKey="Binz P">PA Binz</name>
</author>
<author>
<name sortKey="Bogue, M" uniqKey="Bogue M">M Bogue</name>
</author>
<author>
<name sortKey="Booth, T" uniqKey="Booth T">T Booth</name>
</author>
<author>
<name sortKey="Brazma, A" uniqKey="Brazma A">A Brazma</name>
</author>
<author>
<name sortKey="Brinkman, Rr" uniqKey="Brinkman R">RR Brinkman</name>
</author>
<author>
<name sortKey="Clark, Am" uniqKey="Clark A">AM Clark</name>
</author>
<author>
<name sortKey="Deutsch, Ew" uniqKey="Deutsch E">EW Deutsch</name>
</author>
<author>
<name sortKey="Fiehn, O" uniqKey="Fiehn O">O Fiehn</name>
</author>
<author>
<name sortKey="Fostel, J" uniqKey="Fostel J">J Fostel</name>
</author>
<author>
<name sortKey="Ghazal, P" uniqKey="Ghazal P">P Ghazal</name>
</author>
<author>
<name sortKey="Gibson, F" uniqKey="Gibson F">F Gibson</name>
</author>
<author>
<name sortKey="Gray, T" uniqKey="Gray T">T Gray</name>
</author>
<author>
<name sortKey="Frimes, F" uniqKey="Frimes F">F Frimes</name>
</author>
<author>
<name sortKey="Hancock, Jm" uniqKey="Hancock J">JM Hancock</name>
</author>
<author>
<name sortKey="Hardy, Nw" uniqKey="Hardy N">NW Hardy</name>
</author>
<author>
<name sortKey="Hermjakob, H" uniqKey="Hermjakob H">H Hermjakob</name>
</author>
<author>
<name sortKey="Julian Jr, Rk" uniqKey="Julian Jr R">RK Julian Jr.</name>
</author>
<author>
<name sortKey="Kane, M" uniqKey="Kane M">M Kane</name>
</author>
<author>
<name sortKey="Kettner, C" uniqKey="Kettner C">C Kettner</name>
</author>
<author>
<name sortKey="Kinsinger, C" uniqKey="Kinsinger C">C Kinsinger</name>
</author>
<author>
<name sortKey="Kolker, E" uniqKey="Kolker E">E Kolker</name>
</author>
<author>
<name sortKey="Kuiper, M" uniqKey="Kuiper M">M Kuiper</name>
</author>
<author>
<name sortKey="Le Novere, N" uniqKey="Le Novere N">N Le Novère</name>
</author>
<author>
<name sortKey="Leebens Mack, J" uniqKey="Leebens Mack J">J Leebens-Mack</name>
</author>
<author>
<name sortKey="Lewis, Se" uniqKey="Lewis S">SE Lewis</name>
</author>
<author>
<name sortKey="Lord, P" uniqKey="Lord P">P Lord</name>
</author>
<author>
<name sortKey="Mallon, Am" uniqKey="Mallon A">AM Mallon</name>
</author>
<author>
<name sortKey="Marthandan, N" uniqKey="Marthandan N">N Marthandan</name>
</author>
<author>
<name sortKey="Masuya, H" uniqKey="Masuya H">H Masuya</name>
</author>
<author>
<name sortKey="Mcnally, R" uniqKey="Mcnally R">R McNally</name>
</author>
<author>
<name sortKey="Mehrle, A" uniqKey="Mehrle A">A Mehrle</name>
</author>
<author>
<name sortKey="Morrison, N" uniqKey="Morrison N">N Morrison</name>
</author>
<author>
<name sortKey="Orchard, S" uniqKey="Orchard S">S Orchard</name>
</author>
<author>
<name sortKey="Quackenbush, J" uniqKey="Quackenbush J">J Quackenbush</name>
</author>
<author>
<name sortKey="Reecy, Jm" uniqKey="Reecy J">JM Reecy</name>
</author>
<author>
<name sortKey="Robertson, Dg" uniqKey="Robertson D">DG Robertson</name>
</author>
<author>
<name sortKey="Rocca Serra, P" uniqKey="Rocca Serra P">P Rocca-Serra</name>
</author>
<author>
<name sortKey="Rodriguez, H" uniqKey="Rodriguez H">H Rodriguez</name>
</author>
<author>
<name sortKey="Rosenfelder, H" uniqKey="Rosenfelder H">H Rosenfelder</name>
</author>
<author>
<name sortKey="Santoyo Lopez, J" uniqKey="Santoyo Lopez J">J Santoyo-Lopez</name>
</author>
<author>
<name sortKey="Scheuermann, Rh" uniqKey="Scheuermann R">RH Scheuermann</name>
</author>
<author>
<name sortKey="Schober, D" uniqKey="Schober D">D Schober</name>
</author>
<author>
<name sortKey="Smith, B" uniqKey="Smith B">B Smith</name>
</author>
<author>
<name sortKey="Snape, J" uniqKey="Snape J">J Snape</name>
</author>
<author>
<name sortKey="Stoeckert Jr, Cj" uniqKey="Stoeckert Jr C">CJ Stoeckert Jr.</name>
</author>
<author>
<name sortKey="Tipton, K" uniqKey="Tipton K">K Tipton</name>
</author>
<author>
<name sortKey="Sterk, P" uniqKey="Sterk P">P Sterk</name>
</author>
<author>
<name sortKey="Untergasser, A" uniqKey="Untergasser A">A Untergasser</name>
</author>
<author>
<name sortKey="Vandesompele, J" uniqKey="Vandesompele J">J Vandesompele</name>
</author>
<author>
<name sortKey="Wiemann, S" uniqKey="Wiemann S">S Wiemann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teeb" uniqKey="Teeb">TEEB</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thessen, Ae" uniqKey="Thessen A">AE Thessen</name>
</author>
<author>
<name sortKey="Patterson, Dj" uniqKey="Patterson D">DJ Patterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tittensor, Dp" uniqKey="Tittensor D">DP Tittensor</name>
</author>
<author>
<name sortKey="Mora, C" uniqKey="Mora C">C Mora</name>
</author>
<author>
<name sortKey="Jetz, W" uniqKey="Jetz W">W Jetz</name>
</author>
<author>
<name sortKey="Lotze, Hk" uniqKey="Lotze H">HK Lotze</name>
</author>
<author>
<name sortKey="Ricard, D" uniqKey="Ricard D">D Ricard</name>
</author>
<author>
<name sortKey="Van Den Berghe, E" uniqKey="Van Den Berghe E">E van den Berghe</name>
</author>
<author>
<name sortKey="Worm, B" uniqKey="Worm B">B Worm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="United States Department Of Labor" uniqKey="United States Department Of Labor">United States Department of Labor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vision, Tj" uniqKey="Vision T">TJ Vision</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vollmar, A" uniqKey="Vollmar A">A Vollmar</name>
</author>
<author>
<name sortKey="Macklin, J" uniqKey="Macklin J">J Macklin</name>
</author>
<author>
<name sortKey="Ford, Ls" uniqKey="Ford L">LS Ford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Webb, Tj" uniqKey="Webb T">TJ Webb</name>
</author>
<author>
<name sortKey="Vanden Berghe, E" uniqKey="Vanden Berghe E">E Vanden Berghe</name>
</author>
<author>
<name sortKey="O Or, R" uniqKey="O Or R">R O’Dor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="White, Hc" uniqKey="White H">HC White</name>
</author>
<author>
<name sortKey="Carrier, S" uniqKey="Carrier S">S Carrier</name>
</author>
<author>
<name sortKey="Thompson, A" uniqKey="Thompson A">A Thompson</name>
</author>
<author>
<name sortKey="Greenberg, J" uniqKey="Greenberg J">J Greenberg</name>
</author>
<author>
<name sortKey="Scherle, R" uniqKey="Scherle R">R Scherle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitlock, Mc" uniqKey="Whitlock M">MC Whitlock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitlock, Mc" uniqKey="Whitlock M">MC Whitlock</name>
</author>
<author>
<name sortKey="Mcpeek, Ma" uniqKey="Mcpeek M">MA McPeek</name>
</author>
<author>
<name sortKey="Rausher, Md" uniqKey="Rausher M">MD Rausher</name>
</author>
<author>
<name sortKey="Rieseberg, L" uniqKey="Rieseberg L">L Rieseberg</name>
</author>
<author>
<name sortKey="Moore, Aj" uniqKey="Moore A">AJ Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wren, J" uniqKey="Wren J">J Wren</name>
</author>
<author>
<name sortKey="Bateman, A" uniqKey="Bateman A">A Bateman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, M" uniqKey="Zhang M">M Zhang</name>
</author>
<author>
<name sortKey="Kihara, D" uniqKey="Kihara D">D Kihara</name>
</author>
<author>
<name sortKey="Prabhakar, S" uniqKey="Prabhakar S">S Prabhakar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ziegler, A" uniqKey="Ziegler A">A Ziegler</name>
</author>
<author>
<name sortKey="Mietchen, D" uniqKey="Mietchen D">D Mietchen</name>
</author>
<author>
<name sortKey="Faber, C" uniqKey="Faber C">C Faber</name>
</author>
<author>
<name sortKey="Von Hausen, W" uniqKey="Von Hausen W">W von Hausen</name>
</author>
<author>
<name sortKey="Schobel, C" uniqKey="Schobel C">C Schöbel</name>
</author>
<author>
<name sortKey="Sellerer, M" uniqKey="Sellerer M">M Sellerer</name>
</author>
<author>
<name sortKey="Ziegler, A" uniqKey="Ziegler A">A Ziegler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="review-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Zookeys</journal-id>
<journal-id journal-id-type="publisher-id">ZooKeys</journal-id>
<journal-title-group>
<journal-title>ZooKeys</journal-title>
</journal-title-group>
<issn pub-type="ppub">1313-2989</issn>
<issn pub-type="epub">1313-2970</issn>
<publisher>
<publisher-name>Pensoft Publishers</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22207805</article-id>
<article-id pub-id-type="pmc">3234430</article-id>
<article-id pub-id-type="doi">10.3897/zookeys.150.1766</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Data issues in the life sciences</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Thessen</surname>
<given-names>Anne E.</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Patterson</surname>
<given-names>David J.</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
</contrib>
</contrib-group>
<aff id="A1">
<label>1</label>
Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA</aff>
<author-notes>
<corresp>Corresponding author: Anne E. Thessen (
<email>athessen@mbl.edu</email>
) </corresp>
<fn fn-type="edited-by">
<p>Academic editor: Lyubomir Penev</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2011</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>11</month>
<year>2011</year>
</pub-date>
<issue>150</issue>
<fpage>15</fpage>
<lpage>51</lpage>
<history>
<date date-type="received">
<day>7</day>
<month>7</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>9</day>
<month>8</month>
<year>2011</year>
</date>
</history>
<permissions>
<copyright-statement>Anne E. Thessen, David J. Patterson</copyright-statement>
<license license-type="creative-commons-attribution" xlink:href="http://creativecommons.org/licenses/by/3.0">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract>
<label>Abstract</label>
<p>We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the “Big New Biology”. Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>life science</kwd>
<kwd>informatics</kwd>
<kwd>data issues</kwd>
<kwd>standards</kwd>
<kwd>incentives</kwd>
<kwd>escience</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>The urgent need to understand complex, global phenomena, the data deluge arising from new technologies, and improved data management are driving an agenda to extend the Life Sciences with more data-driven discovery dimensions (
<xref ref-type="bibr" rid="B73">National Academy of Sciences 2009</xref>
). The agenda requires new attitudes, facilities and approaches to sharing and querying existing data (
<xref ref-type="bibr" rid="B43">Hey et al. 2009</xref>
;
<xref ref-type="bibr" rid="B53">Kelling et al. 2009</xref>
). This document
<pmc-comment>PageBreak</pmc-comment>
addresses some of the more proximate issues that some of the Life Sciences face as they progress towards this “Big New Biology”. </p>
<p>Data-driven discovery refers to hypothesis-testing and the discovery of scientific insights through the novel management and analysis of pre-existing data. It relies on access to and reuse of data which will most likely have been generated to address other scientific problems. While still hypothesis-based, data-driven discovery contrasts with the more familiar process of scientific inquiry based on collecting new data - whether by experimentation or by making new observations. It introduces opportunities to address questions that demand a “scale” of data that cannot be acquired within a single project. It is cost-effective (
<xref ref-type="bibr" rid="B85">Piwowar et al. 2011</xref>
). Data-driven discovery is not new to biology, it is already part of exploring long term trends and is an integral part of the molecular field, but it is not the norm in most sub-disciplines. It requires a large open pool of data across the full breadth of the Life Sciences and into adjacent disciplines. The pool will probably be virtual, with tools accessing data from many repositories. Such a pool will allow biology to join the other “Big” (= data-centric) sciences such as astronomy and high-energy particle physics (
<xref ref-type="bibr" rid="B43">Hey et al. 2009</xref>
). Access to a pool will invite “New” logic, strategies and tools (a “macroscope”) to discover those trends, associations, discontinuities, and exceptions that reveal aspects of the underlying biology which are unlikely to emerge from more reductionist approaches (
<xref ref-type="bibr" rid="B26">De Rosnay 1975</xref>
;
<xref ref-type="bibr" rid="B3">Ausubel 2009</xref>
;
<xref ref-type="bibr" rid="B73">National Academy of Sciences 2009</xref>
;
<xref ref-type="bibr" rid="B82">Patterson et al. 2010</xref>
;
<xref ref-type="bibr" rid="B99">Sirovich et al. 2010</xref>
). An additional benefit is that a pool, and the resources from which it is macerated, may reveal factors not intrinsic to biology which improve our acuity or introduce distortions into knowledge; that is, it can lead to a better understanding of scientific certainty (
<xref ref-type="bibr" rid="B30">Evans and Foster 2011</xref>
). </p>
<p>The emergence of a data-centric Big New Biology is not guaranteed. Current practices in much of the discipline are parochial, with data being generated by individuals or small teams, being called upon to develop insights that are communicated in a narrative style in scientific publications. These small sciences rarely have a formal data culture, data are rarely collected with reuse in mind, they may be discarded, although more recently some journals and some sub-disciplines retain publication-related subsets of data (
<xref ref-type="bibr" rid="B111">White et al. 2008</xref>
). Data sharing requires a stable and effective cyberinfrastructure and the enthusiastic participation of the scientific community (National Science Foundation 2003, 2006;
<xref ref-type="bibr" rid="B12">Burton and Treloar 2009</xref>
;
<xref ref-type="bibr" rid="B29">European Science Foundation 2006</xref>
;
<ext-link ext-link-type="uri" xlink:href="http://www.gloriad.org">http://www.gloriad.org</ext-link>
). Registries and repositories must grow to meet the challenges of making data discoverable and accessible. The emerging “Knowledge Organization Systems” (
<xref ref-type="bibr" rid="B72">Morris 2010</xref>
) need to effectively aggregate disparate data sets in part through evolving schemas that define categories of data across the Life Sciences and through ontologies that will intelligently model existing knowledge. Semantic web technologies are needed to achieve flexibility of reuse. Enhanced user interfaces with organizational, analytical and visualization tools will be needed to allow scientists to interact with the data and associated infrastructure. Most existing environments for data management are limited in scope, and need to be improved. The enthusiastic participation of professional biologists requires a readiness to make data available for
<pmc-comment>PageBreak</pmc-comment>
reuse, and to take advantage of new opportunities in their quest for understanding. The resulting new mesh of biological, computer and information sciences, as well as changes to current cultures, is envisioned as having the capacity achieve the data-centric architecture capable of building new bridges among the sub-disciplines of the Life Sciences and making biology big. </p>
<p>This document reviews technical and sociological issues for biologists in the light of this futuristic vision for the Life Sciences. Many elements, such as data trust and data types have technological and sociological components and in such cases we have combined them for clarity.</p>
</sec>
<sec>
<title>What is meant by data</title>
<p>The term “data” is not used consistently. For some it is limited to raw data, for others the term widens to include any kind of information or process that leads to insights. We prefer to limit the term to neutral, objective, raw data that are largely independent of context, analysis or observer. As data become constrained, filtered and selected, they acquire or are assigned a meaning in the context of what they apply to. This is part of the process that transforms data into information (
<xref ref-type="bibr" rid="B1">Ackoff 1989</xref>
). There is no clear point of transition. </p>
<sec sec-type="Contextual categorization of data">
<title>Contextual categorization of data</title>
<p>The context in which biological data are acquired or generated is important to understanding how data can be appropriately reused. A context may be formed if observers select or interpret their records, because of the limitations of tools or instruments used, or because data are gathered in an unnatural setting such as an experiment or “in silico”. Individuals and technologies are selective and capture a limited subset of all available data. Data are affected by choice of instrument and analytical processes. Some context can be represented through the addition of appropriate metadata to data. We categorize the following broad types of data reflecting the context of their origins.</p>
<p>
<bold>A. Observational data</bold>
relate to an object or event actually or potentially witnessed by an agent. An agent may be a person, team, project, initiative; and they may call upon tools and instruments. Scientists need to take responsibility to add metadata to the observational data, ideally identifying the agent, date, location, and contexts such as experimental conditions if relevant or the equipment used. Within the Life Sciences, metadata should include taxon names, the basis for identification and/or pointers to reference (voucher) material. </p>
<p>
<bold>
<italic>1. Descriptive data</italic>
</bold>
are non-experimental data collected through observations of nature. Ideally, descriptive data can be reduced to values about a specified aspect of a taxon, system, or process. Each value will be unique, having been made at one place, at one time, by one agent. Observations
<pmc-comment>PageBreak</pmc-comment>
may be confirmed but not replicated such that it is important to preserve these data. Preservation often does not occur as data of this type are discarded after completion of the research narrative - the publication. The OBOE project offers a formal framework for descriptive data (
<xref ref-type="bibr" rid="B65">Madin et al. 2007a</xref>
). </p>
<p>Descriptive data can be collected by instruments or by individuals. Data collected by individuals may not represent the world completely or accurately. Mistakes can be made, such as misidentification of taxa (
<xref ref-type="bibr" rid="B64">MacLeod et al. 2010</xref>
). Researchers may be selective about the data they seek to gather, either intentionally or unintentionally, such that data sets have limited applicability. Some individuals may discard data that are not in keeping with their expectations. Few or no raw data may be recorded, such that the information may only be available in an interpreted form. Descriptive data contribute to the “long tail” of small data sets, and often are not well suited to reuse. </p>
<p>
<bold>2.
<italic>Experimental data</italic>
</bold>
are obtained when a scientist changes or constrains the conditions under which the expression of a phenomenon occurs. Experiments can be conducted across a broad range of scales - from electrophysiological investigations of sub millisecond processes within cells (
<xref ref-type="bibr" rid="B11">Bunin et al. 2005</xref>
) to manipulations of oceanic ecosystems (
<xref ref-type="bibr" rid="B19">Coale et al. 2004</xref>
). The intent is to dissect the elements of the phenomenon by changing conditions to uncover causal relationships, or to identify variant and invariant elements of biological processes. The raw data that are produced are contextualized by the experimental framework, and may have limited or no value in other contexts. It is important for associated metadata to include information about source and storage of material before the experiment, experimental conditions, equipment, controls and treatments. </p>
<p>
<bold>B. Processed data</bold>
are obtained through a reworking, recombination, or analysis of raw data. There are two primary types. </p>
<p>
<bold>
<italic>1. Computed data</italic>
</bold>
result from a reworking of data to make them more meaningful or to normalize them. In ecology, productivity or the extent of the ecosystem are rarely measured directly. Rather they are computed using information or data from other sources to generate measurements of the amount of carbon or mass that is generated per unit area per unit time. While computed data may be held in the same regard as raw data, choices or errors in formulae or algorithms may diminish or invalidate the data created. The raw data that were used and information on how computed data were derived (provenance) are important for reproducibility. The metadata should provide this information. As computed data will grow as the virtual data pool expands, it will be helpful for sub-disciplines to develop appropriate protocols and advertize best practices. </p>
<p>
<bold>
<italic>2. Simulation data</italic>
</bold>
are generated by combining mathematical or computational models with raw data. Often models seek to make predictions of processes, such as the future distribution of cane toads in Australia under various
<pmc-comment>PageBreak</pmc-comment>
climatic projections. The proximity of predictions to subsequent observations is used to test the concepts on which the model is based and to improve the model and our associated understanding of biology. Metadata differ dramatically from other data types in that date of the run, initial conditions of the model, resolution of the model output, time step, etc. are important. Rerunning the model may require preservation of initial conditions, model software, and even the operating system (
<xref ref-type="bibr" rid="B96">Shirky 2005</xref>
). Simulation data become less useful as they age and can become a storage burden. </p>
</sec>
</sec>
<sec>
<title>Sociological issues</title>
<p>As the study of human social behavior, sociology includes the study of the behavior and practices of scientists. If we are to promote a shift to a Big New Biology, we need to understand current data cultures to determine which elements favor a transformation, and which will hinder it.</p>
<sec sec-type="1. Data cultures">
<title>1. Data cultures</title>
<p>The phrase “data culture” refers to the explicit and implicit data practices and expectations that determine the destiny of data. It relates to the social conventions of acquisition, curation, preservation, sharing, and reuse of data. If the goal is to make data digital, standardized and openly accessible in a reusable format, then current data cultures provide starting points to determine the changes that will be needed before that vision can be realized. While a comprehensive survey has yet to be undertaken, it is clear that there is no single data culture for the Life Sciences (
<xref ref-type="bibr" rid="B76">Norris et al. 2008</xref>
;
<xref ref-type="bibr" rid="B36">Gargouri et al. 2010</xref>
;
<xref ref-type="bibr" rid="B55">Key Perspectives Ltd 2010</xref>
;
<xref ref-type="bibr" rid="B32">Feijen 2011</xref>
). This is unsurprising given that Life Sciences range in scope and scale from the field biologist whose data are captured in short-lived notebooks as a prelude to a narrative explanation of observations to the molecular biologist whose data are born digital in near terabyte quantities and are widely shared through global data repositories. </p>
</sec>
<sec sec-type="2. Readying data for reuse">
<title>2. Readying data for reuse</title>
<p>The preparation of data for reuse in a shared pool often involves a series of steps or stages that relate to the capture, digitization, structure, storage, curation, discoverability, access, and mobility of data. The situation with molecular data achieved by the International Nucleotide Sequence Database Collaboration comprising the DNA Data Bank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and the NCBI GenBank in the USA is exemplary (
<ext-link ext-link-type="uri" xlink:href="http://www.insdc.org/">http://www.insdc.org/</ext-link>
). Molecular data tend to be born digital, and are submitted in standard formats to centralized reposito
<pmc-comment>PageBreak</pmc-comment>
ries in which they are freely available for reuse in a standard form. A rich diversity of tools, services and applications has evolved to analyze and visualize the data. </p>
<p>Yet, set in the context of Rogers adoption curve (
<xref ref-type="bibr" rid="B92">Rogers 1983</xref>
;
<xref ref-type="fig" rid="F1">Fig. 1</xref>
), and as suggested by
<xref ref-type="bibr" rid="B41">Harnad (2010)</xref>
, Life Sciences, generally, are closer to the early adopters stage of transition to data sharing than other sciences. It is still unusual for data created in most sub-disciplines to be made ready and openly available for sharing (
<xref ref-type="bibr" rid="B25">Davis 2009</xref>
). For these sub-disciplines to join Big New Biology, data practices must change to improve retention of data, their conversion to digital form and placement within schemes of widely agreed standards, and visibility and accessibility with few or no restrictions. The technical aspects of these practices are described in the technical issues section. </p>
<fig id="F1" orientation="portrait" position="float">
<label>Figure 1.</label>
<caption>
<p>Rogers adoption curve describes the acceptance of a new technology. Life Sciences is still in the Early Adopters phase for accepting principles of data readiness.</p>
</caption>
<graphic xlink:href="ZooKeys-150-017-g001"></graphic>
</fig>
</sec>
<sec sec-type="3. Agents">
<title>3. Agents</title>
<p>The term “agent” refers to individuals, groups or organizations - each influencing data cultures.</p>
<p>
<bold>Scientists.</bold>
As major producers and consumers of Life Sciences data, scientists are important participants in Big New Biology. Within the US there are almost 100,000 biologists (excluding agriculture and health sciences) working outside of academia (
<xref ref-type="bibr" rid="B107">United States Department of Labor</xref>
). The number within academia can be estimated from data on the approximately 2,500 colleges and universities (
<ext-link ext-link-type="uri" xlink:href="http://www.globalcomputing.com/american-universities.htm">http://www.globalcomputing.com/american-universities.htm</ext-link>
) that employ almost 300,000 academics in science and engineering, 40% of whom work in the Life Sciences (
<xref ref-type="bibr" rid="B74">National Science Board 2010a</xref>
). US research and development endeavors account for approximately one-third of the global effort (
<xref ref-type="bibr" rid="B75">National Science Board 2010b</xref>
). Consequently, changing data practices will directly or indirectly affect as many as 200,000 life scientists in the US and about half a million professionals worldwide (
<xref ref-type="bibr" rid="B80">PARSE 2009</xref>
).
<pmc-comment>PageBreak</pmc-comment>
</p>
<p>As personal computers and Internet access have become integral components of biological research (
<xref ref-type="bibr" rid="B102">Stein 2008</xref>
), scientists’ views and practices of data sharing have changed. Biologists are increasingly publishing data through repositories like GenBank (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genbank/">http://www.ncbi.nlm.nih.gov/genbank/</ext-link>
), their own web sites, or are participating in collaborative environments such as those that allow data to be annotated (e.g. EcoliWiki,
<ext-link ext-link-type="uri" xlink:href="http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki">http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki</ext-link>
or DNA Subway for genome annotation,
<ext-link ext-link-type="uri" xlink:href="http://dnasubway.iplantcollaborative.org/">http://dnasubway.iplantcollaborative.org/</ext-link>
) or to capture field data using services such as provided by Artportalen (
<ext-link ext-link-type="uri" xlink:href="http://www.artportalen.se/default.asp">http://www.artportalen.se/default.asp</ext-link>
) or eBird (
<ext-link ext-link-type="uri" xlink:href="www://ebird.org">www://ebird.org</ext-link>
). An increasing number of databases are providing web services to mobilize data and new tools for visualizing data (e.g. GeoPhyloBuilder,
<ext-link ext-link-type="uri" xlink:href="https://www.nescent.org/sites/evoviz/GeoPhyloBuilder">https://www.nescent.org/sites/evoviz/GeoPhyloBuilder</ext-link>
,
<xref ref-type="bibr" rid="B56">Kidd and Liu 2008</xref>
). Data processing and management pipelines such as Kepler (
<ext-link ext-link-type="uri" xlink:href="https://kepler-project.org/">https://kepler-project.org/</ext-link>
) and VisTrails (
<ext-link ext-link-type="uri" xlink:href="http://www.vistrails.org/index.php/Main_Page">http://www.vistrails.org/index.php/Main_Page</ext-link>
) are emerging. Yet, for these changes to dominate across the breadth of the discipline and influence the full life cycle of the data, researchers must feel comfortable with design and performance of software systems (
<xref ref-type="bibr" rid="B102">Stein 2008</xref>
). There must be good dialog between the biologists and computer programmers for new tools to be adopted (
<xref ref-type="bibr" rid="B62">Lee et al. 2006</xref>
). Increasingly, biologists will need to be trained in computer and information science (
<xref ref-type="bibr" rid="B102">Stein 2008</xref>
) and include archiving machine-readable data and appropriate metadata as part of their normal workflow (
<xref ref-type="bibr" rid="B112">Whitlock 2011</xref>
). Computer scientists, software engineers, and others who produce code need to develop sensitivity to biology and biological thinking if they are to provide tools that delight life scientists. </p>
<p>Scientists, especially those associated with small science, will need to be more engaged in mobilization of data than at present (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
,
<xref ref-type="bibr" rid="B42">Heidorn 2008</xref>
,
<xref ref-type="bibr" rid="B23">Costello 2009</xref>
,
<xref ref-type="bibr" rid="B100">Smith 2009</xref>
). Many scientists do share specific data sets with close colleagues (
<xref ref-type="bibr" rid="B95">Science staff editorial 2011</xref>
), yet are insufficiently incentivized to share their data openly. In part, they perceive the risks of making data available as outweighing the rewards (
<xref ref-type="bibr" rid="B86">Porter and Callahan 1994</xref>
,
<xref ref-type="bibr" rid="B55">Key Perspectives Ltd 2010</xref>
). This is despite the fact that papers with openly available data gain more citations (
<xref ref-type="bibr" rid="B84">Piwowar et al. 2007</xref>
). While there are communal repositories for sub-disciplines other than molecular, such as Global Biodiversity Information Facility and Ocean Biogeographic Information System for occurrences data, the majority of sub-disciplines lack appropriate communal repositories. </p>
<p>
<bold>Publishers.</bold>
Publishers of scientific journals are increasingly involved in data management (
<xref ref-type="bibr" rid="B113">Whitlock et al. 2010</xref>
). Publishers may provide the same services for data that they provide for manuscripts (i.e. peer review, citability, etc.
<xref ref-type="bibr" rid="B108">Vision 2010</xref>
). Some journals require deposition of data as a condition of publication. An example is the joint data archiving policy (JDAP,
<ext-link ext-link-type="uri" xlink:href="http://datadryad.org/jdap">http://datadryad.org/jdap</ext-link>
). JDAP has grown from its original consortium of evolution and ecology journals to include more than a dozen journals (
<xref ref-type="bibr" rid="B108">Vision 2010</xref>
). Dryad (
<ext-link ext-link-type="uri" xlink:href="http://datadryad.org/">http://datadryad.org/</ext-link>
;
<xref ref-type="bibr" rid="B111">White et al. 2008</xref>
), GenBank (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genbank/">http://www.ncbi.nlm.nih.gov/genbank/</ext-link>
;
<xref ref-type="bibr" rid="B8">Bilofsky and Christian 1988</xref>
), Protein Data Bank (
<ext-link ext-link-type="uri" xlink:href="http://www.wwpdb.org">http://www.wwpdb.org</ext-link>
;
<xref ref-type="bibr" rid="B6">Berman et al. 2006</xref>
) and TAIR (
<ext-link ext-link-type="uri" xlink:href="http://www.arabidopsis.org/">http://www.arabidopsis.org/</ext-link>
;
<xref ref-type="bibr" rid="B90">Rhee et al. 2003</xref>
) are examples of repositories that benefit from deposition requirements from publishers. Publishers historically controlled the dissemination of the
<pmc-comment>PageBreak</pmc-comment>
narrative. Some limit access to articles while others, such as PLoS (
<ext-link ext-link-type="uri" xlink:href="http://www.plosbiology.org/static/help.action#xmlContent">http://www.plosbiology.org/static/help.action#xmlContent</ext-link>
)and Pensoft (
<ext-link ext-link-type="uri" xlink:href="http://www.pensoft.net/journals.php">http://www.pensoft.net/journals.php</ext-link>
) have moved to an open-access model. Although some publishers (
<ext-link ext-link-type="uri" xlink:href="http://www.articleofthefuture.com/">http://www.articleofthefuture.com/</ext-link>
,
<xref ref-type="bibr" rid="B116">Ziegler et al. 2011</xref>
) are experimenting with enhanced publication to allow researchers to share data sets, illustrations and audio files, we may presume that a publisher-driven model for data sharing is likely to incur charges for access to or submission of data. Many scientists feel this is inappropriate (
<xref ref-type="bibr" rid="B55">Key Perspectives Ltd 2010</xref>
). A model is offered by Thomson Reuters BIOSIS that indexes more than half a million Life Sciences abstracts yearly (
<ext-link ext-link-type="uri" xlink:href="http://thomsonreuters.com/content/science/pdf/BIOSIS_Factsheet.pdf">http://thomsonreuters.com/content/science/pdf/BIOSIS_Factsheet.pdf</ext-link>
). They are compiling metadata such as organism names and Enzyme Commission numbers that can be used to discover sources, and the publisher charges for its discovery services. </p>
<p>
<bold>Funding agencies.</bold>
Funding agencies worldwide have been called upon to finance informatics research and to promote tools and digital libraries that will underpin the shift towards a Big New Biology paradigm (
<xref ref-type="bibr" rid="B43">Hey et al. 2009</xref>
;
<xref ref-type="bibr" rid="B73">National Academy of Sciences 2009</xref>
). Funding agencies are accountable to the public and to the government (e.g.
<xref ref-type="bibr" rid="B20">Coburn 2011</xref>
). Data cost money and the reuse of data represents a better return for each research dollar invested (
<xref ref-type="bibr" rid="B85">Piwowar et al. 2011</xref>
). In recognition of the importance of data sharing to their investment, funding agencies are increasingly imposing data-sharing requirements on their researchers (
<xref ref-type="table" rid="T1">Table 1</xref>
). Yet, many funding agencies, especially outside the US and Europe, do not have data policies or plans to make data available. Of those that do, many require scientists to submit data management plans as a part of their proposals. The plans are designed to explain where data will be deposited, under what terms data may be accessed, and what standards will be used. Many agencies believe in open access to data at the end of a project and have specific timelines for data release. They often acknowledge that the data provider will have a period of exclusive “right of first use” of data. </p>
<table-wrap orientation="portrait" id="T1" position="float">
<label>Table 1.</label>
<caption>
<p>List of funding agencies and characteristics of their data policies</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<th rowspan="1" colspan="1">
<bold>Funding Agency</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Country</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Policy</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Data Management Plan</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Deposit</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Standards Compliant</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Attribution</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Local Archive</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Open Source</bold>
</th>
<th rowspan="1" colspan="1">
<bold>QA/QC</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Confidentiality</bold>
</th>
<th rowspan="1" colspan="1">
<bold>IPR/Licensing</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Metadata Deposit</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Provides Data for Free</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Free Access to Publications</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Notes</bold>
</th>
</tr>
<tr>
<td rowspan="1" colspan="1">Gordon and Betty Moore Foundation</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://moore.org/docs/GBMF_Data%20Sharing%20Philosophy%20and%20Plan.pdf">http://moore.org/docs/GBMF_Data%20Sharing%20Philosophy%20and%20Plan.pdf</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Genome Canada</td>
<td rowspan="1" colspan="1">Canada</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.genomecanada.ca/medias/PDF/EN/DataReleaseandResourceSharingPolicy.pdf">http://www.genomecanada.ca/medias/PDF/EN/DataReleaseand ResourceSharingPolicy.pdf</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Data must be made available no later than the publication date or the date the patent has been filed (which ever comes first) at the end of the project</td>
</tr>
<tr>
<td rowspan="1" colspan="1">National Institutes of Health</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://grants.nih.gov/grants/policy/data_sharing/">http://grants.nih.gov/grants/policy/data_sharing/</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Applies to projects requesting > $500,000, data must be released no later than the acceptance of publication of the main findings from the final data set</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Biotechnology and Biological Sciences Research Council</td>
<td rowspan="1" colspan="1">UK</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.bbsrc.ac.uk/publications/policy/data_sharing_policy.html">http://www.bbsrc.ac.uk/publications/policy/data_sharing_policy.html</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">data release no later than publication or within 3 years of generation, Researchers are expected to ensure data availability for 10 years after completion of project</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Natural Environment Research Council</td>
<td rowspan="1" colspan="1">UK</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.nerc.ac.uk/research/sites/data/policy.asp">http://www.nerc.ac.uk/research/sites/data/policy.asp</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Data must be made available within 2 years from the end of data collection</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Wellcome Trust</td>
<td rowspan="1" colspan="1">UK</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.welcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm">http://www.welcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Department of Energy</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://genomicsgtl.energy.gov/datasharing">http://genomicsgtl.energy.gov/datasharing</ext-link>
</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Requires deposit of 1) protocols 2) raw data 3) other relevant materials no later than 3 months after publication</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Chinese Academy of Sciences</td>
<td rowspan="1" colspan="1">China</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://english.cas.cn/">http://english.cas.cn/</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Requires deposit or no further funding</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Australian Research Council</td>
<td rowspan="1" colspan="1">Australia</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.arc.gov.au/default.htm">http://www.arc.gov.au/default.htm</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">No policy</td>
</tr>
<tr>
<td rowspan="1" colspan="1">National Science Foundation</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Austrian Science Fund</td>
<td rowspan="1" colspan="1">Austria</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.fwf.ac.at/en/public_relations/oai/index.html">http://www.fwf.ac.at/en/public_relations/oai/index.html</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1">Data must be available no more than 2 years after end of project</td>
</tr>
<tr>
<td rowspan="1" colspan="1">NASA</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/">http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Data can be embargoed for 2 years</td>
</tr>
<tr>
<td rowspan="1" colspan="1">NOAA</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ncdc.noaa.gov/oa/about/open-access-climate-data-policy.pdf">http://www.ncdc.noaa.gov/oa/about/open-access-climate-data-policy.pdf</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Council for Scientific and Industrial Research</td>
<td rowspan="1" colspan="1">India</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://rdpp.csir.res.in/csir_acsir/Home.aspx">http://rdpp.csir.res.in/csir_acsir/Home.aspx</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Plan being developed in 2010</td>
</tr>
<tr>
<td rowspan="1" colspan="1">North Pacific Research Board</td>
<td rowspan="1" colspan="1">US</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.nprb.org/projects/metadata.html">http://www.nprb.org/projects/metadata.html</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">×</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">Data must be transferred to NPRB by the end of the project</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Japan Science and Technology Agency</td>
<td rowspan="1" colspan="1">Japan</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.jst.go.jp/EN/index.html">http://www.jst.go.jp/EN/index.html</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">None</td>
</tr>
<tr>
<td rowspan="1" colspan="1">National Research Foundation</td>
<td rowspan="1" colspan="1">South Africa</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.nrf.ac.za/">http://www.nrf.ac.za/</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1"></td>
<td rowspan="1" colspan="1">None</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>
<bold>Governments.</bold>
The realization of a Big New Biology will require significant investment in and reorganization of technical and human infrastructure, the creation of new agencies, new policies and implementation frameworks, as well as national and transnational coordination. The scale of these developments will require governmental and intergovernmental participation. Issues that require high-level attention are illustrated by the OECD report that established GBIF (
<xref ref-type="bibr" rid="B79">OECD 1999</xref>
). GBIF has now about 60 national participants and influences national agendas. Especially relevant is the commitment to data sharing with its Suwon declaration (
<ext-link ext-link-type="uri" xlink:href="http://www2.gbif.org/SignedSUWONdeclaration_small.pdf">http://www2.gbif.org/SignedSUWONdeclaration_small.pdf</ext-link>
). This underscores the importance of data sharing to science, conservation and sustainability. INSDC, which collates the sharing of molecular data via the US-based NCBI Genbank, the European EMBL, and the Japanese DDBJ, is another example of international informatics initiatives in the Life Sciences (
<ext-link ext-link-type="uri" xlink:href="http://www.insdc.org/policy.html">http://www.insdc.org/policy.html</ext-link>
). </p>
<p>Several countries have established governmental digital data environments inclusive of the data.gov environments (
<ext-link ext-link-type="uri" xlink:href="http://www.data.gov/">http://www.data.gov/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://data.australia.gov.au/">http://data.australia.gov.au/</ext-link>
, data.gov.uk), or more specialist agencies such as Conabio in Mexico (
<ext-link ext-link-type="uri" xlink:href="http://www.conabio.gob.mx/">http://www.conabio.gob.mx/</ext-link>
)
<pmc-comment>PageBreak</pmc-comment>
<pmc-comment>PageBreak</pmc-comment>
<pmc-comment>PageBreak</pmc-comment>
, ABRS, ERIN and ALA in Australia (
<ext-link ext-link-type="uri" xlink:href="http://www.environment.gov.au/biodiversity/abrs/">http://www.environment.gov.au/biodiversity/abrs/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.environment.gov.au/erin/">http://www.environment.gov.au/erin/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.ala.org.au/">http://www.ala.org.au/</ext-link>
), ITIS in US (
<ext-link ext-link-type="uri" xlink:href="http://www.itis.gov/">http://www.itis.gov/</ext-link>
) or the European Environment Agency (
<ext-link ext-link-type="uri" xlink:href="http://www.eea.europa.eu/data-and-maps">http://www.eea.europa.eu/data-and-maps</ext-link>
). </p>
<p>In respect to the economics at this level, OECD, when establishing GBIF, compared the cost of the molecular informatics infrastructure (millions of dollars) against the benefits to pharmaceutical, health and agricultural businesses worth billions of dollars (
<xref ref-type="bibr" rid="B79">OECD 1999</xref>
). The costs of international cooperation on biodiversity informatics must be set against the estimated economic value of the world’s natural capital of tens of trillions (millions of millions) of dollars (
<xref ref-type="bibr" rid="B22">Costanza et al. 1997</xref>
;
<xref ref-type="bibr" rid="B104">TEEB 2010</xref>
). The OECD estimates costs of sustaining infrastructure to be 25% of the costs of generating raw data. Yet, an allocation of as little as 5% of research funding could provide billions of dollars for data preservation (
<xref ref-type="bibr" rid="B94">Schofield et al. 2010</xref>
). </p>
<p>
<bold>Universities.</bold>
With in excess of 20,000 universities (and institutions modeled on Universities) worldwide (Webometrics Ranking of World Universities;
<ext-link ext-link-type="uri" xlink:href="http://www.webometrics.info/methodology.html">http://www.webometrics.info/methodology.html</ext-link>
), employing an estimated 5–10 million academics and associated researchers, universities form the largest research and development initiative. Collectively, Universities are a significant source of new data and given their international communal character, will be important as consumers of the data pool. The support, infrastructure and services that Universities provide will be a major determinant of the flow and fate of data. Some environments, such as the SURF foundation (
<ext-link ext-link-type="uri" xlink:href="http://www.surffoundation.nl/en/actueel/Pages/Researchersenhancetheirpublications.aspx">http://www.surffoundation.nl/en/actueel/Pages/Researchersenhancetheirpublications.aspx</ext-link>
) seek to unite research institutes through the application of new technologies. SURF serves the Dutch context and currently emphasizes 5 disciplines; Life Sciences are not included. </p>
<p>Universities may or may not regard themselves as owners (having IP rights) of data and so may regulate access to data generated in-house or as part of collaborative projects. Universities may or may not have policies that require the retention of research data for a limited period usually in the range of 3 to 7 years. The University of Melbourne policy is based on guidelines from the National Health and Medical Research Council/Australian Vice Chancellors’ Committee and specifies that “Data must be recorded in a durable and appropriately referenced form” for a minimum of 5 years (
<ext-link ext-link-type="uri" xlink:href="http://www.unimelb.edu.au/records/research.html">http://www.unimelb.edu.au/records/research.html</ext-link>
). The Chinese University of Hong Kong encourages researchers to deposit their data in the University Service Center upon completion of their research (
<ext-link ext-link-type="uri" xlink:href="http://www.usc.cuhk.edu.hk/Eng/SharingPolicy.aspx">http://www.usc.cuhk.edu.hk/Eng/SharingPolicy.aspx</ext-link>
). US universities are bound to comply with the requirements of OMB Circular A-110 (Uniform Administrative Requirements for grants and agreements with Institutions of Higher Education, Hospitals, and Other Non-Profit Organizations –
<ext-link ext-link-type="uri" xlink:href="http://www.whitehouse.gov/omb/circulars_a110">http://www.whitehouse.gov/omb/circulars_a110</ext-link>
). This specifies that financial records, supporting documents, statistics, and all other records produced in connection with a financial award, including laboratory data and primary data
<italic>are to be retained by the</italic>
<italic>institution</italic>
for a specified period. OMB A-110 also states “The Federal awarding agency(ies) reserve a royalty-free, nonexclusive and irrevocable right to reproduce,
<pmc-comment>PageBreak</pmc-comment>
publish, or otherwise use the work for Federal purposes, and to authorize others to do so.” Many universities have data policies that target administrative data and administrative agenda rather than on promoting the use of data for academic purposes (e.g. “(This) University must retain research data in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research” –
<ext-link ext-link-type="uri" xlink:href="http://ora.ra.cwru.edu/University_Policy_On_Custody_Of_Research_Data.pdf">http://ora.ra.cwru.edu/University_Policy_On_Custody_Of_Research_Data.pdf</ext-link>
). As their policies improve, Universities will need to play a significant role in educating staff and students as to the value of data. They will be the focus of reshaping the skill base on which the Big New Biology will rely (
<xref ref-type="bibr" rid="B28">Doom et al. 2002</xref>
). New trans-discipline curricula will ensure that biologists gain informatics skills and that computer scientists develop sensitivity to the challenges and needs in Biology. </p>
<p>
<bold>Museums and herbaria.</bold>
Museums and herbaria play special roles within the Life Sciences. Along with libraries, they have a mandate for the long-term preservation of materials. Those materials include several billion specimens of plants, animals and fossils collected by biologists over 3 centuries (
<xref ref-type="bibr" rid="B14">Chapman 2005a</xref>
;
<xref ref-type="bibr" rid="B79">OECD 1999</xref>
;
<xref ref-type="bibr" rid="B109">Vollmar et al. 2010</xref>
). Those collections provide invaluable information as to changing distributions of species, provide access to extinct species, and inform research into defining species. They have special value in some phenomena that motivate the agenda for Big New Biology, such as distribution of invasive species, consequences of deforestation, and so on.
<xref ref-type="bibr" rid="B14">Chapman (2005a)</xref>
provides an exhaustive treatment of potential and actual value of primary biodiversity records. </p>
<p>
<bold>Citizen scientists.</bold>
Citizen scientists are non-professionals who participate in scientific activities. The appealing richness of nature, its accessibility, and our reliance on natural resources ensures that biology attracts an especially high participation by the citizenry (
<xref ref-type="bibr" rid="B97">Silvertown 2009</xref>
). The academic skills of citizen scientists cover a massive spectrum, from those with casual interests in nature or science to individuals who publish in the scientific literature. The tens of millions of birders in the US (
<xref ref-type="bibr" rid="B54">Kerlinger 1993</xref>
) translates to more than 100 million worldwide. The number of recreational fishermen in marine waters approaches that of birdwatchers (
<xref ref-type="bibr" rid="B2">Arlinghaus and Cooke 2009</xref>
;
<xref ref-type="bibr" rid="B18">Cisneros-Montemayor and Sumaila 2010</xref>
), and an estimated 500 million people have livelihoods attached to fishing (
<ext-link ext-link-type="ftp" xlink:href="ftp://ftp.fao.org/FI/brochure/climate_change/policy_brief.pdf">ftp://ftp.fao.org/FI/brochure/climate_change/policy_brief.pdf</ext-link>
). That suggests that the potential citizen scientist community exceeds 1 billion people. This remarkable pool can be called upon to add the “sightings” (occurrence of a given species at a particular location at a particular time) which can be used to monitor the changing distributions and abundances of endemic and invasive species. The Swedish ArtPortalen (
<ext-link ext-link-type="uri" xlink:href="http://www.artportalen.se/default.asp">http://www.artportalen.se/default.asp</ext-link>
) has in 10 years compiled more than 26 million sightings at a rate of about 10,000 per day, illustrating the irreplaceable role of the citizen scientist. Several mobile phone apps exist that allow naturalists to record species occurrences in the field (BirdsEye from eBird,
<ext-link ext-link-type="uri" xlink:href="http://www.getbirdseye.com/andObserverfromWildObs">http://www.getbirdseye.com/ and Observer from WildObs</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://wildobs.com/about/observer">http://wildobs.com/about/observer</ext-link>
).
<pmc-comment>PageBreak</pmc-comment>
Data on occurrences, or of the first occurrences of flowering or appearance of migratory species, can be called on to test scientific hypotheses as to the impact of climate change on the biosphere. Citizen scientists are significant monitors of endangered species – providing the first evidence that some presumed-extinct species, such as the coelocanth (
<ext-link ext-link-type="uri" xlink:href="http://www.extinctanimal.com/the_coelacanth.htm">http://www.extinctanimal.com/the_coelacanth.htm</ext-link>
), Wollemi pine (
<ext-link ext-link-type="uri" xlink:href="http://www.wolganvalley.com/pdf/wolgan-valley/en/media-centre/fact-sheets/Wolgan%20Valley%20Wollemi%20Pine%20Fact%20Sheet.pdf?1=6">http://www.wolganvalley.com/pdf/wolgan-valley/en/media-centre/fact-sheets/Wolgan%20Valley%20Wollemi%20Pine%20Fact%20Sheet.pdf?1=6</ext-link>
), ivory-billed woodpecker (
<ext-link ext-link-type="uri" xlink:href="http://www.cryptomundo.com/cryptozoo-news/ibw-rainsong/">http://www.cryptomundo.com/cryptozoo-news/ibw-rainsong/</ext-link>
), Lord Howe Island stick insect (
<ext-link ext-link-type="uri" xlink:href="http://www.kidcyber.com.au/topics/Lordhowestick.htm">http://www.kidcyber.com.au/topics/Lordhowestick.htm</ext-link>
) and mountain pygmy possum (
<ext-link ext-link-type="uri" xlink:href="http://animaldiversity.ummz.umich.edu/site/accounts/information/Burramys_parvus.html">http://animaldiversity.ummz.umich.edu/site/accounts/information/Burramys_parvus.html</ext-link>
) are still with us. </p>
<p>
<bold>Repositories
<italic>.</italic>
</bold>
A repository provides services for management and dissemination of data inclusive of, ideally, making data discoverable, providing access, protecting the integrity of the data, ensuring long term preservation and migrating to new technologies (
<xref ref-type="bibr" rid="B63">Lynch 2003</xref>
). Most repositories typically handle a specific data type at a particular granularity. Thousands of repositories already exist for managing Life Sciences data and hold tens of millions of items (
<xref ref-type="table" rid="T2">Table 2</xref>
; see
<xref ref-type="bibr" rid="B51">Jones et al. 2006</xref>
, repository66.org and
<ext-link ext-link-type="uri" xlink:href="http://datacite.org/repolist">http://datacite.org/repolist</ext-link>
for more). However, it is estimated that less than 1% of ecology data is captured in this way (Reichmanet al. 2011).Some sub-disciplines do not have repositories and the volume of data in some fields has led even exemplar repositories such as GenBank to question their capacity to host all data (
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/About/news/16feb2011">http://www.ncbi.nlm.nih.gov/About/news/16feb2011</ext-link>
;
<ext-link ext-link-type="uri" xlink:href="http://phylogenomics.blogspot.com/2011/06/sequenceshort-read-archive-sra-back.html">http://phylogenomics.blogspot.com/2011/06/sequenceshort-read-archive-sra-back.html</ext-link>
). </p>
<table-wrap orientation="portrait" id="T2" position="float">
<label>Table 2.</label>
<caption>
<p>Examples of repositories for Life Sciences data.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<th rowspan="1" colspan="1">
<bold>Repository</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Type of Life Sciences Data</bold>
</th>
<th rowspan="1" colspan="1">
<bold>location</bold>
</th>
</tr>
<tr>
<td rowspan="1" colspan="1">AlgaeBase</td>
<td rowspan="1" colspan="1">algae names and references</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.algaebase.org/">http://www.algaebase.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ArrayExpress</td>
<td rowspan="1" colspan="1">microarray</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/arrayexpress/">http://www.ebi.ac.uk/arrayexpress/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Australia National Data Service</td>
<td rowspan="1" colspan="1">general research data</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ands.org.au/">http://www.ands.org.au/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ConceptWiki</td>
<td rowspan="1" colspan="1">concepts</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://conceptwiki.org/index.php/Main%20Page">http://conceptwiki.org/index.php/Main%20Page</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CSIRO</td>
<td rowspan="1" colspan="1">fisheries catch</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.marine.csiro.au/datacentre/">http://www.marine.csiro.au/datacentre/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Data.gov</td>
<td rowspan="1" colspan="1">natural resources data</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.data.gov/">http://www.data.gov/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<named-content content-type="taxon-name">Diptera</named-content>
database </td>
<td rowspan="1" colspan="1">Dipteran information</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.sel.barc.usda.gov/diptera/biosys.htm">http://www.sel.barc.usda.gov/diptera/biosys.htm</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">EMAGE</td>
<td rowspan="1" colspan="1">gene expression</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.emouseatlas.org/emage/">http://www.emouseatlas.org/emage/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">ENA</td>
<td rowspan="1" colspan="1">gene sequences</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/ena/">http://www.ebi.ac.uk/ena/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ensembl</td>
<td rowspan="1" colspan="1">genomes</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://uswest.ensembl.org/index.html">http://uswest.ensembl.org/index.html</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Euregene</td>
<td rowspan="1" colspan="1">renal genome</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.euregene.org/">http://www.euregene.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Eurexpress</td>
<td rowspan="1" colspan="1">transcriptome</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.eurexpress.org/ee/">http://www.eurexpress.org/ee/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">EURODEER</td>
<td rowspan="1" colspan="1">movement of roe deer</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://sites.google.com/site/eurodeerproject/home">http://sites.google.com/site/eurodeerproject/home</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">FishBase</td>
<td rowspan="1" colspan="1">fish information</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.fishbase.org/">http://www.fishbase.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GBIF</td>
<td rowspan="1" colspan="1">occurrences</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.gbif.org/">http://www.gbif.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GenBank</td>
<td rowspan="1" colspan="1">gene sequences</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/genbank/">http://www.ncbi.nlm.nih.gov/genbank/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GEO</td>
<td rowspan="1" colspan="1">microarray</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/">http://www.ncbi.nlm.nih.gov/geo/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">GNI</td>
<td rowspan="1" colspan="1">names</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://gni.globalnames.org/">http://gni.globalnames.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">INBIO</td>
<td rowspan="1" colspan="1">Costa Rican biodiversity</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.inbio.ac.cr/es/default.html">http://www.inbio.ac.cr/es/default.html</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">INSPIRE</td>
<td rowspan="1" colspan="1">spatial</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://inspire.jrc.ec.europa.eu/index.cfm">http://inspire.jrc.ec.europa.eu/index.cfm</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">KEGG</td>
<td rowspan="1" colspan="1">genes</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.genome.jp/kegg/">http://www.genome.jp/kegg/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Life Sciences Data Archive NASA</td>
<td rowspan="1" colspan="1">effects of space on humans</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://lsda.jsc.nasa.gov/">http://lsda.jsc.nasa.gov/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MassBank</td>
<td rowspan="1" colspan="1">mass spectra</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.massbank.jp/index.html?lang=en">http://www.massbank.jp/index.html?lang=en</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MGI</td>
<td rowspan="1" colspan="1">mouse</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.informatics.jax.org/">http://www.informatics.jax.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">MorphBank</td>
<td rowspan="1" colspan="1">images</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.morphbank.net/">http://www.morphbank.net/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">OBIS</td>
<td rowspan="1" colspan="1">occurrences</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.iobis.org/">http://www.iobis.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">OMIM</td>
<td rowspan="1" colspan="1">human genes and phenotypes</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/omim">http://www.ncbi.nlm.nih.gov/omim</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">PDB</td>
<td rowspan="1" colspan="1">molecule structure</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.pdb.org/pdb/home/home.do">http://www.pdb.org/pdb/home/home.do</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">PRIDE</td>
<td rowspan="1" colspan="1">proteomics</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/pride/">http://www.ebi.ac.uk/pride/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">PubMed</td>
<td rowspan="1" colspan="1">citations</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/pubmed/">http://www.ncbi.nlm.nih.gov/pubmed/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Stanford Microarray Database</td>
<td rowspan="1" colspan="1">microarray</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://smd.stanford.edu/">http://smd.stanford.edu/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">tair</td>
<td rowspan="1" colspan="1">Arabidopsis molecular biology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.arabidopsis.org/">http://www.arabidopsis.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TOPP</td>
<td rowspan="1" colspan="1">animal tagging</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.topp.org/topp_census">http://www.topp.org/topp_census</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TreeBase</td>
<td rowspan="1" colspan="1">phylogenetic trees</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.treebase.org/">http://www.treebase.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TROPICOS</td>
<td rowspan="1" colspan="1">plant specimens</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.tropicos.org/">http://www.tropicos.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">UniProt</td>
<td rowspan="1" colspan="1">protein sequence and function</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.uniprot.org/">http://www.uniprot.org/</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">WILDSPACE</td>
<td rowspan="1" colspan="1">life history information</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://wildspace.ec.gc.ca/more-e.html">http://wildspace.ec.gc.ca/more-e.html</ext-link>
</td>
</tr>
<tr>
<td rowspan="1" colspan="1">WRAM</td>
<td rowspan="1" colspan="1">wireless remote animal monitoring</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www-wram.slu.se/">http://www-wram.slu.se/</ext-link>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Repositories range in functionality from basic data stores to collaborative databases that incorporate analysis functions (WRAM, Wireless Remote Animal Monitoring, www-wram.slu.se). Some repositories host heterogeneous data sets (such as oceanographic databases –
<ext-link ext-link-type="uri" xlink:href="http://woce.nodc.noaa.gov/wdiu/">http://woce.nodc.noaa.gov/wdiu/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.nodc.noaa.gov/">http://www.nodc.noaa.gov/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.ices.dk/ocean/">http://www.ices.dk/ocean/</ext-link>
), but those that provide normalization, standardization, atomization and quality control services (see below) will facilitate the reuse of data and will play a stronger role in data-intensive science. That many older repositories are difficult to access or are not maintained (
<xref ref-type="bibr" rid="B114">Wren and Bateman 2008</xref>
) reveals the need for appropriate funding and persistence strategies. Repositories can fail as a result of policy shifts, funding instability, management issues, or technical failures (
<xref ref-type="bibr" rid="B63">Lynch 2003</xref>
). Such failures can undermine acceptance of digital scholarly work by the community at large. As data repositories become more important over time, they must be trusted to provide high quality services reliably (
<xref ref-type="bibr" rid="B94">Schofield et al. 2010</xref>
). The trustworthiness of archives can be assessed using criteria catalogues (
<xref ref-type="bibr" rid="B57">Klump 2011</xref>
) available from organizations like the Digital Curation Center (
<xref ref-type="bibr" rid="B48">Innocenti et al. 2007</xref>
) and the International Standards Organization (
<xref ref-type="bibr" rid="B49">ISO 2000</xref>
). The Center for Research Libraries has assembled a list of ten principles for data repositories that addresses administrative and technical concerns (
<ext-link ext-link-type="uri" xlink:href="http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re">http://www.crl.edu/archiving-preservation/digital-archives/metrics-assessing-and-certifying/core-re</ext-link>
). </p>
</sec>
</sec>
<sec>
<title>Technological issues</title>
<p>The second array of challenges that need to be addressed as we move towards Big New Biology are technical issues that affect the distribution, preservation, accessibility and reuse of data.</p>
<sec sec-type="Making data accessible">
<title>Making data accessible</title>
<p>The effective reuse of data requires that an array of conditions (
<xref ref-type="fig" rid="F2">Fig. 2</xref>
) is optimized. </p>
<fig id="F2" orientation="portrait" position="float">
<label>Figure 2.</label>
<caption>
<p>A Big New Biology can only emerge with a framework that optimizes reuse. Ideally, data should be in forms that can flow from source into a common pool and can flow back out to consumers, be subject to quality control, or be enhanced through analysis to rejoin the pool as processed data.</p>
</caption>
<graphic xlink:href="ZooKeys-150-017-g002"></graphic>
</fig>
<p>
<bold>Data need to be retained.</bold>
Relatively few data acquired historically have been retained in an accessible form by scientists, projects or institutions (
<xref ref-type="bibr" rid="B87">Pullin and Salafsky 2010</xref>
). The culture of disposing of data following publication, termination of a grant, relocation or retirement of a scientist is clearly incompatible with the vision of a data-centric biology. While work practices in some areas, such as those in which data are born digital, or institutions with a strong tradition of preserving records, include data retention or their submission to a repository, much of the small biology lacks such a culture (
<xref ref-type="bibr" rid="B55">Key Perspectives Ltd 2010</xref>
). There is as yet an unresolved debate as to whether all data should be retained, or if subsets of data should be selected for retention, or if retained data should be subject to periodic review for deaccessioning. </p>
<p>
<bold>Data need to be digital.</bold>
Digitization is a prerequisite for data mobility. Considerable amounts of relevant data are not yet in a digital format (
<xref ref-type="bibr" rid="B17">Chavan and Krishnan 2001</xref>
;
<xref ref-type="bibr" rid="B109">Vollmar et al. 2010</xref>
;
<xref ref-type="bibr" rid="B94">Schofield et al. 2010</xref>
;
<xref ref-type="bibr" rid="B42">Heidorn 2008</xref>
). Non-digital formats include notes, books, photographs and micrographs, papers, and specimens. The Biodiversity Heritage Library and similar projects are now in the process of digitizing some half billion pages of biology text (
<xref ref-type="bibr" rid="B39">Gwinn and Rinaldo 2009</xref>
). Digital metadata about non-digital materials have value as they make the data discoverable and increase incentives for digitization. </p>
<p>
<bold>Data need to be structured.</bold>
Digital data may be unstructured (e.g. in the form of free text or an image) or they may be structured into categories that are represented consecutively or periodically through the use of a template, spreadsheet or database. The simple structure of a spreadsheet allows records to be represented as rows. Data occur within the cells formed by the intersection of rows and columns defined by metadata (headers). A source may mix both structured and unstructured data such as when fields include free-form text, images, or atomic data. Unstructured data, such as the legacy data to be found in an estimated 500 million pages of text, can be improved through annotation with metadata provided by curators or through tools such as natural language processing tools. </p>
<p>
<bold>Data should be normalized.</bold>
Normalization brings information contained within different structures to the same format (or structure). Normalization may be as simple as consistently using one type of unit. Placing data within a template is a common first step to normalization. Normalization is a prerequisite for aggregating data. When data are structured and normalized, they can be mobilized in simple formats (tab delimited or comma delimited text files) or can be transformed into other structures to meet agreed upon standards. DiGIR is an early example of a data transformation tool (
<ext-link ext-link-type="uri" xlink:href="http://digir.sourceforge.net/">http://digir.sourceforge.net/</ext-link>
). More contemporary tools, such as TAPIR or IPT from GBIF (
<ext-link ext-link-type="uri" xlink:href="http://ipt.gbif.org/">http://ipt.gbif.org/</ext-link>
) can output data in an array of normalized forms. </p>
<p>
<bold>Data should be standardized.</bold>
Standardization indicates compliance with a widely accepted mode of normalizing. Standards provide terms that define data and relationships among categories of data. Two basic types of standards that are indispensable for management of biological data are metadata and ontologies. Organizations such as TDWG develop new standards, and catalogs of standards and ontologies are available on the web (
<ext-link ext-link-type="uri" xlink:href="http://otter.oerc.ox.ac.uk/biosharing/?q=standards">http://otter.oerc.ox.ac.uk/biosharing/?q=standards</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://wg.sti2.org/semtech-onto/index.php/The_Ontology_Yellow_Pages">http://wg.sti2.org/semtech-onto/index.php/The_Ontology_Yellow_Pages</ext-link>
). </p>
<p>Metadata are terms that define data in ways that may serve different purposes, such as helping people to find data of relevance (that is they aid the discovery of data -
<xref ref-type="bibr" rid="B70">Michener 2006</xref>
), or allow data to be drawn together (federated). Metadata standards define how data should be named and structured, thus reducing the heterogeneity of terms. Standards may mandate the types of metadata that are appropriate for different types of data. Sets of metadata terms agreed upon by a community are referred to as controlled vocabularies, one of the most extensive bearing on the Life Sciences is the Ecological Metadata Language (EML;
<xref ref-type="bibr" rid="B33">Fergraus et al. 2005</xref>
). Scientific names are argued by some as having the potential to act as an extensive system of metadata (
<xref ref-type="bibr" rid="B82">Patterson et al. 2010</xref>
; See discussion below). </p>
<p>
<pmc-comment>PageBreak</pmc-comment>
By articulating what metadata should be applied and how they should be formatted, standards introduce the consistency that is needed for interoperability and machine reasoning. For example, a marine bacterial RNA sequence collected from the environment ideally might be accompanied by metadata on location (latitude, longitude, depth), environmental parameters, collection metadata (collection event, date of collection, sampling device), and an identifier for the bacterium. Without such metadata, the scope of possible queries is much reduced. Examples of minimum reporting requirements have been established by the MIBBI project (
<xref ref-type="bibr" rid="B103">Taylor et al. 2008</xref>
). Numerous metadata guides are available within Life Sciences (
<xref ref-type="table" rid="T3">Table 3</xref>
). There are software programs available to assist in the collection and organization of metadata (such as Morpho,
<ext-link ext-link-type="uri" xlink:href="http://knb.ecoinformatics.org/morphoportal.jsp">http://knb.ecoinformatics.org/morphoportal.jsp</ext-link>
<xref ref-type="bibr" rid="B44">Higgins et al. 2002</xref>
; Metacat,
<ext-link ext-link-type="uri" xlink:href="http://knb.ecoinformatics.org/software/metacat/">http://knb.ecoinformatics.org/software/metacat/</ext-link>
,
<xref ref-type="bibr" rid="B50">Jones et al. 2002</xref>
; MERMAid,
<ext-link ext-link-type="uri" xlink:href="http://www.ncddc.noaa.gov/metadataresource/metadata-tools">http://www.ncddc.noaa.gov/metadataresource/metadata-tools</ext-link>
). </p>
<table-wrap orientation="portrait" id="T3" position="float">
<label>Table 3.</label>
<caption>
<p>Examples of standards and their location.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<th rowspan="1" colspan="1">
<bold>Standard</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Location</bold>
</th>
<th rowspan="1" colspan="1">
<bold>Type</bold>
</th>
</tr>
<tr>
<td rowspan="1" colspan="1">ABCD</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.bgbm.org/TDWG/CODATA/Schema/default.htm">http://www.bgbm.org/TDWG/CODATA/Schema/default.htm</ext-link>
</td>
<td rowspan="1" colspan="1">Schema</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Bioontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.bioontology.org/">http://www.bioontology.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology Repository</td>
</tr>
<tr>
<td rowspan="1" colspan="1">BIRN </td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.birncommunity.org/">http://www.birncommunity.org/</ext-link>
</td>
<td rowspan="1" colspan="1"></td>
</tr>
<tr>
<td rowspan="1" colspan="1">Cardiac Electrophysiology Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://bioportal.bioontology.org/ontologies/39038">http://bioportal.bioontology.org/ontologies/39038</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">CMECS</td>
<td rowspan="1" colspan="1">Coastal and marine ecological classification standard
<ext-link ext-link-type="uri" xlink:href="http://www.csc.noaa.gov/benthic/cmecs/cmecs_doc.pdf">http://www.csc.noaa.gov/benthic/cmecs/cmecs_doc.pdf</ext-link>
</td>
<td rowspan="1" colspan="1">Vocabulary</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Comparative Data Analysis ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/apps/mediawiki/cdao/index.php?title=Main_Page">http://sourceforge.net/apps/mediawiki/cdao/index.php?title=Main_Page</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Darwin Core</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://wiki.tdwg.org/twiki/bin/view/DarwinCore/">http://wiki.tdwg.org/twiki/bin/view/DarwinCore/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Dublin Core</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://dublincore.org/">http://dublincore.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ecological Metdata Language</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://knb.ecoinformatics.org/software/eml/">http://knb.ecoinformatics.org/software/eml/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Environment Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.environmentontology.org/">http://www.environmentontology.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Evolution Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://code.google.com/p/evolution-ontology/">http://code.google.com/p/evolution-ontology/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Experimental Factor Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/efo/">http://www.ebi.ac.uk/efo/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Federal Geospatial Data Committee</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.fgdc.gov/">http://www.fgdc.gov/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Fungal Anatomy</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.yeastgenome.org/fungi/fungal_anatomy_ontology/">http://www.yeastgenome.org/fungi/fungal_anatomy_ontology/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Gene Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org/">http://www.geneontology.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Homology Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://bioportal.bioontology.org/ontologies/42117">http://bioportal.bioontology.org/ontologies/42117</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">HUPO</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.psidev.info/index.php?q=node/159">http://www.psidev.info/index.php?q=node/159</ext-link>
</td>
<td rowspan="1" colspan="1">Vocabulary</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Infectious Disease ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.infectiousdiseaseontology.org/Home.html">http://www.infectiousdiseaseontology.org/Home.html</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">International Standards Organization</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.iso.org">http://www.iso.org</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Marine Metadata Interoperability </td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://marinemetadata.org/">http://marinemetadata.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Miriam </td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/miriam/main/datatypes/">http://www.ebi.ac.uk/miriam/main/datatypes/</ext-link>
</td>
<td rowspan="1" colspan="1">Vocabulary</td>
</tr>
<tr>
<td rowspan="1" colspan="1">National Biodiversity Information Infrastructure</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.nbii.gov/portal/community/Communities/NBII_Home/">http://www.nbii.gov/portal/community/Communities/NBII_Home/</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Ontology of Microbial Phenotypes</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/microphenotypes/">http://sourceforge.net/projects/microphenotypes/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Open Biological and Biomedical Ontologies</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.obofoundry.org/">http://www.obofoundry.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology Repository</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Phenotype Quality Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://obofoundry.org/wiki/index.php/PATO:Main_Page">http://obofoundry.org/wiki/index.php/PATO:Main_Page</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Plant Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.plantontology.org/">http://www.plantontology.org/</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
<tr>
<td rowspan="1" colspan="1">SDD</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://wiki.tdwg.org/twiki/bin/view/SDD/Version1dot1">http://wiki.tdwg.org/twiki/bin/view/SDD/Version1dot1</ext-link>
</td>
<td rowspan="1" colspan="1">Schema</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Species Profile Model</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://wiki.tdwg.org/SPM">http://wiki.tdwg.org/SPM</ext-link>
</td>
<td rowspan="1" colspan="1">Schema</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Taxonomic Concept Schema</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.tdwg.org/activities/tnc/tcs-schema-repository/">http://www.tdwg.org/activities/tnc/tcs-schema-repository/</ext-link>
</td>
<td rowspan="1" colspan="1">Schema</td>
</tr>
<tr>
<td rowspan="1" colspan="1">TDWG</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="http://www.bgbm.org/TDWG/acc/Referenc.htm">http://www.bgbm.org/TDWG/acc/Referenc.htm</ext-link>
</td>
<td rowspan="1" colspan="1">Metadata</td>
</tr>
<tr>
<td rowspan="1" colspan="1">Teleost Anatomy Ontology</td>
<td rowspan="1" colspan="1">
<ext-link ext-link-type="uri" xlink:href="https://www.phenoscape.org/wiki/Teleost_Anatomy_Ontology">https://www.phenoscape.org/wiki/Teleost_Anatomy_Ontology</ext-link>
</td>
<td rowspan="1" colspan="1">Ontology</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Anontology is a formal statement of relationships among concepts represented by metadata terms. Ontologies enable discovery of and reasoning on data through those relationships. Ontologies may use formal descriptive languages to define the relationships. Ontologies are regarded as having great promise (
<xref ref-type="bibr" rid="B66">Madin et al. 2007b</xref>
): “An ontology makes explicit knowledge that is usually diffusely embedded in notebooks, textbooks and journals or just held in academic memories, and therefore represents a formalization of the current state of a field. If ontologies are properly curated over the longer term, they will come to be seen as modern day (albeit terse) textbooks providing online and up-to-date biological expertise for their area. In another sense, they will provide the common standards needed for producing a strong biological framework for integrating data sets. Ontologies therefore provide the formal basis for an integrative approach to biology that complements the traditional deductive methodology” (
<xref ref-type="bibr" rid="B4">Bard and Rhee 2004</xref>
). </p>
<p>Ontologies are part of “Knowledge Organization Systems”. Those relating to biodiversity have been discussed by Morris (
<xref ref-type="bibr" rid="B72">Morris 2010</xref>
). Ontologies contribute to the semantic annotation of data and the artificial intelligence it enables. As an example, a simple search for information about the bird - robin, seeks to match some or all of character string r-o-b-i-n or to character strings in text within a data object or annotating the data object. The system cannot discriminate among data on American robins, European robins, Robin Reliant cars, Robin Wright Penn, or Robin the boy-superhero. However, if the query for “robin” is placed in the context of an ontology, such as one that declares that a context is the
<named-content content-type="taxon-name">Turdidae</named-content>
, an informed system is able to return only relevant results from appropriately annotated data. In addition to more precise searching, ontological structures allow the computer to perform inference, a form of artificial intelligence. For example, an ontology that establishes that turdidae is_a bird and wing is part_of a bird, allows the inference that an American robin has wings and that data on wings, flight, or migrations may be discoverable. Larger interconnected ontologies allow more complex inferences. </p>
<p>Many ontological structures are available for use in Life Sciences (
<xref ref-type="table" rid="T3">Table 3</xref>
). Some, such as the observational (
<ext-link ext-link-type="uri" xlink:href="http://marinemetadata.org/references/oboeontology">http://marinemetadata.org/references/oboeontology</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.nceas.ucsb.edu/ecoinfo">http://www.nceas.ucsb.edu/ecoinfo</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="https://sonet.ecoinformatics.org/">https://sonet.ecoinformatics.org/</ext-link>
) and
<pmc-comment>PageBreak</pmc-comment>
<pmc-comment>PageBreak</pmc-comment>
taxonomic ontologies (below), have broad applicability - the first within the field of ecoinformatics and the second to biodiversity informatics. Users can adopt existing structures or create their own using an ontology editor such as Protégé (
<ext-link ext-link-type="uri" xlink:href="http://protege.stanford.edu/">http://protege.stanford.edu/</ext-link>
) or OBOEdit (
<ext-link ext-link-type="uri" xlink:href="http://oboedit.org/">http://oboedit.org/</ext-link>
). The search engines, Swoogle (
<ext-link ext-link-type="uri" xlink:href="http://swoogle.umbc.edu/">http://swoogle.umbc.edu/</ext-link>
) and Sindice (
<ext-link ext-link-type="uri" xlink:href="http://sindice.com/">http://sindice.com/</ext-link>
), search over 10,000 ontologies and can return a list of those that contain a term of interest. Services such as these help users to determine if an existing ontology will meet his/her needs. Often, a user may need to use parts of existing ontologies or merge several ontologies into a single new one. Defining relationships between terms in different ontologies can be accomplished through the use of automated alignment tools such as SAMBO and KitAMO (
<xref ref-type="bibr" rid="B60">Lambrix and Tan 2008</xref>
). The development and integration of ontologies is best carried out using formal languages (such as OWL,
<ext-link ext-link-type="uri" xlink:href="http://www.w3.org/TR/owl-ref/">http://www.w3.org/TR/owl-ref/</ext-link>
) and by individuals versed in their logical foundations. The Biodiversity Information Standards (TDWG) organization (
<ext-link ext-link-type="uri" xlink:href="http://www.nhm.ac.uk/hosted_sites/tdwg/first_minutes.pdf">http://www.nhm.ac.uk/hosted_sites/tdwg/first_minutes.pdf</ext-link>
) and GBIF have been prime movers in developing organizational frameworks for biodiversity information. Unfortunately, there are competing systems of standards and not all aspects of biology have established standards. Various efforts are under way to create broad scope ontologies (
<ext-link ext-link-type="uri" xlink:href="http://www.loa-cnr.it/index.html">http://www.loa-cnr.it/index.html</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.tonesproject.org/">http://www.tonesproject.org/</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org/">http://www.geneontology.org/</ext-link>
). The promise of ontologies is as yet not fully realized as “The semantic web is littered with ontologies lacking ... data” (Joel Sachs, pers. comm.). </p>
<p>The system of latinized binomial names (such as
<italic>
<named-content content-type="taxon-name">Homo sapiens</named-content>
</italic>
) introduced for species in the mid-18th century by Linnaeus is an extensive system of potential metadata for data management in the Life Sciences. They have been used to annotate virtually every statement about any of our current catalog of 2.2 million living and extinct forms of life (
<xref ref-type="bibr" rid="B88">Raup 1991</xref>
,
<xref ref-type="bibr" rid="B16">Chapman 2009</xref>
) until quite recently. Now they are being supplemented with molecular identifiers, but at this time they are well suited to form the basis of a names-based cyberinfrastructure for Biology (
<xref ref-type="bibr" rid="B81">Patterson et al. 2008</xref>
,
<xref ref-type="bibr" rid="B82">2010</xref>
). This approach has been used for life-wide, data organization projects such as the Encyclopedia of Life (
<ext-link ext-link-type="uri" xlink:href="http://www.eol.org/">http://www.eol.org/</ext-link>
). Placement of names within hierarchical classifications offers ontological frameworks that enable data aggregation, drilling down through data sets, and browsing through data. The conversion of names into a formal ontology has been explored through projects such as ETHAN (
<ext-link ext-link-type="uri" xlink:href="http://spire.umbc.edu/ont/ethan.php">http://spire.umbc.edu/ont/ethan.php</ext-link>
). Our current understanding of biodiversity
<pmc-comment>PageBreak</pmc-comment>
and the system of names is maintained by a specialist group of 5,000–10,000 professional taxonomists worldwide (
<xref ref-type="bibr" rid="B46">Hopkins and Freckleton 2002</xref>
), who generally are unaware of the informatics potential of names as a near universal indexing system for biological data. The Global Names Architecture is a new global initiative that links names databases and associated services to deliver names-based services to end users (
<xref ref-type="bibr" rid="B82">Patterson et al. 2010</xref>
). </p>
<p>
<bold>Data will need to be atomized.</bold>
Atomization refers to the reduction of data to minimal semantic units and stands in contrast to complex data such as images or bodies of text. In atomized forms, data may exist as numerical values of variables (e.g. “length of tail: 5.3 cm”), binary statements (e.g. “chloroplasts: absent”), or as the association with metadata terms from agreed upon vocabularies (e.g. “part of lodicules of lower floret of pedicellate spikelet of tassel”;
<italic>
<named-content content-type="taxon-name">Zea mays</named-content>
</italic>
ontology ID ZEA:0015118,
<ext-link ext-link-type="uri" xlink:href="http://bioportal.bioontology.org/visualize/3294">http://bioportal.bioontology.org/visualize/3294</ext-link>
). Atomized data on the same subject can be brought together if the data are classified in a standard way. Atomization is necessary for machine-based analysis of data from one or more datasets. Many older data centers capture data as files (or packages of files) and the responsibility for extraction of data atoms falls to the user. This can be time consuming suggesting that, in the future, atomization needs to occur at or near the source of raw data, becoming part of the responsibilities of the author of the data, the software in which data are logged, or data centers that can provide services to transform data sets. </p>
<p>
<bold>Data need to be published.</bold>
Projects participating in a Big New Biology will increasingly make data visible and accessible (i.e. published). Scientists may publish data by displaying them in unstructured or structured formats on local, project, or institutional web sites; or they may seek to place data in central repositories. In science generally, over three-quarters of the published data are in local repositories (
<xref ref-type="bibr" rid="B95">Science staff editorial 2011</xref>
) which can provide few guarantees of persistence (see “Data are Archived” below). In such environments, the responsibilities for discovery of data, negotiations with copyright holders and acquisition of data lie with the consumer. This is time consuming and unlikely to be done on a large scale. Publication is better served through the use of central, domain-specific repositories because they are more likely to persist, provide better services, and offer the framework around which third-parties develop value-adding services. The molecular data environment consortium of ISNDC is a good example of this model. Only a small fraction of data are deposited in such environments (less than 10% of the science community generally -
<xref ref-type="bibr" rid="B95">Science staff editorial 2011</xref>
), with costs and absence of an organizational framework (metadata and archiving environments) being cited as reasons. </p>
<p>Publication of atomized data is essential for large scale data reuse. Data must be able to move from one computer to another in an intelligent way. As illustrated by the Global Biodiversity Information Facility (
<ext-link ext-link-type="uri" xlink:href="http://www.gbif.org/informatics/standards-and-tools/using-data/web-services/">http://www.gbif.org/informatics/standards-and-tools/using-data/web-services/</ext-link>
), scientific initiatives can add RSS feeds, web services, and APIs (Application Programming Interfaces) to their web sites to broadcast new data or to respond to requests for data. An API facilitates interaction between
<pmc-comment>PageBreak</pmc-comment>
computers in the same way that a user interface facilitates interactions between humans and computers. Without such services, data may need to be screen scraped from the web site, a process that is usually costly (because the solution for each site will differ) and, at worst, may require manual re-entry of data. A service-oriented approach is scalable but incurs overhead. They are probably best served through community repositories that can call on appropriate domain-specific knowledge. </p>
<p>
<bold>Data must be archived.</bold>
It is preferable that data, once published, are persistent (
<xref ref-type="bibr" rid="B32">Feijen 2011</xref>
). Projects, initiatives and host institutions have little incentive to preserve data for the long term as the process incurs a cost, and repositories that emerge within projects may have limited life spans (e.g. OBIS,
<ext-link ext-link-type="uri" xlink:href="http://www.iobis.org/">http://www.iobis.org/</ext-link>
). However, data archiving can be viewed as a good investment by funding agencies (
<xref ref-type="bibr" rid="B85">Piwowar et al. 2011</xref>
). Central repositories that are not dependent on short-term funding are better positioned to archive data making them persistent. The three global molecular databases that make up the International Nucleotide Sequence Database Collaboration provide an excellent example of how domain-specific repositories may operate. Because they are not funded through short-term projects, and because they mirror each other, such repositories guarantee the persistence of data, and empower scientists to develop projects that involve substantial analyses of shared data (
<xref ref-type="bibr" rid="B106">Tittensor et al. 2010</xref>
). Persistence can be assisted by institutions such as libraries and museums that specialize in the preservation of artifacts or by governmental intervention (the US-based National Institutes of Health support GenBank). An alternative solution to persistence is an effective business model that allows a data center to be sustained by income from services that it sells; or by providing essential services that ensure support from the community of users. Examples of commercial models include the Chemical Abstracts Service of the American Chemical Society (www.cas.org/) or Thomson Reuters’ Zoological Record (
<ext-link ext-link-type="uri" xlink:href="http://thomsonreuters.com/products_services/science/science_products/a-z/zoological_record/">http://thomsonreuters.com/products_services/science/science_products/a-z/zoological_record/</ext-link>
). </p>
<p>
<bold>Data will ideally be free and open.</bold>
Open Access, the principle of providing unconstrained access to information on the web, improves the uptake, usage, application and impact of research output (
<xref ref-type="bibr" rid="B40">Harnad 2008</xref>
). Open Access has been applied widely to the process of publication, where it is seen as an alternative to the model in which publishers act as gatekeepers. Open Access has been applied less to data, and while this extension is natural, it is not straightforward (
<xref ref-type="bibr" rid="B108">Vision 2010</xref>
). Attitudes about sharing data freely within Life Sciences vary broadly. In sub-disciplines like genomics, data sharing is the norm with some researchers sharing their data immediately via blogs or wikis (
<ext-link ext-link-type="uri" xlink:href="http://www.carlboettiger.info/research/lab-notebook">http://www.carlboettiger.info/research/lab-notebook</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://pathogenomics.bham.ac.uk/blog/">http://pathogenomics.bham.ac.uk/blog/</ext-link>
). Communities that value data sharing may have no formal recognition for such activities nor supportive technical infrastructure. Other communities have a strong sense of data ownership and are antagonistic to open data sharing. Researchers in these communities expect to be directly involved in any further analyses of their data. Databanks for these communities often require registration and/or a fee to gain access. Some data may be regarded as too sensitive to be made fully accessible (
<xref ref-type="bibr" rid="B55">Key Perspectives Ltd 2010</xref>
). </p>
<p>
<pmc-comment>PageBreak</pmc-comment>
<bold>Data can be trusted.</bold>
Once data are accessed, consumers may reveal errors and/or omissions. Biological data can be very dirty, especially if they were acquired without expectation that they would be shared later. Any data cleaning procedures should be documented to aid the consumer in assessing whether the source is “suitable for their purpose” (
<xref ref-type="bibr" rid="B15">Chapman 2005b</xref>
). The creation of “quality loops” allow comments to flow back to the source where data can be annotated or modified, and returned to users for renewed vetting. Webhooks (
<ext-link ext-link-type="uri" xlink:href="http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-making-databases.html">http://iphylo.blogspot.com/2011/02/web-hooks-and-openurl-making-databases.html</ext-link>
) offer a mechanism to exploit APIs to have comments returned to source. Any editing of data can lead to the undesirable outcome that variant forms of the same data may coexist. To some extent, versioning of data sets can be used to discriminate between modified datasets, but users need to cite the version used in analyses (
<xref ref-type="bibr" rid="B115">Zhang et al. 2007</xref>
). </p>
<p>
<bold>Data must be attributed.</bold>
Scientists gain credit in part through attribution. The permanent association of identifiers with open data offers a means of linking attribution to the data and of tracking reuse (
<xref ref-type="bibr" rid="B24">Cryer et al. 2009</xref>
). The association of authors’ names with data motivates contributions (or lack of credit demotivates them). Attribution favors the development of quality loops to correct errors or otherwise comment on the data. Special care is needed when attributing data resulting from the combination of one or more existing sets so that all intellectual investment is properly credited. Dryad, a JDAP partner, provides data citations through the use of DataCite DOIs with an unrestrictive Creative Commons Zero license, thus promoting clear citation and reuse of data (
<xref ref-type="bibr" rid="B108">Vision 2010</xref>
). Community norms can ensure proper attribution of CC0-licensed data (
<xref ref-type="bibr" rid="B31">Fauchart and von Hippel 2008</xref>
). The Panton Principles provide guidelines for licensing data (
<ext-link ext-link-type="uri" xlink:href="http://pantonprinciples.org/">http://pantonprinciples.org/</ext-link>
). </p>
<p>
<bold>Data can be manipulated.</bold>
A value of having large amounts of appropriately annotated data available on the web is that users can explore, in addition to search for, data. Data exploration may result from a desire to test a hypothesis. It is therefore desirable to have tools that draw data together, analyze or visualize them. Exploratory systems include: Humboldt (
<xref ref-type="bibr" rid="B58">Kobilarov and Dickinson 2008</xref>
) which operates like a faceted filter for Linked Data; Parallax which accesses data in Freebase and has the ability to interact with data on multiple web pages at once (
<xref ref-type="bibr" rid="B47">Huynh and Karger 2009</xref>
); and Microsoft Pivot (
<ext-link ext-link-type="uri" xlink:href="http://www.getpivot.com/">http://www.getpivot.com/</ext-link>
) allows a user to interact with large amounts of data from multiple Internet sources. </p>
<p>Visualizations have the capacity to reveal patterns, discontinuities and exceptions that can inform us as to underlying biological processes, appropriateness of data sets, or consistency of experimental protocols. Visualizations can be used to display results with analyses of large data sets. Through visualizations we may help address the challenge stated by
<xref ref-type="bibr" rid="B34">Fox and Hendler (2011)</xref>
that “... many of the major scientific problems facing our world are becoming critically linked to the interdependence and interrelatedness of data from multiple instruments, fields and sources”. The absence of effective visualization is creating a bottleneck within data-intensive sciences (
<xref ref-type="bibr" rid="B34">Fox and Hendler 2011</xref>
). Solutions need to be found in relatively simple low end visualizations (as wonderfully catalogued in
<ext-link ext-link-type="uri" xlink:href="http://www.visual-literacy.org/periodic_table/periodic_table.html">http://www.visual-literacy.org/periodic_table/periodic_table.html</ext-link>
<pmc-comment>PageBreak</pmc-comment>
) to high end tools designed for the data deluge that themselves may call on graphics and visualization standards to be pipelined into rich, complex, and flexible aids. Many Life Sciences data sets can be drawn together and visualized using the geospatial element such as with LifeMapper (
<ext-link ext-link-type="uri" xlink:href="http://www.lifemapper.org/">http://www.lifemapper.org/</ext-link>
) or by OBIS and GBIF (inter alia;
<xref ref-type="bibr" rid="B110">Webb et al. 2010</xref>
). Geospatial metadata, along with temporal, publication, and names metadata are especially valuable as integrators of diverse data sets. </p>
<p>
<bold>Data need to be registered and discoverable.</bold>
Registries index data resources to alert potential users to their availability. Search engines, the normal indexers of web-accessible materials, are not good at revealing database contents - only about half of the open data in repositories are indexed by search engines (
<xref ref-type="bibr" rid="B69">McCown et al. 2006</xref>
). Discovery is made possible by the addition of coarse grained discovery metadata. Registry functions need to expose discovery metadata to make data sets more visible. As an example, GBIF provides registry level service for biodiversity data (
<ext-link ext-link-type="uri" xlink:href="http://www.gbif.org/informatics/standards-and-tools/integrating-data/resource-discovery/">http://www.gbif.org/informatics/standards-and-tools/integrating-data/resource-discovery/</ext-link>
). Registries that cover software (
<ext-link ext-link-type="uri" xlink:href="http://en.bio-soft.net/geshi.html">http://en.bio-soft.net/geshi.html</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://www.equisetites.de/palbot/software/software.html">http://www.equisetites.de/palbot/software/software.html</ext-link>
) or web services (www.biocatalogue.org) are valuable in promoting awareness of tools for data capture, conversion and processing. Successful domain repositories, such as GenBank, have well-structured and detailed metadata that enable detailed search and enhanced discoverability. In the absence of such registries, researchers turn to peers, publications or the thousands of minor data sets available via the Internet. Under these circumstances, it is hard to know when, or if, all relevant data are found. There is a need for a broad-spectrum registry and indexing service (like a Google for data) where researchers can post pointers to their own data, search for desired data and have a means to quickly preview the results. Examples of this exist in Europe with OpenDOAR (
<ext-link ext-link-type="uri" xlink:href="http://www.opendoar.org/">http://www.opendoar.org/</ext-link>
) and in India with Database of Biological Database (
<ext-link ext-link-type="uri" xlink:href="http://www.biodbs.info/">http://www.biodbs.info/</ext-link>
), each with thousands of listings. Semantic annotation of data greatly increases discoverability, and is discussed below. </p>
</sec>
<sec sec-type="The semantic web and Big New Biology">
<title>The semantic web and Big New Biology</title>
<p>The “semantic web” has many definitions, but here we think of it as a technical framework that promotes automated sharing and reuse of data across disciplines (
<xref ref-type="bibr" rid="B13">Campbell and MacNeill 2010</xref>
). The semantic approach has advantages of being flexible, evolvable, and additive. A semantic infrastructure will lead to machine-mediated answers to more complex queries than previously possible (
<xref ref-type="bibr" rid="B102">Stein 2008</xref>
). The foundations for automated reasoning lie in the annotation of data with agreed metadata, linked through a network of ontologies, and queried using conventions (languages) such as RDF, OWL, SKOS and SPARQL (
<xref ref-type="bibr" rid="B13">Campbell and MacNeill 2010</xref>
). The mass of appropriately annotated data that can be accessed through the Internet is referred to as LOD (Linked Open Data). Through common metadata, the data can be linked to form a Linked Open Data cloud. At this time, Life Sciences makes up 9% of the triples in LOD and 51% of the links (
<xref ref-type="bibr" rid="B9">Bizer et al. 2011</xref>
). </p>
<p>Berners-Lee has promoted four guidelines for linked data (
<xref ref-type="bibr" rid="B7">Berners-Lee et al. 2006</xref>
):</p>
<p>1. The use of a standard system of Uniform Resource Identifiers (URIs) as “names” for things</p>
<p>2. The use of HTTP URIs so that the names can be looked up on the internet and the data accessed</p>
<p>3. When a URI is looked up, it should return useful information using standards (RDF, SPARQL)</p>
<p>4. Links to other URIs so that users can discover more things.</p>
<p>A URI is a type of persistent identifier made up of a string of characters that unambiguously (at least in an ideal world, see
<xref ref-type="bibr" rid="B10">Booth 2010</xref>
for discussion) represents data or metadata and can be used by machines to access the data. Different data sets can be linked when they refer to the same URIs. For example, several marine data sets could be linked because they identify the same investigator, sampling event, or location. The most useful classes of terms that are likely to serve the needs of the Life Sciences are georeferences (which can link data from the same location held in different repositories), names of taxa (the common denominator to the majority of statements about biodiversity), publications and identities of people that can be interconnected through devices such as FOAF (friend-of-a-friend) to find collaborators, relevant data, as well as to dig into the world of scientific literature, the latter being linkable through devices such as DOIs to show citation trends, influential publications, etc. (
<xref ref-type="bibr" rid="B82">Patterson et al. 2010</xref>
). </p>
<p>RDF is a language that defines relationships between things. Relationships in RDF are usually made in three parts (often called triples), Entity:Attribute:Value. A machine-readable form in RDF may be a statement that “American robin:has_color:red”. Each term is ideally defined stringently by controlled vocabularies and ontologies, and each part represented within the triple as a URI. The “Value” can be a URI or a literal - the actual value. An advantage of RDF is that it allows datasets to be merged, for example TaxonConcept and Wikipedia (
<ext-link ext-link-type="uri" xlink:href="http://www.slideshare.net/pjdwi/biodiversity-informatics-on-the-semantic-web">http://www.slideshare.net/pjdwi/biodiversity-informatics-on-the-semantic-web</ext-link>
). A goal of the Linking Open Data project is to promote a data commons by registering sets in RDF. As of March 2011, the project had grown to 28 billion triples and 395 million RDF links (
<xref ref-type="bibr" rid="B9">Bizer et al. 2011</xref>
). The EU project, Linking Open Data 2, received €6.5 million to expand Linked Data by building tools and developing standards (
<ext-link ext-link-type="uri" xlink:href="http://lod2.eu/Welcome.html">http://lod2.eu/Welcome.html</ext-link>
). </p>
<p>Transformation of data from printed narrative or spreadsheet to semantic-web formats is a significant challenge. Based on existing ontologies, there is enough information to create 10
<sup>14</sup>
triples in biomedicine alone (
<xref ref-type="bibr" rid="B71">Mons and Velterop 2009</xref>
). At the time of writing, this quantity far exceeds the capacity of any system to process the information. </p>
<p>Life Sciences stand to benefit greatly from the advantages of linked data (
<xref ref-type="bibr" rid="B89">Reichman et al. 2011</xref>
), but need additional investment in mechanisms that ensure quality, provenance and attribution. Provenance identifies sources and, among other things, can ensure attribution and be part of quality control processes. Several software packages currently exist for tracking provenance (such as Kepler,
<ext-link ext-link-type="uri" xlink:href="https://kepler-project.org/">https://kepler-project.org/</ext-link>
; Taverna,
<ext-link ext-link-type="uri" xlink:href="http://www.taverna.org.uk/">http://www.taverna.org.uk/</ext-link>
; VisTrails,
<ext-link ext-link-type="uri" xlink:href="http://www.vistrails.org/index.php/Main_Page">http://www.vistrails.org/index.php/Main_Page</ext-link>
)
<pmc-comment>PageBreak</pmc-comment>
.
<xref ref-type="bibr" rid="B5">Bechhofer et al. (2010)</xref>
advocate the use of Research Objects (ROs) as a mechanism to capture additional value necessary to make the semantic web work for science. Provenance of ROs would satisfy recent calls for “open science” that argue that not only data should be open, but so should be associated methods and analyses (
<xref ref-type="bibr" rid="B89">Reichman et al. 2011</xref>
). </p>
<p>Semanticization enables nanopublication, a form of publication that extends traditional narrative publication (
<xref ref-type="bibr" rid="B37">Groth et al. 2010</xref>
) and allows attribution to be associated with the semantic web (Mons and Veltrop 2009). Nanopublications relate to publication of triples. A uniquely identifiable triple is a statement. A triple with a statement for a subject is called an annotation and a set of annotations that refer to the same statement is called a nanopublication. The annotations add attribution and context to the statement. The concept is not widely accepted. </p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>A Big New Biology holds much promise as a means to address some large proximate scientific challenges. Macroscopic tools will enable discovery of hidden features and better descriptions of relationships within the complexity of the biosphere. Yet, to date, progress towards the vision varies enormously from the successes with high-throughput biology to virtual stasis in some small science biology. Considerable effort is needed to catalog current practices, and to define the sociological transformations that will be required to improve the likelihood of success. If the transformation is to be purposeful, then it will need general oversight, discipline-specific reviews, and a description of the actual and desirable components of the Knowledge Organizational System for Biology and their relationships. Some obvious challenges relate to standards and associated ontologies, incentivizing participation, and assembling an appropriate infrastructure and skill base.</p>
<p>
<bold>Standards and Ontologies.</bold>
Data standards bring order to the virtual data pool on which a Big New Biology will rely. While complex and finely grained metadata are needed for analyses and for the world of Linked Open Data, the first challenge is to improve the discoverability of data. This process has traditionally been supported by word-of-mouth at conferences or in publications. With standards, registries can enable users to find data sets containing information about taxa, parameters, times, processes, or places of interest. If metadata are absent or incomplete, then the data sets cannot be discovered or reused and cannot contribute to Big New Biology. </p>
<p>Automated data discovery, aggregation and analysis require more comprehensive standards than those currently available for many of the Life Sciences. Instead of a comprehensive system of standards, there is a piecemeal system of metadata, vocabularies, thesauri, ontologies, and data transfer schemas that overlap, compete, and have gaps. Greatest progress is being made outside the Life Sciences (such as georeferencing), or in high-investment areas where data are born digital (such as in genomics,
<xref ref-type="bibr" rid="B103">Taylor et al. 2008</xref>
). Given the richness of biodiversity and interactions, a comprehensive
<pmc-comment>PageBreak</pmc-comment>
system of standards will necessarily be extremely complex, and be costly to implement. This creates a tension: whether to promote the comprehensive annotation of data with a significant overhead that deters participation versus pursuing a more minimalistic annotation that can set a grander process in motion. As the commitment to standards is not widespread, the minimalistic approach is more likely to gain traction. The perspective that “The semantic web is littered with ontologies lacking ... data” noted above warns us against starting with complex structures. Metadata and their inter-relationships will need a framework that is designed to allow initial discipline-specific standards to become more finely grained and for the parts to merge into a dynamic grand schema. The world of Linked Open Data provides a good model for this, but given that few data are appropriately annotated, it has yet to realize its potential. </p>
<p>Two organizational frameworks for Life Sciences data are as yet under-exploited. The first is the system of georeferencing that is in use in rich applications in earth sciences, cartography, and so on. Information on occurrences of species is compiled in central databases such as GBIF and OBIS, has been and is being collected in vast quantities by a myriad of citizen scientists. Its potential is well illustrated by some large-scale applications such as the impressive charting of bird migrations (
<xref ref-type="bibr" rid="B68">Marris 2010</xref>
), meta-analyses of oceanic biota (
<xref ref-type="bibr" rid="B110">Webb et al. 2010</xref>
), or web sites that emphasize locally relevant biota (
<ext-link ext-link-type="uri" xlink:href="http://zipcodezoo.com/">http://zipcodezoo.com/</ext-link>
). Less well developed, but arguably with more potential for many sub-disciplines of the Life Sciences, is the transformation of taxonomic and phylogenetic knowledge into an information management system that uses Latin names and molecular identifiers as metadata and classifications and phylogenies as ontological frameworks for the metadata (
<xref ref-type="bibr" rid="B82">Patterson et al. 2010</xref>
). </p>
<p>
<bold>Incentives.</bold>
Despite widespread calls for scientists to make data more widely available, this has yet to happen for many sub-disciplines (
<xref ref-type="bibr" rid="B27">Dittert et al. 2001</xref>
,
<xref ref-type="bibr" rid="B40">Harnad 2008</xref>
,
<xref ref-type="bibr" rid="B67">Mandavilli 2011</xref>
,
<xref ref-type="bibr" rid="B83">Piwowar 2011</xref>
). Only about 10% of data make their way to open repositories (
<xref ref-type="bibr" rid="B93">Savage and Vickers 2009</xref>
,
<xref ref-type="bibr" rid="B95">Science staff editorial 2011</xref>
). A current impediment to data sharing is that the benefits derived are often greater for the consumer than the producer (
<xref ref-type="bibr" rid="B86">Porter and Callahan 1994</xref>
). Other reasons are the lack of resources, infrastructure, and incentives for sharing. Sociological, financial, legal and technical barriers must be surpassed for communities to become directly involved in populating and maintaining data pools, a requisite for success and scalability (
<xref ref-type="bibr" rid="B32">Feijen 2011</xref>
). </p>
<p>In surveys, (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
,
<xref ref-type="bibr" rid="B59">Kohnke et al. 2005</xref>
,
<xref ref-type="bibr" rid="B91">RIN 2008</xref>
,
<xref ref-type="bibr" rid="B23">Costello 2009</xref>
), scientists give the following five reasons not to share data. The first relates to intellectual property: A scientist’s funding and professional recognition relies on receipt of credit for work done. Until scientists receive credit for data publication, there will be little motivation to redirect efforts from more rewarding activities (such as exploring nature or writing papers) towards data mobilization. This problem can be solved with an infrastructure capable of creating citations for data and tracking data use (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
). The second relates tolegal and confidentiality issues as some data cannot be shared, such as data concerning people (
<xref ref-type="bibr" rid="B38">Guttmacher et al. 2009</xref>
) or location of endangered species (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
), proprietary information, or because employers or funders claim that they have copyright over data. The infrastructure must have
<pmc-comment>PageBreak</pmc-comment>
mechanisms to protect necessary confidentiality. Some data can be anonymised, and in the case of endangered taxa, protection can be accomplished by fuzzing data, so that exact locations or identities are obscured (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
). Thirdly, there is concern over misuse or misinterpretation of data, which, once in the literature, cannot be unpublished. This is not a new problem, but it will increase as data producers lose control and can no longer act as “gate-keepers”. Part of the solution lies in developing stringent metadata and format standards such that data are released only when there are sufficient metadata to ensure that all users understand the context and limitations of the data. Until such time, disclaimers can alert consumers about inappropriate reuse (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
,
<xref ref-type="bibr" rid="B101">Smithsonian 2011</xref>
). Fourthly, scientists are concerned that publication can expose errors in their data or weaknesses of analysis. Errors may include insufficient, inaccurate or inappropriate data encoding, metadata, or analysis. Third parties may reveal the selective or inappropriate use of data to emphasize particular arguments. Given the noisy and rich nature of biology, there can be no such thing as a perfect data set; all are incomplete. Errors or gaps uncovered by subsequent users can be dealt with openly and honestly, thereby enhancing the body of scientific data. Finally, there is the issue of sustainability. Project-based data repositories run a risk of being abandoned at the end of the funding cycle. This increases doubts that data curation activities are a good use of resources. It is cheaper to curate data properly than it is to gather it again (
<xref ref-type="bibr" rid="B42">Heidorn 2008</xref>
,
<xref ref-type="bibr" rid="B85">Piwowar et al. 2011</xref>
), and some data, such as data on past distributions of species, are irreplaceable and thus priceless. From an economic perspective, persistent discipline-specific repositories are attractive. There are considerable academic benefits from engaging with repositories. Scientists who share data often report increased book and/or photograph sales, increased web site hits and higher visibility for their projects (
<xref ref-type="bibr" rid="B35">Froese et al. 2003</xref>
). There is greater citation impact for open-access articles (
<xref ref-type="bibr" rid="B36">Gargouri et al. 2010</xref>
). In larger consortia, scientists (such as those studying phylogenetic relationships) who pool data are able to answer questions they could not answer if they were limited to the data that they themselves generated. Some publishers are incentivizing early data-sharing by granting an embargo to the data producers (
<xref ref-type="bibr" rid="B52">Kaye et al. 2009</xref>
) to alleviate fears of being “scooped” (
<xref ref-type="bibr" rid="B89">Reichman et al. 2011</xref>
). An emphasis on “carrots” such as these may be much more effective means of promoting data-sharing than the “sticks” (in the form of funding agency requirements,
<xref ref-type="bibr" rid="B52">Kaye et al. 2009</xref>
;
<xref ref-type="table" rid="T1">Table 1</xref>
). </p>
<p>
<bold>Infrastructure.</bold>
In addition to challenges to incentivize scientists in the direction of data-sharing, the infrastructure for a Big New Biology is incomplete. Funding agencies, like the National Science Foundation in the US, require projects to have plans for data management - a requirement that presumes data persistence. The infrastructure needed to guarantee persistence will require an investment well beyond the usual 3–5 year funding cycle into multi-decadal periods and coordination that has international dimensions. The infrastructure must include tools to capture data, policies, data standards, data identifiers, registration of discovery-level metadata, and APIs to share data (
<xref ref-type="fig" rid="F3">Fig. 3</xref>
). There is as yet no index of data-sharing services (for some initial steps see datacatalogs.org and DataCite
<ext-link ext-link-type="uri" xlink:href="http://www.datacite.org/repolist">http://www.datacite.org/repolist</ext-link>
) nor a framework in which such elements could be integrated. There is little assessment of which elements of data
<pmc-comment>PageBreak</pmc-comment>
plans will lead to persistence of data or their reuse. In the absence of these elements, principle investigators are left to make their own policies, use their own systems, and to finance the processes. As long as the response is piecemeal, there can be no assurances of interoperability, efficiency or persistence. At this time, research scientists need to be supported by data managers and data archivists. Institutional libraries and museums are well placed to shift their agendas to include data management and the preservation of digital artifacts and so may fill this gap, providing institutional, regional or discipline-based services. It is hoped that the ongoing NSF Data Net projects can contribute significantly to the infrastructure. </p>
<p>A new technical challenge is the lack of bandwidth to distribute data from modern data-intense technologies. The problem is illustrated by high throughput molecular biology with tera and petabyte scale data sets (
<xref ref-type="bibr" rid="B21">Cochrane et al. 2009</xref>
). Proposed solutions include Bio-Mirror (
<ext-link ext-link-type="uri" xlink:href="http://www.bio-mirror.net/">http://www.bio-mirror.net/</ext-link>
) which consists of several servers holding the same data, or the Tranche Project (
<ext-link ext-link-type="uri" xlink:href="https://trancheproject.org/">https://trancheproject.org/</ext-link>
), which shares repository functions across servers. The latter has a high administrative overhead. Peer-to-peer sharing systems such as BitTorrent (
<xref ref-type="bibr" rid="B61">Langille and Eisen 2010</xref>
) overcome potential bandwidth problems by sharing data sets without a central repository. Users of BioTorrents benefit from lower bandwidth use, faster transfer times and data publication. Although terabit per second line rates are on the horizon (
<xref ref-type="bibr" rid="B45">Hillerkuss et al. 2011</xref>
), bandwidth problems are likely to persist as part of the interplay between the evolution of new data-generating instruments and the limitations of the infrastructure to make data freely available to all. We may expect to see a growth of specialist centers that will offer analysis, visualization, and data transformation services on behalf of the users.
<pmc-comment>PageBreak</pmc-comment>
</p>
<fig id="F3" orientation="portrait" position="float">
<label>Figure 3.</label>
<caption>
<p>Technical infrastructure needed for Big New Biology to fully emerge (based on
<xref ref-type="bibr" rid="B98">Sinha et al. 2010</xref>
).</p>
</caption>
<graphic xlink:href="ZooKeys-150-017-g003"></graphic>
</fig>
</sec>
<sec>
<title>Conclusion</title>
<p>There is growing pressure from scientists, funding agencies and governments to use new information technologies to effectively manage the increasingly vast amounts of data emerging from new technologies, to integrate these with smaller data sets, and to enhance the communal nature of science. If successful, biology will be enriched with data-intensive dimensions better suited to address large scale and trans-discipline problems. The transition requires many technical advances and cultural changes. Progress on the technical front to date clearly demonstrates that technical issues can be resolved. The process of sociological adaptation is less convincing. Some sub-disciplines (molecular domains) have embraced data-intensive dimensions, some (environmental ecology) are in transition, and others (such as taxonomy) are just beginning. A much better understanding of the existing cultures is needed before we can promote solutions that will realign the traditions of each community with the common goal of shared data use. Training environments such as Universities need to create a new cadre of scientists trained in computer sciences and biology. Other pressing challenges to data integration relate to the development of comprehensive and agreed metadata and ontologies, and to the semanticization of data so that the discipline can take advantage of the Linked Open Data cloud. The long tail of small data sets presents a special challenge - that of bringing heterogeneous data sets together. At this time, the common denominators that are likely to be effective are georeferencing, citations, and names. All require further investment. None of the elements of the transition will come quickly or cheaply, but these transformations are needed if we are to make the Life Sciences less parochial and more capable of responding to major research challenges.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The authors would like to thank Dmitry Mozzherin, David Shorthouse, Nathan Wilson, Jane Maeinschein, Peter DeVries, Holly Miller, Vince Smith, Daniel Mietchen and members of the Data Conservancy Life Sciences Advisory Group (Mark Schildhauer, Bryan Heidorn, Steve Kelling, Dawn Field, Norman Morrison and Paula Mabee) for valuable comments. This work is supported by NSF award 0830976 The Data Conservancy (A digital research and curation virtual organization).</p>
<p>The topics raised here were explored during a workshop held in Woods Hole, Massachusetts attended by computer, information and biological scientists, and representatives of academia, the private sector and government. A longer “white paper” produced for the National Science Foundation Data Conservancy project is available (
<xref ref-type="bibr" rid="B105">Thessen and Patterson 2011</xref>
).</p>
</ack>
<ref-list>
<title>References
<pmc-comment>PageBreak</pmc-comment>
</title>
<ref id="B1">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Ackoff</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>1989</year>
)
<article-title> From data to wisdom.</article-title>
<source>Journal of Applied Systems Analysis </source>
<volume>16</volume>
:
<fpage>3</fpage>
-
<lpage>9</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1002/9781444303179.ch3">doi: 10.1002/9781444303179.ch3</ext-link>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Arlinghaus</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cooke</surname>
<given-names>SJ</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Recreational fisheries: socioeconomic importance, conservation issues and management changes. In: Adams B (Ed) Recreational Hunting, Conservation, and Rural Livelihoods: Science and Practice. Blackwell, Oxford. </source>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Ausubel</surname>
<given-names>JH</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> A botanical macroscope. Proceedings of the National Academy of Science
<italic>.</italic>
106: 12569. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.0906757106">doi: 10.1073/pnas.0906757106</ext-link>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Bard</surname>
<given-names>JBL</given-names>
</name>
<name>
<surname>Rhee</surname>
<given-names>SY</given-names>
</name>
</person-group>
(
<year>2004</year>
)
<article-title> Ontologies in biology: design, applications and future challenges.</article-title>
<source>Nature Reviews Genetics </source>
<volume>5</volume>
:
<fpage>213</fpage>
-
<lpage>222</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrg1295">doi: 10.1038/nrg1295</ext-link>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Bechhofer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ainsworth</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bhagat</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Buchan</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Couch</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cruickshank</surname>
<given-names>D</given-names>
</name>
<name>
<surname>De Roure</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Delderfield</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dunlop</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Gamble</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Goble</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Michaelides</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Missier</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Owen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Newman</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sufi</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Why linked data is not enough for scientists. 6
<sup>th</sup>
IEEE e-Science conference. </source>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Berman</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Henrick</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Nakamura</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Markley</surname>
<given-names>JL</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<source> The worldwide protein data bank (wwPDB): ensuring a single uniform archive of PDB data. Nucleic Acids Research 35: D301–D303. </source>
doi: 10.1093/nar/gkl971</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Berners-Lee</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chilton</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Connolly</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Dhanaraj</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hollenbach</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lerer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sheets</surname>
<given-names>D</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<source> Tabulator: exploring and analyzing linked data on the semantic web. Proceedings of the 3
<sup>rd</sup>
International Semantic Web User Interaction Workshop (SWUI0), Athens, Georgia. </source>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Bilofsky</surname>
<given-names>HS</given-names>
</name>
<name>
<surname>Christian</surname>
<given-names>B</given-names>
</name>
</person-group>
(
<year>1988</year>
)
<article-title> The GenBank genetic sequence data bank.</article-title>
<source>Nucleic Acids Research </source>
<volume>16</volume>
:
<fpage>1861</fpage>
-
<lpage>1863</lpage>
doi: 10.1093/nar/16.5.1861
<pub-id pub-id-type="pmid">3353225</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Bizer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jentzsch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cyganiak</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> State of the LOD cloud. [
<ext-link ext-link-type="uri" xlink:href="http://www4.wiwiss.fu-berlin.de/lodcloud/state/">http://www4.wiwiss.fu-berlin.de/lodcloud/state/</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Booth</surname>
<given-names>D</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Resource identity and semantic extensions: making sense of ambiguity. Semantic Technology Conference, San Francisco, USA
<ext-link ext-link-type="uri" xlink:href="http://dbooth.org/2010/ambiguity/">http://dbooth.org/2010/ambiguity/</ext-link>
. </source>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Bunin</surname>
<given-names>VD</given-names>
</name>
<name>
<surname>Ignatov</surname>
<given-names>OV</given-names>
</name>
<name>
<surname>Gulii</surname>
<given-names>OI</given-names>
</name>
<name>
<surname>Voloshin</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Dykman</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>O’Neil</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Ivnitskii</surname>
<given-names>D</given-names>
</name>
</person-group>
(
<year>2005</year>
)
<article-title> Investigation of electrophysical properties of
<italic>Listeria monocytogenes</italic>
cells during the interaction with monoclonal antibodies.</article-title>
<source>Biofizika </source>
<volume>50</volume>
:
<fpage>316</fpage>
-
<lpage>321</lpage>
<pub-id pub-id-type="pmid">15856991</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Burton</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Treloar</surname>
<given-names>A</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> Designing for discovery and re-use: the ANDS data-sharing verbs approach to service decomposition.</article-title>
<source>The International Journal of Digital Curation </source>
<volume>4</volume>
:
<fpage>44</fpage>
-
<lpage>56</lpage>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Campbell</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>MacNeill</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> The semantic web, linked and open data: a briefing paper. JISC cetis. [
<ext-link ext-link-type="uri" xlink:href="http://wiki.cetis.ac.uk/images/1/1a/The_Semantic_Web.pdf">http://wiki.cetis.ac.uk/images/1/1a/The_Semantic_Web.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Chapman</surname>
<given-names>AD</given-names>
</name>
</person-group>
(
<year>2005a</year>
)
<source> Uses of primary species-occurrence data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. [
<ext-link ext-link-type="uri" xlink:href="http://www.niobioinformatics.in/books/Uses%20of%20Primary%20Data.pdf">http://www.niobioinformatics.in/books/Uses%20of%20Primary%20Data.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Chapman</surname>
<given-names>AD</given-names>
</name>
</person-group>
(
<year>2005b</year>
)
<source> Principles of data quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. [
<ext-link ext-link-type="uri" xlink:href="http://niobioinformatics.in/pdf/workshop/Data%20Quality.pdf">http://niobioinformatics.in/pdf/workshop/Data%20Quality.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Chapman</surname>
<given-names>AD</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Numbers of Living Species in Australia and the World, 2
<sup>nd</sup>
edition. Australian Biological Resources Study, Australia. </source>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Chavan</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Krishnan</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2001</year>
)
<source> Digitizing life: role of digital libraries in life conservation in developing world. Proceedings of the 4
<sup>th</sup>
International Conference on Asian Digital Libraries, December 10–12, 2001, Bangalore India, 330–340.  [
<ext-link ext-link-type="uri" xlink:href="http://ncsi-net.ncsi.iisc.ernet.in/gsdl/collect/icco/index/assoc/HASHe590.dir/doc.doc">http://ncsi-net.ncsi.iisc.ernet.in/gsdl/collect/icco/index/assoc/HASHe590.dir/doc.doc</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Cisneros-Montemayor</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Sumaila</surname>
<given-names>UR</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> A global estimate of benefits from ecosystem based marine recreation: Potential impacts and implications for management.</article-title>
<source>Journal of Bioeconomics</source>
<volume>12</volume>
:
<fpage>245</fpage>
-
<lpage>268</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s10818-010-9092-7">doi: 10.1007/s10818-010-9092-7</ext-link>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Coale</surname>
<given-names>KH</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Chavez</surname>
<given-names>FP</given-names>
</name>
<name>
<surname>Buesseler</surname>
<given-names>KO</given-names>
</name>
<name>
<surname>Barber</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Brzezinski</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Cochlan</surname>
<given-names>WP</given-names>
</name>
<name>
<surname>Millero</surname>
<given-names>FJ</given-names>
</name>
<name>
<surname>Falkowski</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Wanninkhof</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Kudela</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Altabet</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Hales</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Takahashi</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Landry</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Bidigare</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chase</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Strutton</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Friederich</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Gorbunov</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Lance</surname>
<given-names>VP</given-names>
</name>
<name>
<surname>Hilting</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Hiscock</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Demarest</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hiscock</surname>
<given-names>WT</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>KF</given-names>
</name>
<name>
<surname>Tanner</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Gordon</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Hunter</surname>
<given-names>CN</given-names>
</name>
<name>
<surname>Elrod</surname>
<given-names>VA</given-names>
</name>
<name>
<surname>Fitzwater</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Tozzi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Koblizek</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Roberts</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Herndon</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Brewster</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ladizinsky</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Cooper</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Timothy</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>SL</given-names>
</name>
<name>
<surname>Selph</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Sheridan</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Twining</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Johnson</surname>
<given-names>ZI</given-names>
</name>
</person-group>
(
<year>2004</year>
)
<article-title> Southern Ocean iron enrichment experiment: carbon cycling in high- and low-Si waters.</article-title>
<source>Science</source>
<volume>304</volume>
:
<fpage>408</fpage>
-
<lpage>414</lpage>
doi: 10.1126/science.1089778
<pub-id pub-id-type="pmid">15087542</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Coburn</surname>
<given-names>TA</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> The National Science Foundation: Under the microscope. A report by Tom A. Coburn, M.D. U.S. Senator, Oklahoma. [] </source>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Cochrane</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Akhtar</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bonfield</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bower</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Demiralp</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Faruque</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hoad</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hubbard</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Hunter</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Juhos</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Leinonen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Leonard</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lorenc</surname>
<given-names>D</given-names>
</name>
<name>
<surname>McWilliam</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Mukherjee</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Plaister</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Radhakrishan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Robinson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sobhany</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hoopen</surname>
<given-names>PT</given-names>
</name>
<name>
<surname>Vaughan</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zalunin</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Research 37: D19–D25. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/nar/gkn765">doi: 10.1093/nar/gkn765</ext-link>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Costanza</surname>
<given-names>R</given-names>
</name>
<name>
<surname>D’arge</surname>
<given-names>R</given-names>
</name>
<name>
<surname>de Groot</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Farber</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Grasso</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hannon</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Limburg</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Naeem</surname>
<given-names>S</given-names>
</name>
<name>
<surname>O’Neill</surname>
<given-names>RV</given-names>
</name>
<name>
<surname>Paruelo</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Raskin</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>P</given-names>
</name>
<name>
<surname>van den Belt</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>1997</year>
)
<article-title> The value of the world’s ecosystem services and natural capital.</article-title>
<source>Nature </source>
<volume>387</volume>
:
<fpage>253</fpage>
-
<lpage>260</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/387253a0">doi: 10.1038/387253a0</ext-link>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Costello</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> Motivating online publication of data.</article-title>
<source>BioScience </source>
<volume>59</volume>
:
<fpage>418</fpage>
-
<lpage>426</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3525/bio.2009.59.5.9">doi: 10.3525/bio.2009.59.5.9</ext-link>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Cryer</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hyam</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Nicolson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ó Tuama</surname>
<given-names>É</given-names>
</name>
<name>
<surname>Page</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rees</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Riccardi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>K</given-names>
</name>
<name>
<surname>White</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Adoption of persistent identifiers for biodiversity informatics: Recommendations of the GBIF LSID GUID task group, 6 November 2009. [
<ext-link ext-link-type="uri" xlink:href="http://www2.gbif.org/Persistent-Identifiers.pdf">http://www2.gbif.org/Persistent-Identifiers.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Davis</surname>
<given-names>PM</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> Author-choice open access publishing in the biological and medical literature: a citation analysis.</article-title>
<source>Journal of the American Society for Information Science and Technology </source>
<volume>60</volume>
:
<fpage>3</fpage>
-
<lpage>8</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1002/asi.20965">doi: 10.1002/asi.20965</ext-link>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>De Rosnay</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>1975</year>
)
<source> Le macroscope: vers une vision globale
<italic>.</italic>
Seuil, Paris. </source>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Dittert</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Diepenbroek</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grobe</surname>
<given-names>H</given-names>
</name>
</person-group>
(
<year>2001</year>
)
<source> Scientific data must be made available to all. Nature14: 393. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/35106716.">doi: 10.1038/35106716.</ext-link>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Doom</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Raymer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Krane</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>O</given-names>
</name>
</person-group>
(
<year>2002</year>
)
<source> A proposed undergraduate bioinformatics curriculum for computer scientists. ACM SIGCSE Technical Symposium on Computer Science Education (33
<sup>rd</sup>
, Covington, KY), 78–81. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/563340.563368">doi: 10.1145/563340.563368</ext-link>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>ESF (European Science</surname>
<given-names>Foundation)</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<source> Press Release: A cyberinfrastructure network for Europe. [
<ext-link ext-link-type="uri" xlink:href="http://www.esf.org/media-centre/ext-single-news/article/a-cyber-infrastructure-network-for-europe-129.html">http://www.esf.org/media-centre/ext-single-news/article/a-cyber-infrastructure-network-for-europe-129.html</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Evans</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Foster</surname>
<given-names>JG</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Metaknowledge.</article-title>
<source>Science </source>
<volume>331</volume>
:
<fpage>721</fpage>
-
<lpage>725</lpage>
doi: 10.1126/science.1201765
<pub-id pub-id-type="pmid">21311014</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Fauchart</surname>
<given-names>E</given-names>
</name>
<name>
<surname>von Hippel</surname>
<given-names>E</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Norms-based intellectual property systems: The case of French chefs.</article-title>
<source>Organization Science </source>
<volume>19</volume>
:
<fpage>187</fpage>
-
<lpage>201</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1287/orsc.1070.0314">doi: 10.1287/orsc.1070.0314</ext-link>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Feijen</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> What researchers want. SURFfoundation. [
<ext-link ext-link-type="uri" xlink:href="http://www.surffoundation.nl/en/publications">http://www.surffoundation.nl/en/publications</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Fergraus</surname>
<given-names>EH</given-names>
</name>
<name>
<surname>Andelman</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Schildhauer</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>2005</year>
)
<article-title> Maximizing the value of ecological data with structured metadata: an introduction to Ecological Metadata Language (EML) and principles for metadata creation.</article-title>
<source>Bulletin of the Ecological Society of America </source>
<volume>86</volume>
:
<fpage>158</fpage>
-
<lpage>168</lpage>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Fox</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Hendler</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Changing the equation on scientific data visualization.</article-title>
<source>Science </source>
<volume>331</volume>
:
<fpage>705</fpage>
-
<lpage>708</lpage>
doi: 10.1126/science.1197654
<pub-id pub-id-type="pmid">21311008</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Froese</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lloris</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Opitz</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2003</year>
)
<article-title> Scientific data in the public domain.</article-title>
<source>ACP-EU Fisheries Research Report </source>
<volume>14</volume>
:
<fpage>267</fpage>
-
<lpage>271</lpage>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Gargouri</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Hajjen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Larivière</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Gingras</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Carr</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Brody</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Harnad</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Self-selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 5: e13636. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0013636">doi: 10.1371/journal.pone.0013636</ext-link>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Groth</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Velterop</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> The anatomy of a nanopublication.</article-title>
<source>Information Services & Use </source>
<volume>30</volume>
:
<fpage>51</fpage>
-
<lpage>56</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3233/ISU-2010-0613">doi: 10.3233/ISU-2010-0613</ext-link>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Guttmacher</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Nabel</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>FS</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Why data-sharing policies matter. Proceedings of the National Academy of Science106: 16894. doi 10.1073/pnas.0910378106 </source>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Gwinn</surname>
<given-names>NE</given-names>
</name>
<name>
<surname>Rinaldo</surname>
<given-names>C</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> The Biodiversity Heritage Library: sharing biodiversity literature with the world.</article-title>
<source>IFLA Journal </source>
<volume>35</volume>
:
<fpage>25</fpage>
-
<lpage>34</lpage>
doi: 10.1177/0340035208102032</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Harnad</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Waking OA’s “Slumbering Giant”: The University’s mandate to mandate open access.</article-title>
<source>New Review of Information Networking </source>
<volume>14</volume>
:
<fpage>51</fpage>
-
<lpage>68</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1080/13614570903001322">doi: 10.1080/13614570903001322</ext-link>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Harnad</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Open Access – Open Data: similarities and differences. [
<ext-link ext-link-type="uri" xlink:href="http://www.slideshare.net/oaod2010/oa-oa-self-archiving-oa-publishing-and-data-archiving">http://www.slideshare.net/oaod2010/oa-oa-self-archiving-oa-publishing-and-data-archiving</ext-link>
]</source>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Heidorn</surname>
<given-names>PB</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Shedding light on the dark data in the long tail of science.</article-title>
<source>Library Trends </source>
<volume>57</volume>
:
<fpage>280</fpage>
-
<lpage>299</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1353/lib.0.0036">doi: 10.1353/lib.0.0036</ext-link>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="book">
<person-group>
<name>
<surname>Hey</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tansley</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tolle</surname>
<given-names>K</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> The Fourth Paradigm.</source>
<publisher-name>Microsoft Research. Redmond</publisher-name>
,
<publisher-loc>WA</publisher-loc>
,
<lpage>252</lpage>
pp.</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Higgins</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Berkley</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
</person-group>
(
<year>2002</year>
)
<source> Managing heterogeneous ecological data using Morpho. 14
<sup>th</sup>
International Conference on scientific and statistical database management (SSDBM’02), 69. </source>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Hillerkuss</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Schmogrow</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Schellinger</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Jordan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Winter</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huber</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Vallaitis</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Bonk</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Kleinow</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Frey</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Roeger</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Koenig</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ludwig</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Marculescu</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hoh</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dreschmann</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ben Ezra</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Narkiss</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Nebendahl</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Parmigiani</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Petropoulos</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Resan</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Oehler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Weingarten</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ellermeyer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Lutz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Moeller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huebner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Becker</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Koos</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Freude</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Leuthold</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> 26 Tbit s
<sup>−1</sup>
 line-rate super-channel transmission utilizing all-optical fast Fourier transform processing.</article-title>
<source>Nature Photonics </source>
<volume>5</volume>
:
<fpage>364</fpage>
-
<lpage>371</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nphoton.2011.74">doi: 10.1038/nphoton.2011.74</ext-link>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Hopkins</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Freckleton</surname>
<given-names>RP</given-names>
</name>
</person-group>
(
<year>2002</year>
)
<article-title> Declines in the numbers of amateur and professional taxonomists: implications for conservation.</article-title>
<source>Animal Conservation </source>
<volume>5</volume>
:
<fpage>245</fpage>
-
<lpage>249</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1017/S1367943002002299">doi: 10.1017/S1367943002002299</ext-link>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Huynh</surname>
<given-names>DF</given-names>
</name>
<name>
<surname>Karger</surname>
<given-names>DR</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Parallax and companion: set-based browsing for the data web. In: Proceedings of WWW ’09. </source>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Innocenti</surname>
<given-names>P</given-names>
</name>
<name>
<surname>McHugh</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ross</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ruusalepp</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2007</year>
)
<source> Digital Curation Centre (DCC) and DigitalPreservationEurope (DPE) audit toolkit: DRAMBORA. International Conference on Digital Preservation (iPRES), Beijing. </source>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>ISO</surname>
</name>
</person-group>
(
<year>2000</year>
)
<source> ISO 9000:2000: quality management systems – fundamentals and vocabulary. Standard, International Organization for Standardization (ISO), Geneva, Switzerland. </source>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Berkley</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bojilova</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Schilhauer</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>2002</year>
)
<article-title> Managing scientific metadata.</article-title>
<source>Internet Computing IEEE </source>
<volume>5</volume>
:
<fpage>59</fpage>
-
<lpage>68</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/4236.957896">doi: 10.1109/4236.957896</ext-link>
</mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Schildhauer</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Reichman</surname>
<given-names>OJ</given-names>
</name>
<name>
<surname>Bowers</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<article-title> The new bioinformatics: integrating ecological data from the gene to the biosphere.</article-title>
<source>Annual Review of Ecology, Evolution and Systematics</source>
<volume>37</volume>
:
<fpage>519</fpage>
-
<lpage>544</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1146/annurev.ecolsys.37.091305.110031">doi: 10.1146/annurev.ecolsys.37.091305.110031</ext-link>
</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Kaye</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Heeney</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hawkins</surname>
<given-names>N</given-names>
</name>
<name>
<surname>de Vries</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Boddington</surname>
<given-names>P</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> Data sharing in genomics – reshaping scientific practice.</article-title>
<source>Nature Reviews Genetics </source>
<volume>10</volume>
:
<fpage>331</fpage>
-
<lpage>335</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrg2573">doi: 10.1038/nrg2573</ext-link>
</mixed-citation>
</ref>
<ref id="B53">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Kelling</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hochachka</surname>
<given-names>WM</given-names>
</name>
<name>
<surname>Fink</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Riedewald</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Caruana</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ballard</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hooker</surname>
<given-names>G</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> Data-intensive science: a new paradigm for biodiversity studies.</article-title>
<source>BioScience</source>
<volume>59</volume>
:
<fpage>613</fpage>
-
<lpage>619</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1525/bio.2009.59.7.12">doi: 10.1525/bio.2009.59.7.12</ext-link>
</mixed-citation>
</ref>
<ref id="B54">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Kerlinger</surname>
<given-names>P</given-names>
</name>
</person-group>
(
<year>1993</year>
)
<source> Birding Economics and Birder Demographics Studies Conservation Tools. In: Finch D, Stangel P (Eds) Proceedings of the Status and Management of Neotropical Migratory Birds. Rocky Mountains Forest and Range Experimental Station, Fort Collins, CO. USDA Forestry Service General Technical Report RM-229, 32–38. </source>
</mixed-citation>
</ref>
<ref id="B55">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Key Perspectives Ltd</surname>
</name>
</person-group>
(
<year>2010</year>
)
<source> Data Dimensions: disciplinary differences in research data-sharing, reuse and long term viability. DCC Scarp Synthesis Report. ISSN 1759–586X. [
<ext-link ext-link-type="uri" xlink:href="http://hdl.handle.net/1842/3364">http://hdl.handle.net/1842/3364</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B56">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Kidd</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> GEOPHYLOBUILDER 1.0: an ARCGIS extension for creating “geophylogenies”.</article-title>
<source>Molecular Ecology Resources </source>
<volume>8</volume>
:
<fpage>88</fpage>
-
<lpage>91</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/j.1471-8286.2007.01925.x">doi: 10.1111/j.1471-8286.2007.01925.x</ext-link>
<pub-id pub-id-type="pmid">21585723</pub-id>
</mixed-citation>
</ref>
<ref id="B57">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Klump</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Criteria for the trustworthiness of data-centres. D-Lib Magazine vol. 17. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1045/january2011-klump">doi: 10.1045/january2011-klump</ext-link>
</mixed-citation>
</ref>
<ref id="B58">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Kobilarov</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Dickinson</surname>
<given-names>I</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<source> Humboldt: exploring linked data. In: Proceedings of the WWW ’08 Workshop on Linked Data on the Web. </source>
</mixed-citation>
</ref>
<ref id="B59">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Kohnke</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Costello</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Crease</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Folack</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Martinez Guingla</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Michida</surname>
<given-names>Y</given-names>
</name>
</person-group>
(
<year>2005</year>
)
<source> Review of the International Oceanographic Data and Information Exchange (IODE). Intergovernmental Oceanographic Commission (IOC) IOC/IODE-XVIII/18. </source>
</mixed-citation>
</ref>
<ref id="B60">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Lambrix</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>H</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Ontology alignment and merging.</article-title>
In:
<person-group>
<name>
<surname>Burger</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Baldock</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<role>Eds</role>
).
<issue-title>Anatomy Ontologies for Bioinformatics: Principles and Practice.</issue-title>
<source>Springer</source>
:
<fpage>134</fpage>
-
<lpage>149</lpage>
</mixed-citation>
</ref>
<ref id="B61">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Langille</surname>
<given-names>MGI</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>JA</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> BioTorrents: A file sharing service for scientific data. PLoS ONE 5(4): e10071. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0010071">doi: 10.1371/journal.pone.0010071</ext-link>
</mixed-citation>
</ref>
<ref id="B62">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Lee</surname>
<given-names>CP</given-names>
</name>
<name>
<surname>Dourish</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mark</surname>
<given-names>G</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<source> The human infrastructure of cyberinfrastructure. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1145/1180875.1180950">doi: 10.1145/1180875.1180950</ext-link>
</mixed-citation>
</ref>
<ref id="B63">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Lynch</surname>
<given-names>CA</given-names>
</name>
</person-group>
(
<year>2003</year>
)
<article-title> Institutional repositories: essential infrastructure for scholarship in the digital age.</article-title>
<source>Libraries and the Academy </source>
<volume>3</volume>
:
<fpage>327</fpage>
-
<lpage>336</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1353/pla.2003.0039">doi: 10.1353/pla.2003.0039</ext-link>
</mixed-citation>
</ref>
<ref id="B64">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>MacLeod</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Benfield</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Culverhouse</surname>
<given-names>P</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Time to automate identification.</article-title>
<source>Nature </source>
<volume>467</volume>
:
<fpage>154</fpage>
-
<lpage>155</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/467154a">doi: 10.1038/467154a</ext-link>
<pub-id pub-id-type="pmid">20829777</pub-id>
</mixed-citation>
</ref>
<ref id="B65">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Madin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bowers</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Schildhauer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Krivov</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Pennington</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Villa</surname>
<given-names>F</given-names>
</name>
</person-group>
(
<year>2007a</year>
)
<article-title> An ontology for describing and synthesizing obervation data.</article-title>
<source>Ecological Informatics </source>
<volume>2</volume>
:
<fpage>279</fpage>
-
<lpage>296</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.ecoinf.2007.05.004">doi: 10.1016/j.ecoinf.2007.05.004</ext-link>
</mixed-citation>
</ref>
<ref id="B66">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Madin</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Bowers</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Schildhauer</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
</person-group>
(
<year>2007b</year>
)
<article-title> Advancing ecological research with ontologies.</article-title>
<source>Trends in Ecology and Evolution </source>
<volume>23</volume>
:
<fpage>159</fpage>
-
<lpage>168</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.tree.2007.11.007.">doi: 10.1016/j.tree.2007.11.007.</ext-link>
<pub-id pub-id-type="pmid">18289717</pub-id>
</mixed-citation>
</ref>
<ref id="B67">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Mandavilli</surname>
<given-names>A</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Trial by twitter.</article-title>
<source>Nature</source>
<volume>469</volume>
:
<fpage>286</fpage>
-
<lpage>287</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/469286a">doi: 10.1038/469286a</ext-link>
<pub-id pub-id-type="pmid">21248816</pub-id>
</mixed-citation>
</ref>
<ref id="B68">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Marris</surname>
<given-names>E</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Supercomputing for the birds. Nature 466: 807. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/466807a">doi: 10.1038/466807a</ext-link>
</mixed-citation>
</ref>
<ref id="B69">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>McCown</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Zubair</surname>
<given-names>M</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<article-title> Search engine coverage of the OAI-PMH Corpus.</article-title>
<source>IEEE Internet Computing, </source>
<volume>10</volume>
:
<fpage>66</fpage>
-
<lpage>73</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/MIC.2006.41">doi: 10.1109/MIC.2006.41</ext-link>
</mixed-citation>
</ref>
<ref id="B70">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Michener</surname>
<given-names>WK</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<article-title> Meta-information concepts for ecological data management.</article-title>
<source>Ecological Informatics </source>
<volume>1</volume>
:
<fpage>3</fpage>
-
<lpage>7</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.ecoinf.2005.08.004">doi: 10.1016/j.ecoinf.2005.08.004</ext-link>
</mixed-citation>
</ref>
<ref id="B71">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Mons</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Velterop</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Nano-publication in the e-science era. In: Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), Washington DC, USA.</source>
[
<ext-link ext-link-type="uri" xlink:href="http://www.surffoundation.nl/SiteCollectionDocuments/Nano-Publication%20-%20Mons%20-%20Velterop.pdf">http://www.surffoundation.nl/SiteCollectionDocuments/Nano-Publication%20-%20Mons%20-%20Velterop.pdf</ext-link>
] </mixed-citation>
</ref>
<ref id="B72">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Morris</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> GBIFKOS Draft White Paper v 2010_11-25-0400.</source>
[
<ext-link ext-link-type="uri" xlink:href="http://community.gbif.org/pg/file/BMorris/read/10694/gbifkos-draft-white-paper-v-2010_11250400">http://community.gbif.org/pg/file/BMorris/read/10694/gbifkos-draft-white-paper-v-2010_11250400</ext-link>
] </mixed-citation>
</ref>
<ref id="B73">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>NAS</surname>
<given-names>(National Academy of Sciences)</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> A New Biology for the 21
<sup>st</sup>
Century, 112 pp. </source>
</mixed-citation>
</ref>
<ref id="B74">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>National Science Board</surname>
</name>
</person-group>
(
<year>2010a</year>
)
<source> Science and Engineering Indicators 2010, Chapter 5, Academic Research and Development.</source>
[
<ext-link ext-link-type="uri" xlink:href="http://www.nsf.gov/statistics/seind10/c5/c5h.htm">http://www.nsf.gov/statistics/seind10/c5/c5h.htm</ext-link>
] </mixed-citation>
</ref>
<ref id="B75">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>National Science Board</surname>
</name>
</person-group>
(
<year>2010b</year>
)
<source> Globalization of Science and Engineering Research. [
<ext-link ext-link-type="uri" xlink:href="http://www.nsf.gov/statistics/nsb1003/">http://www.nsf.gov/statistics/nsb1003/</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B76">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Norris</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Oppenheim</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rowland</surname>
<given-names>F</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> The citation advantage of open access articles.</article-title>
<source>Journal of the American Society of Information Science and Technology </source>
<volume>59</volume>
:
<fpage>1963</fpage>
-
<lpage>1972</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pbio.0040157">doi: 10.1371/journal.pbio.0040157</ext-link>
</mixed-citation>
</ref>
<ref id="B77">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>NSF</surname>
<given-names>(National ScienceFoundation)</given-names>
</name>
</person-group>
(
<year>2003</year>
)
<source> Revolutionizing science and engineering through cyberinfrastructure: report of the national science foundation blue-ribbon advisory panel on cyberinfrastructure. 84 pp. [
<ext-link ext-link-type="uri" xlink:href="http://www.nsf.gov/od/oci/reports/atkins.pdf">http://www.nsf.gov/od/oci/reports/atkins.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B78">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>NSF</surname>
<given-names>(National ScienceFoundation)</given-names>
</name>
</person-group>
(
<year>2006</year>
)
<source> NSF’s Cyberinfrastructure Vision for 21
<sup>st</sup>
Century Discovery ver 5.0. NSF Cyberinfrastructure Council, 32pp. [
<ext-link ext-link-type="uri" xlink:href="http://www.nsf.gov/od/oci/ci_v5.pdf">http://www.nsf.gov/od/oci/ci_v5.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B79">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>OECD</surname>
</name>
</person-group>
(
<year>1999</year>
)
<source> Final Report of the megascience forum working group on biological informatics. OECD, Paris. </source>
</mixed-citation>
</ref>
<ref id="B80">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>PARSE</surname>
</name>
</person-group>
(
<year>2009</year>
)
<source> PARSE.Insight: INSIGHT into issues of permanent access to the records of science in Europe. [
<ext-link ext-link-type="uri" xlink:href="http://www.parse-insight.eu/downloads/PARSE-Insight_D3-4_SurveyReport_final_hq.pdf">http://www.parse-insight.eu/downloads/PARSE-Insight_D3-4_SurveyReport_final_hq.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B81">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Patterson</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Faulwetter</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Shipunov</surname>
<given-names>A</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<source> Principles for a names-based cyberinfrastructure to serve all of biology. In: Minelli A, Bonato L, Fusco G (Eds) Updating the Linnaean Heritage: Names as Tools for Thinking About Plants and Animals, 153–163. </source>
</mixed-citation>
</ref>
<ref id="B82">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Patterson</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Cooper</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kirk</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Pyle</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Remsen</surname>
<given-names>DP</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Names are key to the Big New Biology.</article-title>
<source>Trends in Ecology and Evolution </source>
<volume>25</volume>
:
<fpage>686</fpage>
-
<lpage>691</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.tree.2010.09.004">doi: 10.1016/j.tree.2010.09.004</ext-link>
<pub-id pub-id-type="pmid">20961649</pub-id>
</mixed-citation>
</ref>
<ref id="B83">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Who shares? Who doesn’t? Factors associated with openly archiving raw research data. PLoS ONE 6: e18657. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0018657">doi: 10.1371/journal.pone.0018657</ext-link>
</mixed-citation>
</ref>
<ref id="B84">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Fridsma</surname>
<given-names>DB</given-names>
</name>
</person-group>
(
<year>2007</year>
)
<source> Sharing detailed research data is associated with increased citation rate. PLoS ONE 3: e308. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0000308">doi: 10.1371/journal.pone.0000308</ext-link>
</mixed-citation>
</ref>
<ref id="B85">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Piwowar</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Vision</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Whitlock</surname>
<given-names>MC</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Data archiving is a good investment. Nature 473: 285. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/473285a">doi: 10.1038/473285a</ext-link>
</mixed-citation>
</ref>
<ref id="B86">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Porter</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Callahan</surname>
<given-names>JT</given-names>
</name>
</person-group>
(
<year>1994</year>
)
<article-title> Circumventing a dilemma: historical approaches to data-sharing in ecological research.</article-title>
In:
<person-group>
<name>
<surname>Michener</surname>
<given-names>WK</given-names>
</name>
<name>
<surname>Brunt</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Stafford</surname>
<given-names>SG</given-names>
</name>
</person-group>
(
<role>Eds</role>
).
<issue-title>Environmental Information Management and Analysis: Ecosystem to Global Scales.</issue-title>
<source>Taylor & Francis Ltd, London</source>
:
<fpage>193</fpage>
-
<lpage>202</lpage>
</mixed-citation>
</ref>
<ref id="B87">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Pullin</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Salafsky</surname>
<given-names>N</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Save the whales? Save the rainforest? Save the data! Conservation Biology 24: 915–917. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1111/j.1523-1739.2010.01537.x">doi: 10.1111/j.1523-1739.2010.01537.x</ext-link>
</mixed-citation>
</ref>
<ref id="B88">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Raup</surname>
<given-names>D</given-names>
</name>
</person-group>
(
<year>1991</year>
)
<source> Extinction: Bad Genes or Bad Luck? Norton and Co., New York. </source>
</mixed-citation>
</ref>
<ref id="B89">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Reichman</surname>
<given-names>OJ</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Schildauer</surname>
<given-names>MP</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Challenges and opportunities to open data in ecology.</article-title>
<source>Science </source>
<volume>331</volume>
:
<fpage>703</fpage>
-
<lpage>705</lpage>
doi: 10.1126/science.1197962
<pub-id pub-id-type="pmid">21311007</pub-id>
</mixed-citation>
</ref>
<ref id="B90">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Rhee</surname>
<given-names>SY</given-names>
</name>
<name>
<surname>Beavis</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Berardini</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Dixon</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Doyle</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Carcia-Hernandez</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huala</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Montoya</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Mueller</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Mundodi</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Reiser</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tacklind</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Weems</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Yoo</surname>
<given-names>D</given-names>
</name>
<name>
<surname>YoonJ, Zhang</surname>
<given-names>P</given-names>
</name>
</person-group>
(
<year>2003</year>
)
<article-title> The
<italic>Arabidopsis</italic>
information resource (TAIR): a model organism database providing a centralized, curated gateway to
<italic>Arabidopsis</italic>
biology, research materials and community.</article-title>
<source>Nucleic Acids Research </source>
<volume>31</volume>
:
<fpage>224</fpage>
-
<lpage>228</lpage>
doi: 10.1093/nar/gkg076
<pub-id pub-id-type="pmid">12519987</pub-id>
</mixed-citation>
</ref>
<ref id="B91">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>RIN</surname>
<given-names> (Research InformationNetwork)</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<source> To share or not to share: publication and quality assurance of research data outputs. A report commissioned by the Research Information Network. [
<ext-link ext-link-type="uri" xlink:href="http://www.rin.ac.uk/data-publication">http://www.rin.ac.uk/data-publication</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B92">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Rogers</surname>
<given-names>EM</given-names>
</name>
</person-group>
(
<year>1983</year>
)
<source> Diffusion of innovations. 3
<sup>rd</sup>
Edition. Free Press, New York. </source>
</mixed-citation>
</ref>
<ref id="B93">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Savage</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Vickers</surname>
<given-names>AJ</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Empirical study of data-sharing by authors publishing in PLoS journals. PLoS ONE 4: e7078. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0007078">doi: 10.1371/journal.pone.0007078</ext-link>
</mixed-citation>
</ref>
<ref id="B94">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Schofield</surname>
<given-names>PN</given-names>
</name>
<name>
<surname>Eppig</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Huala</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hrabe de Angelis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Harvey</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Weaver</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Smedley</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rosenthal</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Schughart</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Aidinis</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Tocchini-Valentini</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hancock</surname>
<given-names>JM</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Sustaining the data and bioresource commons.</article-title>
<source>Science </source>
<volume>330</volume>
:
<fpage>592</fpage>
-
<lpage>593</lpage>
doi: 10.1126/science.1191506
<pub-id pub-id-type="pmid">21030633</pub-id>
</mixed-citation>
</ref>
<ref id="B95">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Science staff editorial</surname>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Challenges and opportunities.</article-title>
<source>Science </source>
<volume>331</volume>
:
<fpage>692</fpage>
-
<lpage>693</lpage>
<pub-id pub-id-type="pmid">21311002</pub-id>
</mixed-citation>
</ref>
<ref id="B96">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Shirky</surname>
<given-names>C</given-names>
</name>
</person-group>
(
<year>2005</year>
)
<source> Making digital durable. [] </source>
</mixed-citation>
</ref>
<ref id="B97">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Silvertown</surname>
<given-names>J</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<article-title> A new dawn for citizen science.</article-title>
<source>Trends in Ecology and Evolution, </source>
<volume>24</volume>
:
<fpage>467</fpage>
-
<lpage>471</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.tree.2009.03.017">doi: 10.1016/j.tree.2009.03.017</ext-link>
<pub-id pub-id-type="pmid">19586682</pub-id>
</mixed-citation>
</ref>
<ref id="B98">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Sinha</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Malik</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Rezgui</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Barnes</surname>
<given-names>CG</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Heiken</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>WA</given-names>
</name>
<name>
<surname>Gundersen</surname>
<given-names>LC</given-names>
</name>
<name>
<surname>Raskin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Fox</surname>
<given-names>P</given-names>
</name>
<name>
<surname>McGuinness</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Seber</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zimmerman</surname>
<given-names>H</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Geoinformatics: transforming data to knowledge for geosciences.</article-title>
<source>GSA Today </source>
<volume>20</volume>
:
<fpage>4</fpage>
-
<lpage>10</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1130/GSATG85A.1">doi: 10.1130/GSATG85A.1</ext-link>
</mixed-citation>
</ref>
<ref id="B99">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Sirovich</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Stoeckle</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Structural analysis of biodiversity. PLoS ONE 5:e9266. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0009266">doi: 10.1371/journal.pone.0009266</ext-link>
</mixed-citation>
</ref>
<ref id="B100">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Smith</surname>
<given-names>VS</given-names>
</name>
</person-group>
(
<year>2009</year>
)
<source> Data publication: towards a database of everything. BMC Research Notes 2: 113. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1756-0500-2-113">doi: 10.1186/1756-0500-2-113</ext-link>
</mixed-citation>
</ref>
<ref id="B101">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Smithsonian</surname>
<given-names>Institution</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Sharing Smithsonian digital scientific research data from biology. Smithsonian Institution Office of Policy and Analysis, Washington DC. [
<ext-link ext-link-type="uri" xlink:href="http://www.si.edu/opanda/docs/Rpts2011/DataSharingFinal110328.pdf">http://www.si.edu/opanda/docs/Rpts2011/DataSharingFinal110328.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B102">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Stein</surname>
<given-names>LD</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges.</article-title>
<source>Nature Reviews Genetics </source>
<volume>9</volume>
:
<fpage>678</fpage>
-
<lpage>688</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nrg2414">doi: 10.1038/nrg2414</ext-link>
</mixed-citation>
</ref>
<ref id="B103">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Taylor</surname>
<given-names>CF</given-names>
</name>
<name>
<surname>Field</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sansone</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Aerts</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Binz</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Bogue</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Booth</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Brazma</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brinkman</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Deutsch</surname>
<given-names>EW</given-names>
</name>
<name>
<surname>Fiehn</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Fostel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ghazal</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Gray</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Frimes</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Hancock</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Hardy</surname>
<given-names>NW</given-names>
</name>
<name>
<surname>Hermjakob</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Julian Jr.</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Kane</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kettner</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kinsinger</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kolker</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kuiper</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Le Novère</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Leebens-Mack</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Lord</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mallon</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Marthandan</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Masuya</surname>
<given-names>H</given-names>
</name>
<name>
<surname>McNally</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mehrle</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Morrison</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Orchard</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Quackenbush</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Reecy</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Robertson</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Rocca-Serra</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rodriguez</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Rosenfelder</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Santoyo-Lopez</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Scheuermann</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Schober</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Snape</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stoeckert Jr.</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Tipton</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sterk</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Untergasser</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Vandesompele</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wiemann</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project.</article-title>
<source>Nature Biotechnology </source>
<volume>26</volume>
:
<fpage>889</fpage>
-
<lpage>896</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nbt.1411">doi: 10.1038/nbt.1411</ext-link>
</mixed-citation>
</ref>
<ref id="B104">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>TEEB</surname>
</name>
</person-group>
(
<year>2010</year>
)
<source> The economics of ecosystems and biodiversity: Mainstreaming the economics of nature: A synthesis of the approach, conclusions and recommendations of TEEB. United Nations Environment Program. </source>
</mixed-citation>
</ref>
<ref id="B105">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Thessen</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>DJ</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Data Issues in the Life Sciences. [
<ext-link ext-link-type="uri" xlink:href="http://dataconservancy.org/sites/default/files/Data%20Issues%20in%20the%20Life%20Sciences%20White%20Paper.pdf">http://dataconservancy.org/sites/default/files/Data%20Issues%20in%20the%20Life%20Sciences%20White%20Paper.pdf</ext-link>
] </source>
</mixed-citation>
</ref>
<ref id="B106">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Tittensor</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Mora</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jetz</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lotze</surname>
<given-names>HK</given-names>
</name>
<name>
<surname>Ricard</surname>
<given-names>D</given-names>
</name>
<name>
<surname>van den Berghe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Worm</surname>
<given-names>B</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Global patterns and predictors of marine biodiversity across taxa.</article-title>
<source>Nature </source>
<volume>466</volume>
:
<fpage>1098</fpage>
-
<lpage>1101</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nature09329">doi: 10.1038/nature09329</ext-link>
<pub-id pub-id-type="pmid">20668450</pub-id>
</mixed-citation>
</ref>
<ref id="B107">
<element-citation publication-type="other">
<person-group>
<name>
<surname>United States Department of Labor</surname>
</name>
</person-group>
<source> Occupational Outlook Handbook, 2010–11 Edition. [
<ext-link ext-link-type="uri" xlink:href="http://www.bls.gov/oco/ocos047.htm">http://www.bls.gov/oco/ocos047.htm</ext-link>
] </source>
</element-citation>
</ref>
<ref id="B108">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Vision</surname>
<given-names>TJ</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Open data and the social contract of scientific publishing.</article-title>
<source>BioScience </source>
<volume>60</volume>
:
<fpage>330</fpage>
-
<lpage>330</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1525/bio.2010.60.5.2">doi: 10.1525/bio.2010.60.5.2</ext-link>
</mixed-citation>
</ref>
<ref id="B109">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Vollmar</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Macklin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ford</surname>
<given-names>LS</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Natural history specimen digitization: challenges and concerns.</article-title>
<source>Biodiversity Informatics </source>
<volume>7</volume>
:
<fpage>93</fpage>
-
<lpage>112</lpage>
</mixed-citation>
</ref>
<ref id="B110">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Webb</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Vanden Berghe</surname>
<given-names>E</given-names>
</name>
<name>
<surname>O’Dor</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<source> Biodiversity’s big wet secret: The global distribution of marine biological records reveals chronic under-exploration of the deep pelagic ocean. PLoS ONE 5: e10223. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0010223">doi: 10.1371/journal.pone.0010223</ext-link>
</mixed-citation>
</ref>
<ref id="B111">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>White</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Carrier</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Greenberg</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Scherle</surname>
<given-names>R</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<source> The dryad data repository: a Singapore framework metadata architecture in a DSpace environment. Proceedings of the International Conference on Dublic core and Metadata Applications 157–162. </source>
</mixed-citation>
</ref>
<ref id="B112">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Whitlock</surname>
<given-names>MC</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<article-title> Data archiving in ecology and evolution: best practices.</article-title>
<source>Trends in Ecology and Evolution </source>
<volume>26</volume>
:
<fpage>61</fpage>
-
<lpage>65</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1016/j.tree.2010.11.006">doi: 10.1016/j.tree.2010.11.006</ext-link>
<pub-id pub-id-type="pmid">21159406</pub-id>
</mixed-citation>
</ref>
<ref id="B113">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Whitlock</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>McPeek</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Rausher</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Rieseberg</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>AJ</given-names>
</name>
</person-group>
(
<year>2010</year>
)
<article-title> Data archiving.</article-title>
<source>The American Naturalist </source>
<volume>175</volume>
:
<fpage>145</fpage>
-
<lpage>146</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1086/650340">doi: 10.1086/650340</ext-link>
</mixed-citation>
</ref>
<ref id="B114">
<mixed-citation publication-type="journal">
<person-group>
<name>
<surname>Wren</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
</person-group>
(
<year>2008</year>
)
<article-title> Databases, data tombs and dust in the wind.</article-title>
<source>Bioinformatics </source>
<volume>24</volume>
:
<fpage>2127</fpage>
-
<lpage>2128</lpage>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btn464">doi: 10.1093/bioinformatics/btn464</ext-link>
<pub-id pub-id-type="pmid">18819940</pub-id>
</mixed-citation>
</ref>
<ref id="B115">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Zhang</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kihara</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Prabhakar</surname>
<given-names>S</given-names>
</name>
</person-group>
(
<year>2007</year>
)
<source> Tracing lineage in multi-version scientific databases. Technical Report CSD TR 06–013, Purdue University. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1109/BIBE.2007.4375599">doi: 10.1109/BIBE.2007.4375599</ext-link>
</mixed-citation>
</ref>
<ref id="B116">
<mixed-citation publication-type="other">
<person-group>
<name>
<surname>Ziegler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Mietchen</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Faber</surname>
<given-names>C</given-names>
</name>
<name>
<surname>von Hausen</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Schöbel</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sellerer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ziegler</surname>
<given-names>A</given-names>
</name>
</person-group>
(
<year>2011</year>
)
<source> Effectively incorporating selected multimedia content into medical publications. BMC Medicine 9: 17. </source>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1741-7015-9-17">doi: 10.1186/1741-7015-9-17</ext-link>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000719 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000719 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3234430
   |texte=   Data issues in the life sciences
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22207805" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024