Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

Identifieur interne : 000592 ( Pmc/Curation ); précédent : 000591; suivant : 000593

The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families

Auteurs : Shibu Yooseph [États-Unis] ; Granger Sutton [États-Unis] ; Douglas B. Rusch [États-Unis] ; Aaron L. Halpern [États-Unis] ; Shannon J. Williamson [États-Unis] ; Karin Remington [États-Unis] ; Jonathan A. Eisen [États-Unis] ; Karla B. Heidelberg [États-Unis] ; Gerard Manning [États-Unis] ; Weizhong Li [États-Unis] ; Lukasz Jaroszewski [États-Unis] ; Piotr Cieplak [États-Unis] ; Christopher S. Miller [États-Unis] ; Huiying Li [États-Unis] ; Susan T. Mashiyama [États-Unis] ; Marcin P. Joachimiak [États-Unis] ; Christopher Van Belle [États-Unis] ; John-Marc Chandonia [États-Unis] ; David A. Soergel [États-Unis] ; Yufeng Zhai [États-Unis] ; Kannan Natarajan [États-Unis] ; Shaun Lee [États-Unis] ; Benjamin J. Raphael [États-Unis] ; Vineet Bafna [États-Unis] ; Robert Friedman [États-Unis] ; Steven E. Brenner [États-Unis] ; Adam Godzik [États-Unis] ; David Eisenberg [États-Unis] ; Jack E. Dixon [États-Unis] ; Susan S. Taylor [États-Unis] ; Robert L. Strausberg [États-Unis] ; Marvin Frazier [États-Unis] ; J. Craig Venter [États-Unis]

Source :

RBID : PMC:1821046

Abstract

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.


Url:
DOI: 10.1371/journal.pbio.0050016
PubMed: 17355171
PubMed Central: 1821046

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:1821046

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The
<italic>Sorcerer II</italic>
Global Ocean Sampling Expedition: Expanding the Universe of Protein Families</title>
<author>
<name sortKey="Yooseph, Shibu" sort="Yooseph, Shibu" uniqKey="Yooseph S" first="Shibu" last="Yooseph">Shibu Yooseph</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Sutton, Granger" sort="Sutton, Granger" uniqKey="Sutton G" first="Granger" last="Sutton">Granger Sutton</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rusch, Douglas B" sort="Rusch, Douglas B" uniqKey="Rusch D" first="Douglas B" last="Rusch">Douglas B. Rusch</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Halpern, Aaron L" sort="Halpern, Aaron L" uniqKey="Halpern A" first="Aaron L" last="Halpern">Aaron L. Halpern</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Williamson, Shannon J" sort="Williamson, Shannon J" uniqKey="Williamson S" first="Shannon J" last="Williamson">Shannon J. Williamson</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Remington, Karin" sort="Remington, Karin" uniqKey="Remington K" first="Karin" last="Remington">Karin Remington</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Eisen, Jonathan A" sort="Eisen, Jonathan A" uniqKey="Eisen J" first="Jonathan A" last="Eisen">Jonathan A. Eisen</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2"> University of California, Davis, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California, Davis, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Heidelberg, Karla B" sort="Heidelberg, Karla B" uniqKey="Heidelberg K" first="Karla B" last="Heidelberg">Karla B. Heidelberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Manning, Gerard" sort="Manning, Gerard" uniqKey="Manning G" first="Gerard" last="Manning">Gerard Manning</name>
<affiliation wicri:level="1">
<nlm:aff id="aff3"> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jaroszewski, Lukasz" sort="Jaroszewski, Lukasz" uniqKey="Jaroszewski L" first="Lukasz" last="Jaroszewski">Lukasz Jaroszewski</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Cieplak, Piotr" sort="Cieplak, Piotr" uniqKey="Cieplak P" first="Piotr" last="Cieplak">Piotr Cieplak</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Miller, Christopher S" sort="Miller, Christopher S" uniqKey="Miller C" first="Christopher S" last="Miller">Christopher S. Miller</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Li, Huiying" sort="Li, Huiying" uniqKey="Li H" first="Huiying" last="Li">Huiying Li</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Mashiyama, Susan T" sort="Mashiyama, Susan T" uniqKey="Mashiyama S" first="Susan T" last="Mashiyama">Susan T. Mashiyama</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Joachimiak, Marcin P" sort="Joachimiak, Marcin P" uniqKey="Joachimiak M" first="Marcin P" last="Joachimiak">Marcin P. Joachimiak</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Van Belle, Christopher" sort="Van Belle, Christopher" uniqKey="Van Belle C" first="Christopher" last="Van Belle">Christopher Van Belle</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Chandonia, John Marc" sort="Chandonia, John Marc" uniqKey="Chandonia J" first="John-Marc" last="Chandonia">John-Marc Chandonia</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff7"> Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Soergel, David A" sort="Soergel, David A" uniqKey="Soergel D" first="David A" last="Soergel">David A. Soergel</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhai, Yufeng" sort="Zhai, Yufeng" uniqKey="Zhai Y" first="Yufeng" last="Zhai">Yufeng Zhai</name>
<affiliation wicri:level="1">
<nlm:aff id="aff3"> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Natarajan, Kannan" sort="Natarajan, Kannan" uniqKey="Natarajan K" first="Kannan" last="Natarajan">Kannan Natarajan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Lee, Shaun" sort="Lee, Shaun" uniqKey="Lee S" first="Shaun" last="Lee">Shaun Lee</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Raphael, Benjamin J" sort="Raphael, Benjamin J" uniqKey="Raphael B" first="Benjamin J" last="Raphael">Benjamin J. Raphael</name>
<affiliation wicri:level="1">
<nlm:aff id="aff9"> Brown University, Providence, Rhode Island, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Brown University, Providence, Rhode Island</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Bafna, Vineet" sort="Bafna, Vineet" uniqKey="Bafna V" first="Vineet" last="Bafna">Vineet Bafna</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Friedman, Robert" sort="Friedman, Robert" uniqKey="Friedman R" first="Robert" last="Friedman">Robert Friedman</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Brenner, Steven E" sort="Brenner, Steven E" uniqKey="Brenner S" first="Steven E" last="Brenner">Steven E. Brenner</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Godzik, Adam" sort="Godzik, Adam" uniqKey="Godzik A" first="Adam" last="Godzik">Adam Godzik</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Eisenberg, David" sort="Eisenberg, David" uniqKey="Eisenberg D" first="David" last="Eisenberg">David Eisenberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Dixon, Jack E" sort="Dixon, Jack E" uniqKey="Dixon J" first="Jack E" last="Dixon">Jack E. Dixon</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Susan S" sort="Taylor, Susan S" uniqKey="Taylor S" first="Susan S" last="Taylor">Susan S. Taylor</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Strausberg, Robert L" sort="Strausberg, Robert L" uniqKey="Strausberg R" first="Robert L" last="Strausberg">Robert L. Strausberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Frazier, Marvin" sort="Frazier, Marvin" uniqKey="Frazier M" first="Marvin" last="Frazier">Marvin Frazier</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Venter, J Craig" sort="Venter, J Craig" uniqKey="Venter J" first="J. Craig" last="Venter">J. Craig Venter</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17355171</idno>
<idno type="pmc">1821046</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1821046</idno>
<idno type="RBID">PMC:1821046</idno>
<idno type="doi">10.1371/journal.pbio.0050016</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000592</idno>
<idno type="wicri:Area/Pmc/Curation">000592</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">The
<italic>Sorcerer II</italic>
Global Ocean Sampling Expedition: Expanding the Universe of Protein Families</title>
<author>
<name sortKey="Yooseph, Shibu" sort="Yooseph, Shibu" uniqKey="Yooseph S" first="Shibu" last="Yooseph">Shibu Yooseph</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Sutton, Granger" sort="Sutton, Granger" uniqKey="Sutton G" first="Granger" last="Sutton">Granger Sutton</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rusch, Douglas B" sort="Rusch, Douglas B" uniqKey="Rusch D" first="Douglas B" last="Rusch">Douglas B. Rusch</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Halpern, Aaron L" sort="Halpern, Aaron L" uniqKey="Halpern A" first="Aaron L" last="Halpern">Aaron L. Halpern</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Williamson, Shannon J" sort="Williamson, Shannon J" uniqKey="Williamson S" first="Shannon J" last="Williamson">Shannon J. Williamson</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Remington, Karin" sort="Remington, Karin" uniqKey="Remington K" first="Karin" last="Remington">Karin Remington</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Eisen, Jonathan A" sort="Eisen, Jonathan A" uniqKey="Eisen J" first="Jonathan A" last="Eisen">Jonathan A. Eisen</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff2"> University of California, Davis, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California, Davis, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Heidelberg, Karla B" sort="Heidelberg, Karla B" uniqKey="Heidelberg K" first="Karla B" last="Heidelberg">Karla B. Heidelberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Manning, Gerard" sort="Manning, Gerard" uniqKey="Manning G" first="Gerard" last="Manning">Gerard Manning</name>
<affiliation wicri:level="1">
<nlm:aff id="aff3"> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Li, Weizhong" sort="Li, Weizhong" uniqKey="Li W" first="Weizhong" last="Li">Weizhong Li</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Jaroszewski, Lukasz" sort="Jaroszewski, Lukasz" uniqKey="Jaroszewski L" first="Lukasz" last="Jaroszewski">Lukasz Jaroszewski</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Cieplak, Piotr" sort="Cieplak, Piotr" uniqKey="Cieplak P" first="Piotr" last="Cieplak">Piotr Cieplak</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Miller, Christopher S" sort="Miller, Christopher S" uniqKey="Miller C" first="Christopher S" last="Miller">Christopher S. Miller</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Li, Huiying" sort="Li, Huiying" uniqKey="Li H" first="Huiying" last="Li">Huiying Li</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Mashiyama, Susan T" sort="Mashiyama, Susan T" uniqKey="Mashiyama S" first="Susan T" last="Mashiyama">Susan T. Mashiyama</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Joachimiak, Marcin P" sort="Joachimiak, Marcin P" uniqKey="Joachimiak M" first="Marcin P" last="Joachimiak">Marcin P. Joachimiak</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Van Belle, Christopher" sort="Van Belle, Christopher" uniqKey="Van Belle C" first="Christopher" last="Van Belle">Christopher Van Belle</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Chandonia, John Marc" sort="Chandonia, John Marc" uniqKey="Chandonia J" first="John-Marc" last="Chandonia">John-Marc Chandonia</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="aff7"> Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Soergel, David A" sort="Soergel, David A" uniqKey="Soergel D" first="David A" last="Soergel">David A. Soergel</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Zhai, Yufeng" sort="Zhai, Yufeng" uniqKey="Zhai Y" first="Yufeng" last="Zhai">Yufeng Zhai</name>
<affiliation wicri:level="1">
<nlm:aff id="aff3"> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Natarajan, Kannan" sort="Natarajan, Kannan" uniqKey="Natarajan K" first="Kannan" last="Natarajan">Kannan Natarajan</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Lee, Shaun" sort="Lee, Shaun" uniqKey="Lee S" first="Shaun" last="Lee">Shaun Lee</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Raphael, Benjamin J" sort="Raphael, Benjamin J" uniqKey="Raphael B" first="Benjamin J" last="Raphael">Benjamin J. Raphael</name>
<affiliation wicri:level="1">
<nlm:aff id="aff9"> Brown University, Providence, Rhode Island, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Brown University, Providence, Rhode Island</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Bafna, Vineet" sort="Bafna, Vineet" uniqKey="Bafna V" first="Vineet" last="Bafna">Vineet Bafna</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Friedman, Robert" sort="Friedman, Robert" uniqKey="Friedman R" first="Robert" last="Friedman">Robert Friedman</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Brenner, Steven E" sort="Brenner, Steven E" uniqKey="Brenner S" first="Steven E" last="Brenner">Steven E. Brenner</name>
<affiliation wicri:level="1">
<nlm:aff id="aff6"> University of California Berkeley, Berkeley, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Berkeley, Berkeley, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Godzik, Adam" sort="Godzik, Adam" uniqKey="Godzik A" first="Adam" last="Godzik">Adam Godzik</name>
<affiliation wicri:level="1">
<nlm:aff id="aff4"> Burnham Institute for Medical Research, La Jolla, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> Burnham Institute for Medical Research, La Jolla, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Eisenberg, David" sort="Eisenberg, David" uniqKey="Eisenberg D" first="David" last="Eisenberg">David Eisenberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff5"> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Dixon, Jack E" sort="Dixon, Jack E" uniqKey="Dixon J" first="Jack E" last="Dixon">Jack E. Dixon</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Susan S" sort="Taylor, Susan S" uniqKey="Taylor S" first="Susan S" last="Taylor">Susan S. Taylor</name>
<affiliation wicri:level="1">
<nlm:aff id="aff8"> University of California San Diego, San Diego, California, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> University of California San Diego, San Diego, California</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Strausberg, Robert L" sort="Strausberg, Robert L" uniqKey="Strausberg R" first="Robert L" last="Strausberg">Robert L. Strausberg</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Frazier, Marvin" sort="Frazier, Marvin" uniqKey="Frazier M" first="Marvin" last="Frazier">Marvin Frazier</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Venter, J Craig" sort="Venter, J Craig" uniqKey="Venter J" first="J. Craig" last="Venter">J. Craig Venter</name>
<affiliation wicri:level="1">
<nlm:aff id="aff1"> J. Craig Venter Institute, Rockville, Maryland, United States of America</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea> J. Craig Venter Institute, Rockville, Maryland</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS Biology</title>
<idno type="ISSN">1544-9173</idno>
<idno type="eISSN">1545-7885</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS Biol</journal-id>
<journal-id journal-id-type="publisher-id">pbio</journal-id>
<journal-title>PLoS Biology</journal-title>
<issn pub-type="ppub">1544-9173</issn>
<issn pub-type="epub">1545-7885</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17355171</article-id>
<article-id pub-id-type="pmc">1821046</article-id>
<article-id pub-id-type="doi">10.1371/journal.pbio.0050016</article-id>
<article-id pub-id-type="publisher-id">06-PLBI-RA-0500R3</article-id>
<article-id pub-id-type="sici">plbi-05-03-23</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline">
<subject>Computational Biology</subject>
<subject>Evolutionary Biology</subject>
<subject>Genetics and Genomics</subject>
<subject>Molecular Biology</subject>
</subj-group>
<subj-group subj-group-type="System Taxonomy">
<subject>Eubacteria</subject>
<subject>Viruses</subject>
</subj-group>
<series-title>Oceanic Metagenomics</series-title>
</article-categories>
<title-group>
<article-title>The
<italic>Sorcerer II</italic>
Global Ocean Sampling Expedition: Expanding the Universe of Protein Families</article-title>
<alt-title alt-title-type="running-head">Expanding the Protein Family Universe</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Yooseph</surname>
<given-names>Shibu</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sutton</surname>
<given-names>Granger</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rusch</surname>
<given-names>Douglas B</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Halpern</surname>
<given-names>Aaron L</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Williamson</surname>
<given-names>Shannon J</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Remington</surname>
<given-names>Karin</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Eisen</surname>
<given-names>Jonathan A</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
<xref rid="aff2" ref-type="aff">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Heidelberg</surname>
<given-names>Karla B</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Manning</surname>
<given-names>Gerard</given-names>
</name>
<xref rid="aff3" ref-type="aff">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Weizhong</given-names>
</name>
<xref rid="aff4" ref-type="aff">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jaroszewski</surname>
<given-names>Lukasz</given-names>
</name>
<xref rid="aff4" ref-type="aff">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cieplak</surname>
<given-names>Piotr</given-names>
</name>
<xref rid="aff4" ref-type="aff">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Miller</surname>
<given-names>Christopher S</given-names>
</name>
<xref rid="aff5" ref-type="aff">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Huiying</given-names>
</name>
<xref rid="aff5" ref-type="aff">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mashiyama</surname>
<given-names>Susan T</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Joachimiak</surname>
<given-names>Marcin P</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>van Belle</surname>
<given-names>Christopher</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chandonia</surname>
<given-names>John-Marc</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
<xref rid="aff7" ref-type="aff">7</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Soergel</surname>
<given-names>David A</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhai</surname>
<given-names>Yufeng</given-names>
</name>
<xref rid="aff3" ref-type="aff">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Natarajan</surname>
<given-names>Kannan</given-names>
</name>
<xref rid="aff8" ref-type="aff">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lee</surname>
<given-names>Shaun</given-names>
</name>
<xref rid="aff8" ref-type="aff">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Raphael</surname>
<given-names>Benjamin J</given-names>
</name>
<xref rid="aff9" ref-type="aff">9</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bafna</surname>
<given-names>Vineet</given-names>
</name>
<xref rid="aff8" ref-type="aff">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Friedman</surname>
<given-names>Robert</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Brenner</surname>
<given-names>Steven E</given-names>
</name>
<xref rid="aff6" ref-type="aff">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Godzik</surname>
<given-names>Adam</given-names>
</name>
<xref rid="aff4" ref-type="aff">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Eisenberg</surname>
<given-names>David</given-names>
</name>
<xref rid="aff5" ref-type="aff">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Dixon</surname>
<given-names>Jack E</given-names>
</name>
<xref rid="aff8" ref-type="aff">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Taylor</surname>
<given-names>Susan S</given-names>
</name>
<xref rid="aff8" ref-type="aff">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Strausberg</surname>
<given-names>Robert L</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Frazier</surname>
<given-names>Marvin</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Venter</surname>
<given-names>J. Craig</given-names>
</name>
<xref rid="aff1" ref-type="aff">1</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
J. Craig Venter Institute, Rockville, Maryland, United States of America</aff>
<aff id="aff2">
<label>2</label>
University of California, Davis, California, United States of America</aff>
<aff id="aff3">
<label>3</label>
Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, California, United States of America</aff>
<aff id="aff4">
<label>4</label>
Burnham Institute for Medical Research, La Jolla, California, United States of America</aff>
<aff id="aff5">
<label>5</label>
University of California Los Angeles–Department of Energy Institute for Genomics and Proteomics, Los Angeles, California, United States of America</aff>
<aff id="aff6">
<label>6</label>
University of California Berkeley, Berkeley, California, United States of America</aff>
<aff id="aff7">
<label>7</label>
Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America</aff>
<aff id="aff8">
<label>8</label>
University of California San Diego, San Diego, California, United States of America</aff>
<aff id="aff9">
<label>9</label>
Brown University, Providence, Rhode Island, United States of America</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Eddy</surname>
<given-names>Sean</given-names>
</name>
<role>Academic Editor</role>
<xref rid="edit1" ref-type="aff"></xref>
</contrib>
</contrib-group>
<aff id="edit1">Washington University St. Louis, United States of America</aff>
<author-notes>
<corresp id="cor1">* To whom correspondence should be addressed. E-mail:
<email>Shibu.Yooseph@venterinstitute.org</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>3</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>13</day>
<month>3</month>
<year>2007</year>
</pub-date>
<volume>5</volume>
<issue>3</issue>
<elocation-id>e16</elocation-id>
<history>
<date date-type="received">
<day>24</day>
<month>3</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>15</day>
<month>8</month>
<year>2006</year>
</date>
</history>
<copyright-statement>
<bold>Copyright:</bold>
© 2007 Yooseph et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</copyright-statement>
<copyright-year>2007</copyright-year>
<related-article xlink:href="10.1371/journal.pbio.0050085" related-article-type="companion" xlink:title="Synopsis" vol="5" page="e85" id="N0x8eb5648N0x8eaa8a8" ext-link-type="doi">
<article-title>Untapped Bounty: Sampling the Seas to Survey Microbial Biodiversity</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050082" related-article-type="companion" xlink:title="Essay" vol="5" page="e82" id="N0x8eb5648N0x8eaa8d8" ext-link-type="doi">
<article-title>Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050075" related-article-type="companion" xlink:title="Community Page" vol="5" page="e75" id="N0x8eb5648N0x8eaa908" ext-link-type="doi">
<article-title>CAMERA: A Community Resource for Metagenomics</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050017" related-article-type="companion" xlink:title="Research Article" vol="5" page="e17" id="N0x8eb5648N0x8eaa938" ext-link-type="doi">
<article-title>Structural and Functional Diversity of the Microbial Kinome</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050083" related-article-type="companion" xlink:title="Editorial" vol="5" page="e83" id="N0x8eb5648N0x8eaa968" ext-link-type="doi">
<article-title>Global Ocean Sampling Collection</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050077" related-article-type="companion" xlink:title="Research Article" vol="5" page="e77" id="N0x8eb5648N0x8eaa998" ext-link-type="doi">
<article-title>The
<italic>Sorcerer II</italic>
Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific</article-title>
</related-article>
<related-article xlink:href="10.1371/journal.pbio.0050074" related-article-type="companion" xlink:title="Feature" vol="5" page="e74" id="N0x8eb5648N0x8eaa9c8" ext-link-type="doi">
<article-title>
<italic>Sorcerer II:</italic>
The Search for Microbial Diversity Roils the Waters</article-title>
</related-article>
<abstract>
<p>Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.</p>
</abstract>
<abstract abstract-type="summary">
<title>Author Summary</title>
<sec id="st1">
<title></title>
<p>The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the
<italic>Sorcerer II</italic>
Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature.</p>
</sec>
</abstract>
<abstract abstract-type="toc">
<p>The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.</p>
</abstract>
<counts>
<page-count count="35"></page-count>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>citation</meta-name>
<meta-value>Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al. (2007) The
<italic>Sorcerer II</italic>
Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol 5(3): e16. doi:
<ext-link ext-link-type="doi" xlink:href="10.1371/journal.pbio.0050016">10.1371/journal.pbio.0050016</ext-link>
</meta-value>
</custom-meta>
<custom-meta>
<meta-name>article-logo</meta-name>
<meta-value>oceaniclogo.jpg</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000592 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000592 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:1821046
   |texte=   The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:17355171" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024