Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Protein structure determination by exhaustive search of Protein Data Bank derived databases.

Identifieur interne : 000059 ( PubMed/Checkpoint ); précédent : 000058; suivant : 000060

Protein structure determination by exhaustive search of Protein Data Bank derived databases.

Auteurs : Ian Stokes-Rees [États-Unis] ; Piotr Sliz

Source :

RBID : pubmed:21098306

English descriptors

Abstract

Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

DOI: 10.1073/pnas.1012095107
PubMed: 21098306


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:21098306

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Protein structure determination by exhaustive search of Protein Data Bank derived databases.</title>
<author>
<name sortKey="Stokes Rees, Ian" sort="Stokes Rees, Ian" uniqKey="Stokes Rees I" first="Ian" last="Stokes-Rees">Ian Stokes-Rees</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Sliz, Piotr" sort="Sliz, Piotr" uniqKey="Sliz P" first="Piotr" last="Sliz">Piotr Sliz</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2010">2010</date>
<idno type="doi">10.1073/pnas.1012095107</idno>
<idno type="RBID">pubmed:21098306</idno>
<idno type="pmid">21098306</idno>
<idno type="wicri:Area/PubMed/Corpus">000060</idno>
<idno type="wicri:Area/PubMed/Curation">000060</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000060</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Protein structure determination by exhaustive search of Protein Data Bank derived databases.</title>
<author>
<name sortKey="Stokes Rees, Ian" sort="Stokes Rees, Ian" uniqKey="Stokes Rees I" first="Ian" last="Stokes-Rees">Ian Stokes-Rees</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Sliz, Piotr" sort="Sliz, Piotr" uniqKey="Sliz P" first="Piotr" last="Sliz">Piotr Sliz</name>
</author>
</analytic>
<series>
<title level="j">Proceedings of the National Academy of Sciences of the United States of America</title>
<idno type="eISSN">1091-6490</idno>
<imprint>
<date when="2010" type="published">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Amino Acid Sequence</term>
<term>Animals</term>
<term>Computational Biology (methods)</term>
<term>Databases, Protein</term>
<term>Mice</term>
<term>Models, Molecular</term>
<term>Molecular Sequence Data</term>
<term>Protein Conformation</term>
<term>Proteins (chemistry)</term>
<term>Proteins (genetics)</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Homology, Amino Acid</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="chemistry" xml:lang="en">
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>Proteins</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>Sequence Alignment</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Amino Acid Sequence</term>
<term>Animals</term>
<term>Databases, Protein</term>
<term>Mice</term>
<term>Models, Molecular</term>
<term>Molecular Sequence Data</term>
<term>Protein Conformation</term>
<term>Sequence Homology, Amino Acid</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">21098306</PMID>
<DateCreated>
<Year>2011</Year>
<Month>03</Month>
<Day>22</Day>
</DateCreated>
<DateCompleted>
<Year>2011</Year>
<Month>05</Month>
<Day>12</Day>
</DateCompleted>
<DateRevised>
<Year>2015</Year>
<Month>02</Month>
<Day>05</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1091-6490</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>107</Volume>
<Issue>50</Issue>
<PubDate>
<Year>2010</Year>
<Month>Dec</Month>
<Day>14</Day>
</PubDate>
</JournalIssue>
<Title>Proceedings of the National Academy of Sciences of the United States of America</Title>
<ISOAbbreviation>Proc. Natl. Acad. Sci. U.S.A.</ISOAbbreviation>
</Journal>
<ArticleTitle>Protein structure determination by exhaustive search of Protein Data Bank derived databases.</ArticleTitle>
<Pagination>
<MedlinePgn>21476-81</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1073/pnas.1012095107</ELocationID>
<Abstract>
<AbstractText>Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Stokes-Rees</LastName>
<ForeName>Ian</ForeName>
<Initials>I</Initials>
<AffiliationInfo>
<Affiliation>Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Sliz</LastName>
<ForeName>Piotr</ForeName>
<Initials>P</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>P01 GM062580</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS NIH HHS</Agency>
<Country>United States</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D052061">Research Support, N.I.H., Extramural</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2010</Year>
<Month>11</Month>
<Day>22</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Proc Natl Acad Sci U S A</MedlineTA>
<NlmUniqueID>7505876</NlmUniqueID>
<ISSNLinking>0027-8424</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D011506">Proteins</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>Nucleic Acids Res. 2000 Jan 1;28(1):235-42</RefSource>
<PMID Version="1">10592235</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr D Biol Crystallogr. 2001 Oct;57(Pt 10):1428-34</RefSource>
<PMID Version="1">11567156</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nat Struct Biol. 2003 Oct;10(10):856-63</RefSource>
<PMID Version="1">12949490</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Struct Biol. 2003 Dec;144(3):337-48</RefSource>
<PMID Version="1">14643202</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr A. 1991 Mar 1;47 ( Pt 2):110-9</RefSource>
<PMID Version="1">2025413</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Mol Biol. 1995 Apr 7;247(4):536-40</RefSource>
<PMID Version="1">7723011</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nucleic Acids Res. 2005;33(7):2302-9</RefSource>
<PMID Version="1">15849316</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr D Biol Crystallogr. 2008 Jan;64(Pt 1):119-24</RefSource>
<PMID Version="1">18094475</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr D Biol Crystallogr. 2008 Jan;64(Pt 1):125-32</RefSource>
<PMID Version="1">18094476</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr D Biol Crystallogr. 2008 Jan;64(Pt 1):133-40</RefSource>
<PMID Version="1">18094477</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nucleic Acids Res. 2008 Jan;36(Database issue):D419-25</RefSource>
<PMID Version="1">18000004</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Immunity. 2008 Feb;28(2):171-82</RefSource>
<PMID Version="1">18275829</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Mol Biol. 2008 Apr 4;377(4):1265-78</RefSource>
<PMID Version="1">18313074</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Structure. 2008 May;16(5):715-26</RefSource>
<PMID Version="1">18462676</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biol Chem. 2008 May 16;283(20):13745-52</RefSource>
<PMID Version="1">18332143</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Methods Mol Biol. 2008;426:419-35</RefSource>
<PMID Version="1">18542881</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>J Biol Chem. 2008 Oct 17;283(42):28710-20</RefSource>
<PMID Version="1">18701448</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Protein Sci. 2009 Jun;18(6):1306-15</RefSource>
<PMID Version="1">19472362</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Acta Crystallogr D Biol Crystallogr. 2010 Jan;66(Pt 1):22-5</RefSource>
<PMID Version="1">20057045</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nature. 2010 Apr 22;464(7292):1218-22</RefSource>
<PMID Version="1">20376006</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D000595">Amino Acid Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D000818">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D019295">Computational Biology</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D030562">Databases, Protein</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D051379">Mice</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D008958">Models, Molecular</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D008969">Molecular Sequence Data</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y" UI="D011487">Protein Conformation</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D011506">Proteins</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000737">chemistry</QualifierName>
<QualifierName MajorTopicYN="N" UI="Q000235">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D016415">Sequence Alignment</DescriptorName>
<QualifierName MajorTopicYN="N" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D017386">Sequence Homology, Amino Acid</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<OtherID Source="NLM">PMC3003117</OtherID>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="aheadofprint">
<Year>2010</Year>
<Month>11</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2010</Year>
<Month>11</Month>
<Day>25</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2010</Year>
<Month>11</Month>
<Day>26</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2011</Year>
<Month>5</Month>
<Day>13</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pii">1012095107</ArticleId>
<ArticleId IdType="doi">10.1073/pnas.1012095107</ArticleId>
<ArticleId IdType="pubmed">21098306</ArticleId>
<ArticleId IdType="pmc">PMC3003117</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Massachusetts</li>
</region>
</list>
<tree>
<noCountry>
<name sortKey="Sliz, Piotr" sort="Sliz, Piotr" uniqKey="Sliz P" first="Piotr" last="Sliz">Piotr Sliz</name>
</noCountry>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Stokes Rees, Ian" sort="Stokes Rees, Ian" uniqKey="Stokes Rees I" first="Ian" last="Stokes-Rees">Ian Stokes-Rees</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000059 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000059 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:21098306
   |texte=   Protein structure determination by exhaustive search of Protein Data Bank derived databases.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:21098306" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024