Serveur d'exploration sur le LRGP

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000093 ( Pmc/Corpus ); précédent : 0000929; suivant : 0000940 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Selection of representative protein data sets.</title>
<author>
<name sortKey="Hobohm, U" sort="Hobohm, U" uniqKey="Hobohm U" first="U." last="Hobohm">U. Hobohm</name>
</author>
<author>
<name sortKey="Scharf, M" sort="Scharf, M" uniqKey="Scharf M" first="M." last="Scharf">M. Scharf</name>
</author>
<author>
<name sortKey="Schneider, R" sort="Schneider, R" uniqKey="Schneider R" first="R." last="Schneider">R. Schneider</name>
</author>
<author>
<name sortKey="Sander, C" sort="Sander, C" uniqKey="Sander C" first="C." last="Sander">C. Sander</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">1304348</idno>
<idno type="pmc">2142204</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2142204</idno>
<idno type="RBID">PMC:2142204</idno>
<date when="1992">1992</date>
<idno type="wicri:Area/Pmc/Corpus">000093</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000093</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Selection of representative protein data sets.</title>
<author>
<name sortKey="Hobohm, U" sort="Hobohm, U" uniqKey="Hobohm U" first="U." last="Hobohm">U. Hobohm</name>
</author>
<author>
<name sortKey="Scharf, M" sort="Scharf, M" uniqKey="Scharf M" first="M." last="Scharf">M. Scharf</name>
</author>
<author>
<name sortKey="Schneider, R" sort="Schneider, R" uniqKey="Schneider R" first="R." last="Schneider">R. Schneider</name>
</author>
<author>
<name sortKey="Sander, C" sort="Sander, C" uniqKey="Sander C" first="C." last="Sander">C. Sander</name>
</author>
</analytic>
<series>
<title level="j">Protein Science : A Publication of the Protein Society</title>
<idno type="ISSN">0961-8368</idno>
<idno type="eISSN">1469-896X</idno>
<imprint>
<date when="1992">1992</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.</p>
</div>
</front>
</TEI>
<pmc article-type="research-article">
<pmc-comment>The publisher of this article does not allow downloading of the full text in XML form.</pmc-comment>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Protein Sci</journal-id>
<journal-id journal-id-type="pmc">prosci</journal-id>
<journal-title>Protein Science : A Publication of the Protein Society</journal-title>
<issn pub-type="ppub">0961-8368</issn>
<issn pub-type="epub">1469-896X</issn>
<publisher>
<publisher-name>Cold Spring Harbor Laboratory Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">1304348</article-id>
<article-id pub-id-type="pmc">2142204</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Selection of representative protein data sets.</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Hobohm</surname>
<given-names>U.</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Scharf</surname>
<given-names>M.</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schneider</surname>
<given-names>R.</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sander</surname>
<given-names>C.</given-names>
</name>
</contrib>
</contrib-group>
<aff>European Molecular Biology Laboratory, Heidelberg, Germany.</aff>
<pub-date pub-type="ppub">
<month>3</month>
<year>1992</year>
</pub-date>
<volume>1</volume>
<issue>3</issue>
<fpage>409</fpage>
<lpage>417</lpage>
<abstract>
<p>The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.</p>
</abstract>
</article-meta>
</front>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/LrgpV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000093  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000093  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    LrgpV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 15:47:48 2017. Site generation: Wed Mar 6 23:31:34 2024