Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

SANSparallel: interactive homology search against Uniprot

Identifieur interne : 000063 ( Pmc/Corpus ); précédent : 000062; suivant : 000064

SANSparallel: interactive homology search against Uniprot

Auteurs : Panu Somervuo ; Liisa Holm

Source :

RBID : PMC:4489265

Abstract

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.


Url:
DOI: 10.1093/nar/gkv317
PubMed: 25855811
PubMed Central: 4489265

Links to Exploration step

PMC:4489265

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">SANSparallel: interactive homology search against Uniprot</title>
<author>
<name sortKey="Somervuo, Panu" sort="Somervuo, Panu" uniqKey="Somervuo P" first="Panu" last="Somervuo">Panu Somervuo</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Biotechnology, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF2">Department of Biosciences, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Holm, Liisa" sort="Holm, Liisa" uniqKey="Holm L" first="Liisa" last="Holm">Liisa Holm</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Biotechnology, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF2">Department of Biosciences, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25855811</idno>
<idno type="pmc">4489265</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489265</idno>
<idno type="RBID">PMC:4489265</idno>
<idno type="doi">10.1093/nar/gkv317</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000063</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">SANSparallel: interactive homology search against Uniprot</title>
<author>
<name sortKey="Somervuo, Panu" sort="Somervuo, Panu" uniqKey="Somervuo P" first="Panu" last="Somervuo">Panu Somervuo</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Biotechnology, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF2">Department of Biosciences, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Holm, Liisa" sort="Holm, Liisa" uniqKey="Holm L" first="Liisa" last="Holm">Liisa Holm</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Biotechnology, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="AFF2">Department of Biosciences, University of Helsinki, PO Box 65, Finland</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi">http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi</ext-link>
. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, S F" uniqKey="Altschul S">S.F. Altschul</name>
</author>
<author>
<name sortKey="Madden, T L" uniqKey="Madden T">T.L. Madden</name>
</author>
<author>
<name sortKey="Sch Ffer, A A" uniqKey="Sch Ffer A">A.A. Schäffer</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z. Zhang</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W. Miller</name>
</author>
<author>
<name sortKey="Lipman, D J" uniqKey="Lipman D">D.J. Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcginnis, S" uniqKey="Mcginnis S">S. McGinnis</name>
</author>
<author>
<name sortKey="Madden, T L" uniqKey="Madden T">T.L. Madden</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Finn, R D" uniqKey="Finn R">R.D. Finn</name>
</author>
<author>
<name sortKey="Clements, J" uniqKey="Clements J">J. Clements</name>
</author>
<author>
<name sortKey="Eddy, S R" uniqKey="Eddy S">S.R. Eddy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, S" uniqKey="Sun S">S. Sun</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J. Chen</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W. Li</name>
</author>
<author>
<name sortKey="Altinatas, I" uniqKey="Altinatas I">I. Altinatas</name>
</author>
<author>
<name sortKey="Lin, A" uniqKey="Lin A">A. Lin</name>
</author>
<author>
<name sortKey="Peltier, S" uniqKey="Peltier S">S. Peltier</name>
</author>
<author>
<name sortKey="Stocks, K" uniqKey="Stocks K">K. Stocks</name>
</author>
<author>
<name sortKey="Allen, E E" uniqKey="Allen E">E.E. Allen</name>
</author>
<author>
<name sortKey="Ellisman, M" uniqKey="Ellisman M">M. Ellisman</name>
</author>
<author>
<name sortKey="Grethe, J" uniqKey="Grethe J">J. Grethe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heger, A" uniqKey="Heger A">A. Heger</name>
</author>
<author>
<name sortKey="Korpelainen, E" uniqKey="Korpelainen E">E. Korpelainen</name>
</author>
<author>
<name sortKey="Hupponen, T" uniqKey="Hupponen T">T. Hupponen</name>
</author>
<author>
<name sortKey="Mattila, K" uniqKey="Mattila K">K. Mattila</name>
</author>
<author>
<name sortKey="Ollikainen, V" uniqKey="Ollikainen V">V. Ollikainen</name>
</author>
<author>
<name sortKey="Holm, L" uniqKey="Holm L">L. Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rattei, T" uniqKey="Rattei T">T. Rattei</name>
</author>
<author>
<name sortKey="Arnold, R" uniqKey="Arnold R">R. Arnold</name>
</author>
<author>
<name sortKey="Tischler, P" uniqKey="Tischler P">P. Tischler</name>
</author>
<author>
<name sortKey="Lindner, D" uniqKey="Lindner D">D. Lindner</name>
</author>
<author>
<name sortKey="Stumpflen, V" uniqKey="Stumpflen V">V. Stümpflen</name>
</author>
<author>
<name sortKey="Mewes, H W" uniqKey="Mewes H">H.W. Mewes</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koskinen, P" uniqKey="Koskinen P">P. Koskinen</name>
</author>
<author>
<name sortKey="Holm, L" uniqKey="Holm L">L. Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, W J" uniqKey="Kent W">W.J. Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y. Zhao</name>
</author>
<author>
<name sortKey="Tang, H" uniqKey="Tang H">H. Tang</name>
</author>
<author>
<name sortKey="Ye, Y" uniqKey="Ye Y">Y. Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hauswedell, H" uniqKey="Hauswedell H">H. Hauswedell</name>
</author>
<author>
<name sortKey="Singer, J" uniqKey="Singer J">J. Singer</name>
</author>
<author>
<name sortKey="Reinert, K" uniqKey="Reinert K">K. Reinert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kielbasa, S M" uniqKey="Kielbasa S">S.M. Kielbasa</name>
</author>
<author>
<name sortKey="Wan, R" uniqKey="Wan R">R. Wan</name>
</author>
<author>
<name sortKey="Sato, K" uniqKey="Sato K">K. Sato</name>
</author>
<author>
<name sortKey="Horton, P" uniqKey="Horton P">P. Horton</name>
</author>
<author>
<name sortKey="Frith, M C" uniqKey="Frith M">M.C. Frith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Robert C" uniqKey="Edgar R">Robert C. Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buchfink, B" uniqKey="Buchfink B">B. Buchfink</name>
</author>
<author>
<name sortKey="Xie, C" uniqKey="Xie C">C. Xie</name>
</author>
<author>
<name sortKey="Huson, D H" uniqKey="Huson D">D.H. Huson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roytberg, M" uniqKey="Roytberg M">M. Roytberg</name>
</author>
<author>
<name sortKey="Gambin, A" uniqKey="Gambin A">A. Gambin</name>
</author>
<author>
<name sortKey="Noe, L" uniqKey="Noe L">L. Noé</name>
</author>
<author>
<name sortKey="Lasota, S" uniqKey="Lasota S">S. Lasota</name>
</author>
<author>
<name sortKey="Furletova, E" uniqKey="Furletova E">E. Furletova</name>
</author>
<author>
<name sortKey="Szczurek, E" uniqKey="Szczurek E">E. Szczurek</name>
</author>
<author>
<name sortKey="Kucherov, G" uniqKey="Kucherov G">G. Kucherov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pearson, W R" uniqKey="Pearson W">W.R. Pearson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, N P" uniqKey="Brown N">N.P. Brown</name>
</author>
<author>
<name sortKey="Leroy, C" uniqKey="Leroy C">C. Leroy</name>
</author>
<author>
<name sortKey="Sander, C" uniqKey="Sander C">C. Sander</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, T J" uniqKey="Wheeler T">T.J. Wheeler</name>
</author>
<author>
<name sortKey="Clements, J" uniqKey="Clements J">J. Clements</name>
</author>
<author>
<name sortKey="Finn, R D" uniqKey="Finn R">R.D. Finn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waterhouse, A M" uniqKey="Waterhouse A">A.M. Waterhouse</name>
</author>
<author>
<name sortKey="Procter, J B" uniqKey="Procter J">J.B. Procter</name>
</author>
<author>
<name sortKey="Martin, D M" uniqKey="Martin D">D.M. Martin</name>
</author>
<author>
<name sortKey="Clamp, M" uniqKey="Clamp M">M. Clamp</name>
</author>
<author>
<name sortKey="Barton, G J" uniqKey="Barton G">G.J. Barton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Korf, I" uniqKey="Korf I">I. Korf</name>
</author>
<author>
<name sortKey="Yandell, M" uniqKey="Yandell M">M. Yandell</name>
</author>
<author>
<name sortKey="Bedell, J" uniqKey="Bedell J">J. Bedell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holm, L" uniqKey="Holm L">L. Holm</name>
</author>
<author>
<name sortKey="Rosenstrom, P" uniqKey="Rosenstrom P">P. Rosenström</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garlant, L" uniqKey="Garlant L">L. Garlant</name>
</author>
<author>
<name sortKey="Koskinen, P" uniqKey="Koskinen P">P. Koskinen</name>
</author>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y. Liu</name>
</author>
<author>
<name sortKey="Nykyri, J" uniqKey="Nykyri J">J. Nykyri</name>
</author>
<author>
<name sortKey="Ahamed, S" uniqKey="Ahamed S">S. Ahamed</name>
</author>
<author>
<name sortKey="Rouhiainen, L" uniqKey="Rouhiainen L">L. Rouhiainen</name>
</author>
<author>
<name sortKey="Laine, P" uniqKey="Laine P">P. Laine</name>
</author>
<author>
<name sortKey="Paulin, L" uniqKey="Paulin L">L. Paulin</name>
</author>
<author>
<name sortKey="Auvinen, P" uniqKey="Auvinen P">P. Auvinen</name>
</author>
<author>
<name sortKey="Holm, L" uniqKey="Holm L">L. Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ahola, V" uniqKey="Ahola V">V. Ahola</name>
</author>
<author>
<name sortKey="Lehtonen, R" uniqKey="Lehtonen R">R. Lehtonen</name>
</author>
<author>
<name sortKey="Somervuo, P" uniqKey="Somervuo P">P. Somervuo</name>
</author>
<author>
<name sortKey="Salmela, L" uniqKey="Salmela L">L. Salmela</name>
</author>
<author>
<name sortKey="Koskinen, P" uniqKey="Koskinen P">P. Koskinen</name>
</author>
<author>
<name sortKey="Rastas, P" uniqKey="Rastas P">P. Rastas</name>
</author>
<author>
<name sortKey="V Lim Ki, N" uniqKey="V Lim Ki N">N. Välimäki</name>
</author>
<author>
<name sortKey="Paulin, L" uniqKey="Paulin L">L. Paulin</name>
</author>
<author>
<name sortKey="Kvist, J" uniqKey="Kvist J">J. Kvist</name>
</author>
<author>
<name sortKey="Wahlberg, N" uniqKey="Wahlberg N">N. Wahlberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koskinen, P" uniqKey="Koskinen P">P. Koskinen</name>
</author>
<author>
<name sortKey="Toronen, P" uniqKey="Toronen P">P. Toronen</name>
</author>
<author>
<name sortKey="Nokso Koivisto, J" uniqKey="Nokso Koivisto J">J. Nokso-Koivisto</name>
</author>
<author>
<name sortKey="Holm, L" uniqKey="Holm L">L. Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Donoghue, S I" uniqKey="O Donoghue S">S.I. O'Donoghue</name>
</author>
<author>
<name sortKey="Sabir, K S" uniqKey="Sabir K">K.S. Sabir</name>
</author>
<author>
<name sortKey="Kalemanov, M" uniqKey="Kalemanov M">M. Kalemanov</name>
</author>
<author>
<name sortKey="Stolte, C" uniqKey="Stolte C">C. Stolte</name>
</author>
<author>
<name sortKey="Wellmann, B" uniqKey="Wellmann B">B. Wellmann</name>
</author>
<author>
<name sortKey="Ho, V" uniqKey="Ho V">V. Ho</name>
</author>
<author>
<name sortKey="Roos, M" uniqKey="Roos M">M. Roos</name>
</author>
<author>
<name sortKey="Perdigao, N" uniqKey="Perdigao N">N. Perdigão</name>
</author>
<author>
<name sortKey="Buske, F A" uniqKey="Buske F">F.A. Buske</name>
</author>
<author>
<name sortKey="Heinrich, J" uniqKey="Heinrich J">J. Heinrich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rost, B" uniqKey="Rost B">B. Rost</name>
</author>
<author>
<name sortKey="Sander, C" uniqKey="Sander C">C. Sander</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="hwp">nar</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25855811</article-id>
<article-id pub-id-type="pmc">4489265</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkv317</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Web Server issue</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>SANSparallel: interactive homology search against Uniprot</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Somervuo</surname>
<given-names>Panu</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="AFF2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Holm</surname>
<given-names>Liisa</given-names>
</name>
<xref ref-type="aff" rid="AFF1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="AFF2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
<aff id="AFF1">
<label>1</label>
Institute of Biotechnology, University of Helsinki, PO Box 65, Finland</aff>
<aff id="AFF2">
<label>2</label>
Department of Biosciences, University of Helsinki, PO Box 65, Finland</aff>
</contrib-group>
<author-notes>
<corresp id="COR1">
<label>*</label>
To whom correspondence should be addressed. Tel: +358 294 191 59115; Fax: +358 294 59366; Email:
<email>liisa.holm@helsinki.fi</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>01</day>
<month>7</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>08</day>
<month>4</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>08</day>
<month>4</month>
<year>2015</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>43</volume>
<issue>Web Server issue</issue>
<fpage>W24</fpage>
<lpage>W29</lpage>
<history>
<date date-type="accepted">
<day>28</day>
<month>3</month>
<year>2015</year>
</date>
<date date-type="rev-recd">
<day>18</day>
<month>3</month>
<year>2015</year>
</date>
<date date-type="received">
<day>05</day>
<month>2</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:title="pdf" xlink:href="gkv317.pdf"></self-uri>
<abstract>
<p>Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi">http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi</ext-link>
. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.</p>
</abstract>
<counts>
<page-count count="6"></page-count>
</counts>
<custom-meta-group>
<custom-meta>
<meta-name>cover-date</meta-name>
<meta-value>1 July 2015</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="SEC1">
<title>INTRODUCTION</title>
<p>Recent years have witnessed a remarkable growth in the number of sequences. This has made database searches (
<xref rid="B1" ref-type="bibr">1</xref>
<xref rid="B4" ref-type="bibr">4</xref>
) take longer and longer and forced free computing services and pre-computed databases to close down or resort to crowd-sourcing (
<xref rid="B5" ref-type="bibr">5</xref>
<xref rid="B7" ref-type="bibr">7</xref>
). SANSparallel is a web server that takes protein sequences as input and returns an approximate set of closest sequence neighbors in the blink of an eye. At the core of our web server is a fast database search engine that only takes a fraction of a second to compare a query protein against 90 million sequences in Uniprot (
<xref rid="B8" ref-type="bibr">8</xref>
). SANSparallel is a re-implemented, improved and parallelized version of our previous suffix array neighborhood search (SANS) algorithm (
<xref rid="B9" ref-type="bibr">9</xref>
). It belongs to a new generation of fast database search programs indexing the database so that short words (seeds) matching to the query can be found efficiently and independent of database size (
<xref rid="B10" ref-type="bibr">10</xref>
<xref rid="B15" ref-type="bibr">15</xref>
). Similar sequences can then be identified by seed extension or by counting how many seeds match one database protein. Suffix arrays bring the advantage that seed length can be adapted to increase selectivity. On the other hand, spaced seeds and reduced alphabets have been introduced to increase sensitivity (
<xref rid="B16" ref-type="bibr">16</xref>
). Programs implementing these techniques are orders of magnitude faster than BLAST. However, it is hard to match BLAST's sensitivity. These approaches are very suitable for mapping problems, where the match is very close and gives a clear signal. We have found previously that the approach works reliably in protein database searches above 50% sequence identity (
<xref rid="B9" ref-type="bibr">9</xref>
). Here, we present more benchmarking and show that SANSparallel is highly competitive in comparison with recently published programs.</p>
</sec>
<sec sec-type="materials|methods" id="SEC2">
<title>MATERIALS AND METHODS</title>
<sec id="SEC2-1">
<title>System architecture</title>
<p>SANSparallel runs as a client and a server. The server holds the database in memory and performs the search. We have a separate server for each database. Client processes connect to the server and transmit the query sequence to the server and the result to the user. Multiple clients can connect to the server. Concurrent clients are served one query at a time in round-robin fashion. From the users’ perspective this means that the time it takes to process a query increases linearly with server load, but all users experience similar speed. Linearity of response times was maintained up to at least 100 concurrent clients (data not shown).</p>
<p>Underlying the web server is a CGI script which calls the client program with appropriate options and post-processes the database search results into the desired output format (Figure
<xref ref-type="fig" rid="F1">1</xref>
). Some processing steps use third-party software. The primary result from SANSparallel is a set of sequence-similar proteins retrieved from the database. Pairwise alignments between this set of sequences and the query sequence are generated using FASTA (
<xref rid="B17" ref-type="bibr">17</xref>
). The same program is used to output a BLAST-like report. The pairwise alignments are stacked against the query sequence, omitting insertions to generate gapped alignments. The stacked alignment can be colorized by Mview (
<xref rid="B18" ref-type="bibr">18</xref>
) or sent to Skylign (
<xref rid="B19" ref-type="bibr">19</xref>
) to generate a sequence logo. Aligned or unaligned sequences can be output in FASTA format and sent to Jalview (
<xref rid="B20" ref-type="bibr">20</xref>
) for alignment visualization and editing. Our server does not provide multiple sequence alignments as this can be very time consuming. Instead, multiple sequence alignments can be requested from Jalview Desktop's web service menu. The response of the server is immediate and no user data or results are stored on disk except for results viewed with the Jalview applet, which requires file input.</p>
<fig id="F1" orientation="portrait" position="float">
<label>Figure 1.</label>
<caption>
<p>Flowchart of the SANSparallel web server. Computations done by the web server are blue. Results sent to the user include textual outputs (green) and alignment visualizations (orange). Multiple alignment (instantiated from Jalview Desktop) and sequence logo computations utilize third party resources in the cloud (pink).</p>
</caption>
<graphic xlink:href="gkv317fig1"></graphic>
</fig>
<p>SANSparallel was developed in a Linux operating system and parallelized using openmpi. The web server runs on a cluster of computers with 500-Gb memory and 64 cores. SANSparallel was written in Fortran using legacy code from SANS (
<xref rid="B9" ref-type="bibr">9</xref>
), socket communications in C and the CGI script in Perl. Storage of the database in memory and additional work space take about 9 bytes per amino acid.</p>
</sec>
<sec id="SEC2-2">
<title>Database search algorithm</title>
<p>SANSparallel is a re-implemented, improved and parallelized version of the suffix array neighborhood search algorithm SANS (
<xref rid="B9" ref-type="bibr">9</xref>
). Briefly, the algorithm accumulates a vote for database proteins that are found within a window of the position where a suffix of the query sequence would be inserted in the suffix array of the database. Database proteins with the highest votes are collected and, optionally, aligned and resorted by the alignment score. The following changes were introduced: (i) a binary search to find the suffix array insertion position replaces the original mergesort. This enables searching single query sequences instead of the original batch processing. (ii) Votes are summed over diagonal bands rather than the whole protein. This improves selectivity. A similar strategy is used in the FASTA algorithm (
<xref rid="B17" ref-type="bibr">17</xref>
). (iii) Alignments are computed by dynamic programming in a diagonal band. This replaces the original program's greedy algorithm to combine high-scoring segment pairs. e-values are computed from the alignment score using Karlin–Altschul statistics (
<xref rid="B21" ref-type="bibr">21</xref>
). (iv) There is a positive but not perfect correlation between the vote and pairwise alignment score. An option was added to moving down the sorted list of database proteins until the H
<sup>th</sup>
-best alignment score remains stable. This results in more closely similar hits in the output. (v) The program was parallelized using MPI (Message Passing Interface). We chose a micro parallelization strategy in order to achieve fast response times for a single query. One node is reserved for communication with the client. The other nodes are dedicated to the database search. Each node works on a section of the database. The database search nodes go into hibernation when traffic is low. Search speed increased linearly up to 8–16 nodes; above 32 nodes there was not enough work to match communication overheads (data not shown).</p>
</sec>
<sec id="SEC2-3">
<title>Databases</title>
<p>The Uniprot, UniRef90, UniRef50 and Swissprot databases are downloaded monthly from
<ext-link ext-link-type="ftp" xlink:href="ftp.ebi.ac.uk">ftp.ebi.ac.uk</ext-link>
. The sequences of Protein Data Bank entries are taken weekly from the Dali server (
<xref rid="B22" ref-type="bibr">22</xref>
).</p>
</sec>
<sec id="SEC2-4">
<title>Benchmark data sets</title>
<p>The server was benchmarked using the same test set and database as in (
<xref rid="B9" ref-type="bibr">9</xref>
). The test set consists of 4174 predicted proteins of
<italic>Dickeya solani</italic>
, an emerging plant pathogen (
<xref rid="B23" ref-type="bibr">23</xref>
). The reference database is Uniprot frozen in 2012, which did not yet contain
<italic>D. solani</italic>
. The reference set of TRUE hits was generated using SSEARCH (
<xref rid="B17" ref-type="bibr">17</xref>
) and an e-value cutoff of 1.0. Others have observed before us that implementations differ between programs and e-values are not directly comparable between programs (
<xref rid="B12" ref-type="bibr">12</xref>
). Therefore programs being evaluated were asked to output 1000 best hits. Hits found in the reference set were counted as true positives. Most programs compute an e-value for the hits, which operationally eliminates false positives. The hits were also subdivided into bins according to the sequence identity of the pair in the reference set. The wall-clock time to process the test set was also recorded to compare speeds.</p>
<p>BLAST, UBLAST, LAMBDA, RAPSEARCH2 and SANSparallel are natively parallel. LAST was run with GNUparallel using blocksize 36 000. We used pre-compiled LAMBDA v0.4.7 which could not output more than 500 hits per query; this bug was fixed but a new version was not available in time for our benchmarks (Hannes Hausdewell, personal communication). All software used were 64-bit versions except UBLAST of which only a 32-bit version is freely available. Due to 32 bits, reference database needed to be split into several chunks in order to index it. Also BLAT required the reference data to be split into several segments in order to work. The e-value threshold was set to 1.0 in all software where this option was available. In LAST, the score threshold was calculated to correspond to e-value 1.0 and was set accordingly. LAST parameter –m 500 was used in order to get more hits. Otherwise default parameters were used.</p>
</sec>
</sec>
<sec sec-type="results" id="SEC3">
<title>RESULTS</title>
<sec id="SEC3-1">
<title>Benchmarking</title>
<p>We tested SANSparallel against BLAST (
<xref rid="B1" ref-type="bibr">1</xref>
), UBLAST (
<xref rid="B14" ref-type="bibr">14</xref>
), LAMBDA (
<xref rid="B12" ref-type="bibr">12</xref>
), LAST (
<xref rid="B13" ref-type="bibr">13</xref>
), DIAMOND (
<xref rid="B15" ref-type="bibr">15</xref>
), BLAT (
<xref rid="B10" ref-type="bibr">10</xref>
) and RAPSEARCH2 (
<xref rid="B11" ref-type="bibr">11</xref>
) using the same benchmark as in (
<xref rid="B9" ref-type="bibr">9</xref>
). Four modes of SANSparallel (verifast, fast, slow and verislow) were used which differ in the depth and speed of the search. LAMBDA outputs maximally 500 hits, therefore comparisons are shown for 1000 hits and 500 hits. The performance of all methods is quite similar above 50% sequence identity, differences are mainly seen in the detection of remote homologs below 50% sequence identity (Figure
<xref ref-type="fig" rid="F2">2</xref>
). The sensitivity of UBLAST is closest to BLAST. RAPSEARCH2 and BLAT are both slower and less sensitive than at least one competing method. Some aligners have tunable parameters whereby one can arbitrarily trade speed for sensitivity. Also SANSparallel gets faster when fewer hits are output (Table
<xref ref-type="table" rid="tbl1">1</xref>
). Considering both speed and sensitivity, a group of four methods emerges with small differences between them: SANSparallel fast mode, DIAMOND, LAMBDA and LAST. Fast is the default mode in the SANSparallel web server.</p>
<fig id="F2" orientation="portrait" position="float">
<label>Figure 2.</label>
<caption>
<p>Benchmark results showing the number of true positives detected in the top-1000 hits and top-500 hits binned by sequence identity.</p>
</caption>
<graphic xlink:href="gkv317fig2"></graphic>
</fig>
<table-wrap id="tbl1" orientation="portrait" position="float">
<label>Table 1.</label>
<caption>
<title>Speed comparison of database search programs: time taken to search 4174 queries of the
<italic>Dickeya solani</italic>
benchmark</title>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Program</th>
<th align="right" rowspan="1" colspan="1">Hits</th>
<th align="right" rowspan="1" colspan="1">Cores</th>
<th align="right" rowspan="1" colspan="1">Time (s)</th>
<th align="right" rowspan="1" colspan="1">Relative speed</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">verifast</td>
<td align="right" rowspan="1" colspan="1">100</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">62</td>
<td align="right" rowspan="1" colspan="1">5903</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">fast</td>
<td align="right" rowspan="1" colspan="1">100</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">65</td>
<td align="right" rowspan="1" colspan="1">5631</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verifast</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">111</td>
<td align="right" rowspan="1" colspan="1">3298</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verifast</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">170</td>
<td align="right" rowspan="1" colspan="1">2153</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">fast</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">178</td>
<td align="right" rowspan="1" colspan="1">2056</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LAMBDA</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">216</td>
<td align="right" rowspan="1" colspan="1">1695</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">slow</td>
<td align="right" rowspan="1" colspan="1">100</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">235</td>
<td align="right" rowspan="1" colspan="1">1558</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">fast</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">324</td>
<td align="right" rowspan="1" colspan="1">1130</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LAST</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16
<sup>a</sup>
</td>
<td align="right" rowspan="1" colspan="1">327</td>
<td align="right" rowspan="1" colspan="1">1119</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">slow</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">406</td>
<td align="right" rowspan="1" colspan="1">902</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">DIAMOND</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">446</td>
<td align="right" rowspan="1" colspan="1">821</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">slow</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">612</td>
<td align="right" rowspan="1" colspan="1">598</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verislow</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">624</td>
<td align="right" rowspan="1" colspan="1">587</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verislow</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">792</td>
<td align="right" rowspan="1" colspan="1">462</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verifast</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">1009</td>
<td align="right" rowspan="1" colspan="1">363</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UBLAST
<sup>b</sup>
</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16
<sup>a</sup>
</td>
<td align="right" rowspan="1" colspan="1">1310</td>
<td align="right" rowspan="1" colspan="1">279</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">RAPSEARCH2</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16</td>
<td align="right" rowspan="1" colspan="1">1469</td>
<td align="right" rowspan="1" colspan="1">249</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LAMBDA</td>
<td align="right" rowspan="1" colspan="1">500</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">2052</td>
<td align="right" rowspan="1" colspan="1">178</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LAST</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">2957</td>
<td align="right" rowspan="1" colspan="1">124</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">fast</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">3297</td>
<td align="right" rowspan="1" colspan="1">111</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">SANS
<sup>c</sup>
</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">3809</td>
<td align="right" rowspan="1" colspan="1">96</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BLAT
<sup>b</sup>
</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">4307</td>
<td align="right" rowspan="1" colspan="1">85</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">slow</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">5015</td>
<td align="right" rowspan="1" colspan="1">73</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">verislow</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">7094</td>
<td align="right" rowspan="1" colspan="1">52</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">RAPSEARCH2</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">18761</td>
<td align="right" rowspan="1" colspan="1">20</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UBLAST
<sup>b</sup>
</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">28399</td>
<td align="right" rowspan="1" colspan="1">13</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BLAST</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">16
<sup>a</sup>
</td>
<td align="right" rowspan="1" colspan="1">32149</td>
<td align="right" rowspan="1" colspan="1">11</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">BLAST</td>
<td align="right" rowspan="1" colspan="1">1000</td>
<td align="right" rowspan="1" colspan="1">1</td>
<td align="right" rowspan="1" colspan="1">366046</td>
<td align="right" rowspan="1" colspan="1">1</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="TFN001">
<p>
<sup>a</sup>
GNUparallel.</p>
</fn>
<fn id="TFN002">
<p>
<sup>b</sup>
Database split to chunks (UBLAST: 19, BLAT: 5) due to program's size limit.</p>
</fn>
<fn id="TFN003">
<p>
<sup>c</sup>
Serial implementation (
<xref rid="B9" ref-type="bibr">9</xref>
).</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="SEC3-2">
<title>User interface</title>
<sec id="SEC3-2-1">
<title>Inputs and outputs</title>
<p>The website is free and open to all and there is no login requirement. The input to the server are FASTA-formatted sequences. One or multiple query sequences can be submitted in one request. The user can also choose the maximum number of hits to be output (H), the database to be searched (Uniprot, UniRef90, UniRef50, Swissprot or PDB) and a search protocol. The protocols are pre-set parameter combinations: (i) verifast mode reports H proteins with the highest vote; no alignments are computed. (ii) Fast mode is like the previous mode but reports alignment scores. (iii) Slow mode inspects 2H proteins with the highest vote and sorts them by alignment score. (iv) Verislow mode maximizes accuracy when H is small. It always inspects 4000 proteins with the highest vote and sorts them by alignment score. The vote threshold of verifast mode is set so that the false positive rate is 1–2% in our benchmark. The other modes only report hits with an e-value below 1. Figure
<xref ref-type="fig" rid="F3">3</xref>
illustrates the search result for a predicted protein from the butterfly 
<italic>Melitaea cinxia</italic>
(
<xref rid="B24" ref-type="bibr">24</xref>
), which the cgi-script generated in 51 milliseconds. The primary output of the server is a tabular report of the hits with links to different output options (Figure
<xref ref-type="fig" rid="F3">3</xref>
). For example, we generate stacked alignments that are automatically loaded to Jalview (
<xref rid="B20" ref-type="bibr">20</xref>
) for alignment editing/visualization or to Skylign (
<xref rid="B19" ref-type="bibr">19</xref>
) for drawing sequence logos. Jalview Desktop is a standalone Java application that can be downloaded from
<ext-link ext-link-type="uri" xlink:href="http://www.jalview.org/download">http://www.jalview.org/download</ext-link>
. The Jalview applet is launched from our website which must be added to the user's list of trusted sites as instructed in the tutorial (
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#exercises">http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#exercises</ext-link>
). Skylign outputs HTML5 which works on modern web browsers.</p>
<fig id="F3" orientation="portrait" position="float">
<label>Figure 3.</label>
<caption>
<p>Example output.</p>
</caption>
<graphic xlink:href="gkv317fig3"></graphic>
</fig>
</sec>
<sec id="SEC3-2-2">
<title>Programmatic access</title>
<p>SANSparallel can be used for both interactive and high-throughput analyses. All input and output options of the cgi-script can be included in the URL as explained in the web tutorial (
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#external">http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#external</ext-link>
). Thus, another web server can link to SANSparallel to retrieve information about the sequence neighbors of a particular protein. Another use of SANSparallel is in high-throughput functional annotation of proteomes or transcriptomes. For example, the web tutorial demonstrates (
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#perl">http://ekhidna2.biocenter.helsinki.fi/sans/Tutorial.html#perl</ext-link>
) how to build a simple annotation pipeline where (i) the predicted protein sequences (in FASTA format) are sent to the server, (ii) the result is parsed and filtered, (iii) the best informative hit is selected as a source of annotation of the query sequence and (iv) a summary table is generated which reports the predicted annotation of each query protein and links its sequence back to SANSparallel so that anyone interested can study the evidence for the prediction interactively. Finally, it is possible to download the client-server programs in source code (
<ext-link ext-link-type="uri" xlink:href="http://ekhidna2.biocenter.helsinki.fi/sans/download/">http://ekhidna2.biocenter.helsinki.fi/sans/download/</ext-link>
) and run the programs locally on local databases.</p>
</sec>
</sec>
</sec>
<sec sec-type="discussion" id="SEC4">
<title>DISCUSSION</title>
<p>We have improved and parallelized the suffix array neighborhood search algorithm SANS (
<xref rid="B9" ref-type="bibr">9</xref>
). Our benchmarking results were in line with previously published comparisons identifying UBLAST as sensitive and LAST and LAMBDA as fast. SANSparallel is competitive with DIAMOND, LAST and LAMBDA. All these programs are based on similar principles but with different implementations. Benchmarking showed that they miss few hits when sequence identity is above 50% but fall behind BLAST when sequence identity gets lower (Figure
<xref ref-type="fig" rid="F2">2</xref>
). Future work will focus on improving sensitivity by increasing the sequence space coverage of the seeds. The speed of SANSparallel depends on the amount of output (Table
<xref ref-type="table" rid="tbl1">1</xref>
). LAST has no direct control on the number of hits, but this is influenced by the –m parameter for the uniqueness of seeds in the database (
<xref rid="B13" ref-type="bibr">13</xref>
). DIAMOND (
<xref rid="B15" ref-type="bibr">15</xref>
) and LAMBDA (
<xref rid="B12" ref-type="bibr">12</xref>
) are designed for batch processing of large query sets like the original SANS algorithm (
<xref rid="B9" ref-type="bibr">9</xref>
). The SANSparallel server supports both interactive analysis of individual queries and high-throughput analysis of genomes or transcriptomes. It is simple to link to other tools, as inputs and outputs are FASTA-formatted sequences or alignments. Much can be learned by studying groups of homologous proteins instead of individual proteins. Evolutionary conservation sharpens the signal for function (
<xref rid="B25" ref-type="bibr">25</xref>
,
<xref rid="B26" ref-type="bibr">26</xref>
), secondary structure (
<xref rid="B27" ref-type="bibr">27</xref>
) and deeper homology detection (
<xref rid="B1" ref-type="bibr">1</xref>
). SANSparallel facilitates such analyses by retrieving homologs from the database and performing an alignment. It is so fast that the user can change output formats, search parameters or the database interactively. Speed opens up new ways to operate. For example, functional annotations of genomes could be updated on demand, database clustering need not store all-against-all search results on disk, and sequence similarity based data integration could be done on the fly.</p>
</sec>
</body>
<back>
<sec id="SEC5">
<title>FUNDING</title>
<p>Biocenter Finland. Funding for open access charge: Biocenter Finland.</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>S.F.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>T.L.</given-names>
</name>
<name>
<surname>Schäffer</surname>
<given-names>A.A.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>D.J.</given-names>
</name>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
<source>Nucleic Acids Res.</source>
<year>1997</year>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="pmid">9254694</pub-id>
</element-citation>
</ref>
<ref id="B2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McGinnis</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>T.L.</given-names>
</name>
</person-group>
<article-title>BLAST: at the core of a powerful and diverse set of sequence analysis tools</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>W20</fpage>
<lpage>W25</lpage>
<pub-id pub-id-type="pmid">15215342</pub-id>
</element-citation>
</ref>
<ref id="B3">
<label>3.</label>
<element-citation publication-type="journal">
<collab>Analysis Tool Web Services from the EMBL-EBI</collab>
<source>Nucleic Acids Res.</source>
<year>2013</year>
<volume>41</volume>
<fpage>W597</fpage>
<lpage>W600</lpage>
<pub-id pub-id-type="pmid">23671338</pub-id>
</element-citation>
</ref>
<ref id="B4">
<label>4.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Finn</surname>
<given-names>R.D.</given-names>
</name>
<name>
<surname>Clements</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>S.R.</given-names>
</name>
</person-group>
<article-title>HMMER web server: interactive sequence similarity searching</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>W29</fpage>
<lpage>W37</lpage>
<pub-id pub-id-type="pmid">21593126</pub-id>
</element-citation>
</ref>
<ref id="B5">
<label>5.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Altinatas</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Peltier</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Stocks</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Allen</surname>
<given-names>E.E.</given-names>
</name>
<name>
<surname>Ellisman</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Grethe</surname>
<given-names>J.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource</article-title>
<source>Nucleic Acids Res.</source>
<year>2011</year>
<volume>39</volume>
<fpage>D546</fpage>
<lpage>D551</lpage>
<pub-id pub-id-type="pmid">21045053</pub-id>
</element-citation>
</ref>
<ref id="B6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heger</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Korpelainen</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Hupponen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Mattila</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Ollikainen</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Holm</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>PairsDB atlas of protein sequence space</article-title>
<source>Nucleic Acids Res.</source>
<year>2008</year>
<volume>36</volume>
<fpage>D276</fpage>
<lpage>D280</lpage>
<pub-id pub-id-type="pmid">17986464</pub-id>
</element-citation>
</ref>
<ref id="B7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rattei</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Arnold</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tischler</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Lindner</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Stümpflen</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Mewes</surname>
<given-names>H.W.</given-names>
</name>
</person-group>
<article-title>SIMAP: the similarity matrix of proteins</article-title>
<source>Nucleic Acids Res.</source>
<year>2006</year>
<volume>34</volume>
<fpage>D252</fpage>
<lpage>D256</lpage>
<pub-id pub-id-type="pmid">16381858</pub-id>
</element-citation>
</ref>
<ref id="B8">
<label>8.</label>
<element-citation publication-type="journal">
<collab>The UniProt Consortium</collab>
<article-title>UniProt: a hub for protein information</article-title>
<source>Nucleic Acids Res.</source>
<year>2015</year>
<volume>43</volume>
<fpage>D204</fpage>
<lpage>D212</lpage>
<pub-id pub-id-type="pmid">25348405</pub-id>
</element-citation>
</ref>
<ref id="B9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koskinen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Holm</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>SANS: high-throughput retrieval of protein sequences allowing 50% mismatches</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>i438</fpage>
<lpage>i443</lpage>
<pub-id pub-id-type="pmid">22962464</pub-id>
</element-citation>
</ref>
<ref id="B10">
<label>10.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kent</surname>
<given-names>W.J.</given-names>
</name>
</person-group>
<article-title>BLAT—the BLAST-like alignment tool</article-title>
<source>Genome Res.</source>
<year>2002</year>
<volume>12</volume>
<fpage>656</fpage>
<lpage>664</lpage>
<pub-id pub-id-type="pmid">11932250</pub-id>
</element-citation>
</ref>
<ref id="B11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Ye</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<article-title>RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>125</fpage>
<lpage>126</lpage>
<pub-id pub-id-type="pmid">22039206</pub-id>
</element-citation>
</ref>
<ref id="B12">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hauswedell</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Singer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Reinert</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>Lambda: the local aligner for massive biological data</article-title>
<source>Bioinformatics</source>
<year>2014</year>
<volume>30</volume>
<fpage>i349</fpage>
<lpage>i355</lpage>
<pub-id pub-id-type="pmid">25161219</pub-id>
</element-citation>
</ref>
<ref id="B13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kielbasa</surname>
<given-names>S.M.</given-names>
</name>
<name>
<surname>Wan</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Horton</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>M.C.</given-names>
</name>
</person-group>
<article-title>Adaptive seeds tame genomic sequence comparison</article-title>
<source>Genome Res.</source>
<year>2011</year>
<volume>21</volume>
<fpage>487</fpage>
<lpage>493</lpage>
<pub-id pub-id-type="pmid">21209072</pub-id>
</element-citation>
</ref>
<ref id="B14">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>Robert C.</given-names>
</name>
</person-group>
<article-title>Search and clustering orders of magnitude faster than BLAST</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<fpage>2460</fpage>
<lpage>2461</lpage>
<pub-id pub-id-type="pmid">20709691</pub-id>
</element-citation>
</ref>
<ref id="B15">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buchfink</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Huson</surname>
<given-names>D.H.</given-names>
</name>
</person-group>
<article-title>Fast and sensitive protein alignment using DIAMOND</article-title>
<source>Nat. Methods</source>
<year>2014</year>
<volume>12</volume>
<fpage>59</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="pmid">25402007</pub-id>
</element-citation>
</ref>
<ref id="B16">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Roytberg</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gambin</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Noé</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Lasota</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Furletova</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Szczurek</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Kucherov</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>On subset seeds for protein alignment</article-title>
<source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source>
<year>2009</year>
<volume>6</volume>
<fpage>483</fpage>
<lpage>494</lpage>
<pub-id pub-id-type="pmid">19644175</pub-id>
</element-citation>
</ref>
<ref id="B17">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearson</surname>
<given-names>W.R.</given-names>
</name>
</person-group>
<article-title>Effective protein sequence comparison</article-title>
<source>Methods Enzymol.</source>
<year>1996</year>
<volume>266</volume>
<fpage>227</fpage>
<lpage>258</lpage>
<pub-id pub-id-type="pmid">8743688</pub-id>
</element-citation>
</ref>
<ref id="B18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>N.P.</given-names>
</name>
<name>
<surname>Leroy</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>MView: a web-compatible database search or multiple alignment viewer</article-title>
<source>Bioinformatics.</source>
<year>1998</year>
<volume>14</volume>
<fpage>380</fpage>
<lpage>381</lpage>
<pub-id pub-id-type="pmid">9632837</pub-id>
</element-citation>
</ref>
<ref id="B19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wheeler</surname>
<given-names>T.J.</given-names>
</name>
<name>
<surname>Clements</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Finn</surname>
<given-names>R.D.</given-names>
</name>
</person-group>
<article-title>Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models</article-title>
<source>BMC Bioinformatics.</source>
<year>2014</year>
<volume>15</volume>
<fpage>7</fpage>
<pub-id pub-id-type="pmid">24410852</pub-id>
</element-citation>
</ref>
<ref id="B20">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Waterhouse</surname>
<given-names>A.M.</given-names>
</name>
<name>
<surname>Procter</surname>
<given-names>J.B.</given-names>
</name>
<name>
<surname>Martin</surname>
<given-names>D.M.</given-names>
</name>
<name>
<surname>Clamp</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Barton</surname>
<given-names>G.J.</given-names>
</name>
</person-group>
<article-title>Jalview Version 2—a multiple sequence alignment editor and analysis workbench</article-title>
<source>Bioinformatics.</source>
<year>2009</year>
<volume>25</volume>
<fpage>1189</fpage>
<lpage>1191</lpage>
<pub-id pub-id-type="pmid">19151095</pub-id>
</element-citation>
</ref>
<ref id="B21">
<label>21.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Korf</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Yandell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Bedell</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2003</year>
<publisher-loc>Sebastopol, CA</publisher-loc>
<publisher-name>O'Reilly & Associates</publisher-name>
<comment>ISBN-13: 978-0596002992</comment>
</element-citation>
</ref>
<ref id="B22">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holm</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Rosenström</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Dali server: conservation mapping in 3D</article-title>
<source>Nucleic Acids Res.</source>
<year>2010</year>
<volume>38</volume>
<fpage>W545</fpage>
<lpage>W549</lpage>
<pub-id pub-id-type="pmid">20457744</pub-id>
</element-citation>
</ref>
<ref id="B23">
<label>23.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garlant</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Koskinen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Nykyri</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Ahamed</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rouhiainen</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Laine</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Paulin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Auvinen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Holm</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Genome sequence of
<italic>Dickeya solani</italic>
, a new soft rot pathogen of potato, suggests its emergence may be related to a novel combination of non-ribosomal peptide/polyketide synthetase clusters</article-title>
<source>Diversity</source>
<year>2013</year>
<volume>5</volume>
<fpage>824</fpage>
<lpage>842</lpage>
</element-citation>
</ref>
<ref id="B24">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ahola</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Lehtonen</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Somervuo</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Salmela</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Koskinen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Rastas</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Välimäki</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Paulin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Kvist</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wahlberg</surname>
<given-names>N.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Glanville fritillary butterfly retains an ancient karyotype and reveals selective chromosomal fusions in
<italic>Lepidoptera</italic>
</article-title>
<source>Nat. Commun.</source>
<year>2014</year>
<volume>5</volume>
<fpage>4737</fpage>
<pub-id pub-id-type="pmid">25189940</pub-id>
</element-citation>
</ref>
<ref id="B25">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Koskinen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Toronen</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Nokso-Koivisto</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Holm</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>PANNZER—high-throughput functional annotation of uncharacterized proteins in an error-prone environment</article-title>
<source>Bioinformatics</source>
<year>2014</year>
<comment>doi:10.1093/bioinformatics/btu851</comment>
</element-citation>
</ref>
<ref id="B26">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>O'Donoghue</surname>
<given-names>S.I.</given-names>
</name>
<name>
<surname>Sabir</surname>
<given-names>K.S.</given-names>
</name>
<name>
<surname>Kalemanov</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Stolte</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Wellmann</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Roos</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Perdigão</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Buske</surname>
<given-names>F.A.</given-names>
</name>
<name>
<surname>Heinrich</surname>
<given-names>J.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Aquaria: simplifying discovery and insight from protein structures</article-title>
<source>Nat. Methods</source>
<year>2015</year>
<volume>12</volume>
<fpage>98</fpage>
<lpage>99</lpage>
<pub-id pub-id-type="pmid">25633501</pub-id>
</element-citation>
</ref>
<ref id="B27">
<label>27.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rost</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>Combining evolutionary information and neural networks to predict protein secondary structure</article-title>
<source>Proteins</source>
<year>1994</year>
<volume>19</volume>
<fpage>55</fpage>
<lpage>72</lpage>
<pub-id pub-id-type="pmid">8066087</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000063 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000063 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4489265
   |texte=   SANSparallel: interactive homology search against Uniprot
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:25855811" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024