Indexing Arbitrary-Length k-Mers in Sequencing Reads
Identifieur interne : 001009 ( Pmc/Curation ); précédent : 001008; suivant : 001010Indexing Arbitrary-Length k-Mers in Sequencing Reads
Auteurs : Tomasz Kowalski [Pologne] ; Szymon Grabowski [Pologne] ; Sebastian Deorowicz [Pologne]Source :
- PLoS ONE [ 1932-6203 ] ; 2015.
Abstract
We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating
Url:
DOI: 10.1371/journal.pone.0133198
PubMed: 26182400
PubMed Central: 4504488
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :001009
Links to Exploration step
PMC:4504488Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">26182400</idno>
<idno type="pmc">4504488</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4504488</idno>
<idno type="RBID">PMC:4504488</idno>
<idno type="doi">10.1371/journal.pone.0133198</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">001009</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001009</idno>
<idno type="wicri:Area/Pmc/Curation">001009</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001009</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:aff id="aff001"><addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:aff id="aff002"><addr-line>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland</addr-line>
</nlm:aff>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p>We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating <italic>k</italic>
-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Gusfield, D" uniqKey="Gusfield D">D Gusfield</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author><name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Danek, A" uniqKey="Danek A">A Danek</name>
</author>
<author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kelly, Dr" uniqKey="Kelly D">DR Kelly</name>
</author>
<author><name sortKey="Schatz, Mc" uniqKey="Schatz M">MC Schatz</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author><name sortKey="Molnar, M" uniqKey="Molnar M">M Molnar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heo, Y" uniqKey="Heo Y">Y Heo</name>
</author>
<author><name sortKey="Wu, Xl" uniqKey="Wu X">XL Wu</name>
</author>
<author><name sortKey="Chen, D" uniqKey="Chen D">D Chen</name>
</author>
<author><name sortKey="Ma, J" uniqKey="Ma J">J Ma</name>
</author>
<author><name sortKey="Hwu, Wm" uniqKey="Hwu W">WM Hwu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author><name sortKey="Weese, D" uniqKey="Weese D">D Weese</name>
</author>
<author><name sortKey="Holtgrewe, M" uniqKey="Holtgrewe M">M Holtgrewe</name>
</author>
<author><name sortKey="Dimitrova, V" uniqKey="Dimitrova V">V Dimitrova</name>
</author>
<author><name sortKey="Niu, S" uniqKey="Niu S">S Niu</name>
</author>
<author><name sortKey="Reinert, K" uniqKey="Reinert K">K Reinert</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author><name sortKey="Kobert, K" uniqKey="Kobert K">K Kobert</name>
</author>
<author><name sortKey="Flouri, T" uniqKey="Flouri T">T Flouri</name>
</author>
<author><name sortKey="Stamatakis, A" uniqKey="Stamatakis A">A Stamatakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ames, Sk" uniqKey="Ames S">SK Ames</name>
</author>
<author><name sortKey="Hysom, Da" uniqKey="Hysom D">DA Hysom</name>
</author>
<author><name sortKey="Gardner, Sn" uniqKey="Gardner S">SN Gardner</name>
</author>
<author><name sortKey="Lloyd, Gs" uniqKey="Lloyd G">GS Lloyd</name>
</author>
<author><name sortKey="Gokhale, Mb" uniqKey="Gokhale M">MB Gokhale</name>
</author>
<author><name sortKey="Allen, Je" uniqKey="Allen J">JE Allen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, D" uniqKey="Wood D">D Wood</name>
</author>
<author><name sortKey="Salzberg, S" uniqKey="Salzberg S">S Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bazinet, Al" uniqKey="Bazinet A">AL Bazinet</name>
</author>
<author><name sortKey="Cummings, Mp" uniqKey="Cummings M">MP Cummings</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Philippe, N" uniqKey="Philippe N">N Philippe</name>
</author>
<author><name sortKey="Salson, M" uniqKey="Salson M">M Salson</name>
</author>
<author><name sortKey="Lecroq, T" uniqKey="Lecroq T">T Lecroq</name>
</author>
<author><name sortKey="Leonard, M" uniqKey="Leonard M">M Léonard</name>
</author>
<author><name sortKey="Commes, T" uniqKey="Commes T">T Commes</name>
</author>
<author><name sortKey="Rivals, E" uniqKey="Rivals E">E Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Philippe, N" uniqKey="Philippe N">N Philippe</name>
</author>
<author><name sortKey="Salson, M" uniqKey="Salson M">M Salson</name>
</author>
<author><name sortKey="Commes, T" uniqKey="Commes T">T Commes</name>
</author>
<author><name sortKey="Rivals, E" uniqKey="Rivals E">E Rivals</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
<author><name sortKey="Lavenier, D" uniqKey="Lavenier D">D Lavenier</name>
</author>
<author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author><name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Debudaj Grabysz, A" uniqKey="Debudaj Grabysz A">A Debudaj-Grabysz</name>
</author>
<author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schröder</name>
</author>
<author><name sortKey="Schroder, H" uniqKey="Schroder H">H Schröder</name>
</author>
<author><name sortKey="Puglisi, Sj" uniqKey="Puglisi S">SJ Puglisi</name>
</author>
<author><name sortKey="Sinha, R" uniqKey="Sinha R">R Sinha</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salmela, L" uniqKey="Salmela L">L Salmela</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
<author><name sortKey="Pertea, M" uniqKey="Pertea M">M Pertea</name>
</author>
<author><name sortKey="Fahrner, Ja" uniqKey="Fahrner J">JA Fahrner</name>
</author>
<author><name sortKey="Sobreira, N" uniqKey="Sobreira N">N Sobreira</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kurtz, S" uniqKey="Kurtz S">S Kurtz</name>
</author>
<author><name sortKey="Phillippy, A" uniqKey="Phillippy A">A Phillippy</name>
</author>
<author><name sortKey="Delcher, Al" uniqKey="Delcher A">AL Delcher</name>
</author>
<author><name sortKey="Smoot, M" uniqKey="Smoot M">M Smoot</name>
</author>
<author><name sortKey="Shumway, M" uniqKey="Shumway M">M Shumway</name>
</author>
<author><name sortKey="Antonescu, C" uniqKey="Antonescu C">C Antonescu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Manber, U" uniqKey="Manber U">U Manber</name>
</author>
<author><name sortKey="Myers, G" uniqKey="Myers G">G Myers</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Maier, D" uniqKey="Maier D">D Maier</name>
</author>
<author><name sortKey="Storer, Ja" uniqKey="Storer J">JA Storer</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Grabowski, S" uniqKey="Grabowski S">S Grabowski</name>
</author>
<author><name sortKey="Deorowicz, S" uniqKey="Deorowicz S">S Deorowicz</name>
</author>
<author><name sortKey="Roguski, L" uniqKey="Roguski L">L Roguski</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
<front><journal-meta><journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group><journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher><publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">26182400</article-id>
<article-id pub-id-type="pmc">4504488</article-id>
<article-id pub-id-type="publisher-id">PONE-D-15-06025</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0133198</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</article-title>
<alt-title alt-title-type="running-head">Indexing Arbitrary-Length <italic>k</italic>
-Mers in Sequencing Reads</alt-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Kowalski</surname>
<given-names>Tomasz</given-names>
</name>
<xref ref-type="aff" rid="aff001"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Grabowski</surname>
<given-names>Szymon</given-names>
</name>
<xref ref-type="aff" rid="aff001"><sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author"><name><surname>Deorowicz</surname>
<given-names>Sebastian</given-names>
</name>
<xref ref-type="aff" rid="aff002"><sup>2</sup>
</xref>
<xref ref-type="corresp" rid="cor001">*</xref>
</contrib>
</contrib-group>
<aff id="aff001"><label>1</label>
<addr-line>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland</addr-line>
</aff>
<aff id="aff002"><label>2</label>
<addr-line>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland</addr-line>
</aff>
<contrib-group><contrib contrib-type="editor"><name><surname>Kingsford</surname>
<given-names>Carl</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1"><addr-line>University of Maryland, UNITED STATES</addr-line>
</aff>
<author-notes><fn fn-type="COI-statement" id="coi001"><p><bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001"><p>Conceived and designed the experiments: TK SG SD. Performed the experiments: TK. Analyzed the data: TK SG SD. Wrote the paper: TK SG SD.</p>
</fn>
<corresp id="cor001">* E-mail: <email>sebastian.deorowicz@polsl.pl</email>
</corresp>
</author-notes>
<pub-date pub-type="collection"><year>2015</year>
</pub-date>
<pub-date pub-type="epub"><day>16</day>
<month>7</month>
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>7</issue>
<elocation-id>e0133198</elocation-id>
<history><date date-type="received"><day>13</day>
<month>2</month>
<year>2015</year>
</date>
<date date-type="accepted"><day>24</day>
<month>6</month>
<year>2015</year>
</date>
</history>
<permissions><copyright-statement>© 2015 Kowalski et al</copyright-statement>
<copyright-year>2015</copyright-year>
<copyright-holder>Kowalski et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0133198.pdf"></self-uri>
<abstract><p>We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating <italic>k</italic>
-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.</p>
</abstract>
<funding-group><funding-statement>This work was supported by The Polish National Science Centre under the project DEC-2012/05/B/ST6/03148. The infrastructure was supported by POIG.02.03.01-24-099/13 grant “GeCONiI---Upper Silesian Center for Computational Science and Engineering.” The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts><fig-count count="6"></fig-count>
<table-count count="7"></table-count>
<page-count count="16"></page-count>
</counts>
<custom-meta-group><custom-meta id="data-availability"><meta-name>Data Availability</meta-name>
<meta-value>All relevant data (including URLs to public repositories) are available within the paper.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes><title>Data Availability</title>
<p>All relevant data (including URLs to public repositories) are available within the paper.</p>
</notes>
</front>
</pmc>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001009 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 001009 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Pmc |étape= Curation |type= RBID |clé= PMC:4504488 |texte= Indexing Arbitrary-Length k-Mers in Sequencing Reads }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i -Sk "pubmed:26182400" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |