Indexing Arbitrary-Length k-Mers in Sequencing Reads.
Identifieur interne : 001552 ( PubMed/Curation ); précédent : 001551; suivant : 001553Indexing Arbitrary-Length k-Mers in Sequencing Reads.
Auteurs : Tomasz Kowalski [Pologne] ; Szymon Grabowski [Pologne] ; Sebastian Deorowicz [Pologne]Source :
- PloS one [ 1932-6203 ] ; 2015.
Descripteurs français
- KwdFr :
- MESH :
English descriptors
- KwdEn :
- MESH :
- genetics : Caenorhabditis elegans, Escherichia coli.
- methods : Sequence Analysis, RNA.
- statistics & numerical data : Sequence Analysis, RNA.
- Algorithms, Animals, Datasets as Topic, Genome, High-Throughput Nucleotide Sequencing, Humans, Software.
Abstract
We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments.
DOI: 10.1371/journal.pone.0133198
PubMed: 26182400
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: Pour aller vers cette notice dans l'étape Curation :001552
Links to Exploration step
pubmed:26182400Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Indexing Arbitrary-Length k-Mers in Sequencing Reads.</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:26182400</idno>
<idno type="pmid">26182400</idno>
<idno type="doi">10.1371/journal.pone.0133198</idno>
<idno type="wicri:Area/PubMed/Corpus">001552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001552</idno>
<idno type="wicri:Area/PubMed/Curation">001552</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001552</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Indexing Arbitrary-Length k-Mers in Sequencing Reads.</title>
<author><name sortKey="Kowalski, Tomasz" sort="Kowalski, Tomasz" uniqKey="Kowalski T" first="Tomasz" last="Kowalski">Tomasz Kowalski</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Grabowski, Szymon" sort="Grabowski, Szymon" uniqKey="Grabowski S" first="Szymon" last="Grabowski">Szymon Grabowski</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Deorowicz, Sebastian" sort="Deorowicz, Sebastian" uniqKey="Deorowicz S" first="Sebastian" last="Deorowicz">Sebastian Deorowicz</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.</nlm:affiliation>
<country xml:lang="fr">Pologne</country>
<wicri:regionArea>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Caenorhabditis elegans (genetics)</term>
<term>Datasets as Topic</term>
<term>Escherichia coli (genetics)</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Sequence Analysis, RNA (methods)</term>
<term>Sequence Analysis, RNA (statistics & numerical data)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ARN ()</term>
<term>Animaux</term>
<term>Caenorhabditis elegans (génétique)</term>
<term>Données de la recherche comme sujet</term>
<term>Escherichia coli (génétique)</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Caenorhabditis elegans</term>
<term>Escherichia coli</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Caenorhabditis elegans</term>
<term>Escherichia coli</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" qualifier="statistics & numerical data" xml:lang="en"><term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Animals</term>
<term>Datasets as Topic</term>
<term>Genome</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Humans</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ARN</term>
<term>Animaux</term>
<term>Données de la recherche comme sujet</term>
<term>Génome</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Séquençage nucléotidique à haut débit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments. </div>
</front>
</TEI>
<pubmed><MedlineCitation Status="MEDLINE" Owner="NLM"><PMID Version="1">26182400</PMID>
<DateCompleted><Year>2016</Year>
<Month>04</Month>
<Day>27</Day>
</DateCompleted>
<DateRevised><Year>2020</Year>
<Month>03</Month>
<Day>06</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection"><Journal><ISSN IssnType="Electronic">1932-6203</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>10</Volume>
<Issue>7</Issue>
<PubDate><Year>2015</Year>
</PubDate>
</JournalIssue>
<Title>PloS one</Title>
<ISOAbbreviation>PLoS ONE</ISOAbbreviation>
</Journal>
<ArticleTitle>Indexing Arbitrary-Length k-Mers in Sequencing Reads.</ArticleTitle>
<Pagination><MedlinePgn>e0133198</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pone.0133198</ELocationID>
<Abstract><AbstractText>We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments. </AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Kowalski</LastName>
<ForeName>Tomasz</ForeName>
<Initials>T</Initials>
<AffiliationInfo><Affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Grabowski</LastName>
<ForeName>Szymon</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Poland.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Deorowicz</LastName>
<ForeName>Sebastian</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2015</Year>
<Month>07</Month>
<Day>16</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>PLoS One</MedlineTA>
<NlmUniqueID>101285081</NlmUniqueID>
<ISSNLinking>1932-6203</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName UI="D000465" MajorTopicYN="Y">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017173" MajorTopicYN="N">Caenorhabditis elegans</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D066264" MajorTopicYN="N">Datasets as Topic</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D004926" MajorTopicYN="N">Escherichia coli</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D016678" MajorTopicYN="Y">Genome</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D017423" MajorTopicYN="N">Sequence Analysis, RNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
<QualifierName UI="Q000706" MajorTopicYN="Y">statistics & numerical data</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2015</Year>
<Month>02</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2015</Year>
<Month>06</Month>
<Day>24</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2015</Year>
<Month>7</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2015</Year>
<Month>7</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2016</Year>
<Month>4</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">26182400</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pone.0133198</ArticleId>
<ArticleId IdType="pii">PONE-D-15-06025</ArticleId>
<ArticleId IdType="pmc">PMC4504488</ArticleId>
</ArticleIdList>
<ReferenceList><Reference><Citation>Bioinformatics. 2013 Sep 15;29(18):2253-60</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23828782</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2013 Oct 1;29(19):2490-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23853064</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Genome Biol. 2004;5(2):R12</Citation>
<ArticleIdList><ArticleId IdType="pubmed">14759262</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2010 May 15;26(10):1284-90</Citation>
<ArticleIdList><ArticleId IdType="pubmed">20378555</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Genome Biol. 2013 Mar 28;14(3):R30</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23537109</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Hum Mutat. 2014 Mar;35(3):283-8</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24375697</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2015 May 1;31(9):1389-95</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25536966</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Genome Biol. 2014 Mar 03;15(3):R46</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24580807</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Nat Methods. 2012 Mar 04;9(4):357-9</Citation>
<ArticleIdList><ArticleId IdType="pubmed">22388286</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>BMC Bioinformatics. 2013 May 16;14:160</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23679007</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2009 Jul 15;25(14):1754-60</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19451168</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2014 Sep 1;30(17):i356-63</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25161220</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>PLoS One. 2014 Oct 07;9(10):e109384</Citation>
<ArticleIdList><ArticleId IdType="pubmed">25289699</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList><ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2014 May 15;30(10):1354-62</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24451628</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2014 Mar 1;30(5):614-20</Citation>
<ArticleIdList><ArticleId IdType="pubmed">24142950</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2009 Sep 1;25(17):2157-63</Citation>
<ArticleIdList><ArticleId IdType="pubmed">19542152</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>BMC Bioinformatics. 2012 May 10;13:92</Citation>
<ArticleIdList><ArticleId IdType="pubmed">22574964</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList><Reference><Citation>BMC Bioinformatics. 2011 Jun 17;12:242</Citation>
<ArticleIdList><ArticleId IdType="pubmed">21682852</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001552 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001552 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Curation |type= RBID |clé= pubmed:26182400 |texte= Indexing Arbitrary-Length k-Mers in Sequencing Reads. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i -Sk "pubmed:26182400" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |