Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Estimating the k-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.

Identifieur interne : 000574 ( PubMed/Checkpoint ); précédent : 000573; suivant : 000575

Estimating the k-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.

Auteurs : Swati C. Manekar [Inde] ; Shailesh R. Sathe [Inde]

Source :

RBID : pubmed:31015787

Abstract

In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.

DOI: 10.2174/1389202919666181026101326
PubMed: 31015787


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:31015787

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Estimating the
<i>k</i>
-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.</title>
<author>
<name sortKey="Manekar, Swati C" sort="Manekar, Swati C" uniqKey="Manekar S" first="Swati C" last="Manekar">Swati C. Manekar</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur</wicri:regionArea>
<wicri:noRegion>Nagpur</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sathe, Shailesh R" sort="Sathe, Shailesh R" uniqKey="Sathe S" first="Shailesh R" last="Sathe">Shailesh R. Sathe</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur</wicri:regionArea>
<wicri:noRegion>Nagpur</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:31015787</idno>
<idno type="pmid">31015787</idno>
<idno type="doi">10.2174/1389202919666181026101326</idno>
<idno type="wicri:Area/PubMed/Corpus">000557</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000557</idno>
<idno type="wicri:Area/PubMed/Curation">000557</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000557</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000574</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000574</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Estimating the
<i>k</i>
-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.</title>
<author>
<name sortKey="Manekar, Swati C" sort="Manekar, Swati C" uniqKey="Manekar S" first="Swati C" last="Manekar">Swati C. Manekar</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur</wicri:regionArea>
<wicri:noRegion>Nagpur</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sathe, Shailesh R" sort="Sathe, Shailesh R" uniqKey="Sathe S" first="Shailesh R" last="Sathe">Shailesh R. Sathe</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</nlm:affiliation>
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur</wicri:regionArea>
<wicri:noRegion>Nagpur</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Current genomics</title>
<idno type="ISSN">1389-2029</idno>
<imprint>
<date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">31015787</PMID>
<DateRevised>
<Year>2020</Year>
<Month>02</Month>
<Day>25</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">1389-2029</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>20</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2019</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>Current genomics</Title>
<ISOAbbreviation>Curr. Genomics</ISOAbbreviation>
</Journal>
<ArticleTitle>Estimating the
<i>k</i>
-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.</ArticleTitle>
<Pagination>
<MedlinePgn>2-15</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.2174/1389202919666181026101326</ELocationID>
<Abstract>
<AbstractText Label="Background" NlmCategory="UNASSIGNED">In bioinformatics, estimation of k-mer abundance histograms or just enumerat-ing the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequenc-ing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estima-tion in sequencing data have been developed in recent years.</AbstractText>
<AbstractText Label="Objective" NlmCategory="UNASSIGNED">In this article, we present a comparative assessment of the different k-mer frequency estima-tion programs (ntCard, KmerGenie, KmerStream and Khmer (abundance-dist-single.py and unique-kmers.py) to assess their relative merits and demerits.</AbstractText>
<AbstractText Label="Methods" NlmCategory="UNASSIGNED">Principally, the miscounts/error-rates of these tools are analyzed by rigorous experimental analysis for a varied range of k. We also present experimental results on runtime, scalability for larger datasets, memory, CPU utilization as well as parallelism of k-mer frequency estimation methods.</AbstractText>
<AbstractText Label="Results" NlmCategory="UNASSIGNED">The results indicate that ntCard is more accurate in estimating F0, f1 and full k-mer abundance histograms compared with other methods. ntCard is the fastest but it has more memory requirements compared to KmerGenie.</AbstractText>
<AbstractText Label="Conclusion" NlmCategory="UNASSIGNED">The results of this evaluation may serve as a roadmap to potential users and practitioners of streaming algorithms for estimating k-mer coverage frequencies, to assist them in identifying an appro-priate method. Such results analysis also help researchers to discover remaining open research ques-tions, effective combinations of existing techniques and possible avenues for future research.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Manekar</LastName>
<ForeName>Swati C</ForeName>
<Initials>SC</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Sathe</LastName>
<ForeName>Shailesh R</ForeName>
<Initials>SR</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, India.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United Arab Emirates</Country>
<MedlineTA>Curr Genomics</MedlineTA>
<NlmUniqueID>100960527</NlmUniqueID>
<ISSNLinking>1389-2029</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Distinct k-mers</Keyword>
<Keyword MajorTopicYN="N">Hashing</Keyword>
<Keyword MajorTopicYN="N">High-throughput sequencing</Keyword>
<Keyword MajorTopicYN="N">K-mer abundance histogram</Keyword>
<Keyword MajorTopicYN="N">Singleton k-mers</Keyword>
<Keyword MajorTopicYN="N">Streaming algorithms</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2018</Year>
<Month>07</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2018</Year>
<Month>10</Month>
<Day>05</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2018</Year>
<Month>10</Month>
<Day>24</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2019</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2019</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>4</Month>
<Day>25</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">31015787</ArticleId>
<ArticleId IdType="doi">10.2174/1389202919666181026101326</ArticleId>
<ArticleId IdType="pii">CG-20-2</ArticleId>
<ArticleId IdType="pmc">PMC6446480</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11504945</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Jan;13(1):91-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12529310</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2003 Feb 12;19(3):319-26</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12584116</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2003 Oct;13(10):2306-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12975312</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Mar 1;21(5):582-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15374857</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Jun;21 Suppl 1:i351-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15961478</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2005 Sep;1(4):e43</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16184192</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 May;18(5):821-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18349386</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2008 Dec 15;24(24):2818-24</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18952627</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2008 Oct 31;9:517</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18976482</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2009 Jun;19(6):1117-23</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19251739</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genomics. 2010 Jun;95(6):315-27</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20211242</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2010 Apr;17(4):603-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20426693</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2010;11(11):R116</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21114842</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Feb 15;27(4):479-86</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21245053</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jun 1;27(11):1455-61</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21471014</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jul 1;27(13):i137-41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21685062</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2012 Aug 14;109(33):13272-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22847406</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2013 Feb 1;29(3):308-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23202746</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2013 Mar 1;29(5):652-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23325618</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Jan 1;30(1):31-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23732276</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 May 1;30(9):1228-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24443382</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2014 Jul 25;9(7):e101271</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25062443</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Dec 1;30(23):3402-4</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25143290</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2014 Dec 15;30(24):3541-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25355787</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Biol. 2015 Jul 07;13(7):e1002195</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26151137</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>F1000Res. 2015 Sep 25;4:900</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26535114</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2016 Apr;23(4):248-55</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26982880</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2016 Nov 15;32(22):3492-3494</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27423894</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2017 Mar 31;12:9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28373894</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 May 1;33(9):1324-1330</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28453674</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 Sep 1;33(17):2759-2761</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28472236</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Manekar, Swati C" sort="Manekar, Swati C" uniqKey="Manekar S" first="Swati C" last="Manekar">Swati C. Manekar</name>
</noRegion>
<name sortKey="Sathe, Shailesh R" sort="Sathe, Shailesh R" uniqKey="Sathe S" first="Shailesh R" last="Sathe">Shailesh R. Sathe</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000574 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000574 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:31015787
   |texte=   Estimating the k-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:31015787" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021