Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Clustering of reads with alignment-free measures and quality values.

Identifieur interne : 001696 ( PubMed/Curation ); précédent : 001695; suivant : 001697

Clustering of reads with alignment-free measures and quality values.

Auteurs : Matteo Comin [Italie] ; Andrea Leoni [Italie] ; Michele Schimd [Italie]

Source :

RBID : pubmed:25691913

Abstract

The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %).

DOI: 10.1186/s13015-014-0029-x
PubMed: 25691913

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:25691913

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Clustering of reads with alignment-free measures and quality values.</title>
<author>
<name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Leoni, Andrea" sort="Leoni, Andrea" uniqKey="Leoni A" first="Andrea" last="Leoni">Andrea Leoni</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Schimd, Michele" sort="Schimd, Michele" uniqKey="Schimd M" first="Michele" last="Schimd">Michele Schimd</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:25691913</idno>
<idno type="pmid">25691913</idno>
<idno type="doi">10.1186/s13015-014-0029-x</idno>
<idno type="wicri:Area/PubMed/Corpus">001696</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001696</idno>
<idno type="wicri:Area/PubMed/Curation">001696</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001696</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Clustering of reads with alignment-free measures and quality values.</title>
<author>
<name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Leoni, Andrea" sort="Leoni, Andrea" uniqKey="Leoni A" first="Andrea" last="Leoni">Andrea Leoni</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Schimd, Michele" sort="Schimd, Michele" uniqKey="Schimd M" first="Michele" last="Schimd">Michele Schimd</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</nlm:affiliation>
<country xml:lang="fr">Italie</country>
<wicri:regionArea>Department of Information Engineering, University of Padova, Padova</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint>
<date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %).</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">25691913</PMID>
<DateCompleted>
<Year>2015</Year>
<Month>02</Month>
<Day>18</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>11</Month>
<Day>13</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">1748-7188</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>10</Volume>
<PubDate>
<Year>2015</Year>
</PubDate>
</JournalIssue>
<Title>Algorithms for molecular biology : AMB</Title>
<ISOAbbreviation>Algorithms Mol Biol</ISOAbbreviation>
</Journal>
<ArticleTitle>Clustering of reads with alignment-free measures and quality values.</ArticleTitle>
<Pagination>
<MedlinePgn>4</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1186/s13015-014-0029-x</ELocationID>
<Abstract>
<AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">The data volume generated by Next-Generation Sequencing (NGS) technologies is growing at a pace that is now challenging the storage and data processing capacities of modern computer systems. In this context an important aspect is the reduction of data complexity by collapsing redundant reads in a single cluster to improve the run time, memory requirements, and quality of post-processing steps like assembly and error correction. Several alignment-free measures, based on k-mers counts, have been used to cluster reads. Quality scores produced by NGS platforms are fundamental for various analysis of NGS data like reads mapping and error detection. Moreover future-generation sequencing platforms will produce long reads but with a large number of erroneous bases (up to 15 %).</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">In this scenario it will be fundamental to exploit quality value information within the alignment-free framework. To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data. Based on this principles, in this paper we present a family of alignment-free measures called D (q) -type. A set of experiments on simulated and real reads data confirms that the new measures are superior to other classical alignment-free statistics, especially when erroneous reads are considered. Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures. These statistics are implemented in a software called QCluster (http://www.dei.unipd.it/~ciompin/main/qcluster.html).</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Comin</LastName>
<ForeName>Matteo</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Leoni</LastName>
<ForeName>Andrea</ForeName>
<Initials>A</Initials>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Schimd</LastName>
<ForeName>Michele</ForeName>
<Initials>M</Initials>
<AffiliationInfo>
<Affiliation>Department of Information Engineering, University of Padova, Padova, Italy.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2015</Year>
<Month>01</Month>
<Day>28</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Algorithms Mol Biol</MedlineTA>
<NlmUniqueID>101265088</NlmUniqueID>
<ISSNLinking>1748-7188</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Alignment-free measures</Keyword>
<Keyword MajorTopicYN="N">Reads clustering</Keyword>
<Keyword MajorTopicYN="N">Reads quality values</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2014</Year>
<Month>11</Month>
<Day>19</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2014</Year>
<Month>12</Month>
<Day>17</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>2</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2015</Year>
<Month>2</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2015</Year>
<Month>2</Month>
<Day>19</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">25691913</ArticleId>
<ArticleId IdType="doi">10.1186/s13015-014-0029-x</ArticleId>
<ArticleId IdType="pii">29</ArticleId>
<ArticleId IdType="pmc">PMC4331138</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2003 Mar 1;19(4):513-23</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12611807</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Sep 15;27(18):2502-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21810899</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Microbiol. 2008 Jun;6(6):419-30</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18475305</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 Nov;18(11):1851-8</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18714091</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">3460087</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2677-82</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19188606</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2009 Dec;16(12):1615-34</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20001252</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2008 May;18(5):821-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18349386</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2013 Sep 08;14:268</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24011402</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 1998 Mar;8(3):186-94</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">9521922</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Proc Natl Acad Sci U S A. 2002 Oct 29;99(22):13980-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12374863</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Algorithms Mol Biol. 2012 Dec 06;7(1):34</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23216990</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2012 Aug 05;13:375</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22863213</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Evol Biol. 2007 Mar 15;7:41</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17359548</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2012 Mar 1;28(5):656-63</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22247280</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2009 Jul;19(7):1309-15</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19439514</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2013 Feb;20(2):64-79</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23383994</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008 Sep 23;9:394</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18811946</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2007 Jul 1;23(13):i249-55</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17646303</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Mol Biol. 1990 Oct 5;215(3):403-10</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2231712</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):500-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26356018</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2011 Jan;8(1):59-60</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21191376</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE/ACM Trans Comput Biol Bioinform. 2014 Jul-Aug;11(4):628-37</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26356333</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2010 Jan 18;11 Suppl 1:S16</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20122187</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2008 Sep;36(16):5221-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18684996</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2014;15 Suppl 9:S1</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25252700</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W45-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15215347</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2010 Nov;17(11):1467-90</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20973742</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Res. 2011 Jun;21(6):961-73</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20980555</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Comput Biol. 2011 Dec;18(12):1819-29</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21548811</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001696 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001696 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:25691913
   |texte=   Clustering of reads with alignment-free measures and quality values.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:25691913" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021