Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.

Identifieur interne : 000830 ( PubMed/Corpus ); précédent : 000829; suivant : 000831

MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.

Auteurs : Kévin Vervier ; Pierre Mahé ; Jean-Philippe Vert

Source :

RBID : pubmed:30030800

English descriptors

Abstract

Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.

DOI: 10.1007/978-1-4939-8561-6_2
PubMed: 30030800

Links to Exploration step

pubmed:30030800

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author>
<name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation>
<nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation>
<nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation>
<nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:30030800</idno>
<idno type="pmid">30030800</idno>
<idno type="doi">10.1007/978-1-4939-8561-6_2</idno>
<idno type="wicri:Area/PubMed/Corpus">000830</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000830</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</title>
<author>
<name sortKey="Vervier, Kevin" sort="Vervier, Kevin" uniqKey="Vervier K" first="Kévin" last="Vervier">Kévin Vervier</name>
<affiliation>
<nlm:affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Mahe, Pierre" sort="Mahe, Pierre" uniqKey="Mahe P" first="Pierre" last="Mahé">Pierre Mahé</name>
<affiliation>
<nlm:affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Vert, Jean Philippe" sort="Vert, Jean Philippe" uniqKey="Vert J" first="Jean-Philippe" last="Vert">Jean-Philippe Vert</name>
<affiliation>
<nlm:affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Methods in molecular biology (Clifton, N.J.)</title>
<idno type="eISSN">1940-6029</idno>
<imprint>
<date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Metagenomics (methods)</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Metagenomics</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Base Sequence</term>
<term>Calibration</term>
<term>Genome, Bacterial</term>
<term>Machine Learning</term>
<term>Reproducibility of Results</term>
<term>Sequence Analysis, DNA</term>
<term>Software</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">30030800</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>03</Month>
<Day>04</Day>
</DateCompleted>
<DateRevised>
<Year>2019</Year>
<Month>03</Month>
<Day>04</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1940-6029</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>1807</Volume>
<PubDate>
<Year>2018</Year>
</PubDate>
</JournalIssue>
<Title>Methods in molecular biology (Clifton, N.J.)</Title>
<ISOAbbreviation>Methods Mol. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.</ArticleTitle>
<Pagination>
<MedlinePgn>9-20</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1007/978-1-4939-8561-6_2</ELocationID>
<Abstract>
<AbstractText>Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Vervier</LastName>
<ForeName>Kévin</ForeName>
<Initials>K</Initials>
<AffiliationInfo>
<Affiliation>Department of Psychiatry, University of Iowa Hospital and Clinics, Iowa, IA, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Mahé</LastName>
<ForeName>Pierre</ForeName>
<Initials>P</Initials>
<AffiliationInfo>
<Affiliation>Bioinformatics Research Department, BioMérieux, Marcy-l'Étoile, France.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Vert</LastName>
<ForeName>Jean-Philippe</ForeName>
<Initials>JP</Initials>
<AffiliationInfo>
<Affiliation>MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Fontainebleau, France. Jean-Philippe.Vert@mines-paristech.fr.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Institut Curie, Paris Cedex, France. Jean-Philippe.Vert@mines-paristech.fr.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>INSERM U900, Paris Cedex, France. Jean-Philippe.Vert@mines-paristech.fr.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Département de Mathématiques et Applications, École Normale Supérieure, CNRS, PSL Research University, Paris, France. Jean-Philippe.Vert@mines-paristech.fr.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Methods Mol Biol</MedlineTA>
<NlmUniqueID>9214969</NlmUniqueID>
<ISSNLinking>1064-3745</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D001483" MajorTopicYN="N">Base Sequence</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D002138" MajorTopicYN="N">Calibration</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016680" MajorTopicYN="N">Genome, Bacterial</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000069550" MajorTopicYN="Y">Machine Learning</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D056186" MajorTopicYN="N">Metagenomics</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015203" MajorTopicYN="N">Reproducibility of Results</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017422" MajorTopicYN="Y">Sequence Analysis, DNA</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012984" MajorTopicYN="Y">Software</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="Y">Binning</Keyword>
<Keyword MajorTopicYN="Y">Classification</Keyword>
<Keyword MajorTopicYN="Y">Machine learning</Keyword>
<Keyword MajorTopicYN="Y">Metagenomics</Keyword>
<Keyword MajorTopicYN="Y">Microbiology</Keyword>
<Keyword MajorTopicYN="Y">Next-generation sequencing</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2018</Year>
<Month>7</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2018</Year>
<Month>7</Month>
<Day>22</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>3</Month>
<Day>5</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">30030800</ArticleId>
<ArticleId IdType="doi">10.1007/978-1-4939-8561-6_2</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000830 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000830 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:30030800
   |texte=   MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:30030800" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021