Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.

Identifieur interne : 000631 ( PubMed/Corpus ); précédent : 000630; suivant : 000632

16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.

Auteurs : Stephen Woloszynek ; Zhengqiao Zhao ; Jian Chen ; Gail L. Rosen

Source :

RBID : pubmed:30807567

English descriptors

Abstract

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.

DOI: 10.1371/journal.pcbi.1006721
PubMed: 30807567

Links to Exploration step

pubmed:30807567

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.</title>
<author>
<name sortKey="Woloszynek, Stephen" sort="Woloszynek, Stephen" uniqKey="Woloszynek S" first="Stephen" last="Woloszynek">Stephen Woloszynek</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Zhengqiao" sort="Zhao, Zhengqiao" uniqKey="Zhao Z" first="Zhengqiao" last="Zhao">Zhengqiao Zhao</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Chen, Jian" sort="Chen, Jian" uniqKey="Chen J" first="Jian" last="Chen">Jian Chen</name>
<affiliation>
<nlm:affiliation>Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:30807567</idno>
<idno type="pmid">30807567</idno>
<idno type="doi">10.1371/journal.pcbi.1006721</idno>
<idno type="wicri:Area/PubMed/Corpus">000631</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000631</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.</title>
<author>
<name sortKey="Woloszynek, Stephen" sort="Woloszynek, Stephen" uniqKey="Woloszynek S" first="Stephen" last="Woloszynek">Stephen Woloszynek</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Zhengqiao" sort="Zhao, Zhengqiao" uniqKey="Zhao Z" first="Zhengqiao" last="Zhao">Zhengqiao Zhao</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Chen, Jian" sort="Chen, Jian" uniqKey="Chen J" first="Jian" last="Chen">Jian Chen</name>
<affiliation>
<nlm:affiliation>Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, United States of America.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
<affiliation>
<nlm:affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</nlm:affiliation>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS computational biology</title>
<idno type="eISSN">1553-7358</idno>
<imprint>
<date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
<term>Computational Biology (methods)</term>
<term>High-Throughput Nucleotide Sequencing (methods)</term>
<term>Microbiota (genetics)</term>
<term>RNA, Ribosomal, 16S (genetics)</term>
<term>RNA, Ribosomal, 16S (physiology)</term>
<term>Sequence Analysis, RNA (methods)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>RNA, Ribosomal, 16S</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Microbiota</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Computational Biology</term>
<term>High-Throughput Nucleotide Sequencing</term>
<term>Sequence Analysis, RNA</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="physiology" xml:lang="en">
<term>RNA, Ribosomal, 16S</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Cluster Analysis</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">30807567</PMID>
<DateCompleted>
<Year>2019</Year>
<Month>03</Month>
<Day>28</Day>
</DateCompleted>
<DateRevised>
<Year>2020</Year>
<Month>02</Month>
<Day>25</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Electronic">1553-7358</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>15</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2019</Year>
<Month>02</Month>
</PubDate>
</JournalIssue>
<Title>PLoS computational biology</Title>
<ISOAbbreviation>PLoS Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.</ArticleTitle>
<Pagination>
<MedlinePgn>e1006721</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1371/journal.pcbi.1006721</ELocationID>
<Abstract>
<AbstractText>Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space. Here, we use Skip-Gram word2vec to embed k-mers, obtained from 16S rRNA amplicon surveys, and then leverage an existing sentence embedding technique to embed all sequences belonging to specific body sites or samples. We demonstrate that these representations are meaningful, and hence the embedding space can be exploited as a form of feature extraction for exploratory analysis. We show that sequence embeddings preserve relevant information about the sequencing data such as k-mer context, sequence taxonomy, and sample class. Specifically, the sequence embedding space resolved differences among phyla, as well as differences among genera within the same family. Distances between sequence embeddings had similar qualities to distances between alignment identities, and embedding multiple sequences can be thought of as generating a consensus sequence. In addition, embeddings are versatile features that can be used for many downstream tasks, such as taxonomic and sample classification. Using sample embeddings for body site classification resulted in negligible performance loss compared to using OTU abundance data, and clustering embeddings yielded high fidelity species clusters. Lastly, the k-mer embedding space captured distinct k-mer profiles that mapped to specific regions of the 16S rRNA gene and corresponded with particular body sites. Together, our results show that embedding sequences results in meaningful representations that can be used for exploratory analyses or for downstream machine learning applications that require numeric data. Moreover, because the embeddings are trained in an unsupervised manner, unlabeled data can be embedded and used to bolster supervised machine learning tasks.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Woloszynek</LastName>
<ForeName>Stephen</ForeName>
<Initials>S</Initials>
<Identifier Source="ORCID">0000-0003-0568-298X</Identifier>
<AffiliationInfo>
<Affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zhao</LastName>
<ForeName>Zhengqiao</ForeName>
<Initials>Z</Initials>
<AffiliationInfo>
<Affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Chen</LastName>
<ForeName>Jian</ForeName>
<Initials>J</Initials>
<AffiliationInfo>
<Affiliation>Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, New York, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Rosen</LastName>
<ForeName>Gail L</ForeName>
<Initials>GL</Initials>
<Identifier Source="ORCID">0000-0003-1763-5750</Identifier>
<AffiliationInfo>
<Affiliation>Department of Electrical and Computer Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2019</Year>
<Month>02</Month>
<Day>26</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>PLoS Comput Biol</MedlineTA>
<NlmUniqueID>101238922</NlmUniqueID>
<ISSNLinking>1553-734X</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>0</RegistryNumber>
<NameOfSubstance UI="D012336">RNA, Ribosomal, 16S</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000465" MajorTopicYN="N">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016000" MajorTopicYN="N">Cluster Analysis</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D019295" MajorTopicYN="N">Computational Biology</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="N">High-Throughput Nucleotide Sequencing</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="N">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D064307" MajorTopicYN="N">Microbiota</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="N">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D012336" MajorTopicYN="N">RNA, Ribosomal, 16S</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
<QualifierName UI="Q000502" MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017423" MajorTopicYN="N">Sequence Analysis, RNA</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
<CoiStatement>The authors have declared that no competing interests exist.</CoiStatement>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2018</Year>
<Month>05</Month>
<Day>04</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2018</Year>
<Month>12</Month>
<Day>17</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2019</Year>
<Month>03</Month>
<Day>08</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2019</Year>
<Month>2</Month>
<Day>27</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>3</Month>
<Day>29</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2019</Year>
<Month>2</Month>
<Day>27</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">30807567</ArticleId>
<ArticleId IdType="doi">10.1371/journal.pcbi.1006721</ArticleId>
<ArticleId IdType="pii">PCOMPBIOL-D-18-00713</ArticleId>
<ArticleId IdType="pmc">PMC6407789</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Appl Environ Microbiol. 2004 Feb;70(2):1008-16</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14766583</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Eur J Biochem. 1992 Aug 1;207(3):839-46</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">1499561</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Microbiol Methods. 2004 Dec;59(3):327-35</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15488276</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 2006 Jul;72(7):5069-72</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16820507</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Appl Environ Microbiol. 2007 Aug;73(16):5261-7</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17586664</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>IEEE Trans Neural Netw. 2008 Apr;19(4):713-22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18390314</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2010 May;7(5):335-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20383131</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Stat Softw. 2010;33(1):1-22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20808728</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Med Genomics. 2011 Mar 04;4:22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21371338</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nutr Rev. 2012 Aug;70 Suppl 1:S45-56</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">22861807</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Pharmacogenomics J. 2013 Dec;13(6):514-22</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23032991</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Med. 2012 Oct 10;4(10):77</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23050952</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Mol Metab. 2012 Aug 03;1(1-2):21-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24024115</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2014 Jan;42(Database issue):D625-32</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24198250</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2014 Mar 03;15(3):R46</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24580807</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell Host Microbe. 2014 Mar 12;15(3):382-392</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24629344</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2014 Apr 10;9(4):e94249</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24722003</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cancer J. 2014 May-Jun;20(3):225-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24855012</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2014 Jun 02;9(6):e98741</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">24887397</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Dent Res J (Isfahan). 2014 May;11(3):291-301</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25097637</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Front Microbiol. 2014 Oct 13;5:508</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25352835</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Cell. 2014 Nov 6;159(4):789-99</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25417156</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 2015 May 22;348(6237):1261359</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25999513</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2015 Oct 06;16:322</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26445311</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2015 Nov 10;10(11):e0141287</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26555596</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Sci Rep. 2015 Nov 26;5:17098</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">26606973</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Int Rev Cell Mol Biol. 2016;324:67-124</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27017007</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Methods. 2016 Jul;13(7):581-3</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27214047</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Can J Microbiol. 2016 Aug;62(8):692-703</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27314511</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PeerJ. 2016 Oct 18;4:e2584</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27781170</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Pac Symp Biocomput. 2016;22:254-265</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">27896980</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>FEMS Microbiol Ecol. 2017 Apr 1;93(4):</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28334218</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2017 May 30;18(1):283</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28558684</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>NPJ Biofilms Microbiomes. 2016 Apr 20;2:16004</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28721243</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>ISME J. 2017 Dec;11(12):2639-2643</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28731476</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Gastroenterol Hepatol. 2017 Oct;14(10):573-584</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28743984</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2017 Jul 15;33(14):i92-i101</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28881969</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nat Rev Microbiol. 2018 Mar;16(3):143-155</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">29332945</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2018 Jul 15;34(14):2371-2375</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">29506021</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J R Soc Interface. 2018 Apr;15(141):null</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">29618526</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>mSystems. 2018 May 15;3(3):null</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">29795809</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2018 Jul 1;34(13):i32-i42</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">29950008</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2018 Nov 6;19(1):799</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">30400812</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000631 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000631 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:30807567
   |texte=   16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:30807567" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021