Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.

Identifieur interne : 000558 ( PubMed/Checkpoint ); précédent : 000557; suivant : 000559

Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.

Auteurs : Yuansheng Liu [Australie] ; Leo Yu Zhang [Australie] ; Jinyan Li [Australie]

Source :

RBID : pubmed:30994891

Abstract

Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches.

DOI: 10.1093/bioinformatics/btz273
PubMed: 30994891


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:30994891

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.</title>
<author>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007</wicri:regionArea>
<wicri:noRegion>NSW 2007</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Leo Yu" sort="Zhang, Leo Yu" uniqKey="Zhang L" first="Leo Yu" last="Zhang">Leo Yu Zhang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Information Technology, Deakin University, VIC 3216, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>School of Information Technology, Deakin University, VIC 3216</wicri:regionArea>
<wicri:noRegion>VIC 3216</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
<affiliation wicri:level="1">
<nlm:affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007</wicri:regionArea>
<wicri:noRegion>NSW 2007</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:30994891</idno>
<idno type="pmid">30994891</idno>
<idno type="doi">10.1093/bioinformatics/btz273</idno>
<idno type="wicri:Area/PubMed/Corpus">000565</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000565</idno>
<idno type="wicri:Area/PubMed/Curation">000565</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000565</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000558</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000558</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.</title>
<author>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007</wicri:regionArea>
<wicri:noRegion>NSW 2007</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Leo Yu" sort="Zhang, Leo Yu" uniqKey="Zhang L" first="Leo Yu" last="Zhang">Leo Yu Zhang</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Information Technology, Deakin University, VIC 3216, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>School of Information Technology, Deakin University, VIC 3216</wicri:regionArea>
<wicri:noRegion>VIC 3216</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
<affiliation wicri:level="1">
<nlm:affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007</wicri:regionArea>
<wicri:noRegion>NSW 2007</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="In-Data-Review" Owner="NLM">
<PMID Version="1">30994891</PMID>
<DateRevised>
<Year>2020</Year>
<Month>01</Month>
<Day>08</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>35</Volume>
<Issue>22</Issue>
<PubDate>
<Year>2019</Year>
<Month>Nov</Month>
<Day>01</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.</ArticleTitle>
<Pagination>
<MedlinePgn>4560-4567</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/bioinformatics/btz273</ELocationID>
<Abstract>
<AbstractText Label="MOTIVATION" NlmCategory="BACKGROUND">Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches.</AbstractText>
<AbstractText Label="RESULTS" NlmCategory="RESULTS">Under this framework, we proposed a new method to detect all MEMs from a pair of genomes. The method first performs a fixed sampling of k-mers on the query sequence, and adds these selected k-mers to a Bloom filter. Then all the k-mers of the reference sequence are tested by the Bloom filter. If a k-mer passes the test, it is inserted into a hash table for indexing. Compared with the existing methods, much less number of query k-mers are generated and much less k-mers are inserted into the index to avoid unnecessary matches, leading to an efficient matching process and memory usage savings. Experiments on large genomes demonstrate that our method is at least 1.8 times faster than the best of the existing algorithms. This performance is mainly attributed to the key novelty of our method that the fixed k-mer sampling must be conducted on the query sequence and the index k-mers are filtered from the reference sequence via a Bloom filter.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION" NlmCategory="METHODS">https://github.com/yuansliu/bfMEM.</AbstractText>
<AbstractText Label="SUPPLEMENTARY INFORMATION" NlmCategory="BACKGROUND">Supplementary data are available at Bioinformatics online.</AbstractText>
<CopyrightInformation>© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Liu</LastName>
<ForeName>Yuansheng</ForeName>
<Initials>Y</Initials>
<AffiliationInfo>
<Affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zhang</LastName>
<ForeName>Leo Yu</ForeName>
<Initials>LY</Initials>
<AffiliationInfo>
<Affiliation>School of Information Technology, Deakin University, VIC 3216, Australia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Li</LastName>
<ForeName>Jinyan</ForeName>
<Initials>J</Initials>
<AffiliationInfo>
<Affiliation>Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2019</Year>
<Month>01</Month>
<Day>23</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2019</Year>
<Month>03</Month>
<Day>31</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2019</Year>
<Month>04</Month>
<Day>11</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2019</Year>
<Month>4</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2019</Year>
<Month>4</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2019</Year>
<Month>4</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">30994891</ArticleId>
<ArticleId IdType="pii">5474908</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btz273</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Australie</li>
</country>
</list>
<tree>
<country name="Australie">
<noRegion>
<name sortKey="Liu, Yuansheng" sort="Liu, Yuansheng" uniqKey="Liu Y" first="Yuansheng" last="Liu">Yuansheng Liu</name>
</noRegion>
<name sortKey="Li, Jinyan" sort="Li, Jinyan" uniqKey="Li J" first="Jinyan" last="Li">Jinyan Li</name>
<name sortKey="Zhang, Leo Yu" sort="Zhang, Leo Yu" uniqKey="Zhang L" first="Leo Yu" last="Zhang">Leo Yu Zhang</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000558 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000558 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:30994891
   |texte=   Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:30994891" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021