Efficient Mining Multi-mers in a Variety of Biological Sequences.
Identifieur interne : 000840 ( PubMed/Corpus ); précédent : 000839; suivant : 000841Efficient Mining Multi-mers in a Variety of Biological Sequences.
Auteurs : Jingsong Zhang ; Jianmei Guo ; Ming Zhang ; Xiangtian Yu ; Xiaoqing Yu ; Weifeng Guo ; Tao Zeng ; Luonan ChenSource :
- IEEE/ACM transactions on computational biology and bioinformatics [ 1557-9964 ] ; 2018.
Abstract
Counting the occurrence frequency of each -mer in a biological sequence is a preliminary yet important step in many bioinformatics applications. However, most -mer counting algorithms rely on a given k to produce single-length -mers, which is inefficient for sequence analysis for different k. Moreover, existing -mer counters focus more on DNA and RNA sequences and less on protein ones. In practice, the analysis of -mers in protein sequences can provide substantial biological insights in structure, function and evolution. To this end, an efficient algorithm, called MulMer (Multiple-Mer mining), is proposed to mine -mers of various lengths termed multi-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index methods. Moreover, to the best of our knowledge, MulMer is the first able to mine multi-mers in a variety of sequences, including DNARNA and protein sequences.
DOI: 10.1109/TCBB.2018.2828313
PubMed: 29993642
Links to Exploration step
pubmed:29993642Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Efficient Mining Multi-mers in a Variety of Biological Sequences.</title>
<author><name sortKey="Zhang, Jingsong" sort="Zhang, Jingsong" uniqKey="Zhang J" first="Jingsong" last="Zhang">Jingsong Zhang</name>
</author>
<author><name sortKey="Guo, Jianmei" sort="Guo, Jianmei" uniqKey="Guo J" first="Jianmei" last="Guo">Jianmei Guo</name>
</author>
<author><name sortKey="Zhang, Ming" sort="Zhang, Ming" uniqKey="Zhang M" first="Ming" last="Zhang">Ming Zhang</name>
</author>
<author><name sortKey="Yu, Xiangtian" sort="Yu, Xiangtian" uniqKey="Yu X" first="Xiangtian" last="Yu">Xiangtian Yu</name>
</author>
<author><name sortKey="Yu, Xiaoqing" sort="Yu, Xiaoqing" uniqKey="Yu X" first="Xiaoqing" last="Yu">Xiaoqing Yu</name>
</author>
<author><name sortKey="Guo, Weifeng" sort="Guo, Weifeng" uniqKey="Guo W" first="Weifeng" last="Guo">Weifeng Guo</name>
</author>
<author><name sortKey="Zeng, Tao" sort="Zeng, Tao" uniqKey="Zeng T" first="Tao" last="Zeng">Tao Zeng</name>
</author>
<author><name sortKey="Chen, Luonan" sort="Chen, Luonan" uniqKey="Chen L" first="Luonan" last="Chen">Luonan Chen</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2018">2018</date>
<idno type="RBID">pubmed:29993642</idno>
<idno type="pmid">29993642</idno>
<idno type="doi">10.1109/TCBB.2018.2828313</idno>
<idno type="wicri:Area/PubMed/Corpus">000840</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000840</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Efficient Mining Multi-mers in a Variety of Biological Sequences.</title>
<author><name sortKey="Zhang, Jingsong" sort="Zhang, Jingsong" uniqKey="Zhang J" first="Jingsong" last="Zhang">Jingsong Zhang</name>
</author>
<author><name sortKey="Guo, Jianmei" sort="Guo, Jianmei" uniqKey="Guo J" first="Jianmei" last="Guo">Jianmei Guo</name>
</author>
<author><name sortKey="Zhang, Ming" sort="Zhang, Ming" uniqKey="Zhang M" first="Ming" last="Zhang">Ming Zhang</name>
</author>
<author><name sortKey="Yu, Xiangtian" sort="Yu, Xiangtian" uniqKey="Yu X" first="Xiangtian" last="Yu">Xiangtian Yu</name>
</author>
<author><name sortKey="Yu, Xiaoqing" sort="Yu, Xiaoqing" uniqKey="Yu X" first="Xiaoqing" last="Yu">Xiaoqing Yu</name>
</author>
<author><name sortKey="Guo, Weifeng" sort="Guo, Weifeng" uniqKey="Guo W" first="Weifeng" last="Guo">Weifeng Guo</name>
</author>
<author><name sortKey="Zeng, Tao" sort="Zeng, Tao" uniqKey="Zeng T" first="Tao" last="Zeng">Tao Zeng</name>
</author>
<author><name sortKey="Chen, Luonan" sort="Chen, Luonan" uniqKey="Chen L" first="Luonan" last="Chen">Luonan Chen</name>
</author>
</analytic>
<series><title level="j">IEEE/ACM transactions on computational biology and bioinformatics</title>
<idno type="eISSN">1557-9964</idno>
<imprint><date when="2018" type="published">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Counting the occurrence frequency of each -mer in a biological sequence is a preliminary yet important step in many bioinformatics applications. However, most -mer counting algorithms rely on a given k to produce single-length -mers, which is inefficient for sequence analysis for different k. Moreover, existing -mer counters focus more on DNA and RNA sequences and less on protein ones. In practice, the analysis of -mers in protein sequences can provide substantial biological insights in structure, function and evolution. To this end, an efficient algorithm, called MulMer (Multiple-Mer mining), is proposed to mine -mers of various lengths termed multi-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index methods. Moreover, to the best of our knowledge, MulMer is the first able to mine multi-mers in a variety of sequences, including DNARNA and protein sequences.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="Publisher" Owner="NLM"><PMID Version="1">29993642</PMID>
<DateRevised><Year>2019</Year>
<Month>11</Month>
<Day>20</Day>
</DateRevised>
<Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet"><PubDate><Year>2018</Year>
<Month>Apr</Month>
<Day>19</Day>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficient Mining Multi-mers in a Variety of Biological Sequences.</ArticleTitle>
<ELocationID EIdType="doi" ValidYN="Y">10.1109/TCBB.2018.2828313</ELocationID>
<Abstract><AbstractText>Counting the occurrence frequency of each -mer in a biological sequence is a preliminary yet important step in many bioinformatics applications. However, most -mer counting algorithms rely on a given k to produce single-length -mers, which is inefficient for sequence analysis for different k. Moreover, existing -mer counters focus more on DNA and RNA sequences and less on protein ones. In practice, the analysis of -mers in protein sequences can provide substantial biological insights in structure, function and evolution. To this end, an efficient algorithm, called MulMer (Multiple-Mer mining), is proposed to mine -mers of various lengths termed multi-mers via inverted-index technique, which is orders of magnitude faster than the conventional forward-index methods. Moreover, to the best of our knowledge, MulMer is the first able to mine multi-mers in a variety of sequences, including DNARNA and protein sequences.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Zhang</LastName>
<ForeName>Jingsong</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y"><LastName>Guo</LastName>
<ForeName>Jianmei</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y"><LastName>Zhang</LastName>
<ForeName>Ming</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y"><LastName>Yu</LastName>
<ForeName>Xiangtian</ForeName>
<Initials>X</Initials>
</Author>
<Author ValidYN="Y"><LastName>Yu</LastName>
<ForeName>Xiaoqing</ForeName>
<Initials>X</Initials>
</Author>
<Author ValidYN="Y"><LastName>Guo</LastName>
<ForeName>Weifeng</ForeName>
<Initials>W</Initials>
</Author>
<Author ValidYN="Y"><LastName>Zeng</LastName>
<ForeName>Tao</ForeName>
<Initials>T</Initials>
</Author>
<Author ValidYN="Y"><LastName>Chen</LastName>
<ForeName>Luonan</ForeName>
<Initials>L</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2018</Year>
<Month>04</Month>
<Day>19</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2018</Year>
<Month>7</Month>
<Day>12</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>aheadofprint</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">29993642</ArticleId>
<ArticleId IdType="doi">10.1109/TCBB.2018.2828313</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000840 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000840 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= PubMed |étape= Corpus |type= RBID |clé= pubmed:29993642 |texte= Efficient Mining Multi-mers in a Variety of Biological Sequences. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i -Sk "pubmed:29993642" \ | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |