Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

NetNCSP: Nonoverlapping closed sequential pattern mining.

Identifieur interne : 000375 ( PubMed/Checkpoint ); précédent : 000374; suivant : 000376

NetNCSP: Nonoverlapping closed sequential pattern mining.

Auteurs : Youxi Wu [République populaire de Chine] ; Changrui Zhu [République populaire de Chine] ; Yan Li [République populaire de Chine] ; Lei Guo [République populaire de Chine] ; Xindong Wu [République populaire de Chine]

Source :

RBID : pubmed:32292248

Abstract

Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.

DOI: 10.1016/j.knosys.2020.105812
PubMed: 32292248


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:32292248

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">NetNCSP: Nonoverlapping closed sequential pattern mining.</title>
<author>
<name sortKey="Wu, Youxi" sort="Wu, Youxi" uniqKey="Wu Y" first="Youxi" last="Wu">Youxi Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Changrui" sort="Zhu, Changrui" uniqKey="Zhu C" first="Changrui" last="Zhu">Changrui Zhu</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Li, Yan" sort="Li, Yan" uniqKey="Li Y" first="Yan" last="Li">Yan Li</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Economics and Management, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Economics and Management, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Guo, Lei" sort="Guo, Lei" uniqKey="Guo L" first="Lei" last="Guo">Lei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Wu, Xindong" sort="Wu, Xindong" uniqKey="Wu X" first="Xindong" last="Wu">Xindong Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Mininglamp Academy of Sciences, Mininglamp Technology, Beijing 100084, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Mininglamp Academy of Sciences, Mininglamp Technology, Beijing 100084</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:32292248</idno>
<idno type="pmid">32292248</idno>
<idno type="doi">10.1016/j.knosys.2020.105812</idno>
<idno type="wicri:Area/PubMed/Corpus">000137</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000137</idno>
<idno type="wicri:Area/PubMed/Curation">000137</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000137</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000375</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000375</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">NetNCSP: Nonoverlapping closed sequential pattern mining.</title>
<author>
<name sortKey="Wu, Youxi" sort="Wu, Youxi" uniqKey="Wu Y" first="Youxi" last="Wu">Youxi Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Changrui" sort="Zhu, Changrui" uniqKey="Zhu C" first="Changrui" last="Zhu">Changrui Zhu</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Li, Yan" sort="Li, Yan" uniqKey="Li Y" first="Yan" last="Li">Yan Li</name>
<affiliation wicri:level="1">
<nlm:affiliation>School of Economics and Management, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>School of Economics and Management, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Guo, Lei" sort="Guo, Lei" uniqKey="Guo L" first="Lei" last="Guo">Lei Guo</name>
<affiliation wicri:level="1">
<nlm:affiliation>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401</wicri:regionArea>
<placeName>
<settlement type="city">Tianjin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Wu, Xindong" sort="Wu, Xindong" uniqKey="Wu X" first="Xindong" last="Wu">Xindong Wu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Mininglamp Academy of Sciences, Mininglamp Technology, Beijing 100084, China.</nlm:affiliation>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Mininglamp Academy of Sciences, Mininglamp Technology, Beijing 100084</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Knowledge-based systems</title>
<idno type="ISSN">0950-7051</idno>
<imprint>
<date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">32292248</PMID>
<DateRevised>
<Year>2020</Year>
<Month>04</Month>
<Day>18</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Print">0950-7051</ISSN>
<JournalIssue CitedMedium="Print">
<PubDate>
<Year>2020</Year>
<Month>Mar</Month>
<Day>31</Day>
</PubDate>
</JournalIssue>
<Title>Knowledge-based systems</Title>
<ISOAbbreviation>Knowl Based Syst</ISOAbbreviation>
</Journal>
<ArticleTitle>NetNCSP: Nonoverlapping closed sequential pattern mining.</ArticleTitle>
<Pagination>
<MedlinePgn>105812</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1016/j.knosys.2020.105812</ELocationID>
<Abstract>
<AbstractText>Sequential pattern mining (SPM) has been applied in many fields. However, traditional SPM neglects the pattern repetition in sequence. To solve this problem, gap constraint SPM was proposed and can avoid finding too many useless patterns. Nonoverlapping SPM, as a branch of gap constraint SPM, means that any two occurrences cannot use the same sequence letter in the same position as the occurrences. Nonoverlapping SPM can make a balance between efficiency and completeness. The frequent patterns discovered by existing methods normally contain redundant patterns. To reduce redundant patterns and improve the mining performance, this paper adopts the closed pattern mining strategy and proposes a complete algorithm, named Nettree for Nonoverlapping Closed Sequential Pattern (NetNCSP) based on the Nettree structure. NetNCSP is equipped with two key steps, support calculation and closeness determination. A backtracking strategy is employed to calculate the nonoverlapping support of a pattern on the corresponding Nettree, which reduces the time complexity. This paper also proposes three kinds of pruning strategies, inheriting, predicting, and determining. These pruning strategies are able to find the redundant patterns effectively since the strategies can predict the frequency and closeness of the patterns before the generation of the candidate patterns. Experimental results show that NetNCSP is not only more efficient but can also discover more closed patterns with good compressibility. Furtherly, in biological experiments NetNCSP mines the closed patterns in SARS-CoV-2 and SARS viruses. The results show that the two viruses are of similar pattern composition with different combinations.</AbstractText>
<CopyrightInformation>© 2020 Elsevier B.V. All rights reserved.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Wu</LastName>
<ForeName>Youxi</ForeName>
<Initials>Y</Initials>
<AffiliationInfo>
<Affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Hebei Key Laboratory of Big Data Computing, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zhu</LastName>
<ForeName>Changrui</ForeName>
<Initials>C</Initials>
<AffiliationInfo>
<Affiliation>School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Li</LastName>
<ForeName>Yan</ForeName>
<Initials>Y</Initials>
<AffiliationInfo>
<Affiliation>School of Economics and Management, Hebei University of Technology, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Guo</LastName>
<ForeName>Lei</ForeName>
<Initials>L</Initials>
<AffiliationInfo>
<Affiliation>State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, Tianjin 300401, China.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Wu</LastName>
<ForeName>Xindong</ForeName>
<Initials>X</Initials>
<AffiliationInfo>
<Affiliation>Mininglamp Academy of Sciences, Mininglamp Technology, Beijing 100084, China.</Affiliation>
</AffiliationInfo>
<AffiliationInfo>
<Affiliation>Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei 230009, China.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2020</Year>
<Month>03</Month>
<Day>31</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>Netherlands</Country>
<MedlineTA>Knowl Based Syst</MedlineTA>
<NlmUniqueID>101634338</NlmUniqueID>
<ISSNLinking>0950-7051</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">COVID-19</Keyword>
<Keyword MajorTopicYN="N">Closed pattern mining</Keyword>
<Keyword MajorTopicYN="N">Nettree</Keyword>
<Keyword MajorTopicYN="N">Nonoverlapping sequence pattern</Keyword>
<Keyword MajorTopicYN="N">Periodic wildcard gaps</Keyword>
<Keyword MajorTopicYN="N">Sequential pattern mining</Keyword>
</KeywordList>
<CoiStatement>The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.</CoiStatement>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2020</Year>
<Month>01</Month>
<Day>05</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2020</Year>
<Month>03</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2020</Year>
<Month>03</Month>
<Day>22</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2020</Year>
<Month>4</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2020</Year>
<Month>4</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2020</Year>
<Month>4</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>aheadofprint</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">32292248</ArticleId>
<ArticleId IdType="doi">10.1016/j.knosys.2020.105812</ArticleId>
<ArticleId IdType="pii">S0950-7051(20)30194-5</ArticleId>
<ArticleId IdType="pii">105812</ArticleId>
<ArticleId IdType="pmc">PMC7118609</ArticleId>
</ArticleIdList>
<pmc-dir>pmcsd</pmc-dir>
<ReferenceList>
<Reference>
<Citation>Comput Biol Med. 2013 Jun;43(5):481-92</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23566394</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>IEEE Trans Neural Netw Learn Syst. 2019 Sep;30(9):2764-2778</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">30640632</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>IEEE Trans Cybern. 2018 Oct;48(10):2809-2822</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28976327</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>J Biomed Inform. 2017 Feb;66:19-31</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28011233</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
<ReferenceList>
<Reference>
<Citation>Med Biol Eng Comput. 2018 May;56(5):749-759</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">28905236</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
<settlement>
<li>Pékin</li>
<li>Tianjin</li>
</settlement>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Wu, Youxi" sort="Wu, Youxi" uniqKey="Wu Y" first="Youxi" last="Wu">Youxi Wu</name>
</noRegion>
<name sortKey="Guo, Lei" sort="Guo, Lei" uniqKey="Guo L" first="Lei" last="Guo">Lei Guo</name>
<name sortKey="Li, Yan" sort="Li, Yan" uniqKey="Li Y" first="Yan" last="Li">Yan Li</name>
<name sortKey="Wu, Xindong" sort="Wu, Xindong" uniqKey="Wu X" first="Xindong" last="Wu">Xindong Wu</name>
<name sortKey="Zhu, Changrui" sort="Zhu, Changrui" uniqKey="Zhu C" first="Changrui" last="Zhu">Changrui Zhu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000375 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000375 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:32292248
   |texte=   NetNCSP: Nonoverlapping closed sequential pattern mining.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:32292248" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021