MersV1, Pmc, Corpus, bibRecord, 000E54

Building predictive models for MERS-CoV infections using data mining techniques

Identifieur interne : 000E54 ( Pmc/Corpus ); précédent : 000E53; suivant : 000E55

Building predictive models for MERS-CoV infections using data mining techniques

Auteurs : Isra Al-Turaiki ; Mona Alshahrani ; Tahani Almutairi

Source :

Journal of Infection and Public Health [ 1876-0341 ] ; 2016.

RBID : PMC:7102847

Abstract

SummaryBackground

Recently, the outbreak of MERS-CoV infections caused worldwide attention to Saudi Arabia. The novel virus belongs to the coronaviruses family, which is responsible for causing mild to moderate colds. The control and command center of Saudi Ministry of Health issues a daily report on MERS-CoV infection cases. The infection with MERS-CoV can lead to fatal complications, however little information is known about this novel virus. In this paper, we apply two data mining techniques in order to better understand the stability and the possibility of recovery from MERS-CoV infections.

Method

The Naive Bayes classifier and J48 decision tree algorithm were used to build our models. The dataset used consists of 1082 records of cases reported between 2013 and 2015. In order to build our prediction models, we split the dataset into two groups. The first group combined recovery and death records. A new attribute was created to indicate the record type, such that the dataset can be used to predict the recovery from MERS-CoV. The second group contained the new case records to be used to predict the stability of the infection based on the current status attribute.

Results

The resulting recovery models indicate that healthcare workers are more likely to survive. This could be due to the vaccinations that healthcare workers are required to get on regular basis. As for the stability models using J48, two attributes were found to be important for predicting stability: symptomatic and age. Old patients are at high risk of developing MERS-CoV complications. Finally, the performance of all the models was evaluated using three measures: accuracy, precision, and recall. In general, the accuracy of the models is between 53.6% and 71.58%.

Conclusion

We believe that the performance of the prediction models can be enhanced with the use of more patient data. As future work, we plan to directly contact hospitals in Riyadh in order to collect more information related to patients with MERS-CoV infections.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102847

DOI: 10.1016/j.jiph.2016.09.007
PubMed: 27641481
PubMed Central: 7102847

Links to Exploration step

PMC:7102847

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Building predictive models for MERS-CoV infections using data mining techniques</title>
<author><name sortKey="Al Turaiki, Isra" sort="Al Turaiki, Isra" uniqKey="Al Turaiki I" first="Isra" last="Al-Turaiki">Isra Al-Turaiki</name>
</author>
<author><name sortKey="Alshahrani, Mona" sort="Alshahrani, Mona" uniqKey="Alshahrani M" first="Mona" last="Alshahrani">Mona Alshahrani</name>
</author>
<author><name sortKey="Almutairi, Tahani" sort="Almutairi, Tahani" uniqKey="Almutairi T" first="Tahani" last="Almutairi">Tahani Almutairi</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27641481</idno>
<idno type="pmc">7102847</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102847</idno>
<idno type="RBID">PMC:7102847</idno>
<idno type="doi">10.1016/j.jiph.2016.09.007</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000E54</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000E54</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Building predictive models for MERS-CoV infections using data mining techniques</title>
<author><name sortKey="Al Turaiki, Isra" sort="Al Turaiki, Isra" uniqKey="Al Turaiki I" first="Isra" last="Al-Turaiki">Isra Al-Turaiki</name>
</author>
<author><name sortKey="Alshahrani, Mona" sort="Alshahrani, Mona" uniqKey="Alshahrani M" first="Mona" last="Alshahrani">Mona Alshahrani</name>
</author>
<author><name sortKey="Almutairi, Tahani" sort="Almutairi, Tahani" uniqKey="Almutairi T" first="Tahani" last="Almutairi">Tahani Almutairi</name>
</author>
</analytic>
<series><title level="j">Journal of Infection and Public Health</title>
<idno type="ISSN">1876-0341</idno>
<idno type="eISSN">1876-035X</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Summary</title>
<sec><title>Background</title>
<p>Recently, the outbreak of MERS-CoV infections caused worldwide attention to Saudi Arabia. The novel virus belongs to the coronaviruses family, which is responsible for causing mild to moderate colds. The control and command center of Saudi Ministry of Health issues a daily report on MERS-CoV infection cases. The infection with MERS-CoV can lead to fatal complications, however little information is known about this novel virus. In this paper, we apply two data mining techniques in order to better understand the stability and the possibility of recovery from MERS-CoV infections.</p>
</sec>
<sec><title>Method</title>
<p>The Naive Bayes classifier and J48 decision tree algorithm were used to build our models. The dataset used consists of 1082 records of cases reported between 2013 and 2015. In order to build our prediction models, we split the dataset into two groups. The first group combined recovery and death records. A new attribute was created to indicate the record type, such that the dataset can be used to predict the recovery from MERS-CoV. The second group contained the new case records to be used to predict the stability of the infection based on the current status attribute.</p>
</sec>
<sec><title>Results</title>
<p>The resulting recovery models indicate that healthcare workers are more likely to survive. This could be due to the vaccinations that healthcare workers are required to get on regular basis. As for the stability models using J48, two attributes were found to be important for predicting stability: symptomatic and age. Old patients are at high risk of developing MERS-CoV complications. Finally, the performance of all the models was evaluated using three measures: accuracy, precision, and recall. In general, the accuracy of the models is between 53.6% and 71.58%.</p>
</sec>
<sec><title>Conclusion</title>
<p>We believe that the performance of the prediction models can be enhanced with the use of more patient data. As future work, we plan to directly contact hospitals in Riyadh in order to collect more information related to patients with MERS-CoV infections.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Han, J" uniqKey="Han J">J. Han</name>
</author>
<author><name sortKey="Kamber, M" uniqKey="Kamber M">M. Kamber</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Suh, S C" uniqKey="Suh S">S.C. Suh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ferreira, D" uniqKey="Ferreira D">D. Ferreira</name>
</author>
<author><name sortKey="Oliveira, A" uniqKey="Oliveira A">A. Oliveira</name>
</author>
<author><name sortKey="Freitas, A" uniqKey="Freitas A">A. Freitas</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Venkatalakshmi, B" uniqKey="Venkatalakshmi B">B. Venkatalakshmi</name>
</author>
<author><name sortKey="Shivsankar, M" uniqKey="Shivsankar M">M. Shivsankar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Majali, J" uniqKey="Majali J">J. Majali</name>
</author>
<author><name sortKey="Niranjan, R" uniqKey="Niranjan R">R. Niranjan</name>
</author>
<author><name sortKey="Phatak, V" uniqKey="Phatak V">V. Phatak</name>
</author>
<author><name sortKey="Tadakhe, O" uniqKey="Tadakhe O">O. Tadakhe</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bellaachia, A" uniqKey="Bellaachia A">A. Bellaachia</name>
</author>
<author><name sortKey="Guven, E" uniqKey="Guven E">E. Guven</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Afshar, H L" uniqKey="Afshar H">H.L. Afshar</name>
</author>
<author><name sortKey="Ahmadi, M" uniqKey="Ahmadi M">M. Ahmadi</name>
</author>
<author><name sortKey="Roudbari, M" uniqKey="Roudbari M">M. Roudbari</name>
</author>
<author><name sortKey="Sadoughi, F" uniqKey="Sadoughi F">F. Sadoughi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sandhu, R" uniqKey="Sandhu R">R. Sandhu</name>
</author>
<author><name sortKey="Sood, S K" uniqKey="Sood S">S.K. Sood</name>
</author>
<author><name sortKey="Kaur, G" uniqKey="Kaur G">G. Kaur</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Quinlan, R" uniqKey="Quinlan R">R. Quinlan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hall, M" uniqKey="Hall M">M. Hall</name>
</author>
<author><name sortKey="Frank, E" uniqKey="Frank E">E. Frank</name>
</author>
<author><name sortKey="Holmes, G" uniqKey="Holmes G">G. Holmes</name>
</author>
<author><name sortKey="Pfahringer, B" uniqKey="Pfahringer B">B. Pfahringer</name>
</author>
<author><name sortKey="Reutemann, P" uniqKey="Reutemann P">P. Reutemann</name>
</author>
<author><name sortKey="Wit Ten, I H" uniqKey="Wit Ten I">I.H. Wit-ten</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
  <front><journal-meta><journal-id journal-id-type="nlm-ta">J Infect Public Health</journal-id>
<journal-id journal-id-type="iso-abbrev">J Infect Public Health</journal-id>
<journal-title-group><journal-title>Journal of Infection and Public Health</journal-title>
</journal-title-group>
<issn pub-type="ppub">1876-0341</issn>
<issn pub-type="epub">1876-035X</issn>
<publisher><publisher-name>King Saud Bin Abdulaziz University for Health Sciences. Production and hosting by Elsevier Limited.</publisher-name>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">27641481</article-id>
<article-id pub-id-type="pmc">7102847</article-id>
<article-id pub-id-type="publisher-id">S1876-0341(16)30146-0</article-id>
<article-id pub-id-type="doi">10.1016/j.jiph.2016.09.007</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Article</subject>
</subj-group>
</article-categories>
<title-group><article-title>Building predictive models for MERS-CoV infections using data mining techniques</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" id="aut0005"><name><surname>Al-Turaiki</surname>
<given-names>Isra</given-names>
</name>
<email>ialturaiki@ksu.edu.sa</email>
</contrib>
<contrib contrib-type="author" id="aut0010"><name><surname>Alshahrani</surname>
<given-names>Mona</given-names>
</name>
<email>monaalshahrani@outlook.com</email>
<xref rid="cor0005" ref-type="corresp">⁎</xref>
</contrib>
<contrib contrib-type="author" id="aut0015"><name><surname>Almutairi</surname>
<given-names>Tahani</given-names>
</name>
<email>435203979@student.ksu.edu.sa</email>
</contrib>
</contrib-group>
<aff id="aff0005">Information Technology Department, College of Computer and Information Sciences, King Saud University, Saudi Arabia</aff>
<author-notes><corresp id="cor0005"><label>⁎</label>
Corresponding author. <email>monaalshahrani@outlook.com</email>
</corresp>
</author-notes>
<pub-date pub-type="pmc-release"><day>15</day>
<month>9</month>
<year>2016</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on .</pmc-comment>
      <pub-date pub-type="ppub" iso-8601-date="2016-12-01"><season>November-December</season>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub"><day>15</day>
<month>9</month>
<year>2016</year>
</pub-date>
<volume>9</volume>
<issue>6</issue>
<fpage>744</fpage>
<lpage>748</lpage>
<history><date date-type="received"><day>23</day>
<month>6</month>
<year>2016</year>
</date>
<date date-type="rev-recd"><day>20</day>
<month>7</month>
<year>2016</year>
</date>
<date date-type="accepted"><day>6</day>
<month>9</month>
<year>2016</year>
</date>
</history>
<permissions><copyright-statement>© 2016 King Saud Bin Abdulaziz University for Health Sciences. Production and hosting by Elsevier Limited.</copyright-statement>
<copyright-year>2016</copyright-year>
<copyright-holder>King Saud Bin Abdulaziz University for Health Sciences</copyright-holder>
<license><license-p>Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.</license-p>
</license>
</permissions>
<abstract id="abs0005"><title>Summary</title>
<sec><title>Background</title>
<p>Recently, the outbreak of MERS-CoV infections caused worldwide attention to Saudi Arabia. The novel virus belongs to the coronaviruses family, which is responsible for causing mild to moderate colds. The control and command center of Saudi Ministry of Health issues a daily report on MERS-CoV infection cases. The infection with MERS-CoV can lead to fatal complications, however little information is known about this novel virus. In this paper, we apply two data mining techniques in order to better understand the stability and the possibility of recovery from MERS-CoV infections.</p>
</sec>
<sec><title>Method</title>
<p>The Naive Bayes classifier and J48 decision tree algorithm were used to build our models. The dataset used consists of 1082 records of cases reported between 2013 and 2015. In order to build our prediction models, we split the dataset into two groups. The first group combined recovery and death records. A new attribute was created to indicate the record type, such that the dataset can be used to predict the recovery from MERS-CoV. The second group contained the new case records to be used to predict the stability of the infection based on the current status attribute.</p>
</sec>
<sec><title>Results</title>
<p>The resulting recovery models indicate that healthcare workers are more likely to survive. This could be due to the vaccinations that healthcare workers are required to get on regular basis. As for the stability models using J48, two attributes were found to be important for predicting stability: symptomatic and age. Old patients are at high risk of developing MERS-CoV complications. Finally, the performance of all the models was evaluated using three measures: accuracy, precision, and recall. In general, the accuracy of the models is between 53.6% and 71.58%.</p>
</sec>
<sec><title>Conclusion</title>
<p>We believe that the performance of the prediction models can be enhanced with the use of more patient data. As future work, we plan to directly contact hospitals in Riyadh in order to collect more information related to patients with MERS-CoV infections.</p>
</sec>
</abstract>
<kwd-group id="kwd0005"><title>Keywords</title>
<kwd>MERS-CoV</kwd>
<kwd>Data mining</kwd>
<kwd>Decision tree</kwd>
<kwd>J48</kwd>
<kwd>Naive Bayes</kwd>
<kwd>Classification</kwd>
</kwd-group>
</article-meta>
</front>
<body><sec id="sec0005"><title>Introduction</title>
<p id="par0005">In 2012, Saudi Arabia witnessed the outbreak of a virus called Middle East Respiratory Syndrome Coronavirus (MERS-CoV). The novel virus belongs to the coronaviruses family which is responsible for causing mild to moderate colds. MERS-Co is blamed for causing severe acute respiratory illness that lead to death in many cases. According to <xref rid="bib0060" ref-type="bibr">[1]</xref>
, MERS-CoV symptoms include: cough, fever, nose congestion, breath shortness, and sometimes diarrhea. The virus began spreading rapidly in Saudi Arabia in 2013. Since then, the Control and Command Center of Saudi Ministry of Health in Saudi Arabia started recording and reporting the cases. The ministry website provides daily statistics on new confirmed MERS-CoV cases, recoveries, and deaths.</p>
<p id="par0010">Infection with MERS-CoV can lead to fatal complications. Unfortunately, there is little information about how the virus spreads and how patients are affected. Data mining is the exploration of large datasets to extract hidden and previously unknown patterns and relationships <xref rid="bib0065" ref-type="bibr">[2]</xref>
. In healthcare, data mining techniques have been widely applied in different applications including: modeling health outcomes and predicting patient outcomes, evaluation of treatment effectiveness, hospital ranking, and infection control <xref rid="bib0070" ref-type="bibr">[3]</xref>
.</p>
<p id="par0015">In this paper, we build several models to predict the stability of the case and the possibility of recovery from MERS-CoV infection. The goal is to better understand which factors contribute to complications of this infection. The models are built by applying data mining techniques to the data provided by the Control and Command Center of Saudi Ministry of health website <xref rid="bib0060" ref-type="bibr">[1]</xref>
.</p>
<p id="par0020">The rest of the paper is organized as follows. In <italic>Literature review</italic>
 section, we review related work in the applications of data mining in healthcare. <italic>Methodology</italic>
 section describes the dataset, pre-processing steps, data mining techniques, and our experimental results. Finally, <italic>Conclusion</italic>
 section concludes the paper with findings.</p>
</sec>
<sec id="sec0010"><title>Literature review</title>
<p id="par0025">In this section, we highlight some of the related work in data mining applications in healthcare.</p>
<p id="par0030">Data mining has been widely used for the prognosis and diagnoses of many diseases. Ferreira et al. <xref rid="bib0075" ref-type="bibr">[4]</xref>
 used data mining to improve the diagnosis of neonatal jaundice in newborns. In their experiment, the dataset consisted of 70 variable collected for 227 healthy newborns. Many data mining techniques were applied, including: J48, CART, Naive Bayes classifier, multilayer perceptron, SMO, and simple logistic. The best predictive models were obtained by using Naive Bayes, multilayer perceptron, and simple logistic. For heart disease diagnoses, Venkatalakshmi and Shivsankar <xref rid="bib0080" ref-type="bibr">[5]</xref>
 compared the performance of decision tree algorithm and Naive Bayes. The experimental results using a dataset of 294 records with 13 attributes showed that the performance of the two algorithms is comparable. FP-growth, Association rule mining, and decision trees were used for the diagnosis and prognosis of breast cancer <xref rid="bib0085" ref-type="bibr">[6]</xref>
. The classification models were built using a dataset of 699 records and 9 attributes and the best accuracy was achieved using decision trees induction algorithms.</p>
<p id="par0035">In terms of survivability predicting, Bellaachia et al. <xref rid="bib0090" ref-type="bibr">[7]</xref>
 used Naive Bayes, back-propagated neural network, and the C4.5 decision tree algorithm to predict the survivability of breast cancer patients. The dataset used in the study was obtained from the Surveillance Epidemiology and End Results (SEER). Experimental results indicated that the C4.5 algorithm outperformed the other two techniques. Recently, several predictive models for breast cancer survival were developed <xref rid="bib0095" ref-type="bibr">[8]</xref>
. The models were based on a dataset of 657,712 records and 72 variables, also obtained from SEER. Three different data mining techniques were used: Support Vector Machine (SVM), Bayes Networks, and Chi-squared Automatic Interaction Detection (CHAID). Results showed that the best survival prediction model was obtained using SVM. The authors in <xref rid="bib0095" ref-type="bibr">[8]</xref>
 presented a study of predictive models for breast cancer survival. The main goal was to discover important attributes that contribute to breast cancer survival. Three data mining techniques were used: Support Vector Machine (SVM), Bayes Net, and Chi-squared Automatic Interaction Detection (CHAID). Experiments on a dataset obtained from SEER showed that the SVM model outperformed other models in terms of accuracy, sensitivity, and specificity. SVM was able to identify ten attributes that are important indicators of breast cancer survivability. Sandhu et al. <xref rid="bib0100" ref-type="bibr">[9]</xref>
 proposed a cloud-based MERS-CoV prediction system. The system is based on Bayesian Belief Networks (BBN) for initial classification of patients. A geographic positioning system is utilized to represent patients on Google Maps. Patients classified as infected were tracked using GPS from their mobile phones. The proposed system is useful to citizens since it allows them to avoid infected areas. In addition, healthcare authorities can manage the infection problem more effectively. The BBN achieved an accuracy of 83.1% on synthetic data.</p>
</sec>
<sec id="sec0015"><title>Methodology</title>
<sec id="sec0020"><title>Dataset description and pre-processing</title>
<p id="par0040">As mentioned earlier, our dataset was obtained from the website of the Control and Command Center of Saudi Ministry of Health <xref rid="bib0060" ref-type="bibr">[1]</xref>
. We used the data on MERS-CoV infections reported between 2013 and 2015. The data was published in three separate categories: new cases, recoveries, and deaths. For all the categories, the following patient information was provided: gender, age, nationality, city, and whether the patient is a healthcare personnel or not. In addition, there was more specific information for each category. The additional information is as follows:<list list-type="simple" id="lis0005"><list-item id="lsti0005"><label>•</label>
<p id="par0045">New cases: symptomatic, current status, and whether the patient had any contact with suspected or confirmed MERS-CoV infection case.</p>
</list-item>
<list-item id="lsti0010"><label>•</label>
<p id="par0050">Recoveries and deaths: does the patient have pre-existing diseases.</p>
</list-item>
</list>
</p>
<p id="par0055">We collected 633 new case records, 231 recovery records, and 218 death records, for a total of 1082 records. A sample of the original dataset is shown in <xref rid="fig0005" ref-type="fig">Fig. 1</xref>
. The dataset was published in different formats. Some of the records were found as text files. Other records were provided in image format. Thus, our first step was to prepare the data in a unified format appropriate for data mining. Information in image format was manually extracted. Records with missing and inconsistent values (ex. gender is adult) were excluded. The age attribute was converted to discrete values. Finally, all records were prepared in .csv format. In order to build our prediction models, we split the dataset into two groups. The first group consisted of recovery and death records. A new attribute was created to indicate the record type, such that the dataset can be used to predict the recovery from MERS-CoV. The second group contained the new case records to be used to predict the stability of the infection based on the current status attribute.<fig id="fig0005"><label>Figure 1</label>
<caption><p>Sample of the original dataset.</p>
</caption>
<graphic xlink:href="gr1_lrg"></graphic>
</fig>
</p>
</sec>
<sec id="sec0025"><title>Data mining</title>
<p id="par0060">Classification is a widely used technique in healthcare. Here, we build several classification models to predict the stability and recovery of MERS-CoV infection. We apply Naive Bayes and J48 <xref rid="bib0105" ref-type="bibr">[10]</xref>
 decision tree algorithm. Here, we briefly describe these algorithms.<list list-type="simple" id="lis0010"><list-item id="lsti0015"><label>•</label>
<p id="par0065">Naive Bayes classifier: is a probabilistic model based on Bayes theorem. It assumes class conditional independence, where the dependencies between class attributes are ignored. Research has shown that Naive Bayes classifiers have comparable performance to other classification algorithms such as decision trees and neural networks. In addition, they produce highly accurate models and can deal with large datasets <xref rid="bib0065" ref-type="bibr">[2]</xref>
.</p>
</list-item>
<list-item id="lsti0020"><label>•</label>
<p id="par0070">J48 Decision Tree Algorithm: is an implementation by the WEKA project team of the well-known tree induction algorithm C 4.5 <xref rid="bib0105" ref-type="bibr">[10]</xref>
. It follows a greedy iterative approach in building the decision tree. The algorithm partitions the dataset based on the best informative attribute. At each iteration, the attribute with maximum gain ratio is selected as the splitting attribute. Decision tree classification models have many advantages. They are easy to interpret and are known to have comparable accuracy to other classification models.</p>
</list-item>
</list>
</p>
</sec>
<sec id="sec0030"><title>Experimental results</title>
<p id="par0075">The WEKA platform <xref rid="bib0110" ref-type="bibr">[11]</xref>
 was used in our experiment. It is a well-known data mining software that supports a wide range of data mining algorithms with a friendly user graphical user interface. All models were built using 10-fold cross validation.</p>
<p id="par0080">Here, we discuss the obtained prediction models. In the J48 decision tree recovery model, shown in <xref rid="fig0010" ref-type="fig">Fig. 2</xref>
, the attribute healthcare personnel appears as the first splitting attribute. This indicates the importance of this information. The model can be interpreted as follows: if the patient is a healthcare personnel, the model predicts recovery. However, if the patient is not a healthcare personnel then the model examines whether he has any pre-existing disease. If the patient suffers from other diseases, the model predicts death, otherwise recovery is predicted. According to this model, healthcare personnel are more likely to survive MERS-CoV infections. This could be due to the vaccinations that healthcare workers are required to get on regular basis.<fig id="fig0010"><label>Figure 2</label>
<caption><p>J84 decision tree recovery model.</p>
</caption>
<graphic xlink:href="gr2_lrg"></graphic>
</fig>
</p>
<p id="par0085"><xref rid="fig0015" ref-type="fig">Fig. 3</xref>
shows the stability model using J48 decision tree algorithm. This model shows that the two important attributes for predicting stability are symptomatic and age. The model first checks symptoms, if symptoms exist, then the age of the patient is examined. The status of patients between the ages of 66–87 are predicted as critical. From this model, we conclude that old patients are at high risk of developing MERS-CoV complications.<fig id="fig0015"><label>Figure 3</label>
<caption><p>J84 decision tree stability model.</p>
</caption>
<graphic xlink:href="gr3_lrg"></graphic>
</fig>
</p>
</sec>
<sec id="sec0035"><title>Evaluation</title>
<p id="par0090">In order to evaluate the accuracy of the obtained models, three performance measures were used: accuracy, precision, and recall. Accuracy refers to the percentage of correctly classified records. Precision is the percentage of records that the model correctly classified as positives out of all positive predictions. Recall measures the true positives recognition rate. These measures are calculated as follows:<disp-formula id="eq0005"><label>(1)</label>
<mml:math id="M1" altimg="si1.gif" overflow="scroll"><mml:mrow><mml:mtext>Accuracy</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>T</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="eq0010"><label>(2)</label>
<mml:math id="M2" altimg="si2.gif" overflow="scroll"><mml:mrow><mml:mtext>Precision</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
<disp-formula id="eq0015"><label>(3)</label>
<mml:math id="M3" altimg="si3.gif" overflow="scroll"><mml:mrow><mml:mtext>Recall</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac><mml:mrow><mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mrow><mml:mi>T</mml:mi>
<mml:mi>P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>F</mml:mi>
<mml:mi>N</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</disp-formula>
where <italic>P</italic>
 is the number of positive records. <italic>N</italic>
 is the number of negative records. <italic>TP</italic>
 is the number of records that were correctly classified as positive. <italic>TN</italic>
 is the number of records that were correctly classified as negative. <italic>FN</italic>
 is the number of records that were misclassified as negative.</p>
<p id="par0095"><xref rid="tbl0005" ref-type="table">Table 1</xref>
, <xref rid="tbl0010" ref-type="table">Table 2</xref>
summarizes the evaluation measures for the obtained recovery and stability models, respectively. The Naive Bayes recovery model performs better in terms of overall accuracy. In addition, it shows high recognition rate for the recovery class. However, the J48 has better recognition rate for class death. As for stability models, the results indicate that J48 has better overall accuracy. However, the recognition rate of class critical is very low. In general, we observe that the performance measures of the recovery models are higher than the stability models.<table-wrap position="float" id="tbl0005"><label>Table 1</label>
<caption><p>Performance evaluation for MERS-CoV recovery models.</p>
</caption>
<alt-text id="at1">Table 1</alt-text>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Model (year)</th>
<th align="left">Accuracy</th>
<th align="left">Class</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
</tr>
</thead>
<tbody><tr><td rowspan="2" align="left">Naive Bayes</td>
<td align="char">71.58%</td>
<td align="left">Recovery</td>
<td align="char">79.4%</td>
<td align="char">60.4%</td>
</tr>
<tr><td></td>
<td align="left">Death</td>
<td align="char">66.5%</td>
<td align="char">83.4%</td>
</tr>
<tr><td colspan="5" align="left">  </td>
</tr>
<tr><td rowspan="2" align="left">J48</td>
<td align="char">68%</td>
<td align="left">Recovery</td>
<td align="char">86%</td>
<td align="char">45.2%</td>
</tr>
<tr><td></td>
<td align="left">Death</td>
<td align="char">61.3%</td>
<td align="char">92.2%</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="tbl0010"><label>Table 2</label>
<caption><p>Performance evaluation for MERS-CoV stability models.</p>
</caption>
<alt-text id="at2">Table 2</alt-text>
<table frame="hsides" rules="groups"><thead><tr><th align="left">Model (year)</th>
<th align="left">Accuracy</th>
<th align="left">Class</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
</tr>
</thead>
<tbody><tr><td rowspan="2" align="left">Naive Bayes</td>
<td align="char">53.63%</td>
<td align="left">Stable</td>
<td align="char">56.9%</td>
<td align="char">67.5%</td>
</tr>
<tr><td></td>
<td align="left">Critical</td>
<td align="char">41.1%</td>
<td align="char">43.1%</td>
</tr>
<tr><td colspan="5" align="left">  </td>
</tr>
<tr><td rowspan="2" align="left">J48</td>
<td align="char">55.69%</td>
<td align="left">Stable</td>
<td align="char">54.9%</td>
<td align="char">89.5%</td>
</tr>
<tr><td></td>
<td align="left">Critical</td>
<td align="char">43.9%</td>
<td align="char">15.3%</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0100">In this work, we use a real dataset with two classification algorithms known to produce highly accurate models. However, the performance of the all obtained models is not satisfactory for application in real world. The main limitation lies in the size of the training dataset. We believe that there is a need to increase the size of the dataset in order to improve predictions. In addition, more patient information (ex. medical history) can be included.</p>
</sec>
</sec>
<sec id="sec0040"><title>Conclusion</title>
<p id="par0105">In this paper, we built several models to predict the stability and recovery of MERS-CoV infections. Our models were built using Naive Bayes and J48 decision tree classification algorithms. The decision tree recovery model indicated that patients who are healthcare personnel are more likely to survive. The age attribute was found to be important in predicting the stability of the patient. Old patients with ages between 66 and 87 are more likely to suffer from critical complications. The performance of all the models was evaluated and compared. In general, the accuracy of the models is between 53.6% and 71.58%. We believe that the performance of the prediction models can be enhanced with the use of more patient data. As future work, we plan to directly contact hospitals in Riyadh in order to collect more information related to patients with MERS-CoV infections.</p>
</sec>
<sec id="sec0045"><title>Funding</title>
<p id="par0110">No funding sources.</p>
</sec>
<sec id="sec0050"><title>Competing interests</title>
<p id="par0115">None declared.</p>
</sec>
<sec id="sec0055"><title>Ethical approval</title>
<p id="par0120">Not required.</p>
</sec>
</body>
<back><ref-list id="bibl0005"><title>References</title>
<ref id="bib0060"><label>1</label>
<mixed-citation publication-type="other" id="oref0005">Coronavirus Website - Ministry of Health. URL http://www.moh.gov.sa/en/CCC/</mixed-citation>
</ref>
<ref id="bib0065"><label>2</label>
<element-citation publication-type="book" id="sbref0065"><person-group person-group-type="author"><name><surname>Han</surname>
<given-names>J.</given-names>
</name>
<name><surname>Kamber</surname>
<given-names>M.</given-names>
</name>
</person-group>
<chapter-title>Data Mining: Concepts and Techniques, 3rd Edition The Morgan Kaufmann Series in Data Management Systems</chapter-title>
<year>2011</year>
<publisher-name>Morgan Kaufmann</publisher-name>
</element-citation>
</ref>
<ref id="bib0070"><label>3</label>
<element-citation publication-type="book" id="sbref0070"><person-group person-group-type="author"><name><surname>Suh</surname>
<given-names>S.C.</given-names>
</name>
</person-group>
<chapter-title>Practical Applications of Data Mining</chapter-title>
<year>2012</year>
<publisher-name>Jones & Bartlett Publishers</publisher-name>
</element-citation>
</ref>
<ref id="bib0075"><label>4</label>
<element-citation publication-type="journal" id="sbref0075"><person-group person-group-type="author"><name><surname>Ferreira</surname>
<given-names>D.</given-names>
</name>
<name><surname>Oliveira</surname>
<given-names>A.</given-names>
</name>
<name><surname>Freitas</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Applying data mining techniques to improve diagnosis in neonatal jaundice</article-title>
<source>BMC Medical Informatics and Decision Making</source>
<volume>12</volume>
<year>2012</year>
<fpage>143</fpage>
<pub-id pub-id-type="pmid">23216895</pub-id>
</element-citation>
</ref>
<ref id="bib0080"><label>5</label>
<element-citation publication-type="journal" id="sbref0080"><person-group person-group-type="author"><name><surname>Venkatalakshmi</surname>
<given-names>B.</given-names>
</name>
<name><surname>Shivsankar</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Heart disease diagnosis using predictive data mining, International Journal of Innovative Research in Science</article-title>
<source>Engineering and Technology</source>
<volume>3</volume>
<year>2014</year>
<fpage>1873</fpage>
<lpage>1877</lpage>
</element-citation>
</ref>
<ref id="bib0085"><label>6</label>
<element-citation publication-type="journal" id="sbref0085"><person-group person-group-type="author"><name><surname>Majali</surname>
<given-names>J.</given-names>
</name>
<name><surname>Niranjan</surname>
<given-names>R.</given-names>
</name>
<name><surname>Phatak</surname>
<given-names>V.</given-names>
</name>
<name><surname>Tadakhe</surname>
<given-names>O.</given-names>
</name>
</person-group>
<article-title>Data mining techniques for diagnosis and prognosis of cancer</article-title>
<source>IJARCCE</source>
<volume>4</volume>
<issue>3</issue>
<year>2015</year>
<fpage>613</fpage>
<lpage>615</lpage>
</element-citation>
</ref>
<ref id="bib0090"><label>7</label>
<element-citation publication-type="book" id="sbref0090"><person-group person-group-type="author"><name><surname>Bellaachia</surname>
<given-names>A.</given-names>
</name>
<name><surname>Guven</surname>
<given-names>E.</given-names>
</name>
</person-group>
<chapter-title>Predicting breast cancer survivability using data mining techniques, in: Ninth Workshop on Mining Scientific and Engineering Datasets in conjunction with the Sixth SIAM International Conference on Data Mining</chapter-title>
<year>2006</year>
</element-citation>
</ref>
<ref id="bib0095"><label>8</label>
<element-citation publication-type="journal" id="sbref0095"><person-group person-group-type="author"><name><surname>Afshar</surname>
<given-names>H.L.</given-names>
</name>
<name><surname>Ahmadi</surname>
<given-names>M.</given-names>
</name>
<name><surname>Roudbari</surname>
<given-names>M.</given-names>
</name>
<name><surname>Sadoughi</surname>
<given-names>F.</given-names>
</name>
</person-group>
<article-title>Prediction of breast cancer survival through knowledge discovery in databases</article-title>
<source>Global Journal of Health Science</source>
<volume>7</volume>
<issue>4</issue>
<year>2015</year>
<fpage>392</fpage>
<pub-id pub-id-type="pmid">25946945</pub-id>
</element-citation>
</ref>
<ref id="bib0100"><label>9</label>
<element-citation publication-type="journal" id="sbref0100"><person-group person-group-type="author"><name><surname>Sandhu</surname>
<given-names>R.</given-names>
</name>
<name><surname>Sood</surname>
<given-names>S.K.</given-names>
</name>
<name><surname>Kaur</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>An intelligent system for predicting and preventing MERS-CoV infection outbreak</article-title>
<source>The Journal of Supecomputing</source>
<year>2015</year>
<fpage>1</fpage>
<lpage>24</lpage>
</element-citation>
</ref>
<ref id="bib0105"><label>10</label>
<element-citation publication-type="book" id="sbref0105"><person-group person-group-type="author"><name><surname>Quinlan</surname>
<given-names>R.</given-names>
</name>
</person-group>
<chapter-title>C4.5: Programs for Machine Learning</chapter-title>
<year>1993</year>
<publisher-name>Morgan Kaufmann Publishers</publisher-name>
<publisher-loc>San Mateo, CA</publisher-loc>
</element-citation>
</ref>
<ref id="bib0110"><label>11</label>
<element-citation publication-type="journal" id="sbref0110"><person-group person-group-type="author"><name><surname>Hall</surname>
<given-names>M.</given-names>
</name>
<name><surname>Frank</surname>
<given-names>E.</given-names>
</name>
<name><surname>Holmes</surname>
<given-names>G.</given-names>
</name>
<name><surname>Pfahringer</surname>
<given-names>B.</given-names>
</name>
<name><surname>Reutemann</surname>
<given-names>P.</given-names>
</name>
<name><surname>Wit-ten</surname>
<given-names>I.H.</given-names>
</name>
</person-group>
<article-title>The WEKA data mining software: an update</article-title>
<source>SIGKDD Explor. Newsl</source>
<volume>11</volume>
<issue>1</issue>
<year>2009</year>
<fpage>10</fpage>
<lpage>18</lpage>
</element-citation>
</ref>
</ref-list>
<ack id="ack0005"><title>Acknowledgment</title>
<p>The project was scientifically supported by King Saud University, Deanship of Scientific research, research chairs and The Research Chair of Health Informatics and Promotion.</p>
</ack>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E54 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000E54 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:7102847
   |texte=   Building predictive models for MERS-CoV infections using data mining techniques
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:27641481" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

Building predictive models for MERS-CoV infections using data mining techniques

Building predictive models for MERS-CoV infections using data mining techniques

Source :

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki