Serveur d'exploration Covid

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000698 ( Pmc/Corpus ); précédent : 0006979; suivant : 0006990 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments</title>
<author>
<name sortKey="Rizzi, Romeo" sort="Rizzi, Romeo" uniqKey="Rizzi R" first="Romeo" last="Rizzi">Romeo Rizzi</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Dipartimento di Matematica ed Informatica, University of Udine, Udine, Italy</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mahata, Pritha" sort="Mahata, Pritha" uniqKey="Mahata P" first="Pritha" last="Mahata">Pritha Mahata</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>New South Wales Rural Doctors Network, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mathieson, Luke" sort="Mathieson, Luke" uniqKey="Mathieson L" first="Luke" last="Mathieson">Luke Mathieson</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Centre for Bioinformatics, Biomarker Discovery & Information Based Medicine, The University of Newcastle, Callaghan, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Information Based Medicine Program, Hunter Medical Research Institute, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moscato, Pablo" sort="Moscato, Pablo" uniqKey="Moscato P" first="Pablo" last="Moscato">Pablo Moscato</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Centre for Bioinformatics, Biomarker Discovery & Information Based Medicine, The University of Newcastle, Callaghan, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Information Based Medicine Program, Hunter Medical Research Institute, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21151943</idno>
<idno type="pmc">2997101</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2997101</idno>
<idno type="RBID">PMC:2997101</idno>
<idno type="doi">10.1371/journal.pone.0014067</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000698</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000698</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments</title>
<author>
<name sortKey="Rizzi, Romeo" sort="Rizzi, Romeo" uniqKey="Rizzi R" first="Romeo" last="Rizzi">Romeo Rizzi</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>Dipartimento di Matematica ed Informatica, University of Udine, Udine, Italy</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mahata, Pritha" sort="Mahata, Pritha" uniqKey="Mahata P" first="Pritha" last="Mahata">Pritha Mahata</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>New South Wales Rural Doctors Network, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mathieson, Luke" sort="Mathieson, Luke" uniqKey="Mathieson L" first="Luke" last="Mathieson">Luke Mathieson</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Centre for Bioinformatics, Biomarker Discovery & Information Based Medicine, The University of Newcastle, Callaghan, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Information Based Medicine Program, Hunter Medical Research Institute, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Moscato, Pablo" sort="Moscato, Pablo" uniqKey="Moscato P" first="Pablo" last="Moscato">Pablo Moscato</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Centre for Bioinformatics, Biomarker Discovery & Information Based Medicine, The University of Newcastle, Callaghan, Australia</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Information Based Medicine Program, Hunter Medical Research Institute, Newcastle, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the
<italic>arithmetic-harmonic cut</italic>
. We show that the problem of finding such a cut is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e001.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e002.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e003.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
, Graclus and N
<sc>ormalized</sc>
-C
<sc>ut</sc>
. The arithmetic-harmonic cut metric overcoming difficulties other hierarchal methods have in representing both intercluster differences and intracluster similarities.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, J" uniqKey="Shi J">J Shi</name>
</author>
<author>
<name sortKey="Malik, J" uniqKey="Malik J">J Malik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wertheimer, M" uniqKey="Wertheimer M">M Wertheimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Papadimitriou, Ch" uniqKey="Papadimitriou C">CH Papadimitriou</name>
</author>
<author>
<name sortKey="Yannakakis, M" uniqKey="Yannakakis M">M Yannakakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Hoon, Mj" uniqKey="De Hoon M">MJ de Hoon</name>
</author>
<author>
<name sortKey="Imoto, S" uniqKey="Imoto S">S Imoto</name>
</author>
<author>
<name sortKey="Nolan, J" uniqKey="Nolan J">J Nolan</name>
</author>
<author>
<name sortKey="Miyano, S" uniqKey="Miyano S">S Miyano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Z" uniqKey="Wu Z">Z Wu</name>
</author>
<author>
<name sortKey="Leahy, R" uniqKey="Leahy R">R Leahy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
<author>
<name sortKey="Siskind, Jf" uniqKey="Siskind J">JF Siskind</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, Yc" uniqKey="Wei Y">YC Wei</name>
</author>
<author>
<name sortKey="Cheng, Ck" uniqKey="Cheng C">CK Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, Sx" uniqKey="Yu S">SX Yu</name>
</author>
<author>
<name sortKey="Shi, J" uniqKey="Shi J">J Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dhillon, Is" uniqKey="Dhillon I">IS Dhillon</name>
</author>
<author>
<name sortKey="Guan, Y" uniqKey="Guan Y">Y Guan</name>
</author>
<author>
<name sortKey="Kulis, B" uniqKey="Kulis B">B Kulis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahata, P" uniqKey="Mahata P">P Mahata</name>
</author>
<author>
<name sortKey="Costa, W" uniqKey="Costa W">W Costa</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ausiello, G" uniqKey="Ausiello G">G Ausiello</name>
</author>
<author>
<name sortKey="Crescenzi, P" uniqKey="Crescenzi P">P Crescenzi</name>
</author>
<author>
<name sortKey="Gambosi, G" uniqKey="Gambosi G">G Gambosi</name>
</author>
<author>
<name sortKey="Kann, V" uniqKey="Kann V">V Kann</name>
</author>
<author>
<name sortKey="Spaccamela, Am" uniqKey="Spaccamela A">AM Spaccamela</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flum, J" uniqKey="Flum J">J Flum</name>
</author>
<author>
<name sortKey="Grohe, M" uniqKey="Grohe M">M Grohe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karp, Rm" uniqKey="Karp R">RM Karp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mahajan, M" uniqKey="Mahajan M">M Mahajan</name>
</author>
<author>
<name sortKey="Raman, V" uniqKey="Raman V">V Raman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ross, Dt" uniqKey="Ross D">DT Ross</name>
</author>
<author>
<name sortKey="Scherf, U" uniqKey="Scherf U">U Scherf</name>
</author>
<author>
<name sortKey="Eisen, Mb" uniqKey="Eisen M">MB Eisen</name>
</author>
<author>
<name sortKey="Perou, Cm" uniqKey="Perou C">CM Perou</name>
</author>
<author>
<name sortKey="Rees, C" uniqKey="Rees C">C Rees</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yap, Yl" uniqKey="Yap Y">YL Yap</name>
</author>
<author>
<name sortKey="Zhang, Xw" uniqKey="Zhang X">XW Zhang</name>
</author>
<author>
<name sortKey="Danchin, A" uniqKey="Danchin A">A Danchin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Su, Ai" uniqKey="Su A">AI Su</name>
</author>
<author>
<name sortKey="Cooke, Mp" uniqKey="Cooke M">MP Cooke</name>
</author>
<author>
<name sortKey="Ching, Ka" uniqKey="Ching K">KA ching</name>
</author>
<author>
<name sortKey="Hakak, Y" uniqKey="Hakak Y">Y Hakak</name>
</author>
<author>
<name sortKey="Walker, Jr" uniqKey="Walker J">JR Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laan, Mj" uniqKey="Laan M">MJ Laan</name>
</author>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Golub, Tr" uniqKey="Golub T">TR Golub</name>
</author>
<author>
<name sortKey="Slonim, Dk" uniqKey="Slonim D">DK Slonim</name>
</author>
<author>
<name sortKey="Tamayo, P" uniqKey="Tamayo P">P Tamayo</name>
</author>
<author>
<name sortKey="Huard, C" uniqKey="Huard C">C Huard</name>
</author>
<author>
<name sortKey="Gaasenbeek, M" uniqKey="Gaasenbeek M">M Gaasenbeek</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitley, D" uniqKey="Whitley D">D Whitley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
<author>
<name sortKey="Mendes, A" uniqKey="Mendes A">A Mendes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Merz, P" uniqKey="Merz P">P Merz</name>
</author>
<author>
<name sortKey="Freisleben, B" uniqKey="Freisleben B">B Freisleben</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Festa, P" uniqKey="Festa P">P Festa</name>
</author>
<author>
<name sortKey="Pardalos, P" uniqKey="Pardalos P">P Pardalos</name>
</author>
<author>
<name sortKey="Resende, Mgc" uniqKey="Resende M">MGC Resende</name>
</author>
<author>
<name sortKey="Ribeiro, Cc" uniqKey="Ribeiro C">CC Ribeiro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buriol, L" uniqKey="Buriol L">L Buriol</name>
</author>
<author>
<name sortKey="Franca, Pm" uniqKey="Franca P">PM Franca</name>
</author>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Berretta, R" uniqKey="Berretta R">R Berretta</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mendes, A" uniqKey="Mendes A">A Mendes</name>
</author>
<author>
<name sortKey="Cotta, C" uniqKey="Cotta C">C Cotta</name>
</author>
<author>
<name sortKey="Garcia, V" uniqKey="Garcia V">V Garcia</name>
</author>
<author>
<name sortKey="Franca, P" uniqKey="Franca P">P Franca</name>
</author>
<author>
<name sortKey="Moscato, P" uniqKey="Moscato P">P Moscato</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hansen, P" uniqKey="Hansen P">P Hansen</name>
</author>
<author>
<name sortKey="Mladenovi, N" uniqKey="Mladenovi N">N Mladenović</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">21151943</article-id>
<article-id pub-id-type="pmc">2997101</article-id>
<article-id pub-id-type="publisher-id">10-PONE-RA-21449R1</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0014067</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline">
<subject>Computer Science</subject>
<subject>Oncology</subject>
<subject>Virology</subject>
<subject>Computer Science/Applications</subject>
<subject>Computer Science/Numerical Analysis and Theoretical Computing</subject>
<subject>Mathematics/Algorithms</subject>
<subject>Molecular Biology/Bioinformatics</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments</article-title>
<alt-title alt-title-type="running-head">Arithmetic-Harmonic Cut</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Rizzi</surname>
<given-names>Romeo</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mahata</surname>
<given-names>Pritha</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mathieson</surname>
<given-names>Luke</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Moscato</surname>
<given-names>Pablo</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>Dipartimento di Matematica ed Informatica, University of Udine, Udine, Italy</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>New South Wales Rural Doctors Network, Newcastle, Australia</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Centre for Bioinformatics, Biomarker Discovery & Information Based Medicine, The University of Newcastle, Callaghan, Australia</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Information Based Medicine Program, Hunter Medical Research Institute, Newcastle, Australia</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Brusic</surname>
<given-names>Vladimir</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">Dana-Farber Cancer Institute, United States of America</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>luke.mathieson@newcastle.edu.au</email>
</corresp>
<fn fn-type="con">
<p>Conceived and designed the experiments: RR PMahata LM PMoscato. Performed the experiments: RR PMahata LM. Analyzed the data: PMahata PMoscato. Contributed reagents/materials/analysis tools: RR PMahata LM. Wrote the paper: PMahata LM.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2010</year>
</pub-date>
<pub-date pub-type="epub">
<day>2</day>
<month>12</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="ecorrected">
<day>03 </day>
<month>12</month>
<year>2010</year>
</pub-date>
<volume>5</volume>
<issue>12</issue>
<elocation-id>e14067</elocation-id>
<history>
<date date-type="received">
<day>21</day>
<month>7</month>
<year>2010</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>10</month>
<year>2010</year>
</date>
</history>
<permissions>
<copyright-statement>Rizzi et al.</copyright-statement>
<copyright-year>2010</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.</license-p>
</license>
</permissions>
<abstract>
<p>Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the
<italic>arithmetic-harmonic cut</italic>
. We show that the problem of finding such a cut is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e001.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e002.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e003.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
, Graclus and N
<sc>ormalized</sc>
-C
<sc>ut</sc>
. The arithmetic-harmonic cut metric overcoming difficulties other hierarchal methods have in representing both intercluster differences and intracluster similarities.</p>
</abstract>
<counts>
<page-count count="8"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>The problem of finding structure in a set of unlabeled data (the so-called
<italic>clustering problem</italic>
) appears in various domains of research including bioinformatics, machine learning, image processing and video processing. In the area of bioinformatics, clustering has become increasingly important, as finding genetic subtypes of heterogeneous diseases like breast cancer, ovarian cancer and multiple sclerosis, may be made easier by using suitable clustering methods. This work aims to facilitate this line of research by finding good clusterings of various datasets with known partitions.</p>
<p>The importance of the clustering problem in various areas has given rise to several greedy algorithms such as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e004.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
, optimization-based methods such as N
<sc>ormalized</sc>
-C
<sc>ut</sc>
and neural-net based methods amongst others.</p>
<p>In this work, we will use a top-down approach for hierarchical clustering, recursively dividing the elements in the data. In each division step, often a graph partitioning technique is used (a similar approach is used for N
<sc>ormalized</sc>
-C
<sc>ut</sc>
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
). However, many graph (bi)partitioning problems can be formulated as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e005.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard optimization problems, for which there are no polynomial-time algorithms to find the optimal solution unless
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e006.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. This is an indication of the difficulty of the clustering problem and the focus of research since the work of Wertheimer
<xref rid="pone.0014067-Wertheimer1" ref-type="bibr">[2]</xref>
. In this work, we propose a new objective function for graph bipartitioning. The motivation for finding a new objective function for graph bipartitioning is that the known bipartitioning methods produce incorrect results for some datasets. For example, two formulations for clustering by graph partitioning are M
<sc>ax</sc>
-C
<sc>ut</sc>
<xref rid="pone.0014067-Papadimitriou1" ref-type="bibr">[3]</xref>
and N
<sc>ormalized</sc>
-C
<sc>ut</sc>
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
. M
<sc>ax</sc>
-C
<sc>ut</sc>
is already known to provide incorrect results for some datasets
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
. We show that N
<sc>ormalized</sc>
-C
<sc>ut</sc>
also produces some incorrect results for some of the datasets examined.</p>
<p>To achieve a better clustering than agglomerative hierarchical clustering and existing graph partitioning formulations, our proposed objective function seeks to minimize the intra-cluster distances and at the same time it seeks to maximize the inter-cluster distances. Our objective function performs well in clustering diverse types of datasets.</p>
<p>More precisely, we pose the hierarchical clustering problem as a finite number of instances of a graph partitioning problem, called A
<sc>rithmetic</sc>
-H
<sc>armonic</sc>
C
<sc>ut</sc>
(AH-C
<sc>ut</sc>
). In the AH-C
<sc>ut</sc>
problem, we start with a distance matrix for a set of objects and compute a weighted graph in which vertices represent objects and edges are weighted by the distance between the corresponding vertices. Our objective function tries to obtain a partition where the weight of the partition is directly proportional to the sum of the weights on the edges between the two partite sets and the sum of the reciprocals of the weights on the edges inside the partite sets. When considered as an optimisation problem, the goal is to maximise the weight of the partition. The recursive application of AH-C
<sc>ut</sc>
can then be used to generate a tree-based classification of the data.</p>
<p>As noted many graph bipartioniting problems are
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e007.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard at least, so a theoretical examination of any proposed clustering problem is necessary to determine whether it constitutes a practical approach to clustering. We give such a classification of AH-C
<sc>ut</sc>
and show that although it is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e008.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard and hard to approximate, it is
<italic>fixed-parameter tractable</italic>
, and therefore still a practical method for clustering.</p>
<sec id="s1a">
<title>Related Objective Functions for Hierarchical Clustering</title>
<sec id="s1a1">
<title>The
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e009.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-Means Algorithm</title>
<p>The
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e010.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
algorithm is one of a group of algorithms called
<italic>partitioning methods</italic>
; Given
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e011.jpg" mimetype="image"></inline-graphic>
</inline-formula>
objects in a
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e012.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-dimensional metric space, we wish to find a partition of the objects into
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e013.jpg" mimetype="image"></inline-graphic>
</inline-formula>
groups, or clusters, such that the objects in a cluster are more similar to each other than to objects in different clusters. The value of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e014.jpg" mimetype="image"></inline-graphic>
</inline-formula>
may or may not be specified and a clustering criterion, typically the squared-error criterion, must be adopted. The
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e015.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
algorithm initializes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e016.jpg" mimetype="image"></inline-graphic>
</inline-formula>
clusters by arbitrarily selecting one object to represent each cluster. Each of the remaining objects are assigned to a cluster and the clustering criterion is used to calculate the cluster mean. These means are used as the new cluster points and each object is reassigned to the cluster that it is most similar to. This continues until there is no longer a change when the clusters are recalculated. However, it is well-known that depending on the initial centres of the clusters, clustering results can change significantly. We use Gene Cluster
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e017.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<xref rid="pone.0014067-deHoon1" ref-type="bibr">[4]</xref>
for comparing our method with
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e018.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
.</p>
</sec>
<sec id="s1a2">
<title>Max Cut, Ratio Cut and Average Cut</title>
<p>Graph bipartitioning algorithms are also used for clustering
<xref rid="pone.0014067-Wu1" ref-type="bibr">[5]</xref>
. Given a graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e019.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and perhaps a weighting function
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e020.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, a graph bipartitioning problem asks for a partition
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e021.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that some function on the (weights of the) edges between
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e022.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e023.jpg" mimetype="image"></inline-graphic>
</inline-formula>
satisfies the given bound, or in the case of an optimisation problem, is optimised. One of the most common formulations is essentially an
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e024.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard combinatorial problem, called W
<sc>eighted</sc>
M
<sc>ax</sc>
-C
<sc>ut</sc>
, which is a simple weighted extension of the M
<sc>ax</sc>
-C
<sc>ut</sc>
problem. If we denote the edges between
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e025.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e026.jpg" mimetype="image"></inline-graphic>
</inline-formula>
as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e027.jpg" mimetype="image"></inline-graphic>
</inline-formula>
then the function
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e028.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to be optimised in the case of W
<sc>eighted</sc>
M
<sc>ax</sc>
-C
<sc>ut</sc>
is:
<disp-formula>
<graphic xlink:href="pone.0014067.e029.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
Although good algorithms exist for W
<sc>eighted</sc>
M
<sc>ax</sc>
-C
<sc>ut</sc>
, Shi and Malik
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
and Wu and Leahy
<xref rid="pone.0014067-Wu1" ref-type="bibr">[5]</xref>
show that (Weighted) M
<sc>ax</sc>
-C
<sc>ut</sc>
's objective function favours cutting small sets of isolated nodes in the graph. Furthermore, during bipartitioning, sometimes it may also cut small groups and put two parts of the same small group into different partite sets.</p>
<p>R
<sc>atio</sc>
-C
<sc>ut</sc>
uses the objective function:
<disp-formula>
<graphic xlink:href="pone.0014067.e030.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
In this case
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e031.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is taken as a similarity metric. R
<sc>atio</sc>
-C
<sc>ut</sc>
(and its
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e032.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-way extension) has also been employed for image segmentation
<xref rid="pone.0014067-Wang1" ref-type="bibr">[6]</xref>
and circuit partitioning for hierarchical VLSI design
<xref rid="pone.0014067-Wei1" ref-type="bibr">[7]</xref>
.</p>
<p>A
<sc>verage</sc>
-C
<sc>ut</sc>
employs the following objective funtion:
<disp-formula>
<graphic xlink:href="pone.0014067.e033.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e034.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is a similarity metric, the the problem becomes a minimisation problem,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e035.jpg" mimetype="image"></inline-graphic>
</inline-formula>
expresses distance the goal is maximisation. It turns out that even using the average cut, one cannot simultaneously minimise the inter-cluster similarity while maximizing the similarity within the groups.</p>
</sec>
<sec id="s1a3">
<title>Normalized-Cut</title>
<p>In the context of image segmentation, Shi and Malik
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
introduce N
<sc>ormalized</sc>
-C
<sc>ut</sc>
. They use a similarity metric for
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e036.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, and thus N
<sc>ormalized</sc>
-C
<sc>ut</sc>
is typically expressed as a minimisation problem with the following objective function:
<disp-formula>
<graphic xlink:href="pone.0014067.e037.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
<p>It is well-known that by negating weights the M
<sc>ax</sc>
-C
<sc>ut</sc>
problem is equivalent to the corresponding M
<sc>in</sc>
-C
<sc>ut</sc>
problem where one is supposed to minimise the sum of the weights (given by some similarity measure) between the two partitions (
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e038.jpg" mimetype="image"></inline-graphic>
</inline-formula>
) of a set of vertices
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e039.jpg" mimetype="image"></inline-graphic>
</inline-formula>
in a graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e040.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. It is straightforward to see that the same argument holds in case of N
<sc>ormalized</sc>
-C
<sc>ut</sc>
as well, which allows the negation of a distance matrix to be used a similarity matrix, facilitating comparisons for datasets for which only distance matrices are available. However, Shi and Malik
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
start with a Euclidian distance matrix
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e041.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and then compute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e042.jpg" mimetype="image"></inline-graphic>
</inline-formula>
as their similarity matrix. We use both approaches and demonstrate that the performance of this algorithm varies depending on the dataset and the two similarity measures.</p>
<p>Furthermore, Shi an Malik's
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
implementation relaxes the N
<sc>ormalized</sc>
-C
<sc>ut</sc>
problem into a generalised eigen-value problem by allowing the vertices
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e043.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to take real-values, instead of taking values from just the set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e044.jpg" mimetype="image"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e045.jpg" mimetype="image"></inline-graphic>
</inline-formula>
denotes that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e046.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e047.jpg" mimetype="image"></inline-graphic>
</inline-formula>
denotes that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e048.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Then, for bipartitioning, the second smallest eigenvector of the generalized eigen system is the real-valued solution to N
<sc>ormalized</sc>
-C
<sc>ut</sc>
. Finally, they search for the splitting point as follows: first choose
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e049.jpg" mimetype="image"></inline-graphic>
</inline-formula>
equidistance splitting points, compute N
<sc>ormalized</sc>
-C
<sc>ut</sc>
's objective value for each of these splits, then choose the one for which N
<sc>ormalized</sc>
-C
<sc>ut</sc>
's objective value is the smallest. In fact, the implementation also allows
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e050.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-way N
<sc>ormalized</sc>
-C
<sc>ut</sc>
, Yu and Shi
<xref rid="pone.0014067-Yu1" ref-type="bibr">[8]</xref>
examine this further. It considers the first
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e051.jpg" mimetype="image"></inline-graphic>
</inline-formula>
eigen vectors and yields
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e052.jpg" mimetype="image"></inline-graphic>
</inline-formula>
partite sets from a discretisation step following it.</p>
<p>Notice that M
<sc>ax</sc>
-C
<sc>ut</sc>
and R
<sc>atio</sc>
-C
<sc>ut</sc>
do not cluster by intra-cluster similarity and this results in a poor clustering results for image segmentation in comparison to N
<sc>ormalized</sc>
-C
<sc>ut</sc>
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
. Therefore, among these three algorithms, we consider only N
<sc>ormalized</sc>
-C
<sc>ut</sc>
for comparison with our algorithm.</p>
</sec>
<sec id="s1a4">
<title>Graclus</title>
<p>Graclus
<xref rid="pone.0014067-Dhillon1" ref-type="bibr">[9]</xref>
implements a multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, etc. This algorithm for multilevel clustering consists of three steps: (a) iteratively merging nodes of the graph (using various criteria of merging) and creating supergraphs with fewer nodes; (b) applying some base-level clustering on the resulting supergraph; and (c) restoring the clusters of original graph iteratively from the clustering of the final supergraph. This algorithm does not use eigenvector computation, and is thus notably faster than existing implementations of normalised and ratio-cuts. However, in most of the examples shown in this paper, Shi and Malik's
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
implementation of N
<sc>ormalized</sc>
-C
<sc>ut</sc>
results in a better clustering than Graclus.</p>
</sec>
</sec>
<sec id="s1b">
<title>Outline of the Paper</title>
<p>In this paper, after introducing the problem, we first examine the approximability of AH-C
<sc>ut</sc>
. In fact, we prove that AH-C
<sc>ut</sc>
is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e053.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard (and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e054.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-complete) via a reduction from the M
<sc>ax</sc>
-C
<sc>ut</sc>
problem, which is already known to be
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e055.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard
<xref rid="pone.0014067-Papadimitriou1" ref-type="bibr">[3]</xref>
. Therefore
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e056.jpg" mimetype="image"></inline-graphic>
</inline-formula>
has no polynomial time approximation scheme unless
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e057.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. We then demonstrate that AH-C
<sc>ut</sc>
is fixed-parameter tractable via a greedy localisation algorithm. Such a complexity analysis provides an indication of what practical methods are suitable for application to the problem. In this case the complexity results indicate that there is unlikely to be a polynomial time algorithm (or even approximation), but that the exponential component of the running time is at worst only a function of a small independent parameter and therefore the problem is likely to still have effective algorithms.</p>
<p>Given the complexity result we use a meta-heuristic approach (namely, a
<italic>memetic algorithm</italic>
) for AH-C
<sc>ut</sc>
(an outline of which was presented previously
<xref rid="pone.0014067-Mahata1" ref-type="bibr">[10]</xref>
). We compare the performance of our algorithm on four diverse types of datasets and compare the results with two recent and highly regarded clustering algorithms: N
<sc>ormalized</sc>
-C
<sc>ut</sc>
; and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e058.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
. The results indicate that AH-C
<sc>ut</sc>
gives a robust and broadly useful hierarchical clustering method.</p>
</sec>
<sec id="s1c">
<title>Preliminaries</title>
<sec id="s1c1">
<title>Graph Notation and Problem Definition</title>
<p>We consider only simple, undirected graphs, which may or may not be associated with a weight function on the edges. Given a graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e059.jpg" mimetype="image"></inline-graphic>
</inline-formula>
unless otherwise specified we denote the vertex set of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e060.jpg" mimetype="image"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e061.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and the edge set of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e062.jpg" mimetype="image"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e063.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. We denote an edge between vertices
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e064.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e065.jpg" mimetype="image"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e066.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>Given a graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e067.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and two vertex sets
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e068.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e069.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we denote the set of edges with one endpoint in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e070.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and the other in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e071.jpg" mimetype="image"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e072.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. When the graph is clear from context we write
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e073.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<disp-quote>
<p>A
<sc>rithmetic</sc>
-H
<sc>armonic</sc>
C
<sc>ut</sc>
(AH-C
<sc>ut</sc>
)</p>
<p>
<italic>Instance:</italic>
A graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e074.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, two positive integers
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e075.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e076.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and a weight function
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e077.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>
<italic>Question:</italic>
Is there a partition of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e078.jpg" mimetype="image"></inline-graphic>
</inline-formula>
into two sets
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e079.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e080.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<disp-formula>
<graphic xlink:href="pone.0014067.e081.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
</disp-quote>
<p>Given a graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e082.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and two disjoint vertex sets
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e083.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e084.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, for convenience we denote
<disp-formula>
<graphic xlink:href="pone.0014067.e085.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
by
<disp-formula>
<graphic xlink:href="pone.0014067.e086.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
The optimisation verion of AH-C
<sc>ut</sc>
is identical except that we maximise the function
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e087.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</sec>
<sec id="s1c2">
<title>Approximation and Complexity</title>
<p>If a maximisation problem
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e088.jpg" mimetype="image"></inline-graphic>
</inline-formula>
with objective function
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e089.jpg" mimetype="image"></inline-graphic>
</inline-formula>
has an polynomial time algorithm which given an instance
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e090.jpg" mimetype="image"></inline-graphic>
</inline-formula>
with optimal solution
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e091.jpg" mimetype="image"></inline-graphic>
</inline-formula>
guarantees a solution
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e092.jpg" mimetype="image"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e093.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for some
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e094.jpg" mimetype="image"></inline-graphic>
</inline-formula>
then we say
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e095.jpg" mimetype="image"></inline-graphic>
</inline-formula>
has a
<italic>constant factor approximation algorithm</italic>
. If there is an algorithm that guarantees such a bound for every
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e096.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e097.jpg" mimetype="image"></inline-graphic>
</inline-formula>
has a
<italic>polynomial time approximation scheme</italic>
(
<italic>ptas</italic>
).
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e098.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is the class of all problems which have constant factor approximation algorithms. If a problem
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e099.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e100.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard, then
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e101.jpg" mimetype="image"></inline-graphic>
</inline-formula>
has no ptas unless
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e102.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>We refer to Ausiello
<italic>et al.</italic>
<xref rid="pone.0014067-Ausiello1" ref-type="bibr">[11]</xref>
for further reading.</p>
</sec>
<sec id="s1c3">
<title>Parameterized Complexity</title>
<p>A parameterized problem is a (decision) problem equipped with an additional input called the
<italic>parameter</italic>
. Typically the parameter will numeric and should be independent of the size of the instance and relatively small. A problem
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e103.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is
<italic>fixed-parameter tractable</italic>
if there is an algorithm that solves the problem in time bounded by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e104.jpg" mimetype="image"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e105.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is the parameter,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e106.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is the size of the input,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e107.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is a computable function and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e108.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is a polynomial.</p>
<p>As we do not require the parameterized notion of hardness, we refer the reader to Flum and Grohe
<xref rid="pone.0014067-Flum1" ref-type="bibr">[12]</xref>
for complete coverage.</p>
</sec>
</sec>
</sec>
<sec id="s2">
<title>Results and Discussion</title>
<sec id="s2a">
<title>The Complexity of AH-Cut</title>
<p>We first turn to theoretical results for AH-C
<sc>ut</sc>
. We show that the optimisation version of the problem is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e109.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard, and consquently that the decision version is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e110.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-complete, indicating that AH-C
<sc>ut</sc>
is not has no polynomial time algorithm, but has no polynomial time approximation scheme, under standard complexity theoretic assumptions. Under the parameterized complexity framework however we show that AH-C
<sc>ut</sc>
is fixed parameter tractable with a
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e111.jpg" mimetype="image"></inline-graphic>
</inline-formula>
time algorithm.</p>
</sec>
<sec id="s2b">
<title>NP-Completeness and APX-Hardness</title>
<p>We demonstrate the
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e112.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-completeness for AH-C
<sc>ut</sc>
via an
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e113.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hardness reduction from M
<sc>ax</sc>
-C
<sc>ut</sc>
which is known to be
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e114.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard
<xref rid="pone.0014067-Papadimitriou1" ref-type="bibr">[3]</xref>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e115.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-complete
<xref rid="pone.0014067-Karp1" ref-type="bibr">[13]</xref>
.</p>
<disp-quote>
<p>M
<sc>ax</sc>
-C
<sc>ut</sc>
</p>
<p>
<italic>Instance:</italic>
A graph
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e116.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, a positive integer
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e117.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>
<italic>Question:</italic>
Is there a set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e118.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e119.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e120.jpg" mimetype="image"></inline-graphic>
</inline-formula>
?</p>
</disp-quote>
<p>The goal of the optimisation version of M
<sc>ax</sc>
-C
<sc>ut</sc>
is to maximise
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e121.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>Let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e122.jpg" mimetype="image"></inline-graphic>
</inline-formula>
be an instance of M
<sc>ax</sc>
-C
<sc>ut</sc>
with
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e123.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e124.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(we may assume that there is at least one cycle, as the maximum cut of any forest is trivially
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e125.jpg" mimetype="image"></inline-graphic>
</inline-formula>
), we construct an instance
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e126.jpg" mimetype="image"></inline-graphic>
</inline-formula>
of AH-C
<sc>ut</sc>
where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e127.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e128.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(i.e.,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e129.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is a complete graph). The elements of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e130.jpg" mimetype="image"></inline-graphic>
</inline-formula>
are weighted as follows: if
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e131.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then we set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e132.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, if
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e133.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e134.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and for all other edges
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e135.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e136.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. We set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e137.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Clearly we can obtain
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e138.jpg" mimetype="image"></inline-graphic>
</inline-formula>
in polynomial time.</p>
<p>Before moving to the hardness proof, we first prove some auxilliary lemmas.</p>
<sec id="s2b1">
<title>Lemma 1</title>
<p>
<italic>Let </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e139.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e140.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> where </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e141.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>, then </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e142.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> where </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e143.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>.</italic>
</p>
<p>
<italic>Proof.</italic>
Consider the objective function
<disp-formula>
<graphic xlink:href="pone.0014067.e144.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
and let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e145.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e146.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>Each of the edges between
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e147.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e148.jpg" mimetype="image"></inline-graphic>
</inline-formula>
that are also in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e149.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e150.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e151.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, and all other edges that are also in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e152.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e153.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e154.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e155.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e156.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e157.jpg" mimetype="image"></inline-graphic>
</inline-formula>
are in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e158.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, the edges
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e159.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e160.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e161.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e162.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e163.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. The edges between
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e164.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e165.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e166.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e167.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Therefore
<disp-formula>
<graphic xlink:href="pone.0014067.e168.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e169.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<disp-formula>
<graphic xlink:href="pone.0014067.e170.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
</sec>
<sec id="s2b2">
<title>Lemma 2</title>
<p>
<italic>Assume </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e171.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>. Let </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e172.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> be a subset of </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e173.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e174.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>. If </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e175.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> then </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e176.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>
<italic>Proof.</italic>
Assume
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e177.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, without loss of generality (by switching
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e178.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e179.jpg" mimetype="image"></inline-graphic>
</inline-formula>
) we may assume that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e180.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e181.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e182.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e183.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e184.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e185.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e186.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we know
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e187.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, which contributes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e188.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e189.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. The edges in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e190.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e191.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e192.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. As two vertices from
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e193.jpg" mimetype="image"></inline-graphic>
</inline-formula>
are in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e194.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, the edges between those two vertices and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e195.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e196.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e197.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. The third vertex in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e198.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contributes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e199.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e200.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>One of the edges of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e201.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is not in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e202.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and thus contributes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e203.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e204.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. There are
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e205.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e206.jpg" mimetype="image"></inline-graphic>
</inline-formula>
that are not in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e207.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(and thus not in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e208.jpg" mimetype="image"></inline-graphic>
</inline-formula>
) and so contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e209.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e210.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. The edges between the two
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e211.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertices and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e212.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e213.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e214.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Finally the edges between the
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e215.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertex and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e216.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contribute
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e217.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e218.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Thus in total we have
<disp-formula>
<graphic xlink:href="pone.0014067.e219.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
and
<disp-formula>
<graphic xlink:href="pone.0014067.e220.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e221.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e222.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we have
<disp-formula>
<graphic xlink:href="pone.0014067.e223.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
As we assume that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e224.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<disp-formula>
<graphic xlink:href="pone.0014067.e225.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
</sec>
<sec id="s2b3">
<title>Lemma 3</title>
<p>
<italic>Assume </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e226.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>. Let </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e227.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> be a subset of </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e228.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e229.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> such that </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e230.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>. In polynomial time we can obtain an </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e231.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> such that </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e232.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>.</italic>
</p>
<p>
<italic>Proof.</italic>
If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e233.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we may apply the greedy algorithm of Mahajan and Raman
<xref rid="pone.0014067-Mahajan1" ref-type="bibr">[14]</xref>
which returns a set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e234.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e235.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Therefore we may take
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e236.jpg" mimetype="image"></inline-graphic>
</inline-formula>
as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e237.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and we have
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e238.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e239.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, we have that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e240.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Then by Lemma 2,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e241.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Without loss of generality we may assume that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e242.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(by switching
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e243.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e244.jpg" mimetype="image"></inline-graphic>
</inline-formula>
as necessary). Denote
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e245.jpg" mimetype="image"></inline-graphic>
</inline-formula>
by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e246.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e247.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we have
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e248.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. We also have that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e249.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. We may then observe that
<disp-formula>
<graphic xlink:href="pone.0014067.e250.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
We know that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e251.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, and that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e252.jpg" mimetype="image"></inline-graphic>
</inline-formula>
contributes 3 edges to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e253.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e254.jpg" mimetype="image"></inline-graphic>
</inline-formula>
there are at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e255.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e256.jpg" mimetype="image"></inline-graphic>
</inline-formula>
that are in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e257.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and there are at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e258.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges otherwise unnaccounted for in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e259.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Therefore:
<disp-formula>
<graphic xlink:href="pone.0014067.e260.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
As
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e261.jpg" mimetype="image"></inline-graphic>
</inline-formula>
:
<disp-formula>
<graphic xlink:href="pone.0014067.e262.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
<p>We are now prepared to prove the main theorem of this section.</p>
</sec>
<sec id="s2b4">
<title>Theorem 4</title>
<p>AH-C
<sc>ut</sc>
<italic>is </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e263.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>-hard and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e264.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>-complete.</italic>
</p>
<p>
<italic>Proof.</italic>
Assume there is a
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e265.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-approximation algorithm
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e266.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for AH-C
<sc>ut</sc>
. We show that this implies a
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e267.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-approximation algorithm for M
<sc>ax</sc>
-C
<sc>ut</sc>
. Let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e268.jpg" mimetype="image"></inline-graphic>
</inline-formula>
be an instance of M
<sc>ax</sc>
-C
<sc>ut</sc>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e269.jpg" mimetype="image"></inline-graphic>
</inline-formula>
be the corresponding instance of AH-C
<sc>ut</sc>
derived from the reduction described above. If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e270.jpg" mimetype="image"></inline-graphic>
</inline-formula>
or
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e271.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, we solve the instance by complete enumeration in constant time. Otherwise assume the optimal cut of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e272.jpg" mimetype="image"></inline-graphic>
</inline-formula>
cuts at least
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e273.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e274.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and induces the partition
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e275.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. By Lemma 1 the partition
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e276.jpg" mimetype="image"></inline-graphic>
</inline-formula>
induces a solution for AH-C
<sc>ut</sc>
such that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e277.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Algorithm
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e278.jpg" mimetype="image"></inline-graphic>
</inline-formula>
will give a solution with
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e279.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Then by Lemma 3 we have a set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e280.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<disp-formula>
<graphic xlink:href="pone.0014067.e281.jpg" mimetype="image" position="float"></graphic>
</disp-formula>
</p>
<p>It is clear that AH-C
<sc>ut</sc>
is in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e282.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Given a partition, we simply calculate
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e283.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for that partition and compare to the target value.</p>
</sec>
</sec>
<sec id="s2c">
<title>Fixed-Parameter Tractability</title>
<p>We show that AH-C
<sc>ut</sc>
is fixed-parameter tractable via a greedy localisation technique. First we compute a greedy solution as follows:</p>
<list list-type="order">
<list-item>
<p>Pick an edge
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e284.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e285.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for every
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e286.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Add
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e287.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e288.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e289.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e290.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</list-item>
<list-item>
<p>While
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e291.jpg" mimetype="image"></inline-graphic>
</inline-formula>
do</p>
<list list-type="alpha-lower">
<list-item>
<p>Pick a vertex
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e292.jpg" mimetype="image"></inline-graphic>
</inline-formula>
such that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e293.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</list-item>
<list-item>
<p>If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e294.jpg" mimetype="image"></inline-graphic>
</inline-formula>
then set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e295.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, otherwise set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e296.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</list-item>
</list>
</list-item>
</list>
<p>Note that we assume that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e297.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is connected. If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e298.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is not connected then all vertices of degree
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e299.jpg" mimetype="image"></inline-graphic>
</inline-formula>
can be discarded, and the initial selection of vertices must take an adjacent pair from each connected component, then the algorithm continues as before. After all vertices have been assigned if
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e300.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then we answer Yes. If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e301.jpg" mimetype="image"></inline-graphic>
</inline-formula>
we make the following claim:</p>
<sec id="s2c1">
<title>Lemma 5</title>
<p>
<italic>Let </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e302.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> be an instance of</italic>
AH-C
<sc>ut</sc>
<italic>and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e303.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> a partition of </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e304.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> such that </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e305.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>, then </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e306.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e307.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic>.</italic>
</p>
<p>
<italic>Proof.</italic>
Let
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e308.jpg" mimetype="image"></inline-graphic>
</inline-formula>
be such an instance and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e309.jpg" mimetype="image"></inline-graphic>
</inline-formula>
the partition.</p>
<p>If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e310.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then in particular we know that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e311.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, therefore
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e312.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and we have
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e313.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Furthermore if
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e314.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then there are at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e315.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges in
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e316.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Thus the total number of edges is at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e317.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and we have at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e318.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertices in the graph.</p>
<p>If
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e319.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then we immediately have that
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e320.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and therefore
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e321.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. The case with the most edges with both endpoints in the same partite set is then when there is only one edge between the two partite sets, therefore
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e322.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e323.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, therefore
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e324.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and there are at most
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e325.jpg" mimetype="image"></inline-graphic>
</inline-formula>
edges and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e326.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertices in the graph.</p>
<p>As the instance is bounded by a function of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e327.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, we can exhaustively search the instance in time
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e328.jpg" mimetype="image"></inline-graphic>
</inline-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e329.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
<p>This algorithm immediately gives the following result:</p>
</sec>
<sec id="s2c2">
<title>Theorem 6</title>
<p>AH-C
<sc>ut</sc>
<italic>is fixed-parameter tractable with an algorithm running in time </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e330.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> where </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e331.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> is the optimisation target value, </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e332.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> is the maximum edge weight and </italic>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e333.jpg" mimetype="image"></inline-graphic>
</inline-formula>
<italic> is the number of vertices in the input graph.</italic>
</p>
</sec>
</sec>
<sec id="s2d">
<title>AH-Cut in Practice</title>
<p>We apply our algorithm to four datasets: (i) melanoma-colon-leukemia data from National Cancer Institute, U.S
<xref rid="pone.0014067-Ross1" ref-type="bibr">[15]</xref>
(involving gene expression of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e334.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes for
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e335.jpg" mimetype="image"></inline-graphic>
</inline-formula>
samples); (ii) SARS data of Yap
<italic>et al.</italic>
<xref rid="pone.0014067-Yap1" ref-type="bibr">[16]</xref>
and (iii) tissue type data given by Su
<italic>et al.</italic>
<xref rid="pone.0014067-Su1" ref-type="bibr">[17]</xref>
(involving gene expression of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e336.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes for
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e337.jpg" mimetype="image"></inline-graphic>
</inline-formula>
tissue samples).</p>
<p>We also consider a large synthetic dataset consisting of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e338.jpg" mimetype="image"></inline-graphic>
</inline-formula>
samples and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e339.jpg" mimetype="image"></inline-graphic>
</inline-formula>
features with a known optimal solution with three clusters. Despite the size of this datasets, our algorithm finds all three clusters.</p>
<p>In each case we compare our algorithm to N
<sc>ormalized</sc>
-C
<sc>ut</sc>
and where possible to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e340.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
and Graclus. Implementation details for the memetic algorithm are given in the
<xref ref-type="sec" rid="s3">Materials and Methods</xref>
section.</p>
</sec>
<sec id="s2e">
<title>Melanoma-Colon-Leukemia data from NCI</title>
<p>For the first comparison we use a subset of the data for
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e341.jpg" mimetype="image"></inline-graphic>
</inline-formula>
cancer samples taken for the National Cancer Institute's (NCI) screening for anti-cancer drugs
<xref rid="pone.0014067-Ross1" ref-type="bibr">[15]</xref>
. The dataset consists of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e342.jpg" mimetype="image"></inline-graphic>
</inline-formula>
gene expressions of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e343.jpg" mimetype="image"></inline-graphic>
</inline-formula>
melanoma,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e344.jpg" mimetype="image"></inline-graphic>
</inline-formula>
colon tumour and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e345.jpg" mimetype="image"></inline-graphic>
</inline-formula>
leukaemia samples. The reason for taking these three sets of samples is that others (non-small cell lung cancer, breast cancer, etc.) have heterogeneous profiles and removing these gives an expected solution of three clear clusters. Laan and Pollard
<xref rid="pone.0014067-Laan1" ref-type="bibr">[18]</xref>
show that this simple dataset is already hard to cluster using agglomerative hierarchical clustering methods. Nevertheless, AH-C
<sc>ut</sc>
is able to cluster the samples of these three diseases effectively, see
<xref ref-type="fig" rid="pone-0014067-g001">Figure 1</xref>
for the whole dendrogram generated by AH-C
<sc>ut</sc>
. We use centred correlation distance as the distance metric to maintain consistency with Golub
<italic>et al.</italic>
<xref rid="pone.0014067-Golub1" ref-type="bibr">[19]</xref>
.</p>
<fig id="pone-0014067-g001" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0014067.g001</object-id>
<label>Figure 1</label>
<caption>
<title>Dendrogram generated from AH-C
<sc>ut</sc>
for the melanoma-colon-leukemia dataset.</title>
</caption>
<graphic xlink:href="pone.0014067.g001"></graphic>
</fig>
<p>Conversely, N
<sc>ormalized</sc>
-C
<sc>ut</sc>
behaves inconsistently in allocating samples to the partitions. Using the negated distance matrix as a similarity matrix and choosing two clusters, it either separates melanoma from colon and leukemia samples, or leukemia from colon and melanoma samples, or splits the leukemia sample group. Even when number of clusters is specified as
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e346.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, leukemia samples are split between different clusters. Using
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e347.jpg" mimetype="image"></inline-graphic>
</inline-formula>
as the similarity matrix, where
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e348.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is the distance matrix, gives worse results.</p>
<p>On the other hand,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e349.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
performs much better than N
<sc>ormalized</sc>
-C
<sc>ut</sc>
and successfully separates melanoma from colon and leukemia samples when
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e350.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and gives three distinct clusters of colon, melanoma and leukemia samples when
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e351.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</sec>
<sec id="s2f">
<title>SARS</title>
<p>Next we analyse Yap
<italic>et al.</italic>
's
<xref rid="pone.0014067-Yap1" ref-type="bibr">[16]</xref>
dataset for Severe Acute Respiratory Syndrome (SARS). To explore the exact origin of SARS, the genomic sequence relationships of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e352.jpg" mimetype="image"></inline-graphic>
</inline-formula>
different single-stranded RNA (ssRNA) viruses (both positive and negative strand ssRNA viruses) of various families were studied. Yap
<italic>et al.</italic>
<xref rid="pone.0014067-Yap1" ref-type="bibr">[16]</xref>
generate the tetra-nucleotide usage pattern profile for each virus from which a distance matrix based on correlation coefficients is created. We use this distance matrix for the following performance comparison of AH-C
<sc>ut</sc>
and N
<sc>ormalized</sc>
-C
<sc>ut</sc>
. See
<xref ref-type="fig" rid="pone-0014067-g002">Figure 2</xref>
for the dendrogram generated by AH-C
<sc>ut</sc>
for this dataset. It is interesting to note that SARS virus is grouped in the same subtree as other corona viruses and is closest to the Feline Corona Virus (FCoV). Notice that these are all positive strand ssRNA viruses. This group of SARS and Corona viruses also contains other viruses (Porcine epidemic diarrhoea virus (PDV), Transmissible gastroenteritis virus (TGV), Avian infectious bronchitis virus (ABV), Murine hepatitis virus (MHV)). There is also a group of positive strand ssRNA viruses, called “Outliers”, which exhibit differences in their tetra-nucleotide usage pattern from the rest. Yellow Fever Virus (YFV), Avian Encephalomyelitis Virus (AEV), Rabbit Hemorrhagic disease Virus (RHV), Equine arteritis Virus (EV1), Lactate Dehydrogenase-elevating Virus (LDV) were also identified as outliers by Yap
<italic>et al.</italic>
<xref rid="pone.0014067-Yap1" ref-type="bibr">[16]</xref>
. This group also includes other ssRNA positive strand viruses - Igbo ora virus (IOV), Bovine viral diarrheoa (BDV), Foot and mouth disease virus C (FMV) and Simina Haemorrhagic fever virus (SFV). The negative strand ssRNA viruses are clustered in two subgroups, one unmixed with the positive strand ssRNA viruses, the remainder in the group “Mixed”. The first class (called -ve strand ssRNA viruses in
<xref ref-type="fig" rid="pone-0014067-g002">Figure 2</xref>
) covers Canine Distemper Virus (CDV) and Tioman virus (TV2), Reston Ebola Virus (REV), Bovine Ephemeral Fever Virus (BFV), Hantaan Virus (HV1), Bovine Respiratory syncytial Virus (BRV), Human Respiratory syncytial Virus (HRV) and Respiratory Syncytial Virus (RSV).</p>
<fig id="pone-0014067-g002" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0014067.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Dendrogram generated by AH-C
<sc>ut</sc>
for the SARS dataset.</title>
</caption>
<graphic xlink:href="pone.0014067.g002"></graphic>
</fig>
<p>N
<sc>ormalized</sc>
-C
<sc>ut</sc>
mixes positive and negative strand ssRNA viruses even in the first partition for the majority of runs. Graclus puts corona viruses in different partitions even when the number of partition specified is
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e353.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. As Graclus requires a distance matrix of integers we scaled the dataset's distance matrix by
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e354.jpg" mimetype="image"></inline-graphic>
</inline-formula>
to obtain an integral distance matrix. As only a distance matrix was available,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e355.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
was not applicable.</p>
</sec>
<sec id="s2g">
<title>Tissue dataset</title>
<p>Next we present results of applying AH-C
<sc>ut</sc>
to Su
<italic>at al.</italic>
's
<xref rid="pone.0014067-Su1" ref-type="bibr">[17]</xref>
human tissue dataset. This dataset consists of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e356.jpg" mimetype="image"></inline-graphic>
</inline-formula>
tissue specific genes from
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e357.jpg" mimetype="image"></inline-graphic>
</inline-formula>
samples collected from
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e358.jpg" mimetype="image"></inline-graphic>
</inline-formula>
individuals. The known origin of tissue samples gives a great advantage in validating clusterings of this dataset. For brevity we will compare only the first partitioning of this dataset generated by the different algorithms. The first partition of AH-C
<sc>ut</sc>
consists of:</p>
<list list-type="bullet">
<list-item>
<p>brain related tissues;</p>
</list-item>
<list-item>
<p>eye related tissues;</p>
</list-item>
<list-item>
<p>face related tissues;</p>
</list-item>
<list-item>
<p>testis tissues;</p>
</list-item>
<list-item>
<p>others (including ovary tissue).</p>
</list-item>
</list>
<p>The second partition of AH-C
<sc>ut</sc>
consists of:</p>
<list list-type="bullet">
<list-item>
<p>bonemarrow related cells;</p>
</list-item>
<list-item>
<p>blood cells;</p>
</list-item>
<list-item>
<p>heart related cells;</p>
</list-item>
<list-item>
<p>fœtal cells;</p>
</list-item>
<list-item>
<p>others (including ovary tissue, uterine tissue and uterine corpus tissue).</p>
</list-item>
</list>
<p>The partitioning for this dataset is quite reasonable except occurrence ovary tissues in different partitions. This can be due to the possible outlier nature of ovary tissues. However, N
<sc>ormalized</sc>
-C
<sc>ut</sc>
repeatedly separates the brain related tissues and thus performs even worse. Graclus performs similarly to N
<sc>ormalized</sc>
-C
<sc>ut</sc>
on this dataset.</p>
<p>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e359.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
agrees very well with AH-C
<sc>ut</sc>
in the partitions, except that it clusters placental tissues within the second partition instead of the first and it puts the two uterine tissues in different partitions.
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e360.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
also puts the two ovary tissues in two different partitions.</p>
</sec>
<sec id="s2h">
<title>Synthetic large-sampled gene expression data</title>
<p>To test the scalability of our algorithm, we show the results of AH-C
<sc>ut</sc>
applied to a large synthetic dataset. Consider
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e361.jpg" mimetype="image"></inline-graphic>
</inline-formula>
samples of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e362.jpg" mimetype="image"></inline-graphic>
</inline-formula>
synthetic gene expression profiles corresponding to three subtypes of some disease, giving a known optimum clustering with three clusters. To generate the data, we follow Laan and Pollard's
<xref rid="pone.0014067-Laan1" ref-type="bibr">[18]</xref>
method. We sample three groups of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e363.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e364.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e365.jpg" mimetype="image"></inline-graphic>
</inline-formula>
samples respectively from three multivariate normal distributions with diagonal covariance matrices, which differed only in their mean vector. The number of samples are chosen keeping in mind that in general there are some predominant subtypes of a disease and some rarer subtypes. All genes had common standard deviation
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e366.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, which corresponds to a
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e367.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-quantile of all standard deviations in an actual data set. For the first subpopulation, the first
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e368.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes had a mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e369.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, genes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e370.jpg" mimetype="image"></inline-graphic>
</inline-formula>
had mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e371.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, and the other
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e372.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes had mean zero. Then for the second subpopulation, genes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e373.jpg" mimetype="image"></inline-graphic>
</inline-formula>
had mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e374.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, genes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e375.jpg" mimetype="image"></inline-graphic>
</inline-formula>
had mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e376.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and the other
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e377.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes had mean zero. For the third subpopulation, genes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e378.jpg" mimetype="image"></inline-graphic>
</inline-formula>
had mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e379.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, genes
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e380.jpg" mimetype="image"></inline-graphic>
</inline-formula>
had mean of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e381.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and the other
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e382.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes had mean zero. In other words, signature of the three types of cancer is related to
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e383.jpg" mimetype="image"></inline-graphic>
</inline-formula>
genes of which
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e384.jpg" mimetype="image"></inline-graphic>
</inline-formula>
are under-expressed and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e385.jpg" mimetype="image"></inline-graphic>
</inline-formula>
are over-expressed.</p>
<p>The application of AH-C
<sc>ut</sc>
on this dataset first separates the first group from the rest. A second application on the rest of the samples yields the second and third group as the two partitions.</p>
<p>When number of clusters is specified as two, N
<sc>ormalized</sc>
-C
<sc>ut</sc>
clusters the first and third subtypes together and the second subtype separately. However, specifying number of clusters as three creates a partitioning which does not correspond to the expected known grouping.</p>
<p>
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e386.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
puts the first subtype in one partition and the other two subtypes in another partition when
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e387.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, and separates three subtypes successfully when
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e388.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</sec>
<sec id="s2i">
<title>Conclusion</title>
<p>We have introduced a novel objective function for clustering based on graph partitioning. We show that the resulting problem AH-C
<sc>ut</sc>
is, unfortunately,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e389.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-complete and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e390.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-hard, but is however fixed-parameter tractable.</p>
<p>We then gave several test cases demonstrating the potential of the approach using a memetic algorithm. The performance of AH-C
<sc>ut</sc>
based clustering exceeds the performance of N
<sc>ormalized</sc>
-C
<sc>ut</sc>
based clustering across a wide variety of datasets, including large scale datasets, and notably datasets with known optimal clusterings. AH-C
<sc>ut</sc>
based clustering also has a wider applicability than
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e391.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
based clustering, and at least equal performance.</p>
<p>There are several avenues for further research from this initial exploration. The fixed-parameter tractability of AH-C
<sc>ut</sc>
promises the possibility of a practical
<italic>exact</italic>
algorithm, which would give stronger evidence of AH-C
<sc>ut</sc>
's performance, as random elements would be removed.</p>
<p>Further studies on datasets of all kinds would also be useful to explore the strengths and weaknesses of AH-C
<sc>ut</sc>
based clustering, especially in comparison to other existing methods.</p>
<p>Tangentially, the quality of the memetic algorithm solutions suggest that there may be a link between the fixed-parameter tractability and the performance of the memetic algorithm. As established by the fixed-parameter tractability of AH-C
<sc>ut</sc>
, if a simple greedy algorithm does not produce a solution with a sufficiently high objective value, then the instance size must be bounded by an relatively simple function of the parameters. Therefore it is possible that under these conditions the local search component of the memetic algorithm approximates an exhaustive search, or at least has a greater effectiveness. A definite link of this kind would be an interesting development for both parameterized complexity and memetic algorithmics, above and beyond this application.</p>
</sec>
</sec>
<sec sec-type="materials|methods" id="s3">
<title>Materials and Methods</title>
<p>The complexity analysis of the AH-C
<sc>ut</sc>
problem employ standard complexity theory techniques
<xref rid="pone.0014067-Ausiello1" ref-type="bibr">[11]</xref>
,
<xref rid="pone.0014067-Flum1" ref-type="bibr">[12]</xref>
.</p>
<p>The NCI60 cancer dataset was drawn from the NCI60 anti-cancer drug screening program data
<xref rid="pone.0014067-NCINIH1" ref-type="bibr">[20]</xref>
and the gene expression data for the cell lines was given by Ross
<italic>et al.</italic>
<xref rid="pone.0014067-Ross1" ref-type="bibr">[15]</xref>
.</p>
<p>The tissue dataset is drawn from Su
<italic>et al.</italic>
<xref rid="pone.0014067-Su1" ref-type="bibr">[17]</xref>
.</p>
<p>The synthetic dataset was generated using the methods of Laan and Pollard
<xref rid="pone.0014067-Laan1" ref-type="bibr">[18]</xref>
.</p>
<p>For the comparisons we use Gene Cluster's implementation of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e392.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-M
<sc>eans</sc>
<xref rid="pone.0014067-deHoon1" ref-type="bibr">[4]</xref>
(
<ext-link ext-link-type="uri" xlink:href="http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#ctv">http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#ctv</ext-link>
), Dhillon
<italic>et al.</italic>
's Graclus software
<xref rid="pone.0014067-Dhillon1" ref-type="bibr">[9]</xref>
(
<ext-link ext-link-type="uri" xlink:href="http://userweb.cs.utexas.edu/users/dml/Software/graclus.html">http://userweb.cs.utexas.edu/users/dml/Software/graclus.html</ext-link>
) and Shi and Malik's implementation of N
<sc>ormalized</sc>
C
<sc>ut</sc>
<xref rid="pone.0014067-Shi1" ref-type="bibr">[1]</xref>
.</p>
<p>All experiments were performed on a Dell laptop with a 1.67GHz processor and 2GB of RAM with the software written in Java.</p>
<sec id="s3a">
<title>A memetic algorithm for AH-Cut</title>
<p>We have implemented AH-C
<sc>ut</sc>
via a memetic algorithm
<xref rid="pone.0014067-Moscato1" ref-type="bibr">[21]</xref>
,
<xref rid="pone.0014067-Moscato2" ref-type="bibr">[22]</xref>
. Memetic algorithms provide a population-based approach for heuristic search in optimization problems. Broadly speaking they combine local search heuristics with crossover operators used in genetic algorithms
<xref rid="pone.0014067-Whitley1" ref-type="bibr">[23]</xref>
<xref rid="pone.0014067-Moscato4" ref-type="bibr">[25]</xref>
. The essence of our algorithm is similar to the work of Merz and Freisleben
<xref rid="pone.0014067-Merz1" ref-type="bibr">[26]</xref>
for G
<sc>raph</sc>
B
<sc>i</sc>
-
<sc>partitioning</sc>
. Differences arise from the fact that we need to remove the constraint of equal partitioning of the graph. The method consists of three main procedures: a greedy algorithm for initialization of a set of solutions for AH-C
<sc>ut</sc>
(detailed in the parameterized algorithm); a differential greedy crossover for evolution of the population; and a variable neighborhood local search, influenced by Festa
<italic>et al.</italic>
<xref rid="pone.0014067-Festa1" ref-type="bibr">[27]</xref>
, to improve the newly generated solutions.</p>
<p>We use a ternary tree for population similar to Buriol
<italic>et al.</italic>
<xref rid="pone.0014067-Buriol1" ref-type="bibr">[28]</xref>
and keep two solutions at each node of this tree. One solution is the best obtained so far at the node, called
<italic>pocket solution</italic>
and the other one is the
<italic>current</italic>
solution. Essentially, if we generate a current solution by recombination or local search which is better than the pocket solution, we swap the current solution with the pocket solution. Furthermore, each parent node of the tree must have better pocket solution than its children's pocket solutions. Similar tree structures were previously employed successfully for various combinatorially hard problems
<xref rid="pone.0014067-Buriol1" ref-type="bibr">[28]</xref>
<xref rid="pone.0014067-Mendes1" ref-type="bibr">[30]</xref>
.</p>
<sec id="s3a1">
<title>Differential Greedy Crossover</title>
<p>We allow a crossover of a parent's pocket solution with a child's current solution to ensure the diversity in the population. All vertices that are contained in the same set for both the parents, are included in the same set in the offspring. Then both sets are filled according to a greedy recombination method similar to the greedy algorithm used for the parameterized algorithm. Suppose, the parent solutions
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e393.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e394.jpg" mimetype="image"></inline-graphic>
</inline-formula>
have the partitions
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e395.jpg" mimetype="image"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e396.jpg" mimetype="image"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e397.jpg" mimetype="image"></inline-graphic>
</inline-formula>
respectively (after interchanging the sets suitably). Then the starting set
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e398.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(resp.
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e399.jpg" mimetype="image"></inline-graphic>
</inline-formula>
) for the offspring is given by the intersection
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e400.jpg" mimetype="image"></inline-graphic>
</inline-formula>
(resp.
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e401.jpg" mimetype="image"></inline-graphic>
</inline-formula>
), with the remainder of the partition calculated greedily.</p>
</sec>
<sec id="s3a2">
<title>Local Search</title>
<p>We employ a
<italic>variable-neighborhood search (VNS)</italic>
, first proposed by Hansen and Mladenovic
<xref rid="pone.0014067-Hansen1" ref-type="bibr">[31]</xref>
for a local search in the neighborhood of the new offspring. Contrary to other local search methods, VNS allows enlargement of the neighborhood structure. A
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e402.jpg" mimetype="image"></inline-graphic>
</inline-formula>
-th order neighbor of a paritition giving a solution
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e403.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for AH-C
<sc>ut</sc>
is obtained by swapping the partite set of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e404.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertices. In VNS, first local search is done starting from each neighbor
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e405.jpg" mimetype="image"></inline-graphic>
</inline-formula>
of the current solution
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e406.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. If a solution
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e407.jpg" mimetype="image"></inline-graphic>
</inline-formula>
is found which is better than
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e408.jpg" mimetype="image"></inline-graphic>
</inline-formula>
, then the search moves to the neighborhood of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e409.jpg" mimetype="image"></inline-graphic>
</inline-formula>
. Otherwise, the order
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e410.jpg" mimetype="image"></inline-graphic>
</inline-formula>
of the neighborhood is increased by one, until some stop criterion holds. We use maximum value of
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e411.jpg" mimetype="image"></inline-graphic>
</inline-formula>
for
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e412.jpg" mimetype="image"></inline-graphic>
</inline-formula>
.</p>
</sec>
<sec id="s3a3">
<title>Diversity</title>
<p>Whenever the population stagnates (fails to improve the objective value), we keep the best solution and re-initialize the rest of solutions (using the greedy algorithm with a randomised starting vertex pair) in the set and run the above process again for certain number of generations (say,
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e413.jpg" mimetype="image"></inline-graphic>
</inline-formula>
).</p>
<p>To get the optimal solution for very small sized problems (graphs containing less than
<inline-formula>
<inline-graphic xlink:href="pone.0014067.e414.jpg" mimetype="image"></inline-graphic>
</inline-formula>
vertices), we used backtracking. Notice that even though backtracking gives us an optimal solution, a greedy or memetic algorithm may not. By applying this method (backtracking, memetic or greedy algorithm depending on the number of vertices) recursively, we have at each step a graph as input, and the two subgraphs induced by each of the sets of the vertex partition as output; stopping when we arrive to a graph with just one vertex, we generate a hierarchical clustering in a top-down fashion.</p>
</sec>
</sec>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="COI-statement">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="financial-disclosure">
<p>
<bold>Funding: </bold>
We acknowledge the support of the Australian Research Council (ARC) Centre of Excellence in Bioinformatics (CE0348221), The University of Newcastle, and ARC Discovery Project DP0773279 (Application of novel exact combinatorial optimisation techniques and metaheuristic methods for problems in cancer research). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</p>
</fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="pone.0014067-Shi1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Malik</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Normalized cuts and image segmentation.</article-title>
<source>IEEE Transactions of Pattern Analysis and Machine Intelligence</source>
<volume>22</volume>
<fpage>888</fpage>
<lpage>905</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Wertheimer1">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wertheimer</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>1938</year>
<article-title>Laws of organization in perceptual forms (partial translation).</article-title>
<source>A Sourcebook of Gestalt Psychology</source>
<fpage>71</fpage>
<lpage>88</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Papadimitriou1">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Papadimitriou</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Yannakakis</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>1991</year>
<article-title>Optimization, approximation, and complexity classes.</article-title>
<source>Journal of Computer and System Sciences</source>
<volume>43</volume>
<fpage>425</fpage>
<lpage>440</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-deHoon1">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>de Hoon</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Imoto</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nolan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Miyano</surname>
<given-names>S</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Open source clustering software.</article-title>
<source>Bioinformatics</source>
<volume>20</volume>
<fpage>1453</fpage>
<lpage>1454</lpage>
<pub-id pub-id-type="pmid">14871861</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-Wu1">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Leahy</surname>
<given-names>R</given-names>
</name>
</person-group>
<year>1993</year>
<article-title>An optimal graph theoretic approach to data clustering: theory and its application to image segmentation.</article-title>
<source>IEEE Transactions of Pattern Analysis and Machine Intelligence</source>
<volume>15</volume>
<fpage>1101</fpage>
<lpage>1113</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Wang1">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Siskind</surname>
<given-names>JF</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Image segmentation with ratio cut.</article-title>
<source>IEEE Transactions of Pattern Analysis and Machine Intelligence</source>
<volume>25</volume>
<fpage>675</fpage>
<lpage>690</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Wei1">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>YC</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>CK</given-names>
</name>
</person-group>
<year>1991</year>
<article-title>Ratio cut partitioning for hierarchical designs.</article-title>
<source>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</source>
<volume>10</volume>
<fpage>911</fpage>
<lpage>921</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Yu1">
<label>8</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>SX</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>J</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Multiclass spectral clustering.</article-title>
<comment>In: International Conference on Computer Vision</comment>
</element-citation>
</ref>
<ref id="pone.0014067-Dhillon1">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dhillon</surname>
<given-names>IS</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kulis</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Weighted graph cuts without eigenvectors: a multilevel approach.</article-title>
<source>IEEE Transactions of Pattern Analysis and Machine Intelligence</source>
<volume>29</volume>
<fpage>1944</fpage>
<lpage>1957</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Mahata1">
<label>10</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Mahata</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Costa</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Hierarchical clustering, languages and cancer.</article-title>
<source>EvoWorkshops 2006</source>
<fpage>67</fpage>
<lpage>78</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Ausiello1">
<label>11</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Ausiello</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Crescenzi</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Gambosi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kann</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Spaccamela</surname>
<given-names>AM</given-names>
</name>
<etal></etal>
</person-group>
<year>1999</year>
<source>Complexity and Approximation: Combinatorial Optimization Problems and their Approximability Properties</source>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer-Verlag</publisher-name>
</element-citation>
</ref>
<ref id="pone.0014067-Flum1">
<label>12</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Flum</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Grohe</surname>
<given-names>M</given-names>
</name>
</person-group>
<year>2006</year>
<source>Parameterized Complexity Theory</source>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="pone.0014067-Karp1">
<label>13</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Karp</surname>
<given-names>RM</given-names>
</name>
</person-group>
<year>1972</year>
<article-title>Reducibility among combinatorial problems.</article-title>
<source>Proceedings of a symposium on the Complexity of Computer Computations</source>
<publisher-loc>New York</publisher-loc>
<publisher-name>IBM T. W. Watson Research Center</publisher-name>
<fpage>85</fpage>
<lpage>104</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Mahajan1">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mahajan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Raman</surname>
<given-names>V</given-names>
</name>
</person-group>
<year>1997</year>
<article-title>Parameterizing above guaranteed values: Maxsat and maxcut.</article-title>
<source>Electronic Colloquium on Computational Complexity (ECCC)</source>
<volume>4</volume>
</element-citation>
</ref>
<ref id="pone.0014067-Ross1">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ross</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Scherf</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Eisen</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Perou</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Rees</surname>
<given-names>C</given-names>
</name>
<etal></etal>
</person-group>
<year>2000</year>
<article-title>Systematic variation in gene expression patterns in human cancer cell lines.</article-title>
<source>Nature Genetics</source>
<volume>24</volume>
<fpage>227</fpage>
<lpage>235</lpage>
<pub-id pub-id-type="pmid">10700174</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-Yap1">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yap</surname>
<given-names>YL</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>XW</given-names>
</name>
<name>
<surname>Danchin</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling.</article-title>
<source>BMC Bioinformatics</source>
<volume>4</volume>
<fpage>43</fpage>
<pub-id pub-id-type="pmid">14499005</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-Su1">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname>
<given-names>AI</given-names>
</name>
<name>
<surname>Cooke</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>ching</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Hakak</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Walker</surname>
<given-names>JR</given-names>
</name>
<etal></etal>
</person-group>
<year>2002</year>
<article-title>Large scale analysis of the human and mouse transcriptomes.</article-title>
<source>PNAS</source>
<volume>99</volume>
<fpage>4465</fpage>
<lpage>4470</lpage>
<pub-id pub-id-type="pmid">11904358</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-Laan1">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Laan</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Pollard</surname>
<given-names>KS</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>A new algorithm for hybrid clustering of gene expression data with visualization and bootstrap.</article-title>
<source>Journal of Statistical Planning and Inference</source>
<volume>117</volume>
<fpage>275</fpage>
<lpage>303</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Golub1">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Golub</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Slonim</surname>
<given-names>DK</given-names>
</name>
<name>
<surname>Tamayo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Huard</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gaasenbeek</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<year>1999</year>
<article-title>Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.</article-title>
<source>Science</source>
<volume>286</volume>
<fpage>531</fpage>
<lpage>537</lpage>
<pub-id pub-id-type="pmid">10521349</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-NCINIH1">
<label>20</label>
<element-citation publication-type="other">
<collab>NCI/NIH</collab>
<year>2010</year>
<article-title>Developmental Theraputics Program.</article-title>
<comment>Available:
<ext-link ext-link-type="uri" xlink:href="http://dtp.nci.nih.gov/">http://dtp.nci.nih.gov/</ext-link>
</comment>
</element-citation>
</ref>
<ref id="pone.0014067-Moscato1">
<label>21</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>1989</year>
<article-title>On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms.</article-title>
<comment>Technical Report 826, Caltech Concurrent Computation Program</comment>
</element-citation>
</ref>
<ref id="pone.0014067-Moscato2">
<label>22</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>A gentle introduction to memetic algorithms.</article-title>
<person-group person-group-type="editor">
<name>
<surname>Glover</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Kochenberger</surname>
<given-names>G</given-names>
</name>
</person-group>
<source>Handbook of Metaheuristics</source>
<publisher-loc>Boston MA</publisher-loc>
<publisher-name>Kluwer Academic Publishers</publisher-name>
<fpage>105</fpage>
<lpage>144</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Whitley1">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whitley</surname>
<given-names>D</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>A genetic algorithm tutorial.</article-title>
<source>Statistics and Computing</source>
<volume>4</volume>
<fpage>65</fpage>
<lpage>85</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Moscato3">
<label>24</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Mendes</surname>
<given-names>A</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Memetic algorithms.</article-title>
<person-group person-group-type="editor">
<name>
<surname>Onwubolu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Babu</surname>
<given-names>B</given-names>
</name>
</person-group>
<source>New Optimization Techniques in Engineering</source>
<publisher-loc>Berlin</publisher-loc>
<publisher-name>Springer-Verlag</publisher-name>
<fpage>53</fpage>
<lpage>85</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Moscato4">
<label>25</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
</person-group>
<year>2006</year>
<article-title>Memetic algorithms.</article-title>
<person-group person-group-type="editor">
<name>
<surname>González</surname>
<given-names>T</given-names>
</name>
</person-group>
<source>Handbook of Approximation Algorithms and Metaheuristics</source>
<publisher-name>Taylor & Francis</publisher-name>
</element-citation>
</ref>
<ref id="pone.0014067-Merz1">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Merz</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Freisleben</surname>
<given-names>B</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning.</article-title>
<source>Evolutionary Computation</source>
<volume>8</volume>
<fpage>61</fpage>
<lpage>91</lpage>
<pub-id pub-id-type="pmid">10753231</pub-id>
</element-citation>
</ref>
<ref id="pone.0014067-Festa1">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Festa</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pardalos</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Resende</surname>
<given-names>MGC</given-names>
</name>
<name>
<surname>Ribeiro</surname>
<given-names>CC</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Randomized heuristics for the MAX-CUT problem.</article-title>
<source>Optimization Methods and Software</source>
<volume>7</volume>
<fpage>1033</fpage>
<lpage>1058</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Buriol1">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buriol</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Franca</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>A new memetic algorithm for the asymmetric traveling salesman problem.</article-title>
<source>Journal of Heuristics</source>
<volume>10</volume>
<fpage>483</fpage>
<lpage>506</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Berretta1">
<label>29</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Berretta</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Enhancing the performance of memetic algorithms by using a matching-based recombination algorithm.</article-title>
<source>Metaheuristics: computer decision-making</source>
<publisher-loc>Norwell, MA, USA</publisher-loc>
<publisher-name>Kluwer Academic Publishers</publisher-name>
<fpage>65</fpage>
<lpage>90</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Mendes1">
<label>30</label>
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Mendes</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Cotta</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Garcia</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Franca</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Moscato</surname>
<given-names>P</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Gene ordering in microarray data using parallel memetic algorithms.</article-title>
<source>ICPP Workshops</source>
<fpage>604</fpage>
<lpage>611</lpage>
</element-citation>
</ref>
<ref id="pone.0014067-Hansen1">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hansen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Mladenović</surname>
<given-names>N</given-names>
</name>
</person-group>
<year>2001</year>
<article-title>Developments of variable neighborhood search.</article-title>
<source>Essays and Surveys in Metaheuristics</source>
<fpage>415</fpage>
<lpage>439</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/CovidV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000698  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000698  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    CovidV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Fri Mar 27 18:14:15 2020. Site generation: Sun Jan 31 15:15:08 2021