Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data

Identifieur interne : 002B69 ( Pmc/Corpus ); précédent : 002B68; suivant : 002B70

Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data

Auteurs : Saumyadipta Pyne ; Sharon X. Lee ; Kui Wang ; Jonathan Irish ; Pablo Tamayo ; Marc-Danie Nazaire ; Tarn Duong ; Shu-Kay Ng ; David Hafler ; Ronald Levy ; Garry P. Nolan ; Jill Mesirov ; Geoffrey J. Mclachlan

Source :

RBID : PMC:4077578

Abstract

In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template – used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/.


Url:
DOI: 10.1371/journal.pone.0100334
PubMed: 24983991
PubMed Central: 4077578

Links to Exploration step

PMC:4077578

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data</title>
<author>
<name sortKey="Pyne, Saumyadipta" sort="Pyne, Saumyadipta" uniqKey="Pyne S" first="Saumyadipta" last="Pyne">Saumyadipta Pyne</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, Andhra Pradesh, India</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lee, Sharon X" sort="Lee, Sharon X" uniqKey="Lee S" first="Sharon X." last="Lee">Sharon X. Lee</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Kui" sort="Wang, Kui" uniqKey="Wang K" first="Kui" last="Wang">Kui Wang</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Irish, Jonathan" sort="Irish, Jonathan" uniqKey="Irish J" first="Jonathan" last="Irish">Jonathan Irish</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Division of Oncology, Stanford Medical School, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Department of Cancer Biology, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tamayo, Pablo" sort="Tamayo, Pablo" uniqKey="Tamayo P" first="Pablo" last="Tamayo">Pablo Tamayo</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nazaire, Marc Danie" sort="Nazaire, Marc Danie" uniqKey="Nazaire M" first="Marc-Danie" last="Nazaire">Marc-Danie Nazaire</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Duong, Tarn" sort="Duong, Tarn" uniqKey="Duong T" first="Tarn" last="Duong">Tarn Duong</name>
<affiliation>
<nlm:aff id="aff7">
<addr-line>Molecular Mechanisms of Intracellular Transport, Unit Mixte de Recherche 144 Centre National de la Recherche Scientifique/Institut Curie, Paris, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ng, Shu Kay" sort="Ng, Shu Kay" uniqKey="Ng S" first="Shu-Kay" last="Ng">Shu-Kay Ng</name>
<affiliation>
<nlm:aff id="aff8">
<addr-line>School of Medicine, Griffith University, Meadowbrook, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hafler, David" sort="Hafler, David" uniqKey="Hafler D" first="David" last="Hafler">David Hafler</name>
<affiliation>
<nlm:aff id="aff9">
<addr-line>Department of Neurology, Yale School of Medicine, New Haven, Connecticut, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Levy, Ronald" sort="Levy, Ronald" uniqKey="Levy R" first="Ronald" last="Levy">Ronald Levy</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Division of Oncology, Stanford Medical School, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nolan, Garry P" sort="Nolan, Garry P" uniqKey="Nolan G" first="Garry P." last="Nolan">Garry P. Nolan</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mesirov, Jill" sort="Mesirov, Jill" uniqKey="Mesirov J" first="Jill" last="Mesirov">Jill Mesirov</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mclachlan, Geoffrey J" sort="Mclachlan, Geoffrey J" uniqKey="Mclachlan G" first="Geoffrey J." last="Mclachlan">Geoffrey J. Mclachlan</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24983991</idno>
<idno type="pmc">4077578</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4077578</idno>
<idno type="RBID">PMC:4077578</idno>
<idno type="doi">10.1371/journal.pone.0100334</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">002B69</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">002B69</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data</title>
<author>
<name sortKey="Pyne, Saumyadipta" sort="Pyne, Saumyadipta" uniqKey="Pyne S" first="Saumyadipta" last="Pyne">Saumyadipta Pyne</name>
<affiliation>
<nlm:aff id="aff1">
<addr-line>CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, Andhra Pradesh, India</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lee, Sharon X" sort="Lee, Sharon X" uniqKey="Lee S" first="Sharon X." last="Lee">Sharon X. Lee</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Kui" sort="Wang, Kui" uniqKey="Wang K" first="Kui" last="Wang">Kui Wang</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Irish, Jonathan" sort="Irish, Jonathan" uniqKey="Irish J" first="Jonathan" last="Irish">Jonathan Irish</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Division of Oncology, Stanford Medical School, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff5">
<addr-line>Department of Cancer Biology, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tamayo, Pablo" sort="Tamayo, Pablo" uniqKey="Tamayo P" first="Pablo" last="Tamayo">Pablo Tamayo</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nazaire, Marc Danie" sort="Nazaire, Marc Danie" uniqKey="Nazaire M" first="Marc-Danie" last="Nazaire">Marc-Danie Nazaire</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Duong, Tarn" sort="Duong, Tarn" uniqKey="Duong T" first="Tarn" last="Duong">Tarn Duong</name>
<affiliation>
<nlm:aff id="aff7">
<addr-line>Molecular Mechanisms of Intracellular Transport, Unit Mixte de Recherche 144 Centre National de la Recherche Scientifique/Institut Curie, Paris, France</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ng, Shu Kay" sort="Ng, Shu Kay" uniqKey="Ng S" first="Shu-Kay" last="Ng">Shu-Kay Ng</name>
<affiliation>
<nlm:aff id="aff8">
<addr-line>School of Medicine, Griffith University, Meadowbrook, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hafler, David" sort="Hafler, David" uniqKey="Hafler D" first="David" last="Hafler">David Hafler</name>
<affiliation>
<nlm:aff id="aff9">
<addr-line>Department of Neurology, Yale School of Medicine, New Haven, Connecticut, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Levy, Ronald" sort="Levy, Ronald" uniqKey="Levy R" first="Ronald" last="Levy">Ronald Levy</name>
<affiliation>
<nlm:aff id="aff3">
<addr-line>Division of Oncology, Stanford Medical School, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nolan, Garry P" sort="Nolan, Garry P" uniqKey="Nolan G" first="Garry P." last="Nolan">Garry P. Nolan</name>
<affiliation>
<nlm:aff id="aff4">
<addr-line>Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, California, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mesirov, Jill" sort="Mesirov, Jill" uniqKey="Mesirov J" first="Jill" last="Mesirov">Jill Mesirov</name>
<affiliation>
<nlm:aff id="aff6">
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mclachlan, Geoffrey J" sort="Mclachlan, Geoffrey J" uniqKey="Mclachlan G" first="Geoffrey J." last="Mclachlan">Geoffrey J. Mclachlan</name>
<affiliation>
<nlm:aff id="aff2">
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template – used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from
<ext-link ext-link-type="uri" xlink:href="http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/">http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Irish, Jm" uniqKey="Irish J">JM Irish</name>
</author>
<author>
<name sortKey="Kotecha, N" uniqKey="Kotecha N">N Kotecha</name>
</author>
<author>
<name sortKey="Nolan, Gp" uniqKey="Nolan G">GP Nolan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perfetto, Sp" uniqKey="Perfetto S">SP Perfetto</name>
</author>
<author>
<name sortKey="Chattopadhyay, Pk" uniqKey="Chattopadhyay P">PK Chattopadhyay</name>
</author>
<author>
<name sortKey="Roederer, M" uniqKey="Roederer M">M Roederer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lugli, E" uniqKey="Lugli E">E Lugli</name>
</author>
<author>
<name sortKey="Roederer, M" uniqKey="Roederer M">M Roederer</name>
</author>
<author>
<name sortKey="Cossarizza, A" uniqKey="Cossarizza A">A Cossarizza</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krutzik, Po" uniqKey="Krutzik P">PO Krutzik</name>
</author>
<author>
<name sortKey="Nolan, Gp" uniqKey="Nolan G">GP Nolan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tanner, Sd" uniqKey="Tanner S">SD Tanner</name>
</author>
<author>
<name sortKey="Bandura, Dr" uniqKey="Bandura D">DR Bandura</name>
</author>
<author>
<name sortKey="Ornatsky, O" uniqKey="Ornatsky O">O Ornatsky</name>
</author>
<author>
<name sortKey="Baranov, Vi" uniqKey="Baranov V">VI Baranov</name>
</author>
<author>
<name sortKey="Nitz, M" uniqKey="Nitz M">M Nitz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bendall, Sc" uniqKey="Bendall S">SC Bendall</name>
</author>
<author>
<name sortKey="Simonds, Ef" uniqKey="Simonds E">EF Simonds</name>
</author>
<author>
<name sortKey="Qiu, P" uniqKey="Qiu P">P Qiu</name>
</author>
<author>
<name sortKey="Amir, Ead" uniqKey="Amir E">EaD Amir</name>
</author>
<author>
<name sortKey="Krutzik, Po" uniqKey="Krutzik P">PO Krutzik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pyne, S" uniqKey="Pyne S">S Pyne</name>
</author>
<author>
<name sortKey="Hu, X" uniqKey="Hu X">X Hu</name>
</author>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K Wang</name>
</author>
<author>
<name sortKey="Rossin, E" uniqKey="Rossin E">E Rossin</name>
</author>
<author>
<name sortKey="Lin, Ti" uniqKey="Lin T">TI Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kotecha, N" uniqKey="Kotecha N">N Kotecha</name>
</author>
<author>
<name sortKey="Flores, Nj" uniqKey="Flores N">NJ Flores</name>
</author>
<author>
<name sortKey="Irish, Jm" uniqKey="Irish J">JM Irish</name>
</author>
<author>
<name sortKey="Simonds, Ef" uniqKey="Simonds E">EF Simonds</name>
</author>
<author>
<name sortKey="Sakai, Ds" uniqKey="Sakai D">DS Sakai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Irish, Jm" uniqKey="Irish J">JM Irish</name>
</author>
<author>
<name sortKey="Myklebust, Jh" uniqKey="Myklebust J">JH Myklebust</name>
</author>
<author>
<name sortKey="Alizadeh, Aa" uniqKey="Alizadeh A">AA Alizadeh</name>
</author>
<author>
<name sortKey="Houot, R" uniqKey="Houot R">R Houot</name>
</author>
<author>
<name sortKey="Sharman, Jp" uniqKey="Sharman J">JP Sharman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oved, K" uniqKey="Oved K">K Oved</name>
</author>
<author>
<name sortKey="Eden, E" uniqKey="Eden E">E Eden</name>
</author>
<author>
<name sortKey="Akerman, M" uniqKey="Akerman M">M Akerman</name>
</author>
<author>
<name sortKey="Noy, R" uniqKey="Noy R">R Noy</name>
</author>
<author>
<name sortKey="Wolchinsky, R" uniqKey="Wolchinsky R">R Wolchinsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lo, K" uniqKey="Lo K">K Lo</name>
</author>
<author>
<name sortKey="Hahne, F" uniqKey="Hahne F">F Hahne</name>
</author>
<author>
<name sortKey="Brinkman, Rr" uniqKey="Brinkman R">RR Brinkman</name>
</author>
<author>
<name sortKey="Gottardo, R" uniqKey="Gottardo R">R Gottardo</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pyne, S" uniqKey="Pyne S">S Pyne</name>
</author>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K Wang</name>
</author>
<author>
<name sortKey="Irish, J" uniqKey="Irish J">J Irish</name>
</author>
<author>
<name sortKey="Tamayo, P" uniqKey="Tamayo P">P Tamayo</name>
</author>
<author>
<name sortKey="Nazaire, Md" uniqKey="Nazaire M">MD Nazaire</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Aghaeepour, N" uniqKey="Aghaeepour N">N Aghaeepour</name>
</author>
<author>
<name sortKey="Finak, G" uniqKey="Finak G">G Finak</name>
</author>
<author>
<name sortKey="Hoos, H" uniqKey="Hoos H">H Hoos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cron, A" uniqKey="Cron A">A Cron</name>
</author>
<author>
<name sortKey="Gouttefangeas, C" uniqKey="Gouttefangeas C">C Gouttefangeas</name>
</author>
<author>
<name sortKey="Frelinger, J" uniqKey="Frelinger J">J Frelinger</name>
</author>
<author>
<name sortKey="Lin, L" uniqKey="Lin L">L Lin</name>
</author>
<author>
<name sortKey="Singh, Sk" uniqKey="Singh S">SK Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maier, Lm" uniqKey="Maier L">LM Maier</name>
</author>
<author>
<name sortKey="Anderson, De" uniqKey="Anderson D">DE Anderson</name>
</author>
<author>
<name sortKey="De Jager, Pl" uniqKey="De Jager P">PL De Jager</name>
</author>
<author>
<name sortKey="Wicker, Ls" uniqKey="Wicker L">LS Wicker</name>
</author>
<author>
<name sortKey="Hafler, Da" uniqKey="Hafler D">DA Hafler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Subramanian, A" uniqKey="Subramanian A">A Subramanian</name>
</author>
<author>
<name sortKey="Tamayo, P" uniqKey="Tamayo P">P Tamayo</name>
</author>
<author>
<name sortKey="Mootha, Vk" uniqKey="Mootha V">VK Mootha</name>
</author>
<author>
<name sortKey="Mukherjee, S" uniqKey="Mukherjee S">S Mukherjee</name>
</author>
<author>
<name sortKey="Ebert, Bl" uniqKey="Ebert B">BL Ebert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tamayo, P" uniqKey="Tamayo P">P Tamayo</name>
</author>
<author>
<name sortKey="Cho, Yj" uniqKey="Cho Y">YJ Cho</name>
</author>
<author>
<name sortKey="Tsherniak, A" uniqKey="Tsherniak A">A Tsherniak</name>
</author>
<author>
<name sortKey="Greulich, H" uniqKey="Greulich H">H Greulich</name>
</author>
<author>
<name sortKey="Ambrogio, L" uniqKey="Ambrogio L">L Ambrogio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Demers, S" uniqKey="Demers S">S Demers</name>
</author>
<author>
<name sortKey="Kim, J" uniqKey="Kim J">J Kim</name>
</author>
<author>
<name sortKey="Legendre, P" uniqKey="Legendre P">P Legendre</name>
</author>
<author>
<name sortKey="Legendre, L" uniqKey="Legendre L">L Legendre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lo, K" uniqKey="Lo K">K Lo</name>
</author>
<author>
<name sortKey="Gottardo, R" uniqKey="Gottardo R">R Gottardo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fruhwirth Schnatter, S" uniqKey="Fruhwirth Schnatter S">S Frühwirth-Schnatter</name>
</author>
<author>
<name sortKey="Pyne, S" uniqKey="Pyne S">S Pyne</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baudry, Jp" uniqKey="Baudry J">JP Baudry</name>
</author>
<author>
<name sortKey="Raftery, Ae" uniqKey="Raftery A">AE Raftery</name>
</author>
<author>
<name sortKey="Celeux, G" uniqKey="Celeux G">G Celeux</name>
</author>
<author>
<name sortKey="Lo, K" uniqKey="Lo K">K Lo</name>
</author>
<author>
<name sortKey="Gottardo, R" uniqKey="Gottardo R">R Gottardo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Yang, M" uniqKey="Yang M">M Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gaffney, Sj" uniqKey="Gaffney S">SJ Gaffney</name>
</author>
<author>
<name sortKey="Robertson, Aw" uniqKey="Robertson A">AW Robertson</name>
</author>
<author>
<name sortKey="Smyth, P" uniqKey="Smyth P">P Smyth</name>
</author>
<author>
<name sortKey="Camargo, Sj" uniqKey="Camargo S">SJ Camargo</name>
</author>
<author>
<name sortKey="Ghil, M" uniqKey="Ghil M">M Ghil</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ng, Sk" uniqKey="Ng S">SK Ng</name>
</author>
<author>
<name sortKey="Mclachlan, Gj" uniqKey="Mclachlan G">GJ McLachlan</name>
</author>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K Wang</name>
</author>
<author>
<name sortKey="Ben Tovim, L" uniqKey="Ben Tovim L">L Ben-Tovim</name>
</author>
<author>
<name sortKey="Ng, Sw" uniqKey="Ng S">SW Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hahne, F" uniqKey="Hahne F">F Hahne</name>
</author>
<author>
<name sortKey="Khodabakhshi, Ah" uniqKey="Khodabakhshi A">AH Khodabakhshi</name>
</author>
<author>
<name sortKey="Bashashati, A" uniqKey="Bashashati A">A Bashashati</name>
</author>
<author>
<name sortKey="Wong, Cj" uniqKey="Wong C">CJ Wong</name>
</author>
<author>
<name sortKey="Gascoyne, Rd" uniqKey="Gascoyne R">RD Gascoyne</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sahu, Sk" uniqKey="Sahu S">SK Sahu</name>
</author>
<author>
<name sortKey="Dey, Dk" uniqKey="Dey D">DK Dey</name>
</author>
<author>
<name sortKey="Branco, Md" uniqKey="Branco M">MD Branco</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24983991</article-id>
<article-id pub-id-type="pmc">4077578</article-id>
<article-id pub-id-type="publisher-id">PONE-D-13-55136</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0100334</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Physical Sciences</subject>
<subj-group>
<subject>Mathematics</subject>
<subj-group>
<subject>Applied Mathematics</subject>
<subj-group>
<subject>Algorithms</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Statistics (Mathematics)</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Biology and Life Sciences</subject>
<subj-group>
<subject>Biotechnology</subject>
<subj-group>
<subject>Bioengineering</subject>
<subj-group>
<subject>Biomedical Engineering</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group>
<subject>Cell Biology</subject>
<subj-group>
<subject>Cytometry</subject>
<subject>Molecular Cell Biology</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Computational Biology</subject>
<subj-group>
<subject>Population Modeling</subject>
</subj-group>
</subj-group>
<subj-group>
<subject>Population Biology</subject>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Computer and Information Sciences</subject>
<subj-group>
<subject>Computerized Simulations</subject>
</subj-group>
</subj-group>
<subj-group subj-group-type="Discipline-v2">
<subject>Research and Analysis Methods</subject>
<subj-group>
<subject>Spectrum Analysis Techniques</subject>
<subj-group>
<subject>Spectrophotometry</subject>
<subj-group>
<subject>Cytophotometry</subject>
<subj-group>
<subject>Flow Cytometry</subject>
</subj-group>
</subj-group>
</subj-group>
</subj-group>
<subj-group>
<subject>Simulation and Modeling</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data</article-title>
<alt-title alt-title-type="running-head">Joint Modeling and Registration</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Pyne</surname>
<given-names>Saumyadipta</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lee</surname>
<given-names>Sharon X.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Kui</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Irish</surname>
<given-names>Jonathan</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tamayo</surname>
<given-names>Pablo</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nazaire</surname>
<given-names>Marc-Danie</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Duong</surname>
<given-names>Tarn</given-names>
</name>
<xref ref-type="aff" rid="aff7">
<sup>7</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ng</surname>
<given-names>Shu-Kay</given-names>
</name>
<xref ref-type="aff" rid="aff8">
<sup>8</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hafler</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="aff9">
<sup>9</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Levy</surname>
<given-names>Ronald</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nolan</surname>
<given-names>Garry P.</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mesirov</surname>
<given-names>Jill</given-names>
</name>
<xref ref-type="aff" rid="aff6">
<sup>6</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>McLachlan</surname>
<given-names>Geoffrey J.</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="corresp" rid="cor1">
<sup>*</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<label>1</label>
<addr-line>CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, Andhra Pradesh, India</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Department of Mathematics, University of Queensland, St. Lucia, Queensland, Australia</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Division of Oncology, Stanford Medical School, Stanford, California, United States of America</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>Baxter Laboratory for Stem Cell Biology, Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, California, United States of America</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>Department of Cancer Biology, Vanderbilt University, Nashville, Tennessee, United States of America</addr-line>
</aff>
<aff id="aff6">
<label>6</label>
<addr-line>Broad Institute of MIT and Harvard University, Cambridge, Massachusetts, United States of America</addr-line>
</aff>
<aff id="aff7">
<label>7</label>
<addr-line>Molecular Mechanisms of Intracellular Transport, Unit Mixte de Recherche 144 Centre National de la Recherche Scientifique/Institut Curie, Paris, France</addr-line>
</aff>
<aff id="aff8">
<label>8</label>
<addr-line>School of Medicine, Griffith University, Meadowbrook, Queensland, Australia</addr-line>
</aff>
<aff id="aff9">
<label>9</label>
<addr-line>Department of Neurology, Yale School of Medicine, New Haven, Connecticut, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Bontempi</surname>
<given-names>Gianluca</given-names>
</name>
<role>Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>Université Libre de Bruxelles, Belgium</addr-line>
</aff>
<author-notes>
<corresp id="cor1">* E-mail:
<email>g.mclachlan@uq.edu.au</email>
</corresp>
<fn fn-type="conflict">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con">
<p>Conceived and designed the experiments: SP. Performed the experiments: JI. Analyzed the data: SP KW SXL M-DN TD S-KN DH GJM. Contributed reagents/materials/analysis tools: SP KW SXL PT S-KN DH RL GPN GJM. Wrote the paper: SP KW SXL DH RL GPN JM GJM.</p>
</fn>
</author-notes>
<pub-date pub-type="collection">
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>1</day>
<month>7</month>
<year>2014</year>
</pub-date>
<volume>9</volume>
<issue>7</issue>
<elocation-id>e100334</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>12</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>5</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-year>2014</copyright-year>
<copyright-holder>Pyne et al</copyright-holder>
<license>
<license-p>This is an open-access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract>
<p>In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template – used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from
<ext-link ext-link-type="uri" xlink:href="http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/">http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/</ext-link>
.</p>
</abstract>
<funding-group>
<funding-statement>GJM, S-KN, KW, and SXL acknowledge support of the Australian Research Council (ARC). Grant number DP120104327. ARC URL
<ext-link ext-link-type="uri" xlink:href="http://www.arc.gov.au/">http://www.arc.gov.au/</ext-link>
. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
</funding-group>
<counts>
<page-count count="11"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Flow cytometry is widely used for single cell interrogation of surface and intracellular protein expression by measuring fluorescence intensity of fluorophore-conjugated reagents. Recent technical advances have taken the field towards single cell proteomics
<xref rid="pone.0100334-Irish1" ref-type="bibr">[1]</xref>
and enabled highly multiparametric analysis
<xref rid="pone.0100334-Perfetto1" ref-type="bibr">[2]</xref>
and computational cytomics
<xref rid="pone.0100334-Lugli1" ref-type="bibr">[3]</xref>
. Consequently, biomedical applications are presenting new challenges to cytometric analysis. Increasingly such studies involve cohorts with large numbers of patients, replicates, and may also use multiplexing of marker staining panels for probing large signaling networks
<xref rid="pone.0100334-Krutzik1" ref-type="bibr">[4]</xref>
. Further, while typical flow experiments assayed for 4–8 features, the recent development of mass cytometry promises the ability to compare 50–100 features per cell
<xref rid="pone.0100334-Tanner1" ref-type="bibr">[5]</xref>
, . Owing to multiple reasons such as variation among individuals in a cohort, simultaneous use of different stimulation conditions and panels in a given experiment, biological and technical replicates, the highly multivariate nature of the new platforms' measurements, etc., the resulting datasets are rich and complex. Currently there exists no single standard procedure for performing reproducible cohort-wide analysis while tackling systems-level heterogeneity and noise in multiple samples.</p>
<p>Recently, we developed a platform (FLAME) for automated analysis of high-dimensional flow data
<xref rid="pone.0100334-Pyne1" ref-type="bibr">[7]</xref>
. Each cell population (henceforth simply called population) in a sample is modeled by FLAME as a cluster of points with similar fluorescence intensities in the multi-dimensional space of markers. FLAME's heavy-tailed and asymmetric distributions are especially appropriate for flow data, since rare and interesting subpopulations tend to be represented by the tail-subpopulations that are connected to larger populations
<xref rid="pone.0100334-Kotecha1" ref-type="bibr">[8]</xref>
. Notably, the field of computational cytomics has witnessed rapid growth in the past few years, as reviewed by Lugli et al.
<xref rid="pone.0100334-Lugli1" ref-type="bibr">[3]</xref>
</p>
<p>While modeling populations in flow data remains a difficult problem, a second and even more important challenge appears when there are many samples and conditions to compare – how to efficiently match or “register” the corresponding populations across a
<italic>batch</italic>
of samples. The difficulty of this problem arises from (a) the high-dimensionality of data, which prevents visual matching of populations, (b) large cohort or batch sizes, and (c) high inter-sample variation, all of which make the manual approach challenging. Yet it is essential to determine the batch-wise correspondence among populations with automation so that we can register them i.e., identify them uniquely, in high-dimension, which enables direct quantitative comparison of samples across conditions, phenotypes or time points. Addressed with algorithmic precision and rigor, automatic registration can facilitate clinical applications with diagnostic or prognostic implications. For instance, it can be useful for monitoring of specific cellular events such as lymphocytic infiltration in tumors, immuno-profiling of patients following treatment, etc.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
,
<xref rid="pone.0100334-Oved1" ref-type="bibr">[10]</xref>
. By creating parametric models of the matched spatio-temporal profiles, we can use the estimated model parameters to accurately classify new samples as well as identify aberrant patterns (outliers).</p>
<p>A composite solution to these two complex problems – modeling each population within a sample, and registering them across samples – marks a significant improvement over FLAME and the other predominantly clustering approaches
<xref rid="pone.0100334-Lugli1" ref-type="bibr">[3]</xref>
such as flowClust
<xref rid="pone.0100334-Lo1" ref-type="bibr">[11]</xref>
and SWIFT
<xref rid="pone.0100334-Naim1" ref-type="bibr">[12]</xref>
. Currently, FLAME first models the populations separately within individual samples, and then tries to match these populations post hoc by running an external module (using Partitioning Around Medoids or PAM) on the model parameters. In our experience in running FLAME, this alignment procedure has several limitations. For instance, meta-clustering can be overly sensitive to the accuracy of the comparison results of PAM, which may be low if there is high inter-sample variation in a batch. Further, while PAM meta-clustering matches population-features only pairwise, the overall relationships among those features can be captured across all samples, i.e., in a manner more robustly against inter-sample variation, using batch-level modeling as in JCM. Finally, as the whole batch was not modeled simultaneously, no overall consensus template of the batch was formed by FLAME. In that sense, FLAME and other algorithms that analyze single samples cannot determine batch characteristics systematically.</p>
<sec id="s1a">
<title>The JCM approach</title>
<p>We present a new multi-level framework called Joint Clustering and Matching (JCM) that operates on an entire batch of samples across two levels: (1) at a sample-specific “lower” level, JCM models every cell population as a cluster (i.e. a
<italic>component</italic>
of a finite mixture model of multivariate
<italic>t</italic>
or skew
<italic>t</italic>
-distributions); and simultaneously, (2) at a batch-specific “higher” level, JCM constructs a parametric
<italic>template</italic>
, which models overall characteristics of a batch. JCM achieves this by fitting a Random-Effects Model (REM) that allows every sample in a given batch to be modeled as an instance of an “original” template possibly transformed with a flexible amount of variation. In
<xref ref-type="supplementary-material" rid="pone.0100334.s009">Appendices S1</xref>
and
<xref ref-type="supplementary-material" rid="pone.0100334.s010">S2</xref>
, we describe our Expectation-Maximization (EM) algorithm for efficient fitting of the two-level JCM model, as described in (1) and (2) above. Its multi-level design gives JCM the ability to establish a direct parametric correspondence between each cell population in the batch template and its counterpart within an individual sample. Unlike FLAME, this allows JCM to explicitly tackle inter-sample variation, a common concern for flow data, and thus support both biological and clinical applications. JCM's template based mixture-model approach was described originally in our unpublished working paper
<xref rid="pone.0100334-Pyne2" ref-type="bibr">[13]</xref>
.</p>
<p>In recent years, researchers have also started multiplexing many staining panels to overcome limits on the numbers of markers that can be accurately measured together using commercial cytometers
<xref rid="pone.0100334-Krutzik1" ref-type="bibr">[4]</xref>
. While the resulting data are more enriched, it can also produce a large number of distinct features from every panel of markers. Currently there exists no technique for systematic integration of such features across panels into meta-features for the common underlying sample. As part of JCM analysis, we introduce a new technique to combine both univariate and multivariate JCM features across multiplexed panels to construct enriched meta-features (or
<italic>feature-sets</italic>
), and use these to improve sample classification.</p>
<p>Using simulation as well as several real-world benchmark datasets, we found that key performance attributes such as classification accuracy and running time of JCM are quite favorable compared to other methods. To illustrate the different capabilities of JCM, we applied it to two sets of experiments involving multiple markers, time points (or stimulations), staining panels, and sample classes. In addition, the accuracy of JCM is compared with FLAME and HDPGMM on a set of manually analyzed benchmark DLBCL datasets from the flowCAP contest
<xref rid="pone.0100334-Aghaeepour1" ref-type="bibr">[14]</xref>
. Here, HDPGMM denotes the hierarchical Dirichlet process Gaussian mixture model-based procedure proposed recently
<xref rid="pone.0100334-Cron1" ref-type="bibr">[15]</xref>
. The procedure provides a strategy for the alignment of cells across multiple samples by assuming the cell populations to have identical location and shape across the samples, but their weights (or proportions) may vary from sample to sample. Similar to JCM, the HDPGMM is an alternative procedure that produces a template or consensus model to represent the overall distribution of the batch of samples. However, the assumption of identical mean and covariance in the component normal distributions for all samples may be too restrictive in some cases. We also compared JCM with two other popular methods for the automated analysis of flow cytometric data, namely flowClust and SWIFT. As a model-based algorithm, flowClust also uses mixture models for density estimation and clustering, but adopts a data transformation approach to handle asymmetric clusters as an alternative to merging Gaussian mixture components (HDPGMM) or adopting a skew component distribution (FLAME and JCM). One advantage of the former approach is a potentially faster run time due to a simpler model fitting procedure. SWIFT is closely related to HDPGMM in that they are both based on merged Gaussian mixture models, but the former is also designed for scalability to larger datasets by employing weighted down-sampling to speed up model fitting. However, as these two methods do not have any explicit facility for matching the output from a series of samples, we applied them to each sample considered separately and to the single sample consisting of all 16 samples pooled into one.</p>
<p>Concerning the setting of several parameters here in our analyses, we note that it is in fact the biologist who decides the number (and types) of markers necessary for characterizing the populations of interest before the data are generated. Given the generated data, the JCM algorithm allows automated estimation of all the parameters of the fitted JCM model in an unsupervised manner, that is, with no explicit need of manual setting of the model parameters. In the two sets of experiments performed to asses JCM, we applied JCM to obtain multi-parametric characterization of different T cell subpopulations upon T cell receptor (TCR) stimulation in a time course phosphorylation experiment. This illustrates how a complex multi-class and multi-sample experiment can be systematically analyzed in a fully automated and reproducible manner to generate precise and objective profiles for every class. Importantly, it is based on a comprehensive list of rigorously estimated model parameters for each population, which is output by JCM. As illustrated by our next application, such unsupervised, thorough approach can also reveal new or subtle expression phenotypes in specific subpopulations, which might otherwise go undetected in manual gating. In the second experiment, we applied JCM to understand differential patterns of altered B cell receptor (BCR) signaling in human follicular lymphoma (FL) tumor samples. By combining JCM features from multiplexed panels of 16 phospho-markers, we identified a novel spatio-temporal signature of BCR signaling in a specific subpopulation of the lymphoma B cells that improved the separation between two classes of patients previously reported by Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
to have markedly different survival. We also devised visual means for overlaying expression templates to capture the variation in data both within and across a batch. This highlights the capability of JCM to distinguish complex biological contexts via quantitative class-specific characteristics, which may be very useful in new studies involving large cytometric cohorts.</p>
</sec>
</sec>
<sec id="s2">
<title>Results</title>
<sec id="s2a">
<title>Spatio-temporal characterization of TCR activation</title>
<p>We analyzed phosphorylation patterns downstream of T cell receptor (TCR) activation in naïve and memory T cells across six classes of samples corresponding to six time points: 0, 1, 3, 5, 15, and 30 min originally measured by Maier et al.
<xref rid="pone.0100334-Maier1" ref-type="bibr">[16]</xref>
. In that study, human expertise played a key role in manually and visually identifying each population in every sample at every time-point, and then carefully comparing them based on selected features of chosen populations. In the process, many manual decisions were taken and highly supervised time-consuming operations were performed repeatedly such as the applied sequence of gates, the selection of useful parameters for comparing the subsets across classes, etc. Traditionally, therefore, the results of manual gating even on similar experiments can vary with such decisions, which in turn depend on the experience of the human expert.</p>
<p>JCM, in contrast, produced the full sequence of spatio-temporal expression phenotypes of phosphorylation in five distinct subsets of T cells, which are matched across all samples. These five populations were characterized in a fully unsupervised manner in 4-dimensional marker-space, as well as in terms of the 5
<italic>
<sup>th</sup>
</italic>
dimension of time. The model yielded a comprehensive list of matched high-dimensional parameters, not just a few pre-determined visual (i.e. 2-D) features. This list could be readily used for exploratory statistical analyses (e.g. feature selection, discriminant analysis) to accurately identify the changes in every population over time. Since the cohort was modeled as a batch by JCM, we can also compare the overall batch-templates computed for every time-point, both statistically and visually, to capture the longitudinal phenotypic trend starting from the activation of TCR up to its de-activation. Thus the JCM framework is objective, fast, quantitative and reproducible.</p>
<p>The sequence starts at 0 min, prior to stimulation with an anti-CD3 antibody (baseline measurement), reached peak levels of phosphorylation at 3–5 min, then subsided by 30 min. JCM's multi-level modeling of the time course data is illustrated in
<xref ref-type="fig" rid="pone-0100334-g001">Figure 1</xref>
(for the time point of 3 min), where each sample is modelled as an instance of the class template through an affine transformation, thus inherently aligning the cell populations across different samples. In particular, the transformation is governed by a REM (see
<xref ref-type="sec" rid="s4">
<italic>Methods</italic>
</xref>
). This allows JCM to flexibly accommodate subtle variations between the samples and facilitates interpretability of the results. The profile of each of the five populations (denoted #1–5 in
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2</xref>
) were distinguished apart, matched across samples, summarized with templates and compared across six time-points. The overall changes summarized as high-dimensional templates for each of the successive classes can be observed in
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2</xref>
. Looking at the changes in the proportions of the five clusters (denoted by
<italic>π</italic>
<sub>1</sub>
to
<italic>π</italic>
<sub>5</sub>
; see
<xref ref-type="sec" rid="s4">
<italic>Methods</italic>
</xref>
) over the six time points, we can see from
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2</xref>
that the estimate of
<italic>π</italic>
<sub>3</sub>
is relatively constant, while the estimates of
<italic>π</italic>
<sub>1</sub>
and
<italic>π</italic>
<sub>5</sub>
are on the increase and the estimate of
<italic>π</italic>
<sub>4</sub>
is on the decrease. The overall spatio-temporal differences both within and across classes may be observed with JCM's overlay plots (
<xref ref-type="supplementary-material" rid="pone.0100334.s003">Figure S3</xref>
). Specifically, the alterations in the naïve and memory T cell populations are outlined in
<xref ref-type="supplementary-material" rid="pone.0100334.s004">Figure S4</xref>
, where a rise in the intensities of marker ZAP70 can be observed soon after stimulation and then a gradual decline over time. For details on the experiments, see
<xref ref-type="supplementary-material" rid="pone.0100334.s011">Text S1</xref>
.</p>
<fig id="pone-0100334-g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0100334.g001</object-id>
<label>Figure 1</label>
<caption>
<title>JCM model and application.</title>
<p>The multi-level model is illustrated using the samples (bottom) and the template (top) for the samples of the 3 min class, along 3 out of 4 dimensions in the TCR activation data. Actual values of the JCM parameters were used to construct the 50
<sup>th</sup>
percentile multivariate
<italic>t</italic>
density contour (ellipsoid) depicting every population. The overall class template is computed by fitting a random effects model on all the samples, which in turn are fitted with sample-specific finite mixture models of multivariate
<italic>t</italic>
's. Under the JCM framwork, each sample can be described as an affine transformation of the template, where each population in a sample corresponds to its counterpart in the class template, as shown by the matched colors and labels (# 1–5).</p>
</caption>
<graphic xlink:href="pone.0100334.g001"></graphic>
</fig>
<p>Two markers in the staining panel, CD4 and CD45RA, were used for characterizing the different populations, while two other markers, SLP76 (p-Y128) and ZAP70 (p-Y292), were used to measure the intensity of phosphorylation in these subsets. As described in Maier et al.
<xref rid="pone.0100334-Maier1" ref-type="bibr">[16]</xref>
, we used the signatures
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e001.jpg"></inline-graphic>
</inline-formula>
with
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e002.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e003.jpg"></inline-graphic>
</inline-formula>
with
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e004.jpg"></inline-graphic>
</inline-formula>
to represent the primarily naïve and memory T cell subsets, respectively. Upon fitting mixtures of
<italic>t</italic>
-distributions to each of the 6 classes, an overall pattern for five matched populations emerged (indexed
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e005.jpg"></inline-graphic>
</inline-formula>
through
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e006.jpg"></inline-graphic>
</inline-formula>
in
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2A–E</xref>
). As expected, a rapid rise in the intensities of phosphorylation markers SLP76 and ZAP70, especially the latter, was observed soon after stimulation for all populations with the possible exception of
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e007.jpg"></inline-graphic>
</inline-formula>
. While both naïve (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e008.jpg"></inline-graphic>
</inline-formula>
) and memory T cell subsets (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e009.jpg"></inline-graphic>
</inline-formula>
) showed similar peak levels of phosphorylation initially (
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2C–D</xref>
), the former exhibited a faster decline with time (
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2D–E</xref>
), consistent with prior results
<xref rid="pone.0100334-Irish1" ref-type="bibr">[1]</xref>
. In fact, both
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e010.jpg"></inline-graphic>
</inline-formula>
populations (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e011.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e012.jpg"></inline-graphic>
</inline-formula>
) exhibited similar expression throughout. Upon p-CD3 (p-Y142) normalization, higher phosphorylation in memory T cells compared to naïve T cells between 5 and 15 min – as observed manually
<xref rid="pone.0100334-Maier1" ref-type="bibr">[16]</xref>
– was recapitulated with help of JCM.</p>
</sec>
<sec id="s2b">
<title>BCR signaling feature-sets distinguish FL subclasses</title>
<p>In a recent study based on human expert analysis, Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
stratified follicular lymphoma (FL) patients into two classes with markedly different overall survival depending on the presence or absence of a Lymphoma Negative Prognostic (LNP) subset of B cells in tumor. The LNP cells showed altered BCR signaling, and were identified by the expressions of a multiplexed panel of selected phospho-markers. The multiplexing of markers, used for assaying each sample with a large set of markers (too large to be contained in a single panel) that is distributed across multiple panels, is described in detail in Irish et al. (
<xref ref-type="fig" rid="pone-0100334-g001">Fig. 1A</xref>
and Supplementary Information in
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
). The signaling based stratification of patients into
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e013.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e014.jpg"></inline-graphic>
</inline-formula>
classes is therefore of clinical significance. We used JCM for (a) automation — to systematically combine features from multi-panel data from FL patients, and (b) discrimination — to identify features that could separate the pre-defined FL patient classes as best as possible.</p>
<p>In the BCR signalling dataset, through automated analysis of multiplexed data, JCM had identified a nuanced signature for signaling alterations in high-dimensional marker-space that further improved the stratification between the two FL patient classes, as described in Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
. The difference between the two classes was determined by comparing the class meansusing the
<italic>t</italic>
test. We analyzed 28 pre-processed patient samples for two time points, 0 min and 4 min (i.e. pre- and post-BCR stimulation, respectively). Further details of the samples and preprocessing are provided in
<xref ref-type="supplementary-material" rid="pone.0100334.s011">Text S1</xref>
.2 and
<xref ref-type="supplementary-material" rid="pone.0100334.s012">S2</xref>
. At every time-point, and for all patients, the data for each sample was available for eight multiplexed panels, each with results for four markers, including two B cell markers CD20 and BCL2 that were common to every panel. Signaling responses were measured in terms of phosphorylation of 16 phospho-proteins from the BCR signaling network. By multiplexing panels, the signaling for all these network components could be measured in every sample. Each sample's phenotype (or class label),
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e015.jpg"></inline-graphic>
</inline-formula>
(18 samples) or
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e016.jpg"></inline-graphic>
</inline-formula>
(10 Samples), was assigned by human expert analysis (Supplemental Methods of Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
).</p>
<p>For both unstimulated (0 min) and stimulated (4 min) conditions, each class of patient samples was modeled with an overall template produced by the JCM procedure using two-component multivariate skew
<italic>t</italic>
-mixture models. The templates revealed the class-specific features of two lymphoma B cell populations. For convenience, let us call these two populations “mound” and “base” corresponding to higher and lower levels of stimulation respectively. These are components of the JCM mixture model that primarily represent populations in which BCR signaling is intact (i.e. non-LNP cells) as opposed to altered (LNP cells). The change between the corresponding features pre- and post-stimulation provided a kind of baseline correction to the resting level of signaling for each sample. This approach corresponds to asking whether the response of lymphoma B cells to BCR engagement was heterogeneous, but using the entire set of continuous features for exploring tumor heterogeneity rather than only median phosphorylation, the primary discretized feature in the Irish et al. study
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
.”</p>
<p>We introduced a new strategy for a combined analysis of multiplexed markers probing different parts of the BCR signaling network. The JCM features of 16 phospho-markers distributed across all 8 panels were pooled to form an enhanced meta-feature, or a feature-set, that is analogous to the concept of a gene-set (GSEA
<xref rid="pone.0100334-Subramanian1" ref-type="bibr">[17]</xref>
). Thus we applied Gene Set Enrichment Analysis (GSEA
<xref rid="pone.0100334-Subramanian1" ref-type="bibr">[17]</xref>
) to every feature-set to test their abilities to distinguish between
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e017.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e018.jpg"></inline-graphic>
</inline-formula>
samples. Notably, Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
had previously discovered that the size of the LNP population could be used to distinguish FL patients into two classes with different outcomes. However, these results were based on manual demarcation of the LNP subset, and therefore based on low-dimensional gating of data. Interestingly, in our feature-set enrichment analysis, the single most significantly enriched feature-set (at
<italic>P</italic>
-value level 0.05 by Kolmogorov-Smirnov test of GSEA
<xref rid="pone.0100334-Subramanian1" ref-type="bibr">[17]</xref>
), i.e. the most distinctive meta-feature across these two patient classes, was skewness (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e019.jpg"></inline-graphic>
</inline-formula>
) of the mound at 5 min. (
<italic>P</italic>
-value 0.0144,
<italic>q</italic>
-value 0.058;
<xref ref-type="supplementary-material" rid="pone.0100334.s005">Figure S5</xref>
). Across
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e020.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e021.jpg"></inline-graphic>
</inline-formula>
classes, this spatial signature (i.e. stimulated mound skew) is distinctive both visually (
<xref ref-type="fig" rid="pone-0100334-g002">Figure 2A and 2B</xref>
) and statistically (the average of posterior log-odds ratios in
<xref ref-type="fig" rid="pone-0100334-g002">Figure 2C</xref>
, computed using Bayesian methods described in
<xref rid="pone.0100334-Tamayo1" ref-type="bibr">[18]</xref>
, particularly for markers such as p-PLCg2, p-BLNK, and p-SFK (
<xref ref-type="supplementary-material" rid="pone.0100334.s006">Figure S6</xref>
). In particular, we draw attention to
<xref ref-type="fig" rid="pone-0100334-g002">Figure 2A</xref>
, outlining the asymmetric expression of the mound in LNP
<sup>lo</sup>
samples, which contrasts with their more spherical counterparts (i.e. lower skew) in the
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e022.jpg"></inline-graphic>
</inline-formula>
samples. The distinction is in fact statistically significant even after controlling for the corresponding base (LNP)
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e023.jpg"></inline-graphic>
</inline-formula>
population sizes (e.g. for p-SFK the GLM based p-value after controlling is 0.0079).</p>
<fig id="pone-0100334-g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0100334.g002</object-id>
<label>Figure 2</label>
<caption>
<title>Distinct spatial characteristics of phospho-marker expression in samples from two classes of patients with different outcomes.</title>
<p>(A) Heatplots provide insight into the distribution of phospho-proteomic expression of p-PLCg2 and p-STAT5 (panel 4) for
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e024.jpg"></inline-graphic>
</inline-formula>
(top 2 rows) and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e025.jpg"></inline-graphic>
</inline-formula>
(bottom row) samples. The mound (high CD20 and BCL-2) populations are shown here. In contrast to the more symmetrically distributed, well-rounded
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e026.jpg"></inline-graphic>
</inline-formula>
mounds, the skewness in the
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e027.jpg"></inline-graphic>
</inline-formula>
mounds is clearly visible. (B) The stimulated mound (light brown histogram) of a
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e028.jpg"></inline-graphic>
</inline-formula>
sample is shown in contrast with the corresponding population prior to stimulation (greyish blue histogram). (C) The ability of the mound skew parameters (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e029.jpg"></inline-graphic>
</inline-formula>
) for 16 phospho-markers to distinguish samples across the
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e030.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e031.jpg"></inline-graphic>
</inline-formula>
classes (green and pink labels respectively) is shown with a heatmap based on the corresponding posterior log-odds scores. The higher the score, the darker the corresponding entry in red/blue. Each marker name and its average posterior log-odds score over all samples are marked on the sides of the heatmap.</p>
</caption>
<graphic xlink:href="pone.0100334.g002"></graphic>
</fig>
<p>The skewness, given by the parameter vector
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e032.jpg"></inline-graphic>
</inline-formula>
, of the stimulated mound in
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e033.jpg"></inline-graphic>
</inline-formula>
samples is expressed in the form of a heavy left tail (
<xref ref-type="fig" rid="pone-0100334-g002">Figure 2B</xref>
). This suggests the likely presence of a subpopulation of primarily non-LNP cells with partially altered signaling at a given time-point. Whether it is of real prognostic value needs to be tested in future studies. Our main point is that JCM's automatic feature detection can reveal new spatio-temporal states and their characteristics. State transitions can be numerically measured and monitored even if they are subtle across classes. For instance, if the alteration in BCR signaling happens in a way that is gradual and not sharp, then it can be difficult to demarcate or determine the size of the LNP component accurately, and yet the skew feature can be used for nuanced understanding of the change in the same population thus providing mechanistic insights into the biology of the system in action.</p>
</sec>
<sec id="s2c">
<title>Cell population identification and alignment across DLBCL batch samples</title>
<p>We compared JCM with two other flow analysis methods that compute cluster correspondence, namely, FLAME and the HDPGMM procedure. As with JCM, FLAME is based on mixtures of skew
<italic>t</italic>
-distributions, while HDPGMM uses mixtures of normal distributions. Note that although the HDPGMM model adopts the multivariate normal distribution as component distributions, it has some flexibility in handling clusters that are not distributed normally in that it can use more than one normal distributions to model the distribution of observations in a cluster. Based on a real-world benchmark dataset from the flowCAP1 contest
<xref rid="pone.0100334-Aghaeepour1" ref-type="bibr">[14]</xref>
, we compare the performance of JCM with several other competing procedures in cell population identification and alignment across a batch of samples. In the original dataset, 30 samples were collected from patients diagnosed with diffuse large B-cell lymphoma (DLBCL). For this illustration, we use the subset of 16 samples which were manually analyzed and were determined to have the same number of clusters. With JCM, we first created a template across the batch of 16 samples. Then the cluster membership labels given by JCM for each sample are compared with the results given by manual gating. The results are given in
<xref ref-type="table" rid="pone-0100334-t001">Table 1</xref>
, along with the corresponding results for FLAME and HDPGMM procedures.</p>
<table-wrap id="pone-0100334-t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0100334.t001</object-id>
<label>Table 1</label>
<caption>
<title>Classification error rates of three methods on DLBCL data.</title>
</caption>
<alternatives>
<graphic id="pone-0100334-t001-1" xlink:href="pone.0100334.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Sample</td>
<td align="left" rowspan="1" colspan="1">JCM</td>
<td align="left" rowspan="1" colspan="1">HDPGMM</td>
<td align="left" rowspan="1" colspan="1">FLAME</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Sa001</td>
<td align="left" rowspan="1" colspan="1">0.3045</td>
<td align="left" rowspan="1" colspan="1">0.2046</td>
<td align="left" rowspan="1" colspan="1">0.5143</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa002</td>
<td align="left" rowspan="1" colspan="1">0.0339</td>
<td align="left" rowspan="1" colspan="1">0.1044</td>
<td align="left" rowspan="1" colspan="1">0.4300</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa003</td>
<td align="left" rowspan="1" colspan="1">0.0694</td>
<td align="left" rowspan="1" colspan="1">0.0946</td>
<td align="left" rowspan="1" colspan="1">0.5931</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa004</td>
<td align="left" rowspan="1" colspan="1">0.0659</td>
<td align="left" rowspan="1" colspan="1">0.0946</td>
<td align="left" rowspan="1" colspan="1">0.5459</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa005</td>
<td align="left" rowspan="1" colspan="1">0.0089</td>
<td align="left" rowspan="1" colspan="1">0.1230</td>
<td align="left" rowspan="1" colspan="1">0.4440</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa006</td>
<td align="left" rowspan="1" colspan="1">0.2947</td>
<td align="left" rowspan="1" colspan="1">0.0611</td>
<td align="left" rowspan="1" colspan="1">0.5987</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa007</td>
<td align="left" rowspan="1" colspan="1">0.0208</td>
<td align="left" rowspan="1" colspan="1">0.0510</td>
<td align="left" rowspan="1" colspan="1">0.2584</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa008</td>
<td align="left" rowspan="1" colspan="1">0.0683</td>
<td align="left" rowspan="1" colspan="1">0.0719</td>
<td align="left" rowspan="1" colspan="1">0.3719</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa009</td>
<td align="left" rowspan="1" colspan="1">0.0249</td>
<td align="left" rowspan="1" colspan="1">0.1343</td>
<td align="left" rowspan="1" colspan="1">0.2417</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa010</td>
<td align="left" rowspan="1" colspan="1">0.0121</td>
<td align="left" rowspan="1" colspan="1">0.3828</td>
<td align="left" rowspan="1" colspan="1">0.5413</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa011</td>
<td align="left" rowspan="1" colspan="1">0.0236</td>
<td align="left" rowspan="1" colspan="1">0.4082</td>
<td align="left" rowspan="1" colspan="1">0.4792</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa012</td>
<td align="left" rowspan="1" colspan="1">0.0096</td>
<td align="left" rowspan="1" colspan="1">0.1148</td>
<td align="left" rowspan="1" colspan="1">0.2456</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa013</td>
<td align="left" rowspan="1" colspan="1">0.0326</td>
<td align="left" rowspan="1" colspan="1">0.3247</td>
<td align="left" rowspan="1" colspan="1">0.5947</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa014</td>
<td align="left" rowspan="1" colspan="1">0.0062</td>
<td align="left" rowspan="1" colspan="1">0.2959</td>
<td align="left" rowspan="1" colspan="1">0.6000</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa015</td>
<td align="left" rowspan="1" colspan="1">0.1283</td>
<td align="left" rowspan="1" colspan="1">0.4110</td>
<td align="left" rowspan="1" colspan="1">0.3927</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa016</td>
<td align="left" rowspan="1" colspan="1">0.0361</td>
<td align="left" rowspan="1" colspan="1">0.4437</td>
<td align="left" rowspan="1" colspan="1">0.5372</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AMCR</td>
<td align="left" rowspan="1" colspan="1">0.0711</td>
<td align="left" rowspan="1" colspan="1">0.2038</td>
<td align="left" rowspan="1" colspan="1">0.4618</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="nt101">
<label></label>
<p>Samples from 16 patients diagnosed with Diffuse Large B-cell Lymphoma (DLBCL) were clustered using JCM, HDPGMM, and FLAME. For both JCM and HDPGMM, a class template is computed for the entire batch of samples, while FLAME performs post hoc alignment of the results given by FLAME-I, where FLAME-I denotes the procedure with FLAME applied to each individual sample considered separately. The final row shows the average misclassification rate (AMCR) for each method. Clearly, JCM shows overall superior performance.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In 14 of the 16 samples, JCM achieved the lowest misclassification rate (MCR) among the methods. This MCR is calculated for each permutation of the cluster labels of the clustering result under consideration against the class labels given by manual expert gating and the rate reported is the minimum value over all such permutations. For reference, we have included in
<xref ref-type="supplementary-material" rid="pone.0100334.s008">Table S1</xref>
the corresponding results using the
<italic>F</italic>
-measure as reported in
<xref rid="pone.0100334-Aghaeepour1" ref-type="bibr">[14]</xref>
, which is given by the harmonic mean of precision and recall. Our discussions here will focus on the MCR, which is the standard rate used in statistics to assess the performance of classifiers and also clustering procedures in studies where the true labels are known. However, we note that the relative ranking of the methods remains similar using
<xref ref-type="supplementary-material" rid="pone.0100334.s008">Table S1</xref>
.</p>
<p>JCM's average MCR of 0.0711 is well below the average rates of 0.2038 and 0.4618 for HDPGMM and FLAME, respectively. It can observed from
<xref ref-type="table" rid="pone-0100334-t001">Table 1</xref>
that JCM had a lower MCR than FLAME for all 16 samples, and also in 14 of the 16 samples when compared to HDPGMM. For the two samples, Sa001 and Sa006, on which it does not have the lowest MCR, its performance is well below what it is for the other 14 samples. Given the presence of these two samples with atypically high MCRs, we computed the median MCR of JCM for these 16 samples. It was only 0.0333, being just under half the average MCR. As mentioned in the introduction, FLAME adopts a single-sample based approach to the analysis of multiple samples, and so it does have its limitations in registering the individual results across the samples. This is clearly evident in
<xref ref-type="table" rid="pone-0100334-t001">Table 1</xref>
, where the MCR for FLAME is quite high relative to JCM and HDPGMM which analyse the samples simultaneously.</p>
<p>We have also listed in
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
the MCR for each of the 16 samples clustered according to FLAME-I and FLAME-P, where FLAME-I denotes the procedure with FLAME applied to each individual sample considered separately and FLAME-P denotes FLAME based on the single sample formed by pooling the 16 samples together. If there were little inter-sample variation, then one would expect FLAME-P to be similar or even superior in performance to JCM. But it can be seen from
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
that JCM has a lower MCR than FLAME-P except for only three samples that include the aforementioned two samples (Sa001 and Sa006) on which JCM performs poorly. The MCR for JCM is also lower than that for FLAME-I except for only three samples (apart from Sa001 and Sa006). For these three samples, the differences between the MCR for JCM and FLAME-I is zero up to the fourth decimal place.</p>
<table-wrap id="pone-0100334-t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0100334.t002</object-id>
<label>Table 2</label>
<caption>
<title>Classification error rates of various methods on DLBCL data.</title>
</caption>
<alternatives>
<graphic id="pone-0100334-t002-2" xlink:href="pone.0100334.t002"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
<col align="center" span="1"></col>
</colgroup>
<thead>
<tr>
<td align="left" rowspan="1" colspan="1">Sample</td>
<td align="left" rowspan="1" colspan="1">JCM</td>
<td align="left" rowspan="1" colspan="1">FLAME-I</td>
<td align="left" rowspan="1" colspan="1">flowClust-I</td>
<td align="left" rowspan="1" colspan="1">SWIFT-I</td>
<td align="left" rowspan="1" colspan="1">FLAME-P</td>
<td align="left" rowspan="1" colspan="1">flowClust-P</td>
<td align="left" rowspan="1" colspan="1">SWIFT-P</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Sa001</td>
<td align="left" rowspan="1" colspan="1">0.3045</td>
<td align="left" rowspan="1" colspan="1">0.3039</td>
<td align="left" rowspan="1" colspan="1">0.3070</td>
<td align="left" rowspan="1" colspan="1">0.5368</td>
<td align="left" rowspan="1" colspan="1">0.1666</td>
<td align="left" rowspan="1" colspan="1">0.2187</td>
<td align="left" rowspan="1" colspan="1">0.3039</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa002</td>
<td align="left" rowspan="1" colspan="1">0.0339</td>
<td align="left" rowspan="1" colspan="1">0.3394</td>
<td align="left" rowspan="1" colspan="1">0.0388</td>
<td align="left" rowspan="1" colspan="1">0.1526</td>
<td align="left" rowspan="1" colspan="1">0.2146</td>
<td align="left" rowspan="1" colspan="1">0.4096</td>
<td align="left" rowspan="1" colspan="1">0.3060</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa003</td>
<td align="left" rowspan="1" colspan="1">0.0694</td>
<td align="left" rowspan="1" colspan="1">0.0753</td>
<td align="left" rowspan="1" colspan="1">0.0588</td>
<td align="left" rowspan="1" colspan="1">0.4500</td>
<td align="left" rowspan="1" colspan="1">0.1790</td>
<td align="left" rowspan="1" colspan="1">0.3194</td>
<td align="left" rowspan="1" colspan="1">0.2204</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa004</td>
<td align="left" rowspan="1" colspan="1">0.0659</td>
<td align="left" rowspan="1" colspan="1">0.0687</td>
<td align="left" rowspan="1" colspan="1">0.0682</td>
<td align="left" rowspan="1" colspan="1">0.5506</td>
<td align="left" rowspan="1" colspan="1">0.1227</td>
<td align="left" rowspan="1" colspan="1">0.1661</td>
<td align="left" rowspan="1" colspan="1">0.3038</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa005</td>
<td align="left" rowspan="1" colspan="1">0.0089</td>
<td align="left" rowspan="1" colspan="1">0.1631</td>
<td align="left" rowspan="1" colspan="1">0.1868</td>
<td align="left" rowspan="1" colspan="1">0.4521</td>
<td align="left" rowspan="1" colspan="1">0.1415</td>
<td align="left" rowspan="1" colspan="1">0.0752</td>
<td align="left" rowspan="1" colspan="1">0.1220</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa006</td>
<td align="left" rowspan="1" colspan="1">0.2947</td>
<td align="left" rowspan="1" colspan="1">0.2670</td>
<td align="left" rowspan="1" colspan="1">0.1150</td>
<td align="left" rowspan="1" colspan="1">0.3612</td>
<td align="left" rowspan="1" colspan="1">0.0809</td>
<td align="left" rowspan="1" colspan="1">0.3773</td>
<td align="left" rowspan="1" colspan="1">0.1869</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa007</td>
<td align="left" rowspan="1" colspan="1">0.0208</td>
<td align="left" rowspan="1" colspan="1">0.0211</td>
<td align="left" rowspan="1" colspan="1">0.0217</td>
<td align="left" rowspan="1" colspan="1">0.2580</td>
<td align="left" rowspan="1" colspan="1">0.0943</td>
<td align="left" rowspan="1" colspan="1">0.0569</td>
<td align="left" rowspan="1" colspan="1">0.0438</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa008</td>
<td align="left" rowspan="1" colspan="1">0.0683</td>
<td align="left" rowspan="1" colspan="1">0.0678</td>
<td align="left" rowspan="1" colspan="1">0.0997</td>
<td align="left" rowspan="1" colspan="1">0.1911</td>
<td align="left" rowspan="1" colspan="1">0.0852</td>
<td align="left" rowspan="1" colspan="1">0.1045</td>
<td align="left" rowspan="1" colspan="1">0.3560</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa009</td>
<td align="left" rowspan="1" colspan="1">0.0249</td>
<td align="left" rowspan="1" colspan="1">0.3191</td>
<td align="left" rowspan="1" colspan="1">0.0891</td>
<td align="left" rowspan="1" colspan="1">0.2508</td>
<td align="left" rowspan="1" colspan="1">0.0487</td>
<td align="left" rowspan="1" colspan="1">0.0302</td>
<td align="left" rowspan="1" colspan="1">0.0186</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa010</td>
<td align="left" rowspan="1" colspan="1">0.0121</td>
<td align="left" rowspan="1" colspan="1">0.0575</td>
<td align="left" rowspan="1" colspan="1">0.0111</td>
<td align="left" rowspan="1" colspan="1">0.5353</td>
<td align="left" rowspan="1" colspan="1">0.0628</td>
<td align="left" rowspan="1" colspan="1">0.0471</td>
<td align="left" rowspan="1" colspan="1">0.0757</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa011</td>
<td align="left" rowspan="1" colspan="1">0.0236</td>
<td align="left" rowspan="1" colspan="1">0.0248</td>
<td align="left" rowspan="1" colspan="1">0.0248</td>
<td align="left" rowspan="1" colspan="1">0.1627</td>
<td align="left" rowspan="1" colspan="1">0.0240</td>
<td align="left" rowspan="1" colspan="1">0.1660</td>
<td align="left" rowspan="1" colspan="1">0.1004</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa012</td>
<td align="left" rowspan="1" colspan="1">0.0096</td>
<td align="left" rowspan="1" colspan="1">0.3919</td>
<td align="left" rowspan="1" colspan="1">0.4613</td>
<td align="left" rowspan="1" colspan="1">0.2170</td>
<td align="left" rowspan="1" colspan="1">0.0421</td>
<td align="left" rowspan="1" colspan="1">0.0299</td>
<td align="left" rowspan="1" colspan="1">0.0188</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa013</td>
<td align="left" rowspan="1" colspan="1">0.0326</td>
<td align="left" rowspan="1" colspan="1">0.0324</td>
<td align="left" rowspan="1" colspan="1">0.0355</td>
<td align="left" rowspan="1" colspan="1">0.5936</td>
<td align="left" rowspan="1" colspan="1">0.0796</td>
<td align="left" rowspan="1" colspan="1">0.0500</td>
<td align="left" rowspan="1" colspan="1">0.0581</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa014</td>
<td align="left" rowspan="1" colspan="1">0.0062</td>
<td align="left" rowspan="1" colspan="1">0.0065</td>
<td align="left" rowspan="1" colspan="1">0.0083</td>
<td align="left" rowspan="1" colspan="1">0.5612</td>
<td align="left" rowspan="1" colspan="1">0.0857</td>
<td align="left" rowspan="1" colspan="1">0.0159</td>
<td align="left" rowspan="1" colspan="1">0.0373</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa015</td>
<td align="left" rowspan="1" colspan="1">0.1283</td>
<td align="left" rowspan="1" colspan="1">0.1274</td>
<td align="left" rowspan="1" colspan="1">0.1317</td>
<td align="left" rowspan="1" colspan="1">0.5896</td>
<td align="left" rowspan="1" colspan="1">0.1093</td>
<td align="left" rowspan="1" colspan="1">0.1077</td>
<td align="left" rowspan="1" colspan="1">0.0947</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Sa016</td>
<td align="left" rowspan="1" colspan="1">0.0361</td>
<td align="left" rowspan="1" colspan="1">0.0554</td>
<td align="left" rowspan="1" colspan="1">0.1832</td>
<td align="left" rowspan="1" colspan="1">0.4502</td>
<td align="left" rowspan="1" colspan="1">0.0524</td>
<td align="left" rowspan="1" colspan="1">0.0535</td>
<td align="left" rowspan="1" colspan="1">0.0803</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">AMCR</td>
<td align="left" rowspan="1" colspan="1">0.0711</td>
<td align="left" rowspan="1" colspan="1">0.1451</td>
<td align="left" rowspan="1" colspan="1">0.1151</td>
<td align="left" rowspan="1" colspan="1">0.3946</td>
<td align="left" rowspan="1" colspan="1">0.1128</td>
<td align="left" rowspan="1" colspan="1">0.1393</td>
<td align="left" rowspan="1" colspan="1">0.1454</td>
</tr>
</tbody>
</table>
</alternatives>
<table-wrap-foot>
<fn id="nt102">
<label></label>
<p>Misclassification rate (MCR) for JCM, FLAME, flowClust and SWIFT on the 16 samples from the DLBCL dataset (see also
<xref ref-type="table" rid="pone-0100334-t001">Table 1</xref>
). The latter three methods were applied to each individual sample separately (denoted with suffix -I), and also based on a pooling approach (denoted with suffix -P). The final row shows the average misclassification rate (AMCR) for each method.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>For comparative purposes, we have also included in
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
the corresponding MCR for these 16 samples clustered according to two other methods in flow cytometry, SWIFT and flowClust. As these two methods do not have any explicit facility for matching the output from a series of samples, we reported the MCR for SWIFT-I and flowClust-I corresponding to SWIFT and flowClust applied individually to each sample and for SWIFT-P and flowClust-P corresponding to SWIFT and flowClust based on the pooled sample. It can be seen from
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
that for the 16 samples FLAME-I and flowClust-I have similar performances for most of them as do FLAME-P and flowClust-P. For example, FLAME-I has a lower MCR than flowClust-I in 9 of the 16 samples, with there being one tie between FLAME-I and flowClust-I. The flowClust method fits mixtures of
<italic>t</italic>
-distributions after first applying a Box-Cox transformation. We note that if the transformation is sample-specific, then this approach of first transforming each sample considered separately makes it difficult to compare the differences between the fitted distributions for a series of samples corresponding, for example, to different patients or to the one patient monitored over a series of time points. Concerning the SWIFT procedure, it can be seen from
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
that SWIFT-I has a higher MCR than FLAME-I and flowClust-I for most of the samples. However, the average MCR (AMCR) for SWIFT-P is much closer to that for FLAME-P and flowClust-P. Indeed, SWIFT-P has a lower MCR than JCM for three of the samples, including the two samples for which FLAME-I and FLAME-P was performing better than JCM. On comparing flowClust and SWIFT with JCM, it can be observed from
<xref ref-type="table" rid="pone-0100334-t002">Table 2</xref>
that JCM had a lower MCR for all samples than SWIFT-I, and in 13 and 14 of the 16 samples compared to flowClust-P and flowClust-I, respectively. Overall, JCM is clearly favoured by both MCR and the
<italic>F</italic>
-measure in this dataset, as evidenced by it being ranked first or second in 13 of the 16 samples among the five methods based on both MCR and the
<italic>F</italic>
-measure.</p>
</sec>
</sec>
<sec id="s3">
<title>Discussion</title>
<p>High-dimensional computational analysis of flow data is receiving increasing attention with the rapid rise in the number of markers that can be used to probe each cell in parallel
<xref rid="pone.0100334-Lugli1" ref-type="bibr">[3]</xref>
,
<xref rid="pone.0100334-Bendall1" ref-type="bibr">[6]</xref>
. By mirroring the perception of a flow sample as a mixture of cell populations, finite mixture of Gaussians has long been an attractive modeling mechanism
<xref rid="pone.0100334-Demers1" ref-type="bibr">[19]</xref>
. Recently, robust mixture models with multivariate
<italic>t</italic>
and skew
<italic>t</italic>
distributions were introduced for analyzing flow data with non-Gaussian features such as outliers, heavy-tailed densities, and asymmetric shapes
<xref rid="pone.0100334-Pyne1" ref-type="bibr">[7]</xref>
,
<xref rid="pone.0100334-Lo2" ref-type="bibr">[20]</xref>
<xref rid="pone.0100334-Baudry1" ref-type="bibr">[22]</xref>
. In addition to modeling the cell populations, Pyne et al.
<xref rid="pone.0100334-Pyne1" ref-type="bibr">[7]</xref>
also highlighted the importance of registering them across samples. Recent studies have noted that for re-structuring of cell populations, the optimal algorithmic strategy is to do so in conjunction with population modeling
<xref rid="pone.0100334-Lo2" ref-type="bibr">[20]</xref>
,
<xref rid="pone.0100334-Baudry1" ref-type="bibr">[22]</xref>
.</p>
<p>The key contribution of JCM is its joint approach to address two challenges with a single composite model. It is a two-level framework for simultaneous mixture modeling and registration of populations in an entire batch of flow samples. That allows JCM to meet a key need of cytomics – reproducible analysis of data from many samples and conditions simultaneously. Notably, in the field of pattern recognition, alignment of images and curves in lower-dimensional space have emerged as active areas of research in recent years
<xref rid="pone.0100334-Liu1" ref-type="bibr">[23]</xref>
<xref rid="pone.0100334-McLachlan1" ref-type="bibr">[25]</xref>
. Thus, JCM provides an important extension from Gaussian mixture regression models
<xref rid="pone.0100334-Gaffney1" ref-type="bibr">[24]</xref>
to multivariate
<italic>t</italic>
- and skew
<italic>t</italic>
-models, which can be fitted via the EM algorithm. This algorithm is an effective generic technique for parameter estimation
<xref rid="pone.0100334-McLachlan1" ref-type="bibr">[25]</xref>
, and we have extended it for the JCM-specific application of EM (
<xref ref-type="supplementary-material" rid="pone.0100334.s009">Appendices S1</xref>
and
<xref ref-type="supplementary-material" rid="pone.0100334.s010">S2</xref>
). Thus the JCM framework is objective, fast, quantitative and reproducible.</p>
<p>As demonstrated in the previous section, automated population registration of JCM marks a significant technical improvement over FLAME. Unlike the post-hoc meta-clustering approach of FLAME, matching of populations by JCM is intrinsic to its modeling strategy. It is achieved by fitting a random-effects model (REM), a meta-analytic approach for estimating the mean of a distribution of effects
<xref rid="pone.0100334-Ng1" ref-type="bibr">[26]</xref>
. Rare past usage of REM in cytomics was limited to measuring variability of very specific features, e.g., CD4 expression
<xref rid="pone.0100334-Aghaeepour1" ref-type="bibr">[14]</xref>
. JCM is perhaps the first framework that incorporates REM for comprehensive batch characterization in flow data analysis (
<xref ref-type="fig" rid="pone-0100334-g001">Figure 1</xref>
). In particular, our REM uses affine transformation parameters to explicitly learn relationships among every population in a batch even in the presence of flexible amounts of cross-sample variation. In theory, were JCM to be reduced to its lower level, i.e., to perform clustering only and restricted to just a single sample input, then it would be equivalent to FLAME clustering. FLAME was ranked by rigorous benchmarking and expert analysis to be among the top performing unsupervised algorithms at a recent international contest on flow analysis FlowCAP1 organized in NIH
<xref rid="pone.0100334-Aghaeepour1" ref-type="bibr">[14]</xref>
. This signifies that JCM has much greater potential with its more flexible approach compared to FLAME.</p>
<p>A technical advantage of JCM's REM-based registration is that it accounts for the populations' scaling and shifting transformations without explicitly “correcting” them. Some programs may shift populations in order to apply a common gate or filter on an entire cohort, without considering inter-sample variation. However, for precise modeling of the populations, we want to identify those spatio-temporally distinctive high-dimensional features, which may actually be characteristic of each individual sample's phenotype. Whereas we do not want to homogenize population features by aligning them, at the same time, we do want to register the populations – as they appear in high-dimensional space – with precision and rigor. This makes registration more challenging than just matching (as in FLAME meta-clustering
<xref rid="pone.0100334-Pyne1" ref-type="bibr">[7]</xref>
) or alignment (as in channel normalization
<xref rid="pone.0100334-Hahne1" ref-type="bibr">[27]</xref>
). In fact, we compared the performances of JCM and FLAME meta-clustering on benchmark data and, as shown in
<xref ref-type="table" rid="pone-0100334-t001">Table 1</xref>
JCM with its use of a template keeps classification error rates low in the face of increasing inter-sample variation in batches derived from real cytometric cohorts.</p>
<p>Perhaps the most attractive feature of REM is an overall consensus template that emerges from connecting both levels of the JCM model (
<xref ref-type="fig" rid="pone-0100334-g001">Figure 1</xref>
). Thereby JCM establishes a direct parametric correspondence between each population in the batch's template and its counterpart within every sample. Further, the template allows JCM to capture across-sample inter-relationships that may exist among populations and are useful for accurate registration. For instance, if a certain population A usually appeared in between two populations B and C, then it is useful to learn about such relative positioning of A even if its actual location varied from sample to sample. It makes JCM more robust to common transformations (such as shifting or scaling of populations – to which these relationships are generally invariant) compared to FLAME meta-clustering, which can handle only limited variation in actual locations. Thus the JCM template provides a “ground truth” while the REM transformation parameters quantify each individual instance's deviation from that reference structure. From classification standpoint, given that the JCM templates are defined by parametric distributions, they allow direct statistical comparison of batches which could represent, say, different subclasses of patients or successive longitudinal observations. We also present overlay plots for visual comparison of overall batch-structures along every dimension both within and across different classes in
<xref ref-type="supplementary-material" rid="pone.0100334.s003">Figures S3</xref>
and
<xref ref-type="supplementary-material" rid="pone.0100334.s004">S4</xref>
. Moreover, any new patient sample can be easily classified with the group that has the most similar template (as determined by, say, Kullback-Leibler distance). Finally, a JCM template provides the user with a visually convenient yet parametrically precise “snapshot” summarizing a cohort's overall population structure. Studies of large cohorts, such as for finding associations between genotypes and immuno-phenotypes in human populations, can be performed systematically with our two-level approach. Thus large population-wide immune cytome databases can be created.</p>
<p>Parametric characterization of cohorts in terms of their high-dimensional spatio-temporal features can reveal complex and dynamic biological contexts and present them for further investigation. Dissecting and monitoring the parameters of individual cellular species as they evolve over time — such as our time course profiling of TCR stimulation (
<xref ref-type="supplementary-material" rid="pone.0100334.s002">Figure S2</xref>
) — could be useful in many biomedical applications. The JCM models supporting asymmetric and heavy-tailed distributions of events are uniquely suited for detecting features that appear dynamically as hard-to-separate transitional features, such as asymmetric or tail subpopulations
<xref rid="pone.0100334-Kotecha1" ref-type="bibr">[8]</xref>
, that are otherwise difficult to distinguish via automation. Further, by pooling features across staining panels that are multiplexed, JCM can detect complex biological contexts involving multiple markers from a signaling pathway or network
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
, which is a new application in computational cytomics.</p>
<p>JCM can serve as a practical framework that is suitable for clinical applications. Here, its main objective is to learn the specific target populations' parameters for large numbers of samples precisely and quickly. Yet, in clinical applications, the modeling must also be robust enough to allow a reliable parameter-driven classification of patient samples. This is of particular concern for flow data which may contain high inter-sample variation due to the presence of complex, biologically interesting subpopulations, along with noise, within the target pool of primary cells. In the BCR signalling dataset, through automated analysis of multiplexed data, JCM had identified a nuanced signature for signaling alterations in high-dimensional marker-space that further improved the stratification between the two FL patient classes, as described in Irish et al.
<xref rid="pone.0100334-Irish2" ref-type="bibr">[9]</xref>
. Explicit detection of variation by REM is useful for batch characterization, QA/QC, as well as downstream analysis.</p>
<p>Moreover, JCM produces an array of insightful plots. For instance, the overlay plot can reveal within-class variation along any dimension (
<xref ref-type="supplementary-material" rid="pone.0100334.s003">Figure S3</xref>
), while the intensity heatplots take advantage of REM to allow monitoring of spatio-temporal changes in individual populations that are matched across the cohort (
<xref ref-type="fig" rid="pone-0100334-g002">Figure 2</xref>
). Another attractive practical feature of JCM is its representation of output in the form of a generic feature-by-sample matrix, which can be analyzed with common bioinformatic pipelines. Thus, here we used the well-known GSEA algorithm
<xref rid="pone.0100334-Subramanian1" ref-type="bibr">[17]</xref>
to create a new technique for combining JCM features into enriched meta-features across multiplexed staining panels. The simple new technique may become highly effective as more multiplexed staining data begin to appear
<xref rid="pone.0100334-Krutzik1" ref-type="bibr">[4]</xref>
.</p>
<p>By accounting for sample-specific variation, in essence REM also performs cohort-wide meta-analysis. Indeed, JCM framework can be further generalized to include an even higher level of parameterization for representing class-specific information such as time points or patient subtype (including clinico-pathological variables, genotypes, etc.). This makes JCM well suited for integrative cytomics, such as for large population immunome studies. In fact, our simulations show that besides being efficient in batch mode analysis, JCM is also robust against both class-size and the amount of inter-sample variation it can handle (
<xref ref-type="supplementary-material" rid="pone.0100334.s007">Figure S7</xref>
). In particular, we conducted an extensive set of simulation studies to determine the performance of JCM under different settings, including Simulations A to D reported in
<xref ref-type="supplementary-material" rid="pone.0100334.s007">Figure S7</xref>
which focus on the performance of JCM with different number of sample sizes, markers, populations, and samples (in a cohort), respectively. Simulation shows that the run time performance is linearly proportional to the number of samples, the number of observations per sample, and the number of clusters. For instance, the running time for JCM modeling of a sample in our phosphorylation data averaged 33.7 sec per sample on a standard desktop PC (again using only a single-threaded implementation of JCM). This contrasts sharply with the hours of manual analysis performed over weeks by multiple researchers in the original study. With increasing multi-parameterization and multiplexing of cytometric data, JCM can facilitate automated, quantitative, scalable and objective investigation of complex hypotheses about different conditions and cohorts of biomedical interest.</p>
</sec>
<sec sec-type="methods" id="s4">
<title>Methods</title>
<p>Following is the description of the JCM workflow and details of the models and methods, also continued in
<xref ref-type="supplementary-material" rid="pone.0100334.s012">Text S2</xref>
.</p>
<sec id="s4a">
<title>Overview of JCM</title>
<p>JCM is run in the following sequence of steps (flowchart in
<xref ref-type="supplementary-material" rid="pone.0100334.s001">Figure S1</xref>
) –</p>
<list list-type="order">
<list-item>
<p>Obtain the expression matrices from an input batch of preprocessed samples.</p>
</list-item>
<list-item>
<p>Fit a two-level model (as illustrated in
<xref ref-type="fig" rid="pone-0100334-g001">Figure 1</xref>
) to these data such that —</p>
<list list-type="order">
<list-item>
<p>
<bold>(2a)</bold>
an overall parametric template for the batch is constructed by modeling the affine transformations that may exist among the corresponding populations across samples, and simultaneously</p>
</list-item>
<list-item>
<p>
<bold>(2b)</bold>
every sample is modeled with its own mixture of skewed and heavy-tailed multivariate probability distributions, which characterizes the high-dimensional populations while registering them using the batch template.</p>
</list-item>
</list>
</list-item>
<list-item>
<p>Output files are produced containing the fitted models for the batch template and all samples – in formats suitable for visualization and downstream analysis programs. Overlay plots are produced for visual comparison of all class-templates.</p>
</list-item>
</list>
<p>There are two options for constructing the parametric models with JCM: the default using mixtures of multivariate skew
<italic>t</italic>
-distributions and its symmetric counterpart using a mixture of multivariate
<italic>t</italic>
-distributions.</p>
</sec>
<sec id="s4b">
<title>Mixtures of multivariate
<italic>t</italic>
- and skew
<italic>t</italic>
-distributions</title>
<p>A two-level model is fitted to an input batch or class
<italic>C</italic>
of
<italic>m</italic>
samples where each sample is represented by its own
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e034.jpg"></inline-graphic>
</inline-formula>
expression matrix, where
<italic>k</italic>
indexes the sample
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e035.jpg"></inline-graphic>
</inline-formula>
. The problem is to simultaneously (a) model all
<italic>m</italic>
samples in a batch while (b) creating a
<italic>p</italic>
-dimensional template of
<italic>g</italic>
components for matching the corresponding populations across all samples. Below we describe the JCM model, for both symmetric and asymmetric components, which are fitted with the JCM-specific EM algorithm for maximum likelihood (ML) estimation as described in detail in
<xref ref-type="supplementary-material" rid="pone.0100334.s009">Appendices S1</xref>
and
<xref ref-type="supplementary-material" rid="pone.0100334.s010">S2</xref>
.</p>
<p>Let
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e036.jpg"></inline-graphic>
</inline-formula>
denote a
<italic>p</italic>
-dimensional vector denoting the values of the
<italic>p</italic>
markers in a sample. Then JCM provides a method for constructing a template density of
<italic>y</italic>
for a class of
<italic>m</italic>
samples, where we let
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e037.jpg"></inline-graphic>
</inline-formula>
denote the data observed in the
<italic>k</italic>
th sample (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e038.jpg"></inline-graphic>
</inline-formula>
). For the construction of the template density, we use a mixture of
<italic>g</italic>
component distributions, where the latter are members of the
<italic>t</italic>
-family of distributions
<xref rid="pone.0100334-McLachlan2" ref-type="bibr">[28]</xref>
or of a skew-extension of this family
<xref rid="pone.0100334-Pyne1" ref-type="bibr">[7]</xref>
. In order to define these component distributions, we consider first the
<italic>g</italic>
-component normal mixture density, which can be expressed as
<disp-formula id="pone.0100334.e039">
<graphic xlink:href="pone.0100334.e039.jpg" position="anchor" orientation="portrait"></graphic>
<label>(1)</label>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e040.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e041.jpg"></inline-graphic>
</inline-formula>
denotes the
<italic>p</italic>
-variate normal density with mean
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e042.jpg"></inline-graphic>
</inline-formula>
and covariance matrix
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e043.jpg"></inline-graphic>
</inline-formula>
;
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e044.jpg"></inline-graphic>
</inline-formula>
denote the mixing proportions which are non-negative and sum to one. The optimal value of
<italic>g</italic>
can be specified directly by the user. Alternatively, it can be determined in an unsupervised msanner by the Bayesian Information Criterion (BIC); see
<xref ref-type="supplementary-material" rid="pone.0100334.s012">Text S2</xref>
. The vector
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e045.jpg"></inline-graphic>
</inline-formula>
denotes the elements of
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e046.jpg"></inline-graphic>
</inline-formula>
and the elements of
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e047.jpg"></inline-graphic>
</inline-formula>
known a priori to be distinct. The vector of unknown parameters is given by
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e048.jpg"></inline-graphic>
</inline-formula>
, where the superscript
<italic>T</italic>
denotes vector transpose. In (1),
<italic>f</italic>
is being used generically to denote a density function.</p>
<p>In the present context where the tails of the normal distribution are heavier or the parameter estimates are affected by atypical observations (outliers), the fitting of mixtures of multivariate
<italic>t</italic>
-distributions provides a more robust approach to the fitting of normal mixture models
<xref rid="pone.0100334-McLachlan2" ref-type="bibr">[28]</xref>
. The
<italic>t</italic>
-component density with location parameter
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e049.jpg"></inline-graphic>
</inline-formula>
, positive-definite scale matrix
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e050.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e051.jpg"></inline-graphic>
</inline-formula>
degrees of freedom is given by
<disp-formula id="pone.0100334.e052">
<graphic xlink:href="pone.0100334.e052.jpg" position="anchor" orientation="portrait"></graphic>
<label>(2)</label>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e053.jpg"></inline-graphic>
</inline-formula>
denotes the Mahalanobis squared distance between
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e054.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e055.jpg"></inline-graphic>
</inline-formula>
(with
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e056.jpg"></inline-graphic>
</inline-formula>
as the scale matrix), and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e057.jpg"></inline-graphic>
</inline-formula>
denotes the Gamma function. The parameter
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e058.jpg"></inline-graphic>
</inline-formula>
acts as a robustness tuning parameter, which can be inferred from the data by computing its maximum likelihood estimate.</p>
<p>In order to reliably model the clusters that are not elliptically symmetric but are skewed, we shall adopt component densities that are a skewed version of the
<italic>t</italic>
-distribution. Over the years, a number of proposals have been put forward with increasing level of generality for a skew form of the
<italic>t</italic>
-distribution. We shall adopt the version proposed by Sahu et al.
<xref rid="pone.0100334-Sahu1" ref-type="bibr">[29]</xref>
, which is quite general. Accordingly, we let
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e059.jpg"></inline-graphic>
</inline-formula>
be a diagonal matrix with diagonal elements given by the vector
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e060.jpg"></inline-graphic>
</inline-formula>
of skewness parameters. Suppose that conditional on a gamma random variable
<italic>w</italic>
and membership of the
<italic>h</italic>
th component, the joint distribution of the random vectors
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e061.jpg"></inline-graphic>
</inline-formula>
and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e062.jpg"></inline-graphic>
</inline-formula>
is given by
<disp-formula id="pone.0100334.e063">
<graphic xlink:href="pone.0100334.e063.jpg" position="anchor" orientation="portrait"></graphic>
<label>(3)</label>
</disp-formula>
where
<italic>w</italic>
is distributed according to the gamma
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e064.jpg"></inline-graphic>
</inline-formula>
distribution. In the above, we let
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e065.jpg"></inline-graphic>
</inline-formula>
denotes the
<italic>p</italic>
-dimensional null vector,
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e066.jpg"></inline-graphic>
</inline-formula>
denotes the
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e067.jpg"></inline-graphic>
</inline-formula>
null matrix, and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e068.jpg"></inline-graphic>
</inline-formula>
denotes the
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e069.jpg"></inline-graphic>
</inline-formula>
identity matrix.</p>
<p>Then
<disp-formula id="pone.0100334.e070">
<graphic xlink:href="pone.0100334.e070.jpg" position="anchor" orientation="portrait"></graphic>
<label>(4)</label>
</disp-formula>
defines a
<italic>p</italic>
-dimensional multivariate skew
<italic>t</italic>
-distribution with location
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e071.jpg"></inline-graphic>
</inline-formula>
, scale matrix
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e072.jpg"></inline-graphic>
</inline-formula>
, skew (diagonal) matrix
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e073.jpg"></inline-graphic>
</inline-formula>
, and
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e074.jpg"></inline-graphic>
</inline-formula>
degrees of freedom. Here
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e075.jpg"></inline-graphic>
</inline-formula>
denotes the vector whose
<italic>i</italic>
th element is equal to the magnitude of the
<italic>i</italic>
th element of the vector
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e076.jpg"></inline-graphic>
</inline-formula>
. The density of
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e076a.jpg"></inline-graphic>
</inline-formula>
can be expressed as
<disp-formula id="pone.0100334.e077">
<graphic xlink:href="pone.0100334.e077.jpg" position="anchor" orientation="portrait"></graphic>
<label>(5)</label>
</disp-formula>
where
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e078.jpg"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e079.jpg"></inline-graphic>
</inline-formula>
,
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e080.jpg"></inline-graphic>
</inline-formula>
. In (5),
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e081.jpg"></inline-graphic>
</inline-formula>
denotes the
<italic>p</italic>
-variate
<italic>t</italic>
-density with location
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e082.jpg"></inline-graphic>
</inline-formula>
, scale matrix
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e083.jpg"></inline-graphic>
</inline-formula>
, and degrees of freedom
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e084.jpg"></inline-graphic>
</inline-formula>
, and
<italic>T
<sub>p</sub>
</italic>
denotes its (
<italic>p</italic>
-variate) distribution function.</p>
</sec>
<sec id="s4c">
<title>Multi-level modeling</title>
<p>We represented the class template by fitting the
<italic>g</italic>
-component mixture model in (1) to all the
<italic>m</italic>
samples considered simultaneously, using (2) to represent the
<italic>t</italic>
-component densities in the symmetric case and (5) in the case of skewed
<italic>t</italic>
-component densities. If there were no inter-sample variation, then we could proceed to fit the same
<italic>t</italic>
- or skew
<italic>t</italic>
-mixture sss to all the
<italic>m</italic>
samples observed. But here the second-level of JCM model allows for inter-sample variation based on the concept of random-effects, which is often used for combining data from batches containing different amounts of variation. We propose to do so by introducing random-effects terms and using them to specify how the sample-specific component distributions vary from those in the
<italic>t</italic>
- or skew
<italic>t</italic>
-mixture model representing the template.</p>
<p>Let
<italic>y
<sub>ijk</sub>
</italic>
denote the measurement on the
<italic>i</italic>
th variable for the
<italic>j</italic>
th observation in the
<italic>k</italic>
th sample (
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e085.jpg"></inline-graphic>
</inline-formula>
). Then conditional on its membership of the
<italic>h</italic>
th component of the mixture model and conditional on the random-effects terms, we specify the distribution of
<italic>y
<sub>ijk</sub>
</italic>
as
<disp-formula id="pone.0100334.e086">
<graphic xlink:href="pone.0100334.e086.jpg" position="anchor" orientation="portrait"></graphic>
<label>(6)</label>
</disp-formula>
where
<italic>e
<sub>hijk</sub>
</italic>
is the error term and where
<italic>a
<sub>hik</sub>
</italic>
and
<italic>b
<sub>hik</sub>
</italic>
are random-effects terms with
<disp-formula id="pone.0100334.e087">
<graphic xlink:href="pone.0100334.e087.jpg" position="anchor" orientation="portrait"></graphic>
<label>(7)</label>
</disp-formula>
Here
<inline-formula>
<inline-graphic xlink:href="pone.0100334.e088.jpg"></inline-graphic>
</inline-formula>
is the
<italic>h</italic>
th component mean of the
<italic>i</italic>
th variable in the
<italic>g</italic>
-component mixture model representing the template for class
<italic>C</italic>
. The terms
<italic>e
<sub>hijk</sub>
</italic>
,
<italic>a
<sub>hik</sub>
</italic>
and
<italic>b
<sub>hik</sub>
</italic>
are taken to be independent and this independence assumption extends over all variables and all samples. The sample-specific terms,
<italic>a
<sub>hik</sub>
</italic>
and
<italic>b
<sub>hik</sub>
</italic>
, allow for scaling and translation, respectively, of the sample-component means from the component-means of the template. Estimation of the random-effects model (6) can be performed using the JCM-specific implementation of the EM algorithm described in detail in
<xref ref-type="supplementary-material" rid="pone.0100334.s009">Appendices S1</xref>
and
<xref ref-type="supplementary-material" rid="pone.0100334.s010">S2</xref>
.</p>
</sec>
</sec>
<sec sec-type="supplementary-material" id="s5">
<title>Supporting Information</title>
<supplementary-material content-type="local-data" id="pone.0100334.s001">
<label>Figure S1</label>
<caption>
<p>
<bold>The workflow of JCM.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s001.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s002">
<label>Figure S2</label>
<caption>
<p>
<bold>Spatio-temporal characterization of populations using JCM class templates.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s002.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s003">
<label>Figure S3</label>
<caption>
<p>
<bold>Overlay plot for capturing variation within a class.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s003.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s004">
<label>Figure S4</label>
<caption>
<p>
<bold>Spatio-temporal profiling of populations representing naïve and memory T cells.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s004.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s005">
<label>Figure S5</label>
<caption>
<p>
<bold>Enrichment of cross-panel meta-features.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s005.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s006">
<label>Figure S6</label>
<caption>
<p>
<bold>Differences in mound skewness.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s006.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s007">
<label>Figure S7</label>
<caption>
<p>
<bold>Running time analysis of JCM.</bold>
</p>
<p>(TIF)</p>
</caption>
<media xlink:href="pone.0100334.s007.tif">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s008">
<label>Table S1</label>
<caption>
<p>
<bold>The </bold>
<bold>
<italic>F</italic>
</bold>
<bold>-measure values of various methods on DLBCL data.</bold>
</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0100334.s008.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s009">
<label>Appendix S1</label>
<caption>
<p>
<bold>The JCM-MT Model.</bold>
</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0100334.s009.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s010">
<label>Appendix S2</label>
<caption>
<p>
<bold>The JCM-MST Model.</bold>
</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0100334.s010.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s011">
<label>Text S1</label>
<caption>
<p>
<bold>Data and experiments.</bold>
</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0100334.s011.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="pone.0100334.s012">
<label>Text S2</label>
<caption>
<p>
<bold>The JCM workflow.</bold>
</p>
<p>(PDF)</p>
</caption>
<media xlink:href="pone.0100334.s012.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<p>This study was partially supported by a grant from the Australian Research Council.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0100334-Irish1">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Irish</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Kotecha</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Nolan</surname>
<given-names>GP</given-names>
</name>
(
<year>2006</year>
)
<article-title>Mapping normal and cancer cell signalling networks: towards single-cell proteomics</article-title>
.
<source>Nat Rev Cancer</source>
<volume>6</volume>
:
<fpage>146</fpage>
<lpage>155</lpage>
<pub-id pub-id-type="pmid">16491074</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Perfetto1">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Perfetto</surname>
<given-names>SP</given-names>
</name>
,
<name>
<surname>Chattopadhyay</surname>
<given-names>PK</given-names>
</name>
,
<name>
<surname>Roederer</surname>
<given-names>M</given-names>
</name>
(
<year>2004</year>
)
<article-title>Seventeen-colour flow cytometry: unravelling the immune system</article-title>
.
<source>Nat Rev Immunol</source>
<volume>4</volume>
:
<fpage>648</fpage>
<lpage>655</lpage>
<pub-id pub-id-type="pmid">15286731</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Lugli1">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lugli</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Roederer</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Cossarizza</surname>
<given-names>A</given-names>
</name>
(
<year>2010</year>
)
<article-title>Data analysis in flow cytometry: the future just started</article-title>
.
<source>Cytometry A</source>
<volume>77</volume>
:
<fpage>705</fpage>
<lpage>713</lpage>
<pub-id pub-id-type="pmid">20583274</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Krutzik1">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Krutzik</surname>
<given-names>PO</given-names>
</name>
,
<name>
<surname>Nolan</surname>
<given-names>GP</given-names>
</name>
(
<year>2006</year>
)
<article-title>Fluorescent cell barcoding in flow cytometry allows high-throughput drug screening and signaling profiling</article-title>
.
<source>Nat Methods</source>
<volume>3</volume>
:
<fpage>361</fpage>
<lpage>368</lpage>
<pub-id pub-id-type="pmid">16628206</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Tanner1">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tanner</surname>
<given-names>SD</given-names>
</name>
,
<name>
<surname>Bandura</surname>
<given-names>DR</given-names>
</name>
,
<name>
<surname>Ornatsky</surname>
<given-names>O</given-names>
</name>
,
<name>
<surname>Baranov</surname>
<given-names>VI</given-names>
</name>
,
<name>
<surname>Nitz</surname>
<given-names>M</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>Flow cytometer with mass spectrometer detection for massively multiplexed single-cell biomarker assay</article-title>
.
<source>Pure Appl Chem</source>
<volume>80</volume>
:
<fpage>2627</fpage>
<lpage>2641</lpage>
</mixed-citation>
</ref>
<ref id="pone.0100334-Bendall1">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bendall</surname>
<given-names>SC</given-names>
</name>
,
<name>
<surname>Simonds</surname>
<given-names>EF</given-names>
</name>
,
<name>
<surname>Qiu</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Amir</surname>
<given-names>EaD</given-names>
</name>
,
<name>
<surname>Krutzik</surname>
<given-names>PO</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum</article-title>
.
<source>Science</source>
<volume>332</volume>
:
<fpage>687</fpage>
<lpage>696</lpage>
<pub-id pub-id-type="pmid">21551058</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Pyne1">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pyne</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Hu</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Rossin</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>TI</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Automated high-dimensional flow cytometric data analysis</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>106</volume>
:
<fpage>8519</fpage>
<lpage>8524</lpage>
<pub-id pub-id-type="pmid">19443687</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Kotecha1">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kotecha</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Flores</surname>
<given-names>NJ</given-names>
</name>
,
<name>
<surname>Irish</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Simonds</surname>
<given-names>EF</given-names>
</name>
,
<name>
<surname>Sakai</surname>
<given-names>DS</given-names>
</name>
,
<etal>et al</etal>
(
<year>2008</year>
)
<article-title>Single-cell profiling identifies aberrant stat5 activation in myeloid malignancies with specific clinical and biologic correlates</article-title>
.
<source>Cancer Cell</source>
<volume>14</volume>
:
<fpage>335</fpage>
<lpage>343</lpage>
<pub-id pub-id-type="pmid">18835035</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Irish2">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Irish</surname>
<given-names>JM</given-names>
</name>
,
<name>
<surname>Myklebust</surname>
<given-names>JH</given-names>
</name>
,
<name>
<surname>Alizadeh</surname>
<given-names>AA</given-names>
</name>
,
<name>
<surname>Houot</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Sharman</surname>
<given-names>JP</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>B-cell signaling networks reveal a negative prognostic human lymphoma cell subset that emerges during tumor progression</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>107</volume>
:
<fpage>12747</fpage>
<lpage>1275</lpage>
<pub-id pub-id-type="pmid">20543139</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Oved1">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Oved</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Eden</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Akerman</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Noy</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Wolchinsky</surname>
<given-names>R</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>Predicting and controlling the reactivity of immune cell populations against cancer</article-title>
.
<source>Mol Syst Biol</source>
<volume>5</volume>
:
<fpage>265</fpage>
<pub-id pub-id-type="pmid">19401677</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Lo1">
<label>11</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lo</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Hahne</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Brinkman</surname>
<given-names>RR</given-names>
</name>
,
<name>
<surname>Gottardo</surname>
<given-names>R</given-names>
</name>
(
<year>2009</year>
)
<article-title>flowclust: a bioconductor package for automated gating of flow cytometry data</article-title>
.
<source>BMC Bioinformatics</source>
<volume>10</volume>
:
<fpage>145</fpage>
<pub-id pub-id-type="pmid">19442304</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Naim1">
<label>12</label>
<mixed-citation publication-type="book">Naim I, Datta S, Sharma G, Cavenaugh JS, Mosmann TR (2010) Swift: Scalable weighted iterative sampling for flow cytometry clustering. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP). pp. 509–512.</mixed-citation>
</ref>
<ref id="pone.0100334-Pyne2">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pyne</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Irish</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Tamayo</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Nazaire</surname>
<given-names>MD</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data</article-title>
.
<source>Preprint arXiv</source>
<fpage>13057344</fpage>
</mixed-citation>
</ref>
<ref id="pone.0100334-Aghaeepour1">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Aghaeepour</surname>
<given-names>N</given-names>
</name>
,
<name>
<surname>Finak</surname>
<given-names>G</given-names>
</name>
(
<year>2013</year>
)
<collab>The FLOWCAP Consortium</collab>
<collab>The DREAM Consortium</collab>
(
<year>2013</year>
)
<name>
<surname>Hoos</surname>
<given-names>H</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>Critical assessment of automated flow cytometry analysis techniques</article-title>
.
<source>Nature Methods</source>
<volume>10</volume>
:
<fpage>228</fpage>
<lpage>238</lpage>
<pub-id pub-id-type="pmid">23396282</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Cron1">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Cron</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Gouttefangeas</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Frelinger</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Lin</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Singh</surname>
<given-names>SK</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples</article-title>
.
<source>PLoS Computational Biology</source>
<volume>9</volume>
</mixed-citation>
</ref>
<ref id="pone.0100334-Maier1">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Maier</surname>
<given-names>LM</given-names>
</name>
,
<name>
<surname>Anderson</surname>
<given-names>DE</given-names>
</name>
,
<name>
<surname>De Jager</surname>
<given-names>PL</given-names>
</name>
,
<name>
<surname>Wicker</surname>
<given-names>LS</given-names>
</name>
,
<name>
<surname>Hafler</surname>
<given-names>DA</given-names>
</name>
(
<year>2007</year>
)
<article-title>Allelic variant in ctla4 alters t cell phosphorylation patterns</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>104</volume>
:
<fpage>18607</fpage>
<lpage>18612</lpage>
<pub-id pub-id-type="pmid">18000051</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Subramanian1">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Subramanian</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Tamayo</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Mootha</surname>
<given-names>VK</given-names>
</name>
,
<name>
<surname>Mukherjee</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Ebert</surname>
<given-names>BL</given-names>
</name>
,
<etal>et al</etal>
(
<year>2005</year>
)
<article-title>Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
<volume>102</volume>
:
<fpage>15545</fpage>
<lpage>15550</lpage>
<pub-id pub-id-type="pmid">16199517</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Tamayo1">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tamayo</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Cho</surname>
<given-names>YJ</given-names>
</name>
,
<name>
<surname>Tsherniak</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Greulich</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Ambrogio</surname>
<given-names>L</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>Predicting relapse in patients with medulloblastoma by integrating evidence from clinical and genomic features</article-title>
.
<source>J Clin Oncol</source>
<volume>29</volume>
:
<fpage>1415</fpage>
<lpage>1423</lpage>
<pub-id pub-id-type="pmid">21357789</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Demers1">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Demers</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Kim</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Legendre</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Legendre</surname>
<given-names>L</given-names>
</name>
(
<year>1992</year>
)
<article-title>Analyzing multivariate flow cytometric data in aquatic sciences</article-title>
.
<source>Cytometry</source>
<volume>13</volume>
:
<fpage>291</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="pmid">1576894</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Lo2">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lo</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Gottardo</surname>
<given-names>R</given-names>
</name>
(
<year>2009</year>
)
<article-title>Automated gating of flow cytometry data via robust model-based clus-tering</article-title>
.
<source>Cytometry A</source>
<volume>73</volume>
:
<fpage>321</fpage>
<lpage>332</lpage>
<pub-id pub-id-type="pmid">18307272</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-FrhwirthSchnatter1">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Frühwirth-Schnatter</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Pyne</surname>
<given-names>S</given-names>
</name>
(
<year>2010</year>
)
<article-title>Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions</article-title>
.
<source>Biostatistics</source>
<volume>11</volume>
:
<fpage>317</fpage>
<lpage>336</lpage>
<pub-id pub-id-type="pmid">20110247</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Baudry1">
<label>22</label>
<mixed-citation publication-type="journal">
<name>
<surname>Baudry</surname>
<given-names>JP</given-names>
</name>
,
<name>
<surname>Raftery</surname>
<given-names>AE</given-names>
</name>
,
<name>
<surname>Celeux</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Lo</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Gottardo</surname>
<given-names>R</given-names>
</name>
(
<year>332–353</year>
)
<article-title>Combining mixture components for clustering</article-title>
.
<source>Journal of Computational and Graphical Statistics</source>
<volume>19</volume>
:
<fpage>2010</fpage>
</mixed-citation>
</ref>
<ref id="pone.0100334-Liu1">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Yang</surname>
<given-names>M</given-names>
</name>
(
<year>2009</year>
)
<article-title>Simultaneous curve registration and clustering for functional data</article-title>
.
<source>Computational Statistics and Data Analysis</source>
<volume>53</volume>
:
<fpage>1361</fpage>
<lpage>1376</lpage>
</mixed-citation>
</ref>
<ref id="pone.0100334-Gaffney1">
<label>24</label>
<mixed-citation publication-type="journal">
<name>
<surname>Gaffney</surname>
<given-names>SJ</given-names>
</name>
,
<name>
<surname>Robertson</surname>
<given-names>AW</given-names>
</name>
,
<name>
<surname>Smyth</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Camargo</surname>
<given-names>SJ</given-names>
</name>
,
<name>
<surname>Ghil</surname>
<given-names>M</given-names>
</name>
(
<year>2007</year>
)
<article-title>Probabilistic clustering of extratropical cyclones using regression mixture models</article-title>
.
<source>Climate Dynamics</source>
<volume>29</volume>
:
<fpage>423</fpage>
<lpage>440</lpage>
</mixed-citation>
</ref>
<ref id="pone.0100334-McLachlan1">
<label>25</label>
<mixed-citation publication-type="book">McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions. Hokoben, N. J.: Wiley- Interscience, 2nd edition.</mixed-citation>
</ref>
<ref id="pone.0100334-Ng1">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ng</surname>
<given-names>SK</given-names>
</name>
,
<name>
<surname>McLachlan</surname>
<given-names>GJ</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Ben-Tovim</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Ng</surname>
<given-names>SW</given-names>
</name>
(
<year>2006</year>
)
<article-title>A mixture model with random-effects components for clustering correlated gene-expression profiles</article-title>
.
<source>Bioinformatics</source>
<volume>22</volume>
:
<fpage>1745</fpage>
<lpage>1752</lpage>
<pub-id pub-id-type="pmid">16675467</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-Hahne1">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hahne</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Khodabakhshi</surname>
<given-names>AH</given-names>
</name>
,
<name>
<surname>Bashashati</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Wong</surname>
<given-names>CJ</given-names>
</name>
,
<name>
<surname>Gascoyne</surname>
<given-names>RD</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Per-channel basis normalization methods for flow cytometry data</article-title>
.
<source>Cytometry A</source>
<volume>77</volume>
:
<fpage>121</fpage>
<lpage>131</lpage>
<pub-id pub-id-type="pmid">19899135</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0100334-McLachlan2">
<label>28</label>
<mixed-citation publication-type="book">McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Lecture Notes in Computer Science. Springer-Verlag, pp. 658–666.</mixed-citation>
</ref>
<ref id="pone.0100334-Sahu1">
<label>29</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sahu</surname>
<given-names>SK</given-names>
</name>
,
<name>
<surname>Dey</surname>
<given-names>DK</given-names>
</name>
,
<name>
<surname>Branco</surname>
<given-names>MD</given-names>
</name>
(
<year>2003</year>
)
<article-title>A new class of multivariate skew distributions with applications to bayesian regression models</article-title>
.
<source>Can J Stat</source>
<volume>31</volume>
:
<fpage>129</fpage>
<lpage>150</lpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002B69 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 002B69 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4077578
   |texte=   Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24983991" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024