Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms

Identifieur interne : 000A48 ( Pmc/Corpus ); précédent : 000A47; suivant : 000A49

MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms

Auteurs : Florian Rohart ; Aida Eslami ; Nicholas Matigian ; Stéphanie Bougeard ; Kim-Anh Lê Cao

Source :

RBID : PMC:5327533

Abstract

Background

Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods.

Results

To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method, MINT, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures.

Conclusions

MINT is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies. MINT is computationally fast as part of the mixOmics R CRAN package, available at http://www.mixOmics.org/mixMINT/and http://cran.r-project.org/web/packages/mixOmics/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1553-8) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s12859-017-1553-8
PubMed: 28241739
PubMed Central: 5327533

Links to Exploration step

PMC:5327533

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms</title>
<author>
<name sortKey="Rohart, Florian" sort="Rohart, Florian" uniqKey="Rohart F" first="Florian" last="Rohart">Florian Rohart</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eslami, Aida" sort="Eslami, Aida" uniqKey="Eslami A" first="Aida" last="Eslami">Aida Eslami</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2288 9830</institution-id>
<institution-id institution-id-type="GRID">grid.17091.3e</institution-id>
<institution>Centre for Heart Lung Innovation,</institution>
<institution>University of British Columbia,</institution>
</institution-wrap>
Vancouver, BC V6Z 1Y6 Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Matigian, Nicholas" sort="Matigian, Nicholas" uniqKey="Matigian N" first="Nicholas" last="Matigian">Nicholas Matigian</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bougeard, Stephanie" sort="Bougeard, Stephanie" uniqKey="Bougeard S" first="Stéphanie" last="Bougeard">Stéphanie Bougeard</name>
<affiliation>
<nlm:aff id="Aff3">French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, 22440 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le Cao, Kim Anh" sort="Le Cao, Kim Anh" uniqKey="Le Cao K" first="Kim-Anh" last="Lê Cao">Kim-Anh Lê Cao</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28241739</idno>
<idno type="pmc">5327533</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5327533</idno>
<idno type="RBID">PMC:5327533</idno>
<idno type="doi">10.1186/s12859-017-1553-8</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000A48</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000A48</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms</title>
<author>
<name sortKey="Rohart, Florian" sort="Rohart, Florian" uniqKey="Rohart F" first="Florian" last="Rohart">Florian Rohart</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eslami, Aida" sort="Eslami, Aida" uniqKey="Eslami A" first="Aida" last="Eslami">Aida Eslami</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2288 9830</institution-id>
<institution-id institution-id-type="GRID">grid.17091.3e</institution-id>
<institution>Centre for Heart Lung Innovation,</institution>
<institution>University of British Columbia,</institution>
</institution-wrap>
Vancouver, BC V6Z 1Y6 Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Matigian, Nicholas" sort="Matigian, Nicholas" uniqKey="Matigian N" first="Nicholas" last="Matigian">Nicholas Matigian</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bougeard, Stephanie" sort="Bougeard, Stephanie" uniqKey="Bougeard S" first="Stéphanie" last="Bougeard">Stéphanie Bougeard</name>
<affiliation>
<nlm:aff id="Aff3">French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, 22440 France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le Cao, Kim Anh" sort="Le Cao, Kim Anh" uniqKey="Le Cao K" first="Kim-Anh" last="Lê Cao">Kim-Anh Lê Cao</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods.</p>
</sec>
<sec>
<title>Results</title>
<p>To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method,
<italic>MINT</italic>
, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>
<italic>MINT</italic>
is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies.
<italic>MINT</italic>
is computationally fast as part of the mixOmics R CRAN package, available at
<ext-link ext-link-type="uri" xlink:href="http://www.mixOmics.org/mixMINT/">http://www.mixOmics.org/mixMINT/</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/mixOmics/">http://cran.r-project.org/web/packages/mixOmics/</ext-link>
.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-017-1553-8) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Pihur, V" uniqKey="Pihur V">V Pihur</name>
</author>
<author>
<name sortKey="Datta, S" uniqKey="Datta S">S Datta</name>
</author>
<author>
<name sortKey="Datta, S" uniqKey="Datta S">S Datta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, S" uniqKey="Kim S">S Kim</name>
</author>
<author>
<name sortKey="Lin, C W" uniqKey="Lin C">C-W Lin</name>
</author>
<author>
<name sortKey="Tseng, Gc" uniqKey="Tseng G">GC Tseng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lazar, C" uniqKey="Lazar C">C Lazar</name>
</author>
<author>
<name sortKey="Meganck, S" uniqKey="Meganck S">S Meganck</name>
</author>
<author>
<name sortKey="Taminau, J" uniqKey="Taminau J">J Taminau</name>
</author>
<author>
<name sortKey="Steenhoff, D" uniqKey="Steenhoff D">D Steenhoff</name>
</author>
<author>
<name sortKey="Coletta, A" uniqKey="Coletta A">A Coletta</name>
</author>
<author>
<name sortKey="Molter, C" uniqKey="Molter C">C Molter</name>
</author>
<author>
<name sortKey="Y Weiss Solis, D" uniqKey="Y Weiss Solis D">D Y.Weiss-Solis</name>
</author>
<author>
<name sortKey="Duque, R" uniqKey="Duque R">R Duque</name>
</author>
<author>
<name sortKey="Bersini, H" uniqKey="Bersini H">H Bersini</name>
</author>
<author>
<name sortKey="Nowe, A" uniqKey="Nowe A">A Nowé</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gagnon Bartsch, Ja" uniqKey="Gagnon Bartsch J">JA Gagnon-Bartsch</name>
</author>
<author>
<name sortKey="Speed, Tp" uniqKey="Speed T">TP Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, L" uniqKey="Shi L">L Shi</name>
</author>
<author>
<name sortKey="Reid, Lh" uniqKey="Reid L">LH Reid</name>
</author>
<author>
<name sortKey="Jones, Wd" uniqKey="Jones W">WD Jones</name>
</author>
<author>
<name sortKey="Shippy, R" uniqKey="Shippy R">R Shippy</name>
</author>
<author>
<name sortKey="Warrington, Ja" uniqKey="Warrington J">JA Warrington</name>
</author>
<author>
<name sortKey="Baker, Sc" uniqKey="Baker S">SC Baker</name>
</author>
<author>
<name sortKey="Collins, Pj" uniqKey="Collins P">PJ Collins</name>
</author>
<author>
<name sortKey="De Longueville, F" uniqKey="De Longueville F">F De Longueville</name>
</author>
<author>
<name sortKey="Kawasaki, Es" uniqKey="Kawasaki E">ES Kawasaki</name>
</author>
<author>
<name sortKey="Lee, Ky" uniqKey="Lee K">KY Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Su, Z" uniqKey="Su Z">Z Su</name>
</author>
<author>
<name sortKey="Labaj, P" uniqKey="Labaj P">P Labaj</name>
</author>
<author>
<name sortKey="Li, S" uniqKey="Li S">S Li</name>
</author>
<author>
<name sortKey="Thierry Mieg, J" uniqKey="Thierry Mieg J">J Thierry-Mieg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, W" uniqKey="Johnson W">W Johnson</name>
</author>
<author>
<name sortKey="Li, C" uniqKey="Li C">C Li</name>
</author>
<author>
<name sortKey="Rabinovic, A" uniqKey="Rabinovic A">A Rabinovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hornung, R" uniqKey="Hornung R">R Hornung</name>
</author>
<author>
<name sortKey="Boulesteix, Al" uniqKey="Boulesteix A">AL Boulesteix</name>
</author>
<author>
<name sortKey="Causeur, D" uniqKey="Causeur D">D Causeur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sims, Ah" uniqKey="Sims A">AH Sims</name>
</author>
<author>
<name sortKey="Smethurst, Gj" uniqKey="Smethurst G">GJ Smethurst</name>
</author>
<author>
<name sortKey="Hey, Y" uniqKey="Hey Y">Y Hey</name>
</author>
<author>
<name sortKey="Okoniewski, Mj" uniqKey="Okoniewski M">MJ Okoniewski</name>
</author>
<author>
<name sortKey="Pepper, Sd" uniqKey="Pepper S">SD Pepper</name>
</author>
<author>
<name sortKey="Howell, A" uniqKey="Howell A">A Howell</name>
</author>
<author>
<name sortKey="Miller, Cj" uniqKey="Miller C">CJ Miller</name>
</author>
<author>
<name sortKey="Clarke, Rb" uniqKey="Clarke R">RB Clarke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Listgarten, J" uniqKey="Listgarten J">J Listgarten</name>
</author>
<author>
<name sortKey="Kadie, C" uniqKey="Kadie C">C Kadie</name>
</author>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author>
<name sortKey="Heckerman, D" uniqKey="Heckerman D">D Heckerman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Cao, Ka" uniqKey="Le Cao K">KA Lê Cao</name>
</author>
<author>
<name sortKey="Rohart, F" uniqKey="Rohart F">F Rohart</name>
</author>
<author>
<name sortKey="Mchugh, L" uniqKey="Mchugh L">L McHugh</name>
</author>
<author>
<name sortKey="Korm, O" uniqKey="Korm O">O Korm</name>
</author>
<author>
<name sortKey="Wells, Ca" uniqKey="Wells C">CA Wells</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Breiman, L" uniqKey="Breiman L">L Breiman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dudoit, S" uniqKey="Dudoit S">S Dudoit</name>
</author>
<author>
<name sortKey="Fridlyand, J" uniqKey="Fridlyand J">J Fridlyand</name>
</author>
<author>
<name sortKey="Speed, Tp" uniqKey="Speed T">TP Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guyon, I" uniqKey="Guyon I">I Guyon</name>
</author>
<author>
<name sortKey="Weston, J" uniqKey="Weston J">J Weston</name>
</author>
<author>
<name sortKey="Barnhill, S" uniqKey="Barnhill S">S Barnhill</name>
</author>
<author>
<name sortKey="Vapnik, V" uniqKey="Vapnik V">V Vapnik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Diaz Uriarte, R" uniqKey="Diaz Uriarte R">R Díaz-Uriarte</name>
</author>
<author>
<name sortKey="De Andres, Sa" uniqKey="De Andres S">SA De Andres</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sowa, Jp" uniqKey="Sowa J">JP Sowa</name>
</author>
<author>
<name sortKey="Atmaca, O" uniqKey="Atmaca O">Ö Atmaca</name>
</author>
<author>
<name sortKey="Kahraman, A" uniqKey="Kahraman A">A Kahraman</name>
</author>
<author>
<name sortKey="Schlattjan, M" uniqKey="Schlattjan M">M Schlattjan</name>
</author>
<author>
<name sortKey="Lindner, M" uniqKey="Lindner M">M Lindner</name>
</author>
<author>
<name sortKey="Sydor, S" uniqKey="Sydor S">S Sydor</name>
</author>
<author>
<name sortKey="Scherbaum, N" uniqKey="Scherbaum N">N Scherbaum</name>
</author>
<author>
<name sortKey="Lackner, K" uniqKey="Lackner K">K Lackner</name>
</author>
<author>
<name sortKey="Gerken, G" uniqKey="Gerken G">G Gerken</name>
</author>
<author>
<name sortKey="Heider, D" uniqKey="Heider D">D Heider</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barker, M" uniqKey="Barker M">M Barker</name>
</author>
<author>
<name sortKey="Rayens, W" uniqKey="Rayens W">W Rayens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Cao, Ka" uniqKey="Le Cao K">KA Lê Cao</name>
</author>
<author>
<name sortKey="Boitard, S" uniqKey="Boitard S">S Boitard</name>
</author>
<author>
<name sortKey="Besse, P" uniqKey="Besse P">P Besse</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hughey, Jj" uniqKey="Hughey J">JJ Hughey</name>
</author>
<author>
<name sortKey="Butte, Aj" uniqKey="Butte A">AJ Butte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parker, Js" uniqKey="Parker J">JS Parker</name>
</author>
<author>
<name sortKey="Mullins, M" uniqKey="Mullins M">M Mullins</name>
</author>
<author>
<name sortKey="Cheang, Mc" uniqKey="Cheang M">MC Cheang</name>
</author>
<author>
<name sortKey="Leung, S" uniqKey="Leung S">S Leung</name>
</author>
<author>
<name sortKey="Voduc, D" uniqKey="Voduc D">D Voduc</name>
</author>
<author>
<name sortKey="Vickery, T" uniqKey="Vickery T">T Vickery</name>
</author>
<author>
<name sortKey="Davies, S" uniqKey="Davies S">S Davies</name>
</author>
<author>
<name sortKey="Fauron, C" uniqKey="Fauron C">C Fauron</name>
</author>
<author>
<name sortKey="He, X" uniqKey="He X">X He</name>
</author>
<author>
<name sortKey="Hu, Z" uniqKey="Hu Z">Z Hu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohart, F" uniqKey="Rohart F">F Rohart</name>
</author>
<author>
<name sortKey="Mason, Ea" uniqKey="Mason E">EA Mason</name>
</author>
<author>
<name sortKey="Matigian, N" uniqKey="Matigian N">N Matigian</name>
</author>
<author>
<name sortKey="Mosbergen, R" uniqKey="Mosbergen R">R Mosbergen</name>
</author>
<author>
<name sortKey="Korn, O" uniqKey="Korn O">O Korn</name>
</author>
<author>
<name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
<author>
<name sortKey="Butcher, S" uniqKey="Butcher S">S Butcher</name>
</author>
<author>
<name sortKey="Patel, J" uniqKey="Patel J">J Patel</name>
</author>
<author>
<name sortKey="Atkinson, K" uniqKey="Atkinson K">K Atkinson</name>
</author>
<author>
<name sortKey="Khosrotehrani, K" uniqKey="Khosrotehrani K">K Khosrotehrani</name>
</author>
<author>
<name sortKey="Fisk, Nm" uniqKey="Fisk N">NM Fisk</name>
</author>
<author>
<name sortKey="Le Cao, K" uniqKey="Le Cao K">K Lê Cao</name>
</author>
<author>
<name sortKey="Wells, Ca" uniqKey="Wells C">CA Wells</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eslami, A" uniqKey="Eslami A">A Eslami</name>
</author>
<author>
<name sortKey="Qannari, Em" uniqKey="Qannari E">EM Qannari</name>
</author>
<author>
<name sortKey="Kohler, A" uniqKey="Kohler A">A Kohler</name>
</author>
<author>
<name sortKey="Bougeard, S" uniqKey="Bougeard S">S Bougeard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eslami, A" uniqKey="Eslami A">A Eslami</name>
</author>
<author>
<name sortKey="Qannari, Em" uniqKey="Qannari E">EM Qannari</name>
</author>
<author>
<name sortKey="Kohler, A" uniqKey="Kohler A">A Kohler</name>
</author>
<author>
<name sortKey="Bougeard, S" uniqKey="Bougeard S">S Bougeard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tenenhaus, M" uniqKey="Tenenhaus M">M Tenenhaus</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bilic, J" uniqKey="Bilic J">J Bilic</name>
</author>
<author>
<name sortKey="Belmonte, Jci" uniqKey="Belmonte J">JCI Belmonte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chin, Mh" uniqKey="Chin M">MH Chin</name>
</author>
<author>
<name sortKey="Mason, Mj" uniqKey="Mason M">MJ Mason</name>
</author>
<author>
<name sortKey="Xie, W" uniqKey="Xie W">W Xie</name>
</author>
<author>
<name sortKey="Volinia, S" uniqKey="Volinia S">S Volinia</name>
</author>
<author>
<name sortKey="Singer, M" uniqKey="Singer M">M Singer</name>
</author>
<author>
<name sortKey="Peterson, C" uniqKey="Peterson C">C Peterson</name>
</author>
<author>
<name sortKey="Ambartsumyan, G" uniqKey="Ambartsumyan G">G Ambartsumyan</name>
</author>
<author>
<name sortKey="Aimiuwu, O" uniqKey="Aimiuwu O">O Aimiuwu</name>
</author>
<author>
<name sortKey="Richter, L" uniqKey="Richter L">L Richter</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Newman, Am" uniqKey="Newman A">AM Newman</name>
</author>
<author>
<name sortKey="Cooper, Jb" uniqKey="Cooper J">JB Cooper</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wells, Ca" uniqKey="Wells C">CA Wells</name>
</author>
<author>
<name sortKey="Mosbergen, R" uniqKey="Mosbergen R">R Mosbergen</name>
</author>
<author>
<name sortKey="Korn, O" uniqKey="Korn O">O Korn</name>
</author>
<author>
<name sortKey="Choi, J" uniqKey="Choi J">J Choi</name>
</author>
<author>
<name sortKey="Seidenman, N" uniqKey="Seidenman N">N Seidenman</name>
</author>
<author>
<name sortKey="Matigian, Na" uniqKey="Matigian N">NA Matigian</name>
</author>
<author>
<name sortKey="Vitale, Am" uniqKey="Vitale A">AM Vitale</name>
</author>
<author>
<name sortKey="Shepherd, J" uniqKey="Shepherd J">J Shepherd</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bolstad, Bm" uniqKey="Bolstad B">BM Bolstad</name>
</author>
<author>
<name sortKey="Irizarry, Ra" uniqKey="Irizarry R">RA Irizarry</name>
</author>
<author>
<name sortKey=" Strand, M" uniqKey=" Strand M">M Åstrand</name>
</author>
<author>
<name sortKey="Speed, Tp" uniqKey="Speed T">TP Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Curtis, C" uniqKey="Curtis C">C Curtis</name>
</author>
<author>
<name sortKey="Shah, Sp" uniqKey="Shah S">SP Shah</name>
</author>
<author>
<name sortKey="Chin, Sf" uniqKey="Chin S">SF Chin</name>
</author>
<author>
<name sortKey="Turashvili, G" uniqKey="Turashvili G">G Turashvili</name>
</author>
<author>
<name sortKey="Rueda, Om" uniqKey="Rueda O">OM Rueda</name>
</author>
<author>
<name sortKey="Dunning, Mj" uniqKey="Dunning M">MJ Dunning</name>
</author>
<author>
<name sortKey="Speed, D" uniqKey="Speed D">D Speed</name>
</author>
<author>
<name sortKey="Lynch, Ag" uniqKey="Lynch A">AG Lynch</name>
</author>
<author>
<name sortKey="Samarajiwa, S" uniqKey="Samarajiwa S">S Samarajiwa</name>
</author>
<author>
<name sortKey="Yuan, Y" uniqKey="Yuan Y">Y Yuan</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitcomb, Bw" uniqKey="Whitcomb B">BW Whitcomb</name>
</author>
<author>
<name sortKey="Perkins, Nj" uniqKey="Perkins N">NJ Perkins</name>
</author>
<author>
<name sortKey="Albert, Ps" uniqKey="Albert P">PS Albert</name>
</author>
<author>
<name sortKey="Schisterman, Ef" uniqKey="Schisterman E">EF Schisterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rohart, F" uniqKey="Rohart F">F Rohart</name>
</author>
<author>
<name sortKey="San Cristobal, M" uniqKey="San Cristobal M">M San Cristobal</name>
</author>
<author>
<name sortKey="Laurent, B" uniqKey="Laurent B">B Laurent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benjamini, Y" uniqKey="Benjamini Y">Y Benjamini</name>
</author>
<author>
<name sortKey="Hochberg, Y" uniqKey="Hochberg Y">Y Hochberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Vodyanik, Ma" uniqKey="Vodyanik M">MA Vodyanik</name>
</author>
<author>
<name sortKey="Smuga Otto, K" uniqKey="Smuga Otto K">K Smuga-Otto</name>
</author>
<author>
<name sortKey="Antosiewicz Bourget, J" uniqKey="Antosiewicz Bourget J">J Antosiewicz-Bourget</name>
</author>
<author>
<name sortKey="Frane, Jl" uniqKey="Frane J">JL Frane</name>
</author>
<author>
<name sortKey="Tian, S" uniqKey="Tian S">S Tian</name>
</author>
<author>
<name sortKey="Nie, J" uniqKey="Nie J">J Nie</name>
</author>
<author>
<name sortKey="Jonsdottir, Ga" uniqKey="Jonsdottir G">GA Jonsdottir</name>
</author>
<author>
<name sortKey="Ruotti, V" uniqKey="Ruotti V">V Ruotti</name>
</author>
<author>
<name sortKey="Stewart, R" uniqKey="Stewart R">R Stewart</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tsialikas, J" uniqKey="Tsialikas J">J Tsialikas</name>
</author>
<author>
<name sortKey="Romer Seibert, J" uniqKey="Romer Seibert J">J Romer-Seibert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krivega, M" uniqKey="Krivega M">M Krivega</name>
</author>
<author>
<name sortKey="Geens, M" uniqKey="Geens M">M Geens</name>
</author>
<author>
<name sortKey="Van De Velde, H" uniqKey="Van De Velde H">H Van de Velde</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kouros Mehr, H" uniqKey="Kouros Mehr H">H Kouros-Mehr</name>
</author>
<author>
<name sortKey="Slorach, Em" uniqKey="Slorach E">EM Slorach</name>
</author>
<author>
<name sortKey="Sternlicht, Md" uniqKey="Sternlicht M">MD Sternlicht</name>
</author>
<author>
<name sortKey="Werb, Z" uniqKey="Werb Z">Z Werb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Asselin Labat, Ml" uniqKey="Asselin Labat M">ML Asselin-Labat</name>
</author>
<author>
<name sortKey="Sutherland, Kd" uniqKey="Sutherland K">KD Sutherland</name>
</author>
<author>
<name sortKey="Barker, H" uniqKey="Barker H">H Barker</name>
</author>
<author>
<name sortKey="Thomas, R" uniqKey="Thomas R">R Thomas</name>
</author>
<author>
<name sortKey="Shackleton, M" uniqKey="Shackleton M">M Shackleton</name>
</author>
<author>
<name sortKey="Forrest, Nc" uniqKey="Forrest N">NC Forrest</name>
</author>
<author>
<name sortKey="Hartley, L" uniqKey="Hartley L">L Hartley</name>
</author>
<author>
<name sortKey="Robb, L" uniqKey="Robb L">L Robb</name>
</author>
<author>
<name sortKey="Grosveld, Fg" uniqKey="Grosveld F">FG Grosveld</name>
</author>
<author>
<name sortKey="Van Der Wees, J" uniqKey="Van Der Wees J">J van der Wees</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiang, Yz" uniqKey="Jiang Y">YZ Jiang</name>
</author>
<author>
<name sortKey="Yu, Kd" uniqKey="Yu K">KD Yu</name>
</author>
<author>
<name sortKey="Zuo, Wj" uniqKey="Zuo W">WJ Zuo</name>
</author>
<author>
<name sortKey="Peng, Wt" uniqKey="Peng W">WT Peng</name>
</author>
<author>
<name sortKey="Shao, Zm" uniqKey="Shao Z">ZM Shao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccleskey, Bc" uniqKey="Mccleskey B">BC McCleskey</name>
</author>
<author>
<name sortKey="Penedo, Tl" uniqKey="Penedo T">TL Penedo</name>
</author>
<author>
<name sortKey="Zhang, K" uniqKey="Zhang K">K Zhang</name>
</author>
<author>
<name sortKey="Hameed, O" uniqKey="Hameed O">O Hameed</name>
</author>
<author>
<name sortKey="Siegal, Gp" uniqKey="Siegal G">GP Siegal</name>
</author>
<author>
<name sortKey="Wei, S" uniqKey="Wei S">S Wei</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vargova, K" uniqKey="Vargova K">K Vargova</name>
</author>
<author>
<name sortKey="Curik, N" uniqKey="Curik N">N Curik</name>
</author>
<author>
<name sortKey="Burda, P" uniqKey="Burda P">P Burda</name>
</author>
<author>
<name sortKey="Basova, P" uniqKey="Basova P">P Basova</name>
</author>
<author>
<name sortKey="Kulvait, V" uniqKey="Kulvait V">V Kulvait</name>
</author>
<author>
<name sortKey="Pospisil, V" uniqKey="Pospisil V">V Pospisil</name>
</author>
<author>
<name sortKey="Savvulidi, F" uniqKey="Savvulidi F">F Savvulidi</name>
</author>
<author>
<name sortKey="Kokavec, J" uniqKey="Kokavec J">J Kokavec</name>
</author>
<author>
<name sortKey="Necas, E" uniqKey="Necas E">E Necas</name>
</author>
<author>
<name sortKey="Berkova, A" uniqKey="Berkova A">A Berkova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khan, Fh" uniqKey="Khan F">FH Khan</name>
</author>
<author>
<name sortKey="Pandian, V" uniqKey="Pandian V">V Pandian</name>
</author>
<author>
<name sortKey="Ramraj, S" uniqKey="Ramraj S">S Ramraj</name>
</author>
<author>
<name sortKey="Aravindan, S" uniqKey="Aravindan S">S Aravindan</name>
</author>
<author>
<name sortKey="Herman, Ts" uniqKey="Herman T">TS Herman</name>
</author>
<author>
<name sortKey="Aravindan, N" uniqKey="Aravindan N">N Aravindan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, X" uniqKey="Chen X">X Chen</name>
</author>
<author>
<name sortKey="Iliopoulos, D" uniqKey="Iliopoulos D">D Iliopoulos</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
<author>
<name sortKey="Tang, Q" uniqKey="Tang Q">Q Tang</name>
</author>
<author>
<name sortKey="Greenblatt, Mb" uniqKey="Greenblatt M">MB Greenblatt</name>
</author>
<author>
<name sortKey="Hatziapostolou, M" uniqKey="Hatziapostolou M">M Hatziapostolou</name>
</author>
<author>
<name sortKey="Lim, E" uniqKey="Lim E">E Lim</name>
</author>
<author>
<name sortKey="Tam, Wl" uniqKey="Tam W">WL Tam</name>
</author>
<author>
<name sortKey="Ni, M" uniqKey="Ni M">M Ni</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garczyk, S" uniqKey="Garczyk S">S Garczyk</name>
</author>
<author>
<name sortKey="Von Stillfried, S" uniqKey="Von Stillfried S">S von Stillfried</name>
</author>
<author>
<name sortKey="Antonopoulos, W" uniqKey="Antonopoulos W">W Antonopoulos</name>
</author>
<author>
<name sortKey="Hartmann, A" uniqKey="Hartmann A">A Hartmann</name>
</author>
<author>
<name sortKey="Schrauder, Mg" uniqKey="Schrauder M">MG Schrauder</name>
</author>
<author>
<name sortKey="Fasching, Pa" uniqKey="Fasching P">PA Fasching</name>
</author>
<author>
<name sortKey="Anzeneder, T" uniqKey="Anzeneder T">T Anzeneder</name>
</author>
<author>
<name sortKey="Tannapfel, A" uniqKey="Tannapfel A">A Tannapfel</name>
</author>
<author>
<name sortKey="Ergonenc, Y" uniqKey="Ergonenc Y">Y Ergönenc</name>
</author>
<author>
<name sortKey="Knuchel, R" uniqKey="Knuchel R">R Knüchel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yamamoto Ibusuki, M" uniqKey="Yamamoto Ibusuki M">M Yamamoto-Ibusuki</name>
</author>
<author>
<name sortKey="Yamamoto, Y" uniqKey="Yamamoto Y">Y Yamamoto</name>
</author>
<author>
<name sortKey="Fujiwara, S" uniqKey="Fujiwara S">S Fujiwara</name>
</author>
<author>
<name sortKey="Sueta, A" uniqKey="Sueta A">A Sueta</name>
</author>
<author>
<name sortKey="Yamamoto, S" uniqKey="Yamamoto S">S Yamamoto</name>
</author>
<author>
<name sortKey="Hayashi, M" uniqKey="Hayashi M">M Hayashi</name>
</author>
<author>
<name sortKey="Tomiguchi, M" uniqKey="Tomiguchi M">M Tomiguchi</name>
</author>
<author>
<name sortKey="Takeshita, T" uniqKey="Takeshita T">T Takeshita</name>
</author>
<author>
<name sortKey="Iwase, H" uniqKey="Iwase H">H Iwase</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="May, Fe" uniqKey="May F">FE May</name>
</author>
<author>
<name sortKey="Westley, Br" uniqKey="Westley B">BR Westley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andres, Sa" uniqKey="Andres S">SA Andres</name>
</author>
<author>
<name sortKey="Brock, Gn" uniqKey="Brock G">GN Brock</name>
</author>
<author>
<name sortKey="Wittliff, Jl" uniqKey="Wittliff J">JL Wittliff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andres, Sa" uniqKey="Andres S">SA Andres</name>
</author>
<author>
<name sortKey="Smolenkova, Ia" uniqKey="Smolenkova I">IA Smolenkova</name>
</author>
<author>
<name sortKey="Wittliff, Jl" uniqKey="Wittliff J">JL Wittliff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parris, Tz" uniqKey="Parris T">TZ Parris</name>
</author>
<author>
<name sortKey="Danielsson, A" uniqKey="Danielsson A">A Danielsson</name>
</author>
<author>
<name sortKey="Nemes, S" uniqKey="Nemes S">S Nemes</name>
</author>
<author>
<name sortKey="Kovacs, A" uniqKey="Kovacs A">A Kovács</name>
</author>
<author>
<name sortKey="Delle, U" uniqKey="Delle U">U Delle</name>
</author>
<author>
<name sortKey="Fallenius, G" uniqKey="Fallenius G">G Fallenius</name>
</author>
<author>
<name sortKey="Mollerstrom, E" uniqKey="Mollerstrom E">E Möllerström</name>
</author>
<author>
<name sortKey="Karlsson, P" uniqKey="Karlsson P">P Karlsson</name>
</author>
<author>
<name sortKey="Helou, K" uniqKey="Helou K">K Helou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lefevre, L" uniqKey="Lefevre L">L Lefevre</name>
</author>
<author>
<name sortKey="Omeiri, H" uniqKey="Omeiri H">H Omeiri</name>
</author>
<author>
<name sortKey="Drougat, L" uniqKey="Drougat L">L Drougat</name>
</author>
<author>
<name sortKey="Hantel, C" uniqKey="Hantel C">C Hantel</name>
</author>
<author>
<name sortKey="Giraud, M" uniqKey="Giraud M">M Giraud</name>
</author>
<author>
<name sortKey="Val, P" uniqKey="Val P">P Val</name>
</author>
<author>
<name sortKey="Rodriguez, S" uniqKey="Rodriguez S">S Rodriguez</name>
</author>
<author>
<name sortKey="Perlemoine, K" uniqKey="Perlemoine K">K Perlemoine</name>
</author>
<author>
<name sortKey="Blugeon, C" uniqKey="Blugeon C">C Blugeon</name>
</author>
<author>
<name sortKey="Beuschlein, F" uniqKey="Beuschlein F">F Beuschlein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosner, Mh" uniqKey="Rosner M">MH Rosner</name>
</author>
<author>
<name sortKey="Vigano, Ma" uniqKey="Vigano M">MA Vigano</name>
</author>
<author>
<name sortKey="Ozato, K" uniqKey="Ozato K">K Ozato</name>
</author>
<author>
<name sortKey="Timmons, Pm" uniqKey="Timmons P">PM Timmons</name>
</author>
<author>
<name sortKey="Poirie, F" uniqKey="Poirie F">F Poirie</name>
</author>
<author>
<name sortKey="Rigby, Pw" uniqKey="Rigby P">PW Rigby</name>
</author>
<author>
<name sortKey="Staudt, Lm" uniqKey="Staudt L">LM Staudt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Scholer, Hr" uniqKey="Scholer H">HR Schöler</name>
</author>
<author>
<name sortKey="Ruppert, S" uniqKey="Ruppert S">S Ruppert</name>
</author>
<author>
<name sortKey="Suzuki, N" uniqKey="Suzuki N">N Suzuki</name>
</author>
<author>
<name sortKey="Chowdhury, K" uniqKey="Chowdhury K">K Chowdhury</name>
</author>
<author>
<name sortKey="Gruss, P" uniqKey="Gruss P">P Gruss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niwa, H" uniqKey="Niwa H">H Niwa</name>
</author>
<author>
<name sortKey="Miyazaki, J I" uniqKey="Miyazaki J">J-i Miyazaki</name>
</author>
<author>
<name sortKey="Smith, Ag" uniqKey="Smith A">AG Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matin, Mm" uniqKey="Matin M">MM Matin</name>
</author>
<author>
<name sortKey="Walsh, Jr" uniqKey="Walsh J">JR Walsh</name>
</author>
<author>
<name sortKey="Gokhale, Pj" uniqKey="Gokhale P">PJ Gokhale</name>
</author>
<author>
<name sortKey="Draper, Js" uniqKey="Draper J">JS Draper</name>
</author>
<author>
<name sortKey="Bahrami, Ar" uniqKey="Bahrami A">AR Bahrami</name>
</author>
<author>
<name sortKey="Morton, I" uniqKey="Morton I">I Morton</name>
</author>
<author>
<name sortKey="Moore, Hd" uniqKey="Moore H">HD Moore</name>
</author>
<author>
<name sortKey="Andrews, Pw" uniqKey="Andrews P">PW Andrews</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bock, C" uniqKey="Bock C">C Bock</name>
</author>
<author>
<name sortKey="Kiskinis, E" uniqKey="Kiskinis E">E Kiskinis</name>
</author>
<author>
<name sortKey="Verstappen, G" uniqKey="Verstappen G">G Verstappen</name>
</author>
<author>
<name sortKey="Gu, H" uniqKey="Gu H">H Gu</name>
</author>
<author>
<name sortKey="Boulting, G" uniqKey="Boulting G">G Boulting</name>
</author>
<author>
<name sortKey="Smith, Zd" uniqKey="Smith Z">ZD Smith</name>
</author>
<author>
<name sortKey="Ziller, M" uniqKey="Ziller M">M Ziller</name>
</author>
<author>
<name sortKey="Croft, Gf" uniqKey="Croft G">GF Croft</name>
</author>
<author>
<name sortKey="Amoroso, Mw" uniqKey="Amoroso M">MW Amoroso</name>
</author>
<author>
<name sortKey="Oakley, Dh" uniqKey="Oakley D">DH Oakley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Briggs, Ja" uniqKey="Briggs J">JA Briggs</name>
</author>
<author>
<name sortKey="Sun, J" uniqKey="Sun J">J Sun</name>
</author>
<author>
<name sortKey="Shepherd, J" uniqKey="Shepherd J">J Shepherd</name>
</author>
<author>
<name sortKey="Ovchinnikov, Da" uniqKey="Ovchinnikov D">DA Ovchinnikov</name>
</author>
<author>
<name sortKey="Chung, Tl" uniqKey="Chung T">TL Chung</name>
</author>
<author>
<name sortKey="Nayler, Sp" uniqKey="Nayler S">SP Nayler</name>
</author>
<author>
<name sortKey="Kao, Lp" uniqKey="Kao L">LP Kao</name>
</author>
<author>
<name sortKey="Morrow, Ca" uniqKey="Morrow C">CA Morrow</name>
</author>
<author>
<name sortKey="Thakar, Ny" uniqKey="Thakar N">NY Thakar</name>
</author>
<author>
<name sortKey="Soo, Sy" uniqKey="Soo S">SY Soo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chung, Hc" uniqKey="Chung H">HC Chung</name>
</author>
<author>
<name sortKey="Lin, Rc" uniqKey="Lin R">RC Lin</name>
</author>
<author>
<name sortKey="Logan, Gj" uniqKey="Logan G">GJ Logan</name>
</author>
<author>
<name sortKey="Alexander, Ie" uniqKey="Alexander I">IE Alexander</name>
</author>
<author>
<name sortKey="Sachdev, Ps" uniqKey="Sachdev P">PS Sachdev</name>
</author>
<author>
<name sortKey="Sidhu, Ks" uniqKey="Sidhu K">KS Sidhu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ebert, Ad" uniqKey="Ebert A">AD Ebert</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Rose, Ff" uniqKey="Rose F">FF Rose</name>
</author>
<author>
<name sortKey="Mattis, Vb" uniqKey="Mattis V">VB Mattis</name>
</author>
<author>
<name sortKey="Lorson, Cl" uniqKey="Lorson C">CL Lorson</name>
</author>
<author>
<name sortKey="Thomson, Ja" uniqKey="Thomson J">JA Thomson</name>
</author>
<author>
<name sortKey="Svendsen, Cn" uniqKey="Svendsen C">CN Svendsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guenther, Mg" uniqKey="Guenther M">MG Guenther</name>
</author>
<author>
<name sortKey="Frampton, Gm" uniqKey="Frampton G">GM Frampton</name>
</author>
<author>
<name sortKey="Soldner, F" uniqKey="Soldner F">F Soldner</name>
</author>
<author>
<name sortKey="Hockemeyer, D" uniqKey="Hockemeyer D">D Hockemeyer</name>
</author>
<author>
<name sortKey="Mitalipova, M" uniqKey="Mitalipova M">M Mitalipova</name>
</author>
<author>
<name sortKey="Jaenisch, R" uniqKey="Jaenisch R">R Jaenisch</name>
</author>
<author>
<name sortKey="Young, Ra" uniqKey="Young R">RA Young</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maherali, N" uniqKey="Maherali N">N Maherali</name>
</author>
<author>
<name sortKey="Ahfeldt, T" uniqKey="Ahfeldt T">T Ahfeldt</name>
</author>
<author>
<name sortKey="Rigamonti, A" uniqKey="Rigamonti A">A Rigamonti</name>
</author>
<author>
<name sortKey="Utikal, J" uniqKey="Utikal J">J Utikal</name>
</author>
<author>
<name sortKey="Cowan, C" uniqKey="Cowan C">C Cowan</name>
</author>
<author>
<name sortKey="Hochedlinger, K" uniqKey="Hochedlinger K">K Hochedlinger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marchetto, Mc" uniqKey="Marchetto M">MC Marchetto</name>
</author>
<author>
<name sortKey="Carromeu, C" uniqKey="Carromeu C">C Carromeu</name>
</author>
<author>
<name sortKey="Acab, A" uniqKey="Acab A">A Acab</name>
</author>
<author>
<name sortKey="Yu, D" uniqKey="Yu D">D Yu</name>
</author>
<author>
<name sortKey="Yeo, Gw" uniqKey="Yeo G">GW Yeo</name>
</author>
<author>
<name sortKey="Mu, Y" uniqKey="Mu Y">Y Mu</name>
</author>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
<author>
<name sortKey="Gage, Fh" uniqKey="Gage F">FH Gage</name>
</author>
<author>
<name sortKey="Muotri, Ar" uniqKey="Muotri A">AR Muotri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Takahashi, K" uniqKey="Takahashi K">K Takahashi</name>
</author>
<author>
<name sortKey="Tanabe, K" uniqKey="Tanabe K">K Tanabe</name>
</author>
<author>
<name sortKey="Ohnuki, M" uniqKey="Ohnuki M">M Ohnuki</name>
</author>
<author>
<name sortKey="Narita, M" uniqKey="Narita M">M Narita</name>
</author>
<author>
<name sortKey="Sasaki, A" uniqKey="Sasaki A">A Sasaki</name>
</author>
<author>
<name sortKey="Yamamoto, M" uniqKey="Yamamoto M">M Yamamoto</name>
</author>
<author>
<name sortKey="Nakamura, M" uniqKey="Nakamura M">M Nakamura</name>
</author>
<author>
<name sortKey="Sutou, K" uniqKey="Sutou K">K Sutou</name>
</author>
<author>
<name sortKey="Osafune, K" uniqKey="Osafune K">K Osafune</name>
</author>
<author>
<name sortKey="Yamanaka, S" uniqKey="Yamanaka S">S Yamanaka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andrade, Ln" uniqKey="Andrade L">LN Andrade</name>
</author>
<author>
<name sortKey="Nathanson, Jl" uniqKey="Nathanson J">JL Nathanson</name>
</author>
<author>
<name sortKey="Yeo, Gw" uniqKey="Yeo G">GW Yeo</name>
</author>
<author>
<name sortKey="Menck, Cfm" uniqKey="Menck C">CFM Menck</name>
</author>
<author>
<name sortKey="Muotri, Ar" uniqKey="Muotri A">AR Muotri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hu, K" uniqKey="Hu K">K Hu</name>
</author>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Suknuntha, K" uniqKey="Suknuntha K">K Suknuntha</name>
</author>
<author>
<name sortKey="Tian, S" uniqKey="Tian S">S Tian</name>
</author>
<author>
<name sortKey="Montgomery, K" uniqKey="Montgomery K">K Montgomery</name>
</author>
<author>
<name sortKey="Choi, Kd" uniqKey="Choi K">KD Choi</name>
</author>
<author>
<name sortKey="Stewart, R" uniqKey="Stewart R">R Stewart</name>
</author>
<author>
<name sortKey="Thomson, Ja" uniqKey="Thomson J">JA Thomson</name>
</author>
<author>
<name sortKey="Slukvin, Ii" uniqKey="Slukvin I">II Slukvin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Kim, Ch" uniqKey="Kim C">CH Kim</name>
</author>
<author>
<name sortKey="Moon, Ji" uniqKey="Moon J">JI Moon</name>
</author>
<author>
<name sortKey="Chung, Yg" uniqKey="Chung Y">YG Chung</name>
</author>
<author>
<name sortKey="Chang, My" uniqKey="Chang M">MY Chang</name>
</author>
<author>
<name sortKey="Han, Bs" uniqKey="Han B">BS Han</name>
</author>
<author>
<name sortKey="Ko, S" uniqKey="Ko S">S Ko</name>
</author>
<author>
<name sortKey="Yang, E" uniqKey="Yang E">E Yang</name>
</author>
<author>
<name sortKey="Cha, Ky" uniqKey="Cha K">KY Cha</name>
</author>
<author>
<name sortKey="Lanza, R" uniqKey="Lanza R">R Lanza</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Loewer, S" uniqKey="Loewer S">S Loewer</name>
</author>
<author>
<name sortKey="Cabili, Mn" uniqKey="Cabili M">MN Cabili</name>
</author>
<author>
<name sortKey="Guttman, M" uniqKey="Guttman M">M Guttman</name>
</author>
<author>
<name sortKey="Loh, Yh" uniqKey="Loh Y">YH Loh</name>
</author>
<author>
<name sortKey="Thomas, K" uniqKey="Thomas K">K Thomas</name>
</author>
<author>
<name sortKey="Park, Ih" uniqKey="Park I">IH Park</name>
</author>
<author>
<name sortKey="Garber, M" uniqKey="Garber M">M Garber</name>
</author>
<author>
<name sortKey="Curran, M" uniqKey="Curran M">M Curran</name>
</author>
<author>
<name sortKey="Onder, T" uniqKey="Onder T">T Onder</name>
</author>
<author>
<name sortKey="Agarwal, S" uniqKey="Agarwal S">S Agarwal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Si Tayeb, K" uniqKey="Si Tayeb K">K Si-Tayeb</name>
</author>
<author>
<name sortKey="Noto, Fk" uniqKey="Noto F">FK Noto</name>
</author>
<author>
<name sortKey="Nagaoka, M" uniqKey="Nagaoka M">M Nagaoka</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Battle, Ma" uniqKey="Battle M">MA Battle</name>
</author>
<author>
<name sortKey="Duris, C" uniqKey="Duris C">C Duris</name>
</author>
<author>
<name sortKey="North, Pe" uniqKey="North P">PE North</name>
</author>
<author>
<name sortKey="Dalton, S" uniqKey="Dalton S">S Dalton</name>
</author>
<author>
<name sortKey="Duncan, Sa" uniqKey="Duncan S">SA Duncan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vitale, Am" uniqKey="Vitale A">AM Vitale</name>
</author>
<author>
<name sortKey="Matigian, Na" uniqKey="Matigian N">NA Matigian</name>
</author>
<author>
<name sortKey="Ravishankar, S" uniqKey="Ravishankar S">S Ravishankar</name>
</author>
<author>
<name sortKey="Bellette, B" uniqKey="Bellette B">B Bellette</name>
</author>
<author>
<name sortKey="Wood, Sa" uniqKey="Wood S">SA Wood</name>
</author>
<author>
<name sortKey="Wolvetang, Ej" uniqKey="Wolvetang E">EJ Wolvetang</name>
</author>
<author>
<name sortKey="Mackay Sim, A" uniqKey="Mackay Sim A">A Mackay-Sim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, J" uniqKey="Yu J">J Yu</name>
</author>
<author>
<name sortKey="Hu, K" uniqKey="Hu K">K Hu</name>
</author>
<author>
<name sortKey="Smuga Otto, K" uniqKey="Smuga Otto K">K Smuga-Otto</name>
</author>
<author>
<name sortKey="Tian, S" uniqKey="Tian S">S Tian</name>
</author>
<author>
<name sortKey="Stewart, R" uniqKey="Stewart R">R Stewart</name>
</author>
<author>
<name sortKey="Slukvin, Ii" uniqKey="Slukvin I">II Slukvin</name>
</author>
<author>
<name sortKey="Thomson, Ja" uniqKey="Thomson J">JA Thomson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28241739</article-id>
<article-id pub-id-type="pmc">5327533</article-id>
<article-id pub-id-type="publisher-id">1553</article-id>
<article-id pub-id-type="doi">10.1186/s12859-017-1553-8</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Rohart</surname>
<given-names>Florian</given-names>
</name>
<address>
<email>f.rohart@uq.edu.au</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Eslami</surname>
<given-names>Aida</given-names>
</name>
<address>
<email>Aida.Eslami@hli.ubc.ca</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Matigian</surname>
<given-names>Nicholas</given-names>
</name>
<address>
<email>n.matigian@uq.edu.au</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bougeard</surname>
<given-names>Stéphanie</given-names>
</name>
<address>
<email>Stephanie.BOUGEARD@anses.fr</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Lê Cao</surname>
<given-names>Kim-Anh</given-names>
</name>
<address>
<email>k.lecao@uq.edu.au</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9320 7537</institution-id>
<institution-id institution-id-type="GRID">grid.1003.2</institution-id>
<institution>The University of Queensland Diamantina Institute,</institution>
<institution>The University of Queensland, Translational Research Institute,</institution>
</institution-wrap>
Brisbane, 4102 QLD Australia</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2288 9830</institution-id>
<institution-id institution-id-type="GRID">grid.17091.3e</institution-id>
<institution>Centre for Heart Lung Innovation,</institution>
<institution>University of British Columbia,</institution>
</institution-wrap>
Vancouver, BC V6Z 1Y6 Canada</aff>
<aff id="Aff3">
<label>3</label>
French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, 22440 France</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>27</day>
<month>2</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>27</day>
<month>2</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>18</volume>
<elocation-id>128</elocation-id>
<history>
<date date-type="received">
<day>23</day>
<month>9</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>2</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2017</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods.</p>
</sec>
<sec>
<title>Results</title>
<p>To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method,
<italic>MINT</italic>
, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>
<italic>MINT</italic>
is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies.
<italic>MINT</italic>
is computationally fast as part of the mixOmics R CRAN package, available at
<ext-link ext-link-type="uri" xlink:href="http://www.mixOmics.org/mixMINT/">http://www.mixOmics.org/mixMINT/</ext-link>
and
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/web/packages/mixOmics/">http://cran.r-project.org/web/packages/mixOmics/</ext-link>
.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-017-1553-8) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Integration</kwd>
<kwd>Multivariate</kwd>
<kwd>Classification</kwd>
<kwd>Transcriptome analysis</kwd>
<kwd>Algorithm</kwd>
<kwd>Partial-least-square</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000923</institution-id>
<institution>Australian Research Council</institution>
</institution-wrap>
</funding-source>
<award-id>DP130100777</award-id>
<principal-award-recipient>
<name>
<surname>Rohart</surname>
<given-names>Florian</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000947</institution-id>
<institution>Australian Cancer Research Foundation</institution>
</institution-wrap>
</funding-source>
</award-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100000925</institution-id>
<institution>National Health and Medical Research Council</institution>
</institution-wrap>
</funding-source>
<award-id>APP1087415</award-id>
<principal-award-recipient>
<name>
<surname>Lê Cao</surname>
<given-names>Kim-Anh</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2017</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>High-throughput technologies, based on microarray and RNA-sequencing, are now being used to identify biomarkers or gene signatures that distinguish disease subgroups, predict cell phenotypes or classify responses to therapeutic drugs. However, few of these findings are reproduced when assessed in subsequent studies and even fewer lead to clinical applications [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR2">2</xref>
]. The poor reproducibility of identified gene signatures is most likely a consequence of high-dimensional data, in which the number of genes or transcripts being analysed is very high (often several thousands) relative to a comparatively small sample size being used (<20).</p>
<p>One way to increase sample size is to combine raw data from independent experiments in an integrative analysis. This would improve both the statistical power of the analysis and the reproducibility of the gene signatures that are identified [
<xref ref-type="bibr" rid="CR3">3</xref>
]. However, integrating transcriptomic studies with the aim of classifying biological samples based on an outcome of interest (integrative classification) has a number of challenges. Transcriptomic studies often differ from each other in a number of ways, such as in their experimental protocols or in the technological platform used. These differences can lead to so-called ‘batch-effects’, or systematic variation across studies, which is an important source of confounding [
<xref ref-type="bibr" rid="CR4">4</xref>
]. Technological platform, in particular, has been shown to be an important confounder that affects the reproducibility of transcriptomic studies [
<xref ref-type="bibr" rid="CR5">5</xref>
]. In the MicroArray Quality Control (MAQC) project, poor overlap of differentially expressed genes was observed across different microarray platforms (∼ 60%), with low concordance observed between microarray and RNA-seq technologies specifically [
<xref ref-type="bibr" rid="CR6">6</xref>
]. Therefore, these confounding factors and sources of systematic variation must be accounted for, when combining independent studies, to enable genuine biological variation to be identified.</p>
<p>The common approach to integrative classification is sequential. A first step consists of removing batch-effect by applying for instance ComBat [
<xref ref-type="bibr" rid="CR7">7</xref>
], FAbatch [
<xref ref-type="bibr" rid="CR8">8</xref>
], Batch Mean-Centering [
<xref ref-type="bibr" rid="CR9">9</xref>
], LMM-EH-PS [
<xref ref-type="bibr" rid="CR10">10</xref>
], RUV-2 [
<xref ref-type="bibr" rid="CR4">4</xref>
] or YuGene [
<xref ref-type="bibr" rid="CR11">11</xref>
]. A second step fits a statistical model to classify biological samples and predict the class membership of new samples. A range of classification methods also exists for these purposes, including machine learning approaches (e.g. random forests [
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR13">13</xref>
] or Support Vector Machine [
<xref ref-type="bibr" rid="CR14">14</xref>
<xref ref-type="bibr" rid="CR16">16</xref>
]) as well as multivariate linear approaches (Linear Discriminant Analysis LDA, Partial Least Square Discriminant Analysis PLSDA [
<xref ref-type="bibr" rid="CR17">17</xref>
], or sparse PLSDA [
<xref ref-type="bibr" rid="CR18">18</xref>
]).</p>
<p>The major pitfall of the sequential approach is a risk of over-optimistic results from overfitting of the training set. This leads to signatures that cannot be reproduced on test sets. Moreover, most proposed classification models have not been objectively validated on an external and independent test set. Thus, spurious conclusions can be generated when using these methods, leading to limited potential for translating results into reliable clinical tools [
<xref ref-type="bibr" rid="CR2">2</xref>
]. For instance, most classification methods require the choice of a parameter (e.g. sparsity), which is usually optimised with cross-validation (data are divided into k subsets or ‘folds’ and each fold is used once as an internal test set). Unless the removal of batch-effects is performed independently on each fold, the folds are not independent and this leads to over-optimistic classification accuracy on the internal test sets. Hence, batch removal methods must be used with caution. For instance, ComBat can not remove unwanted variation in an independent test set alone as it requires the test set to be normalised with the learning set in a transductive rather than inductive approach [
<xref ref-type="bibr" rid="CR19">19</xref>
]. This is a clear example where over-fitting and over-optimistic results can be an issue, even when a test set is considered.</p>
<p>To address existing limitations of current data integration approaches and the poor reproducibility of results, we propose a novel Multivariate INTegrative method,
<italic>MINT</italic>
.
<italic>MINT</italic>
is the first approach of its kind that integrates independent data sets while
<italic>simultaneously</italic>
, accounting for unwanted (study) variation, classifying samples and identifying key discriminant variables.
<italic>MINT</italic>
predicts the class of new samples from external studies, which enables a direct assessment of its performance. It also provides insightful graphical outputs to improve interpretation and inspect each study during the integration process.</p>
<p>We validated MINT in a subset of the MAQC project, which was carefully designed to enable assessment of unwanted systematic variation. We then combined microarray and RNA-seq experiments to classify samples from three human cell types (human Fibroblasts (Fib), human Embryonic Stem Cells (hESC) and human induced Pluripotent Stem Cells (hiPSC)) and from four classes of breast cancer (subtype
<italic>Basal, HER2, Luminal A</italic>
and
<italic>Luminal B</italic>
). We use these datasets to demonstrate the reproducibility of gene signatures identified by
<italic>MINT</italic>
.</p>
</sec>
<sec id="Sec2">
<title>Methods</title>
<p>We use the following notations. Let
<italic>X</italic>
denote a data matrix of size
<italic>N</italic>
observations (rows) ×
<italic>P</italic>
variables (e.g. gene expression levels, in columns) and
<italic>Y</italic>
a dummy matrix indicating each sample class membership of size
<italic>N</italic>
observations (rows) ×
<italic>K</italic>
categories outcome (columns). We assume that the data are partitioned into
<italic>M</italic>
groups corresponding to each independent study
<italic>m</italic>
: {(
<italic>X</italic>
<sup>(1)</sup>
,
<italic>Y</italic>
<sup>(1)</sup>
),…,(
<italic>X</italic>
<sup>(
<italic>M</italic>
)</sup>
,
<italic>Y</italic>
<sup>(
<italic>M</italic>
)</sup>
)} so that
<inline-formula id="IEq1">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sum _{m=1}^{M} n_{m}=N$\end{document}</tex-math>
<mml:math id="M2">
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>N</mml:mi>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
, where
<italic>n</italic>
<sub>
<italic>m</italic>
</sub>
is the number of samples in group
<italic>m</italic>
, see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S1. Each variable from the data set
<italic>X</italic>
<sup>(
<italic>m</italic>
)</sup>
and
<italic>Y</italic>
<sup>(
<italic>m</italic>
)</sup>
is centered and has unit variance. We write
<italic>X</italic>
and
<italic>Y</italic>
the concatenation of all
<italic>X</italic>
<sup>(
<italic>m</italic>
)</sup>
and
<italic>Y</italic>
<sup>(
<italic>m</italic>
)</sup>
, respectively. Note that if an internal known batch effect is present in a study, this study should be split according to that batch effect factor into several sub-studies considered as independent. For
<inline-formula id="IEq2">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$n\in \mathbb {N}$\end{document}</tex-math>
<mml:math id="M4">
<mml:mi>n</mml:mi>
<mml:mo></mml:mo>
<mml:mi></mml:mi>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
, we denote for all
<inline-formula id="IEq3">
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$a\in \mathbb {R}^{n}$\end{document}</tex-math>
<mml:math id="M6">
<mml:mi>a</mml:mi>
<mml:mo></mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi></mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq3.gif"></inline-graphic>
</alternatives>
</inline-formula>
its
<italic></italic>
<sup>1</sup>
norm
<inline-formula id="IEq4">
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$||a||_{1}=\sum _{1}^{n}|a_{j}|$\end{document}</tex-math>
<mml:math id="M8">
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq4.gif"></inline-graphic>
</alternatives>
</inline-formula>
and its
<italic></italic>
<sup>2</sup>
norm
<inline-formula id="IEq5">
<alternatives>
<tex-math id="M9">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$||a||_{2}=\left (\sum _{1}^{n}a_{j}^{2}\right)^{1/2}$\end{document}</tex-math>
<mml:math id="M10">
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msubsup>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq5.gif"></inline-graphic>
</alternatives>
</inline-formula>
and |
<italic>a</italic>
|
<sub>+</sub>
the positive part of
<italic>a</italic>
. For any matrix we denote by
<sup></sup>
its transpose.</p>
<sec id="Sec3">
<title>PLS-based classification methods to combine independent studies</title>
<p>PLS approaches have been extended to classify samples
<italic>Y</italic>
from a data matrix
<italic>X</italic>
by maximising a formula based on their covariance. Specifically, latent components are built based on the original
<italic>X</italic>
variables to summarise the information and reduce the dimension of the data while discriminating the Y outcome. Samples are then projected into a smaller space spanned by the latent component. We first detail the classical PLS-DA approach and then describe mgPLS, a PLS-based model we previously developed to model a group (study) structure in
<italic>X</italic>
.</p>
<p>
<bold>PLS-DA</bold>
Partial Least Squares Discriminant Analysis [
<xref ref-type="bibr" rid="CR17">17</xref>
] is an extension of PLS for a classification frameworks where
<italic>Y</italic>
is a dummy matrix indicating sample class membership. In our study, we applied PLS-DA as an integrative approach by naively concatenating all studies. Briefly, PLS-DA is an iterative method that constructs
<italic>H</italic>
successive artificial (latent) components
<italic>t</italic>
<sub>
<italic>h</italic>
</sub>
=
<italic>X</italic>
<sub>
<italic>h</italic>
</sub>
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
and
<italic>u</italic>
<sub>
<italic>h</italic>
</sub>
=
<italic>Y</italic>
<sub>
<italic>h</italic>
</sub>
<italic>b</italic>
<sub>
<italic>h</italic>
</sub>
for
<italic>h</italic>
=1,..,
<italic>H</italic>
, where the
<italic>h</italic>
<sup>
<italic>th</italic>
</sup>
component
<italic>t</italic>
<sub>
<italic>h</italic>
</sub>
(respectively
<italic>u</italic>
<sub>
<italic>h</italic>
</sub>
) is a linear combination of the
<italic>X</italic>
(
<italic>Y</italic>
) variables.
<italic>H</italic>
denotes the dimension of the PLS-DA model. The weight coefficient vector
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
(
<italic>b</italic>
<sub>
<italic>h</italic>
</sub>
) is the loading vector that indicates the
<italic>importance</italic>
of each variable to define the component. For each dimension
<italic>h</italic>
=1,…,
<italic>H</italic>
PLS-DA seeks to maximize
<disp-formula id="Equ1">
<label>1</label>
<alternatives>
<tex-math id="M11">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \underset{||a_{h}||_{2} = ||b_{h}||_{2} =1}{\max }cov(X_{h} a_{h}, Y_{h} b_{h}), $$ \end{document}</tex-math>
<mml:math id="M12">
<mml:munder>
<mml:mrow>
<mml:mo>max</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mtext mathvariant="italic">cov</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2017_1553_Article_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<italic>X</italic>
<sub>
<italic>h</italic>
</sub>
,
<italic>Y</italic>
<sub>
<italic>h</italic>
</sub>
are residual matrices (obtained through a
<italic>deflation step</italic>
, as detailed in [
<xref ref-type="bibr" rid="CR18">18</xref>
]). The PLS-DA algorithm is described in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Supplemental Material S1. The PLS-DA model assigns to each sample
<italic>i</italic>
a pair of
<italic>H</italic>
scores
<inline-formula id="IEq6">
<alternatives>
<tex-math id="M13">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$(t_{h}^{i}, u_{h}^{i})$\end{document}</tex-math>
<mml:math id="M14">
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq6.gif"></inline-graphic>
</alternatives>
</inline-formula>
which effectively represents the projection of that sample into the
<italic>X</italic>
- or
<italic>Y</italic>
- space spanned by those PLS components. As
<italic>H</italic>
<<
<italic>P</italic>
, the projection space is small, allowing for dimension reduction as well as insightful sample plot representation (e.g. graphical outputs in “
<xref rid="Sec13" ref-type="sec">Results</xref>
” section). While PLS-DA ignores the data group structure inherent to each independent study, it can give satisfactory results when the between groups variance is smaller than the within group variance or when combined with extensive data subsampling to account for systematic variation across platforms [
<xref ref-type="bibr" rid="CR21">21</xref>
].</p>
<p>
<bold>mgPLS</bold>
Multi-group PLS is an extension of the PLS framework we recently proposed to model grouped data [
<xref ref-type="bibr" rid="CR22">22</xref>
,
<xref ref-type="bibr" rid="CR23">23</xref>
], which is relevant for our particular case where the groups represent independent studies. In mgPLS, the PLS-components of each group are constraint to be built based on the same loading vectors in
<italic>X</italic>
and
<italic>Y</italic>
. These
<italic>global</italic>
loading vectors thus allow the samples from each group or study to be projected in the same common space spanned by the PLS-components. We extended the original unsupervised approach to a supervised approach by using a dummy matrix
<italic>Y</italic>
as in PLS-DA to classify samples while modelling the group structure. For each dimension
<italic>h</italic>
=1,…,
<italic>H</italic>
mgPLS-DA seeks to maximize
<disp-formula id="Equ2">
<label>2</label>
<alternatives>
<tex-math id="M15">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \underset{||a_{h}||_{2} = ||b_{h}||_{2} =1}{\max }\sum_{m=1}^{M} n_{m} cov\left(X^{(m)}_{h}a_{h}, Y^{(m)}_{h} b_{h}\right), $$ \end{document}</tex-math>
<mml:math id="M16">
<mml:munder>
<mml:mrow>
<mml:mo>max</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext mathvariant="italic">cov</mml:mtext>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2017_1553_Article_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
and
<italic>b</italic>
<sub>
<italic>h</italic>
</sub>
are the global loadings vectors common to all groups,
<inline-formula id="IEq7">
<alternatives>
<tex-math id="M17">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{h}^{(m)}=X_{h}^{(m)}a_{h}$\end{document}</tex-math>
<mml:math id="M18">
<mml:msubsup>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq7.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<inline-formula id="IEq8">
<alternatives>
<tex-math id="M19">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$u_{h}^{(m)}=Y_{h}^{(m)}b_{h}$\end{document}</tex-math>
<mml:math id="M20">
<mml:msubsup>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq8.gif"></inline-graphic>
</alternatives>
</inline-formula>
are the group-specific (partial) PLS-components, and
<inline-formula id="IEq9">
<alternatives>
<tex-math id="M21">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$X_{h}^{(m)}$\end{document}</tex-math>
<mml:math id="M22">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq9.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<inline-formula id="IEq10">
<alternatives>
<tex-math id="M23">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Y_{h}^{(m)}$\end{document}</tex-math>
<mml:math id="M24">
<mml:msubsup>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq10.gif"></inline-graphic>
</alternatives>
</inline-formula>
are the residual (deflated) matrices. The global loadings vectors (
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
,
<italic>b</italic>
<sub>
<italic>h</italic>
</sub>
) and global components (
<italic>t</italic>
<sub>
<italic>h</italic>
</sub>
=
<italic>X</italic>
<sub>
<italic>h</italic>
</sub>
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
,
<italic>u</italic>
<sub>
<italic>h</italic>
</sub>
=
<italic>Y</italic>
<sub>
<italic>h</italic>
</sub>
<italic>b</italic>
<sub>
<italic>h</italic>
</sub>
) enable to assess overall classification accuracy, while the group-specific loadings and components provide powerful graphical outputs for each study that is integrated in the analysis. Global and group-specific components and loadings are represented in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S2. The next development we describe below is to include internal variable selection in mgPLS-DA for large dimensional data sets.</p>
</sec>
<sec id="Sec6">
<title>
<italic>MINT</italic>
</title>
<p>Our novel multivariate integrative method
<italic>MINT</italic>
<italic>simultaneously</italic>
integrates independent studies and selects the most discriminant variables to classify samples and predict the class of new samples. MINT seeks for a common projection space for all studies that is defined on a small subset of discriminative variables and that display an analogous discrimination of the samples across studies. The identified variables share common information across all studies and therefore represent a reproducible signature that helps characterising biological systems.
<italic>MINT</italic>
further extends mgPLS-DA by including a
<italic></italic>
<sup>1</sup>
-penalisation on the global loading vector
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
to perform variable selection. For each dimension
<italic>h</italic>
=1,…,
<italic>H</italic>
the
<italic>MINT</italic>
algorithm seeks to maximize
<disp-formula id="Equ3">
<label>3</label>
<alternatives>
<tex-math id="M25">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \underset{||a_{h}||_{2} = ||b_{s}||_{2} =1}{\max }\sum_{m=1}^{M} n_{m} cov(X_{h}^{(m)}a_{h}, Y_{h}^{(m)}b_{h}) + \lambda_{h}||a_{h}||_{1}, $$ \end{document}</tex-math>
<mml:math id="M26">
<mml:munder>
<mml:mrow>
<mml:mo>max</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext mathvariant="italic">cov</mml:mtext>
<mml:mo>(</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>Y</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>λ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>|</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
</mml:math>
<graphic xlink:href="12859_2017_1553_Article_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>where in addition to the notations from Eq. (
<xref rid="Equ2" ref-type="">2</xref>
),
<italic>λ</italic>
<sub>
<italic>h</italic>
</sub>
is a non negative parameter that controls the amount of shrinkage on the global loading vectors
<italic>a</italic>
<sub>
<italic>h</italic>
</sub>
and thus the number of non zero weights. Similarly to Lasso [
<xref ref-type="bibr" rid="CR24">24</xref>
] or sparse PLS-DA [
<xref ref-type="bibr" rid="CR18">18</xref>
], the added
<italic></italic>
<sup>1</sup>
penalisation in
<italic>MINT</italic>
improves interpretability of the PLS-components that are now defined only on a set of selected biomarkers from
<italic>X</italic>
(with non zero weight) that are identified in the linear combination
<inline-formula id="IEq11">
<alternatives>
<tex-math id="M27">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$X_{h}^{(m)}a_{h}$\end{document}</tex-math>
<mml:math id="M28">
<mml:msubsup>
<mml:mrow>
<mml:mi>X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12859_2017_1553_Article_IEq11.gif"></inline-graphic>
</alternatives>
</inline-formula>
. The
<italic></italic>
<sup>1</sup>
penalisation in effectively solved in the
<italic>MINT</italic>
algorithm using soft-thresholding (see pseudo Algorithm 1).</p>
<p>
<graphic xlink:href="12859_2017_1553_Figa_HTML.gif" id="MO1"></graphic>
</p>
<p>In addition to the integrative classification framework, MINT was extended to an integrative regression framework (multiple multivariate regression, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
Supplemental Material S2).</p>
</sec>
<sec id="Sec7">
<title>Class prediction and parameters tuning with
<italic>MINT</italic>
</title>
<p>MINT centers and scales each study from the training set, so that each variable has mean 0 and variance 1, similarly to any PLS methods. Therefore, a similar pre-processing needs to be applied on test sets. If a test sample belongs to a study that is part of the training set, then we apply the same scaling coefficients as from the training study. This is required so that MINT applied on a single study will provide the same results as PLS. If the test study is completely independent, then it is centered and scaled separately.</p>
<p>After scaling the test samples, the prediction framework of PLS is used to estimate the dummy matrix
<italic>Y</italic>
<sub>
<italic>test</italic>
</sub>
of an independent test set
<italic>X</italic>
<sub>
<italic>test</italic>
</sub>
[
<xref ref-type="bibr" rid="CR25">25</xref>
], where each row in
<italic>Y</italic>
<sub>
<italic>test</italic>
</sub>
sums to 1, and each column represents a class of the outcome. A class membership is assigned (predicted) to each test sample by using the maximal distance, as described in [
<xref ref-type="bibr" rid="CR18">18</xref>
]. It consists in assigning the class with maximal positive value in
<italic>Y</italic>
<sub>
<italic>test</italic>
</sub>
.</p>
<p>The main parameter to tune in MINT is the penalty
<italic>λ</italic>
<sub>
<italic>h</italic>
</sub>
for each PLS-component
<italic>h</italic>
, which is usually performed using Cross-Validation (CV). In practice, the parameter
<italic>λ</italic>
<sub>
<italic>h</italic>
</sub>
can be equally replaced by the number of variables to select on each component, which is our preferred user-friendly option. The assessment criterion in the CV can be based on the proportion of misclassified samples, proportion of false or true positives, or, as in our case, the balanced error rate (BER). BER is calculated as the averaged proportion of wrongly classified samples in each class and weights up small sample size classes. We consider BER to be a more objective performance measure than the overall misclassification error rate when dealing with unbalanced classes.
<italic>MINT</italic>
tuning is computationally efficient as it takes advantage of the group data structure in the integrative study. We used a “Leave-One-Group-Out Cross-Validation (LOGOCV)”, which consists in performing CV where group or study
<italic>m</italic>
is left out only once
<italic>m</italic>
=1,…,
<italic>M</italic>
. LOGOCV realistically reflects the true case scenario where prediction is performed on independent external studies based on a reproducible signature identified on the training set. Finally, the total number of components
<italic>H</italic>
in
<italic>MINT</italic>
is set to
<italic>K</italic>
−1,
<italic>K</italic>
= number of classes, similar to PLS-DA and
<italic></italic>
<sup>1</sup>
penalised PLS-DA models [
<xref ref-type="bibr" rid="CR18">18</xref>
].</p>
</sec>
<sec id="Sec8">
<title>Case studies</title>
<p>We demonstrate the ability of
<italic>MINT</italic>
to identify the true positive genes on the MAQC project, then highlight the strong properties of our method to combine independent data sets in order to identify reproducible and predictive gene signatures on two other biological studies.</p>
<p>
<bold>The MicroArray quality control (MAQC) project.</bold>
The extensive MAQC project focused on assessing microarray technologies reproducibility in a controlled environment [
<xref ref-type="bibr" rid="CR5">5</xref>
]. Two reference samples, RNA samples Universal Human Reference (UHR) and Human Brain Reference (HBR) and two mixtures of the original samples were considered. Technical replicates were obtained from three different array platforms -Illumina, AffyHuGene and AffyPrime- for each of the four biological samples A (100% UHR), B (100% HBR), C (75% UHR, 25% HBR) and D (25% UHR and 75% HBR). Data were downloaded from Gene Expression Omnibus (GEO) - GSE56457. In this study, we focused on identifying biomarkers that discriminate A vs. B and C vs. D. The experimental design is referenced in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1.</p>
<p>
<bold>Stem cells.</bold>
We integrated 15 transcriptomics microarray datasets to classify three types of human cells: human Fibroblasts (Fib), human Embryonic Stem Cells (hESC) and human induced Pluripotent Stem Cells (hiPSC). As there exists a biological hierarchy among these three cell types, two sub-classification problems are of interest in our analysis, which we will address simultaneously with
<italic>MINT</italic>
. On the one hand, differences between pluripotent (hiPSC and hESC) and non-pluripotent cells (Fib) are well-characterised and are expected to contribute to the main biological variation. Our first level of analysis will therefore benchmark
<italic>MINT</italic>
against the gold standard in the field. On the other hand, hiPSC are genetically reprogrammed to behave like hESC and both cell types are commonly assumed to be alike. However, differences have been reported in the literature [
<xref ref-type="bibr" rid="CR26">26</xref>
<xref ref-type="bibr" rid="CR28">28</xref>
], justifying the second and more challenging level of classification analysis between hiPSC and hESC. We used the cell type annotations of the 342 samples as provided by the authors of the 15 studies.</p>
<p>The stem cell dataset provides an excellent showcase study to benchmark
<italic>MINT</italic>
against existing statistical methods to solve a rather ambitious classification problem.</p>
<p>Each of the 15 studies was assigned to either a training or test set. Platforms uniquely represented were assigned to the training set and studies with only one sample in one class were assigned to the test set. Remaining studies were randomly assigned to training or test set. Eventually, the training set included eight datasets (210 samples) derived on five commercial platforms and the independent test set included the remaining seven datasets (132 samples) derived on three platforms (Table
<xref rid="Tab1" ref-type="table">1</xref>
).
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Stem cells experimental design</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Experiment</th>
<th align="left">Platform</th>
<th align="left">Fib</th>
<th align="left">hESC</th>
<th align="left">hiPSC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Bock</td>
<td align="left">Affymetrix HT-HG-U133A</td>
<td align="left">6</td>
<td align="left">20</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Briggs</td>
<td align="left">Illumina HumanHT-12 V4</td>
<td align="left">18</td>
<td align="left">3</td>
<td align="left">30</td>
</tr>
<tr>
<td align="left">Chung</td>
<td align="left">Affymetrix HuGene-1.0-ST V1</td>
<td align="left">3</td>
<td align="left">8</td>
<td align="left">10</td>
</tr>
<tr>
<td align="left">Ebert</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">2</td>
<td align="left">5</td>
<td align="left">3</td>
</tr>
<tr>
<td align="left">Guenther</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">2</td>
<td align="left">17</td>
<td align="left">20</td>
</tr>
<tr>
<td align="left">Maherali</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">3</td>
<td align="left">3</td>
<td align="left">15</td>
</tr>
<tr>
<td align="left">Marchetto</td>
<td align="left">Affymetrix HuGene-1.0-ST V1</td>
<td align="left">6</td>
<td align="left">3</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Takahashi</td>
<td align="left">Agilent SurePrint G3 GE 8x60K</td>
<td align="left">3</td>
<td align="left">3</td>
<td align="left">3</td>
</tr>
<tr>
<td align="left">Total training set</td>
<td align="left">5 platforms</td>
<td align="left">43</td>
<td align="left">62</td>
<td align="left">105</td>
</tr>
<tr>
<td align="left">Andrade</td>
<td align="left">Affymetrix HuGene-1.0-ST V1</td>
<td align="left">3</td>
<td align="left">6</td>
<td align="left">15</td>
</tr>
<tr>
<td align="left">Hu</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">1</td>
<td align="left">5</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Kim</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">1</td>
<td align="left">1</td>
<td align="left">3</td>
</tr>
<tr>
<td align="left">Loewer</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">4</td>
<td align="left">2</td>
<td align="left">7</td>
</tr>
<tr>
<td align="left">Si-Tayeb</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">3</td>
<td align="left">6</td>
<td align="left">6</td>
</tr>
<tr>
<td align="left">Vitale</td>
<td align="left">Illumina HumanHT-12 V4</td>
<td align="left">8</td>
<td align="left">3</td>
<td align="left">18</td>
</tr>
<tr>
<td align="left">Yu</td>
<td align="left">Affymetrix HG-U133 Plus2</td>
<td align="left">2</td>
<td align="left">10</td>
<td align="left">16</td>
</tr>
<tr>
<td align="left">Total test set</td>
<td align="left">3 platforms</td>
<td align="left">22</td>
<td align="left">33</td>
<td align="left">77</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>A total of 15 studies were analysed, including three human cell types, human Fibroblasts (Fib), human Embryonic Stem Cells (hESC) and human induced Pluripotent Stem Cells (hiPSC) across five different types of microarray platforms. Eight studies from five microarray platforms were considered as a training set [
<xref ref-type="bibr" rid="CR57">57</xref>
<xref ref-type="bibr" rid="CR64">64</xref>
] and seven independent studies from three of the five platforms were considered as a test set [
<xref ref-type="bibr" rid="CR65">65</xref>
<xref ref-type="bibr" rid="CR71">71</xref>
]</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>The pre-processed files were downloaded from the
<ext-link ext-link-type="uri" xlink:href="http://www.stemformatics.org">http://www.stemformatics.org</ext-link>
collaborative platform [
<xref ref-type="bibr" rid="CR29">29</xref>
]. Each dataset was background corrected, log2 transformed, YuGene normalized and mapped from probes ID to Ensembl ID as previously described in [
<xref ref-type="bibr" rid="CR11">11</xref>
], resulting in 13 313 unique Ensembl gene identifiers. In the case where datasets contained multiple probes for the same Ensembl ID gene, the highest expressed probe was chosen as the representative of that gene in that dataset. The choice of YuGene normalisation was motivated by the need to normalise each sample independently rather than as a part of a whole study (e.g. existing methods ComBat [
<xref ref-type="bibr" rid="CR7">7</xref>
], quantile normalisation (RMA [
<xref ref-type="bibr" rid="CR30">30</xref>
])), to effectively limit over-fitting during the CV evaluation process.</p>
<p>
<bold>Breast cancer.</bold>
We combined whole-genome gene-expression data from two cohorts from the Molecular Taxonomy of Breast Cancer International Consortium project (METABRIC, [
<xref ref-type="bibr" rid="CR31">31</xref>
] and of two cohorts from the Cancer Genome Atlas (TCGA, [
<xref ref-type="bibr" rid="CR32">32</xref>
]) to classify the intrinsic subtypes
<italic>Basal, HER2, Luminal A</italic>
and
<italic>Luminal B</italic>
, as defined by the PAM50 signature [
<xref ref-type="bibr" rid="CR20">20</xref>
]. The METABRIC cohorts data were made available upon request, and were processed by [
<xref ref-type="bibr" rid="CR31">31</xref>
]. TCGA cohorts are gene-expression data from RNA-seq and microarray platforms. RNA-seq data were normalised using Expectation Maximisation (RSEM) and percentile-ranked gene-level transcription estimates. The microarray data were processed as described in [
<xref ref-type="bibr" rid="CR32">32</xref>
].</p>
<p>The training set consisted in three cohorts (TCGA RNA-seq and both METABRIC microarray studies), including the expression levels of 15 803 genes on 2 814 samples; the test set included the TCGA microarray cohort with 254 samples (Table
<xref rid="Tab2" ref-type="table">2</xref>
). Two analyses were conducted, which either included or discarded the PAM50 genes from the data. The first analysis aimed at recovering the PAM50 genes used to classify the samples. The second analysis was performed on 15,755 genes and aimed at identifying an alternative signature to the PAM50.
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Experimental design of four breast cancer cohorts including 4 cancer subtypes:
<italic>Basal, HER2, Luminal A</italic>
(LumA) and
<italic>Luminal B</italic>
(LumB)</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Experiment</th>
<th align="left">Platform</th>
<th align="left">Basal</th>
<th align="left">Her2</th>
<th align="left">LumA</th>
<th align="left">LumB</th>
</tr>
</thead>
<tbody>
<tr>
<td align="justify">METABRIC Discovery</td>
<td align="justify">Illumina HT-12 v3</td>
<td align="left">118</td>
<td align="left">87</td>
<td align="left">466</td>
<td align="left">268</td>
</tr>
<tr>
<td align="justify">METABRIC Validation</td>
<td align="justify">Illumina HT-12 v3</td>
<td align="left">213</td>
<td align="left">153</td>
<td align="left">255</td>
<td align="left">224</td>
</tr>
<tr>
<td align="justify">TCGA RNA-seq</td>
<td align="justify">illumina HiSeq 2000</td>
<td align="left">188</td>
<td align="left">80</td>
<td align="left">549</td>
<td align="left">213</td>
</tr>
<tr>
<td align="justify">Total training set</td>
<td align="justify">2 platforms</td>
<td align="left">519</td>
<td align="left">320</td>
<td align="left">1270</td>
<td align="left">705</td>
</tr>
<tr>
<td align="justify">TCGA microarray</td>
<td align="justify">Agilent custom 244K</td>
<td align="left">57</td>
<td align="left">31</td>
<td align="left">99</td>
<td align="left">67</td>
</tr>
<tr>
<td align="justify">Total test set</td>
<td align="justify">1 platform</td>
<td align="left">57</td>
<td align="left">31</td>
<td align="left">99</td>
<td align="left">67</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec12">
<title>Performance comparison with sequential classification approaches</title>
<p>We compared
<italic>MINT</italic>
with sequential approaches that combine batch-effect removal approaches with classification methods. As a reference, classification methods were also used on their own on a naive concatenation of all studies. Batch-effect removal methods included Batch Mean-Centering (BMC, [
<xref ref-type="bibr" rid="CR9">9</xref>
]), ComBat [
<xref ref-type="bibr" rid="CR7">7</xref>
], linear models (LM) or linear mixed models (LMM), and classification methods included PLS-DA, sPLS-DA [
<xref ref-type="bibr" rid="CR18">18</xref>
], mgPLS [
<xref ref-type="bibr" rid="CR22">22</xref>
,
<xref ref-type="bibr" rid="CR23">23</xref>
] and Random forests (RF [
<xref ref-type="bibr" rid="CR12">12</xref>
]). For LM and LMM, linear models were fitted on each gene and the residuals were extracted as a batch-corrected gene expression [
<xref ref-type="bibr" rid="CR33">33</xref>
,
<xref ref-type="bibr" rid="CR34">34</xref>
]. The study effect was set as a fixed effect with LM or as a random effect with LMM. No sample outcome (e.g. cell-type) was included.</p>
<p>Prediction with ComBat normalised data were obtained as described in [
<xref ref-type="bibr" rid="CR19">19</xref>
]. In this study, we did not include methods that require extra information -as control genes with RUV-2 [
<xref ref-type="bibr" rid="CR4">4</xref>
]- and methods that are not widely available to the community as LMM-EH [
<xref ref-type="bibr" rid="CR10">10</xref>
]. Classification methods were chosen so as to simultaneously discriminate all classes. With the exception of sPLS-DA, none of those methods perform internal variable selection. The multivariate methods PLS-DA, mgPLS and sPLS-DA were run on
<italic>K</italic>
−1 components, sPLS-DA was tuned using 5-fold CV on each component. All classification methods were combined with batch-removal method with the exception of mgPLS that already includes a study structure in the model.</p>
<p>MINT and PLS-DA-like approaches use a prediction threshold based on distances (see “
<xref rid="Sec7" ref-type="sec">Class prediction and parameters tuning with
<italic>MINT</italic>
</xref>
” section) that optimally determines class membership of test samples, and as such do not require receiver operating characteristic (ROC) curves and area under the curve (AUC) performance measures. In addition, those measures are limited to binary classification which do not apply for our stem cell and breast cancer multi-class studies. Instead we use Balanced classification Error Rate to objectively evaluate the classification and prediction performance of the methods for unbalanced sample size classes (“
<xref rid="Sec6" ref-type="sec">
<italic>MINT</italic>
</xref>
” section). Classification accuracies for each class were also reported.</p>
</sec>
</sec>
<sec id="Sec13" sec-type="results">
<title>Results</title>
<sec id="Sec14">
<title>Validation of the
<italic>MINT</italic>
approach to identify signatures agnostic to batch effect</title>
<p>The MAQC project processed technical replicates of four well-characterised biological samples A, B, C and D across three platforms. Thus, we assumed that genes that are differentially expressed (DEG) in every single platform are true positive. We primarily focused on identifying biomarkers that discriminate C vs. D, and report the results of A vs. B in the Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Supplemental Material S3, Figure S3. Differential expression analysis of C vs. D was conducted on each of the three microarray platforms using ANOVA, showing an overlap of 1385 DEG (FDR <10
<sup>−3</sup>
[
<xref ref-type="bibr" rid="CR35">35</xref>
]), which we considered as true positive. This corresponded to 62.6% of all DEG for Illumina, 30.5% for AffyHuGene and 21.0% for AffyPrime (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S4). We observed that conducting a differential analysis on the concatenated data from the three microarray platforms without accommodating for batch effects resulted in 691 DEG, of which only 56% (387) were true positive genes. This implies that the remaining 44% (304) of these genes were false positive, and hence were not DE in at least one study. The high percentage of false positive was explained by a Principal Component Analysis (PCA) sample plot that showed samples clustering by platforms (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S4), which confirmed that the major source of variation in the combined data was attributed to platforms rather than cell types.</p>
<p>MINT selected a single gene, BCAS1, to discriminate the two biological classes C and D. BCAS1 was a true positive gene, as part of the common DEG, and was ranked 1 for Illumina, 158 for AffyPrime and 1182 for AffyHuGene. Since the biological samples C and D are very different, the selection of one single gene by MINT was not surprising. To further investigate the performance of MINT, we expanded the number of genes selected by MINT, by decreasing its sparsity parameter (see
<xref rid="Sec2" ref-type="sec">Methods</xref>
), and compared the overlap between this larger MINT signature and the true positive genes. We observed an overlap of 100% for a MINT signature of size 100, and an overlap of 89% for a signature of size 1385, which is the number of common DEG identified previously. The high percentage of true positive selected by MINT demonstrates its ability to identify a signature agnostic to batch effect.</p>
</sec>
<sec id="Sec15">
<title>Limitations of common meta-analysis and integrative approaches</title>
<p>A meta-analysis of eight stem cell studies, each including three cell types (Table
<xref rid="Tab1" ref-type="table">1</xref>
, stem cell training set), highlighted a small overlap of DEG lists obtained from the analysis of each separate study (FDR <10
<sup>−5</sup>
, ANOVA, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S2). Indeed, the Takahashi study with only 24 DEG limited the overlap between all eight studies to only 5 DEG. This represents a major limitation of merging pre-analysed gene lists as the concordance between DEG lists decreases when the number of studies increases.</p>
<p>One alternative to meta-analysis is to perform an integrative analysis by concatenating all eight studies. Similarly to the MAQC analysis, we first observed that the major source of variation in the combined data was attributed to study rather than cell type (Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">a</xref>
). PLS-DA was applied to discriminate the samples according to their cell types, and it showed a strong study variation (Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">b</xref>
), despite being a supervised analysis. Compared to unsupervised PCA (Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">a</xref>
), the study effect was reduced for the fibroblast cells, but was still present for the similar cell types hESC and hiPSC. We reached similar conclusions when analysing the breast cancer data (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Supplemental Material S4, Figure S5).
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Stem cell study.
<bold>a</bold>
PCA on the concatenated data: a greater study variation than a cell type variation is observed.
<bold>b</bold>
PLSDA on the concatenated data clustered Fibroblasts only.
<bold>c</bold>
<italic>MINT</italic>
sample plot shows that each cell type is well clustered,
<bold>d</bold>
<italic>MINT</italic>
performance: BER and classification accuracy for each cell type and each study</p>
</caption>
<graphic xlink:href="12859_2017_1553_Fig1_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
<sec id="Sec16">
<title>
<italic>MINT</italic>
outperforms state-of-the-art methods</title>
<p>We compared the classification accuracy of
<italic>MINT</italic>
to sequential methods where batch removal methods were applied prior to classification methods. In both stem cell and breast cancer studies, MINT led to the best accuracy on the training set and the best reproducibility of the classification model on the test set (lowest Balanced Error Rate, BER, Fig.
<xref rid="Fig2" ref-type="fig">2</xref>
, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figures S6 and S7). In addition, MINT consistently ranked first as the best performing method, followed by ComBat+sPLSDA with an average rank of 4.5 (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S8).
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Classification accuracy for both training and test set for the stem cells and breast cancer studies (excluding PAM50 genes). The classification Balanced Error Rates (BER) are reported for all sixteen methods compared with MINT (
<italic>in black</italic>
)</p>
</caption>
<graphic xlink:href="12859_2017_1553_Fig2_HTML" id="MO3"></graphic>
</fig>
</p>
<p>On the stem cell data, we found that fibroblasts were the easiest to classify for all methods, including those that do not accommodate unwanted variation (PLS-DA, sPLS-DA and RF, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S6). Classifying hiPSC vs. hESC proved more challenging for all methods, leading to a substantially lower classification accuracy than fibroblasts.</p>
<p>The analysis of the breast cancer data (excluding PAM50 genes) showed that methods that do not accommodate unwanted variation were able to rightly classify most of the samples from the training set, but failed at classifying any of the four subtypes on the external test set. As a consequence, all samples were predicted as
<italic>LumB</italic>
with PLS-DA and sPLS-DA, or
<italic>Basal</italic>
with RF (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S7). Thus, RF gave a satisfactory performance on the training set (BER =18.5), but a poor performance on the test set (BER =75).</p>
<p>Additionally, we observed that the biomarker selection process substantially improved classification accuracy. On the stem cell data, LM+sPLSDA and
<italic>MINT</italic>
outperformed their non sparse counterparts LM+PLSDA and mgPLS (Fig.
<xref rid="Fig2" ref-type="fig">2</xref>
, BER of 9.8 and 7.1 vs. 20.8 and 11.9), respectively.</p>
<p>Finally,
<italic>MINT</italic>
was largely superior in terms of computational efficiency. The training step on the stem cell data which includes 210 samples and 13,313 was run in 1 s, compared to 8 s with the second best performing method ComBat+sPLS-DA (2013 MacNook Pro 2.6 Ghz, 16 Gb memory). The popular method ComBat took 7.1
<italic>s</italic>
to run, and sPLS-DA 0.9
<italic>s</italic>
. The training step on the breast cancer data that includes 2817 samples and 15,755 genes was run in 37
<italic>s</italic>
for MINT and 71.5
<italic>s</italic>
for ComBat(30.8
<italic>s</italic>
)+sPLS-DA(40.6
<italic>s</italic>
).</p>
</sec>
<sec id="Sec17">
<title>Study-specific outputs with
<italic>MINT</italic>
</title>
<p>One of the main challenges when combining independent studies is to assess the concordance between studies. During the integration procedure, MINT proposes not only individual performance accuracy assessment, but also insightful graphical outputs that are study-specific and can serve as Quality Control step to detect outlier studies. One particular example is the Takahashi study from the stem cell data, whose poor performance (Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">d</xref>
) was further confirmed on the study-specific outputs (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S9). Of note, this study was the only one generated through Agilent technology and its sample size only accounted for 4.2% of the training set.</p>
<p>The sample plots from each individual breast cancer data set showed the strong ability of MINT to discriminate the breast cancer subtypes while integrating data sets generated from disparate transcriptomics platforms, microarrays and RNA-sequencing (Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
<xref rid="Fig3" ref-type="fig">a</xref>
<xref rid="Fig3" ref-type="fig">c</xref>
). Those data sets were all differently pre-processed, and yet MINT was able to model an overall agreement between all studies;
<italic>MINT</italic>
successfully built a space based on a handful of genes in which samples from each study are discriminated in a homogenous manner.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>
<italic>MINT</italic>
study-specific sample plots showing the projection of samples from
<bold>a</bold>
METABRIC Discovery,
<bold>b</bold>
METABRIC Validation and
<bold>c</bold>
TCGA-RNA-seq experiments, in the same subspace spanned by the first two MINT components. The same subspace is also used to plot the (
<bold>d</bold>
) overall (integrated) data.
<bold>e</bold>
Balanced Error Rate and classification accuracy for each study and breast cancer subtype from the MINT analysis</p>
</caption>
<graphic xlink:href="12859_2017_1553_Fig3_HTML" id="MO4"></graphic>
</fig>
</p>
</sec>
<sec id="Sec18">
<title>
<italic>MINT</italic>
gene signature identified promising biomarkers</title>
<p>MINT is a multivariate approach that builds successive components to discriminate all categories (classes) indicated in an outcome variable. On the stem cell data,
<italic>MINT</italic>
selected 2 and 15 genes on the first two components respectively (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S3). The first component clearly segregated the pluripotent cells (fibroblasts) vs. the two non-pluripotent cell types (hiPSC and hESC) (Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">c</xref>
,
<xref rid="Fig1" ref-type="fig">d</xref>
). Those non pluripotent cells were subsequently separated on component two with some expected overlap given the similarities between hiPSC and hESC. The two genes selected by MINT on component 1 were LIN28A and CAR which were both found relevant in the literature. Indeed, LIN28A was shown to be highly expressed in ESCs compared to Fibroblasts [
<xref ref-type="bibr" rid="CR36">36</xref>
,
<xref ref-type="bibr" rid="CR37">37</xref>
] and CAR has been associated to pluripotency [
<xref ref-type="bibr" rid="CR38">38</xref>
]. Finally, despite the high heterogeneity of hiPSC cells included in this study, MINT gave a high accuracy for hESC and hiPSC on independent test sets (93.9% and 77.9% respectively, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S6), suggesting that the 15 genes selected by MINT on component 2 have a high potential to explain the differences between those cell types (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S3).</p>
<p>On the breast cancer study, we performed two analyses which either included or discarded the PAM50 genes that were used to define the four cancer subtypes
<italic>Basal, HER2, Luminal A</italic>
and
<italic>Luminal B</italic>
[
<xref ref-type="bibr" rid="CR20">20</xref>
]. In the first analysis, we aimed to assess the ability of MINT to specifically identify the PAM50 key driver genes.
<italic>MINT</italic>
successfully recovered 37 of the 48 PAM50 genes present in the data (77%) on the first three components (7, 20 and 10 respectively). The overall signature included 30, 572 and 636 genes on each component (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S4), i.e. 7.8
<italic>%</italic>
of the total number of genes in the data. The performance of
<italic>MINT</italic>
(BER of 17.8 on the training set and 11.6 on the test set) was superior than when performing a PLS-DA on the PAM50 genes only (BER of 20.8 on the training set and a very high 75 on the test set). This result shows that the genes selected by MINT offer a complementary characterisation to the PAM50 genes.</p>
<p>In the second analysis, we aimed to provide an alternative signature to the PAM50 genes by ommitting them from the analysis.
<italic>MINT</italic>
identified 11, 272 and 253 genes on the first three components respectively (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S5 and Figure S10). The genes selected on the first component gradually differentiated
<italic>Basal, HER2</italic>
and
<italic>Luminal A/B</italic>
, while the second component genes further differentiated
<italic>Luminal A</italic>
from
<italic>Luminal B</italic>
(Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
<xref rid="Fig3" ref-type="fig">d</xref>
). The classification performance was similar in each study (Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
<xref rid="Fig3" ref-type="fig">e</xref>
), highlighting an excellent reproducibility of the biomarker signature across cohorts and platforms.</p>
<p>Among the 11 genes selected by MINT on the first component, GATA3 is a transcription factor that regulates luminal epithelial cell differentiation in the mammary glands [
<xref ref-type="bibr" rid="CR39">39</xref>
,
<xref ref-type="bibr" rid="CR40">40</xref>
], it was found to be implicated in luminal types of breast cancer [
<xref ref-type="bibr" rid="CR41">41</xref>
] and was recently investigated for its prognosis significance [
<xref ref-type="bibr" rid="CR42">42</xref>
]. The MYB-protein plays an essential role in Haematopoiesis and has been associated to Carcinogenesis [
<xref ref-type="bibr" rid="CR43">43</xref>
,
<xref ref-type="bibr" rid="CR44">44</xref>
]. Other genes present in our
<italic>MINT</italic>
gene signature include XPB1 [
<xref ref-type="bibr" rid="CR45">45</xref>
], AGR3 [
<xref ref-type="bibr" rid="CR46">46</xref>
], CCDC170 [
<xref ref-type="bibr" rid="CR47">47</xref>
] and TFF3 [
<xref ref-type="bibr" rid="CR48">48</xref>
] that were reported as being associated with breast cancer. The remaining genes have not been widely associated with breast cancer. For instance, TBC1D9 has been described as over expressed in cancer patients [
<xref ref-type="bibr" rid="CR49">49</xref>
,
<xref ref-type="bibr" rid="CR50">50</xref>
]. DNALI1 was first identified for its role in breast cancer in [
<xref ref-type="bibr" rid="CR51">51</xref>
] but there was no report of further investigation. Although AFF3 was never associated to breast cancer, it was recently proposed to play a pivotal role in adrenocortical carcinoma [
<xref ref-type="bibr" rid="CR52">52</xref>
]. It is worth noting that these 11 genes were all included in the 30 genes previously selected when the PAM50 genes were included, and are therefore valuable candidates to complement the PAM50 gene signature as well as to further characterise breast cancer subtypes.</p>
</sec>
</sec>
<sec id="Sec19" sec-type="discussion">
<title>Discussion</title>
<p>There is a growing need in the biological and computational community for tools that can integrate data from different microarray platforms with the aim of classifying samples (integrative classification). Although several efficient methods have been proposed to address the unwanted systematic variation when integrating data [
<xref ref-type="bibr" rid="CR4">4</xref>
,
<xref ref-type="bibr" rid="CR7">7</xref>
,
<xref ref-type="bibr" rid="CR9">9</xref>
<xref ref-type="bibr" rid="CR11">11</xref>
], these are usually applied as a pre-processing step before performing classification. Such sequential approach may lead to overfitting and over-optimistic results due to the use of transductive modelling (such as prediction based on ComBat-normalised data [
<xref ref-type="bibr" rid="CR19">19</xref>
]) and the use of a test set that is normalised or pre-processed with the training set. To address this crucial issue, we proposed a new Multivariate INTegrative method, MINT, that simultaneously corrects for batch effects, classifies samples and selects the most discriminant biomarkers across studies.</p>
<p>MINT seeks to identify a common projection space for all studies that is defined on a small subset of discriminative variables and that display an analogous discrimination of the samples across studies. Therefore, MINT provides sample plot and classification performance specific to each study (Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
). Among the compared methods, MINT was found to be the fastest and most accurate method to integrate and classify data from different microarray and RNA-seq platforms.</p>
<p>Integrative approaches such as MINT are essential when combining multiple studies of complex data to limit spurious conclusions from any downstream analysis. Current methods showed a high proportion of false positives (44% on MAQC data) and exhibited very poor prediction accuracy (PLS-DA, sPLS-DA and RF, Fig.
<xref rid="Fig2" ref-type="fig">2</xref>
). For instance, RF was ranked second only to MINT on the breast cancer learning set, but it was ranked as the worst method on the test set. This reflects the absence of controlling for batch effects in these methods and supports the argument that assessing the presence of batch effects is a key preliminary step. Failure to do so, as shown in our study, can result in poor reproducibility of results in subsequent studies, and this would not be detected without an independent test set.</p>
<p>We assessed the ability of
<italic>MINT</italic>
to identify relevant gene signatures that are reproducible and platform-agnostic. MINT successfully integrated data from the MAQC project by selecting true positives genes that were also differentially expressed in each experiment. We also assessed MINT’s capabilities analysing stem cells and breast cancer data. In these studies, MINT displayed the highest classification accuracy in the training sets and the highest prediction accuracy in the testing sets, when compared to sixteen sequential procedures (Fig.
<xref rid="Fig2" ref-type="fig">2</xref>
). These results suggest that, in addition to being highly predictive, the discriminant variables identified by MINT are also of strong biological relevance.</p>
<p>In the stem cell data, MINT identified 2 genes LIN28A and CAR, to discriminate pluripotent cells (fibroblasts) against non-pluripotent cells (hiPSC and hESC). Pluripotency is well-documented in the literature and OCT4 is currently the main known marker for undifferentiated cells [
<xref ref-type="bibr" rid="CR53">53</xref>
<xref ref-type="bibr" rid="CR56">56</xref>
]. However, MINT did not selected OCT4 on the first component but instead, identified two markers, LIN28A and CAR, that were ranked higher than OCT4 in the DEG list obtained on the concatenated data (see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S11, S12). While the results from MINT still supported OCT4 as a marker of pluripotency, our analysis suggests that LIN28A and CAR are stronger reproducible markers of differentiated cells, and could therefore be superior as substitutions or complements to OCT4. Experimental validation would be required to further assess the potential of LIN28A or CAR as efficient markers.</p>
<p>Several important issues require consideration when dealing with the general task of integrating data. First and foremost, sample classification is crucial and needs to be well defined. This required addressing in analyses with the stem cell and breast cancer studies generated from multiple research groups and different microarray and RNA-seq platforms. For instance, the breast cancer subtype classification relied on the PAM50 intrinsic classifier proposed by [
<xref ref-type="bibr" rid="CR20">20</xref>
], which we admit is still controversial in the literature [
<xref ref-type="bibr" rid="CR31">31</xref>
]. Similarly, the biological definition of hiPSC differs across research groups [
<xref ref-type="bibr" rid="CR26">26</xref>
,
<xref ref-type="bibr" rid="CR28">28</xref>
], which results in poor reproducibility among experiments and makes the integration of stem cell studies challenging [
<xref ref-type="bibr" rid="CR21">21</xref>
].</p>
<p>The expertise and exhaustive screening required to homogeneously annotate samples hinders data integration, and because it is a process upstream to the statistical analysis, data integration approaches, including MINT, can not address it.</p>
<p>A second issue in the general process of integrating datasets from different sources is data access and normalisation. As raw data are often not available, this results in integration of data sets that have each been normalised differently, as was the case with the breast cancer data in our study. Despite this limitation, MINT produced satisfactory results in that study. We were also able to overcome this issue in the stem cells data by using the stemformatics resource [
<xref ref-type="bibr" rid="CR29">29</xref>
] where we had direct access to homogeneously pre-processed data (background correction, log2- and YuGene-transformed [
<xref ref-type="bibr" rid="CR11">11</xref>
]). In general, variation in the normalisation processes of different data sets produces unwanted variation between studies and we recommend this should be avoided if possible.</p>
<p>A final important issue in data integration involves accounting for both between-study differences and platform effects. When samples clustered by study and the studies clustered by platform, then the experimental platform and not the study, is the biggest source of variation (e.g. 75% of the variance in the breast cancer data, Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Figure S5). Indeed, there are inherent differences between commercial platforms that greatly magnify unwanted variability, as was discussed by [
<xref ref-type="bibr" rid="CR5">5</xref>
] on the MAQC project. As platform information and study effects are nested,
<italic>MINT</italic>
and other data integration methods dismiss the platform information and focus on the study effect only. Indeed, each study is considered as included in a single platform. MINT successfully integrated microarray and RNA-seq data, which supports that such an approach will likely be sufficient in most scenarios.</p>
<p>When applying
<italic>MINT</italic>
, additional considerations need be taken into account. In order to reduce unwanted systematic variation, the method centers and scales each study as an initial step, similarly to BMC [
<xref ref-type="bibr" rid="CR9">9</xref>
]. Therefore, only studies with a sample size >3 can be included, either in a training or test set. In addition, all outcome categories need to be represented in each study. Indeed, neither MINT nor any classification methods can perform satisfactorily in the extreme case where each study only contains a specific outcome category, as the outcome and the study effect can not be distinguished in this specific case.</p>
</sec>
<sec id="Sec20" sec-type="conclusion">
<title>Conclusion</title>
<p>We introduced MINT, a novel Multivariate INTegrative method, that is the first approach to integrate independent transcriptomics studies from different microarray and RNA-seq platforms by
<italic>simultaneously</italic>
, correcting for batch effects, classifying samples and identifying key discriminant variables. We first validated the ability of MINT to select true positives genes when integrating the MAQC data across different platforms. Then, MINT was compared to sixteen sequential approaches and was shown to be the fastest and most accurate method to discriminate and predict three human cell types (human Fibroblasts, human Embryonic Stem Cells and human induced Pluripotent Stem Cells) and four subtypes of breast cancer (Basal, HER2, Luminal A and Luminal B). The gene signatures identified by MINT contained existing and novel biomarkers that were strong candidates for improved characterisation the phenotype of interest. In conclusion, MINT enables reliable integration and analysis of independent genomic data sets, outperforms existing available sequential methods, and identifies reproducible genetic predictors across data sets. MINT is available through the mixMINT module in the mixOmics R-package.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Additional file</title>
<sec id="Sec21">
<p>
<supplementary-material content-type="local-data" id="MOESM1">
<media xlink:href="12859_2017_1553_MOESM1_ESM.pdf">
<label>Additional file 1</label>
<caption>
<p>Supplementary material. This pdf document contains supplementary methods and all supplementary Figures and Tables. Specifically, it provides the PLS-algorithm, the extension of
<italic>MINT</italic>
in a regression framework, the application to the MAQC data (A vs B), the meta-analysis of the breast cancer data, the classification accuracy of the tested methods on the stem cells and breast cancer data, and details on the signature genes identified by
<italic>MINT</italic>
on the stem cells and breast cancer data. (PDF 4403 kb)</p>
</caption>
</media>
</supplementary-material>
</p>
</sec>
</sec>
</body>
<back>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>BER</term>
<def>
<p>Balanced error rate</p>
</def>
</def-item>
<def-item>
<term>DEG</term>
<def>
<p>Differentially expressed gene</p>
</def>
</def-item>
<def-item>
<term>FDR</term>
<def>
<p>False discovery rate</p>
</def>
</def-item>
<def-item>
<term>Fib</term>
<def>
<p>Fibroblast</p>
</def>
</def-item>
<def-item>
<term>hESC</term>
<def>
<p>Human embryonic stem cells</p>
</def>
</def-item>
<def-item>
<term>hiPSC</term>
<def>
<p>Human induced pluripotent stem cells</p>
</def>
</def-item>
<def-item>
<term>LM</term>
<def>
<p>Linear model</p>
</def>
</def-item>
<def-item>
<term>LMM</term>
<def>
<p>Linear mixed model</p>
</def>
</def-item>
<def-item>
<term>MAQC</term>
<def>
<p>MicroArray quality control</p>
</def>
</def-item>
<def-item>
<term>MINT</term>
<def>
<p>Multivariate integration method</p>
</def>
</def-item>
<def-item>
<term>sPLS-DA</term>
<def>
<p>sparse partial least square discriminant analysis</p>
</def>
</def-item>
<def-item>
<term>RF</term>
<def>
<p>Random forest</p>
</def>
</def-item>
</def-list>
</glossary>
<ack>
<p>The authors would like to thank Marie-Joe Brion, University of Queensland Diamantina Institute for her careful proof-reading and suggestions.</p>
<sec id="d29e3010">
<title>Funding</title>
<p>This project was partly funded by the ARC Discovery grant project DP130100777 and the Australian Cancer Research Foundation for the Diamantina Individualised Oncology Care Centre at the University of Queensland Diamantina Institute (FR), and the National Health and Medical Research Council (NHMRC) Career Development fellowship APP1087415 (KALC).</p>
<p>The funding bodies did not play a role in the design of the study and collection, analysis, and interpretation of data.</p>
</sec>
<sec id="d29e3017">
<title>Availability of data and materials</title>
<p>The MicroArray Quality Control (MAQC) project data are available from the Gene Expression Omnibus (GEO) - GSE56457.</p>
<p>The stem cell raw data are available from GEO and the pre-processed data is available from the (
<ext-link ext-link-type="uri" xlink:href="http://www.stemformatics.org">http://www.stemformatics.org</ext-link>
) platform.</p>
<p>The breast cancer data were obtained from the Molecular Taxonomy of Breast Cancer International Consortium project (METABRIC, [
<xref ref-type="bibr" rid="CR31">31</xref>
], upon request) and from the Cancer Genome Atlas (TCGA, [
<xref ref-type="bibr" rid="CR32">32</xref>
]). The MINT R scripts and functions are publicly available in the mixOmics R package (
<ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package=mixOmics">https://cran.r-project.org/package=mixOmics</ext-link>
), with tutorials on
<ext-link ext-link-type="uri" xlink:href="http://www.mixOmics.org/mixMINT">http://www.mixOmics.org/mixMINT</ext-link>
.</p>
</sec>
<sec id="d29e3047">
<title>Authors’ contributions</title>
<p>FR developed and implemented the MINT method, analysed the stem cell and breast cancer data, NM analysed the MAQC data, KALC supervised all statistical analyses. ES and SB contributed to the early stage of the project to set up the analysis plan. The manuscript was primarily written by FR with editorial advice from AE, NM, SB and KALC. All authors read and approved the final manuscript.</p>
</sec>
<sec id="d29e3052">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec id="d29e3057">
<title>Consent for publication</title>
<p>Not applicable.</p>
</sec>
<sec id="d29e3062">
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</sec>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pihur</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Datta</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Datta</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Finding common genes in multiple cancer types through meta–analysis of microarray experiments: A rank aggregation approach</article-title>
<source>Genomics</source>
<year>2008</year>
<volume>92</volume>
<issue>6</issue>
<fpage>400</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="doi">10.1016/j.ygeno.2008.05.003</pub-id>
<pub-id pub-id-type="pmid">18565726</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C-W</given-names>
</name>
<name>
<surname>Tseng</surname>
<given-names>GC</given-names>
</name>
</person-group>
<article-title>Metaktsp: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis</article-title>
<source>Bioinformatics</source>
<year>2016</year>
<volume>32</volume>
<fpage>1966</fpage>
<lpage>173</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btw115</pub-id>
<pub-id pub-id-type="pmid">27153719</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lazar</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Meganck</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Taminau</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Steenhoff</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Coletta</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Molter</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Y.Weiss-Solis</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Duque</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bersini</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Nowé</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Batch effect removal methods for microarray gene expression data integration: a survey</article-title>
<source>Brief Bioinform</source>
<year>2012</year>
<volume>14</volume>
<issue>4</issue>
<fpage>469</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="doi">10.1093/bib/bbs037</pub-id>
<pub-id pub-id-type="pmid">22851511</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gagnon-Bartsch</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Speed</surname>
<given-names>TP</given-names>
</name>
</person-group>
<article-title>Using control genes to correct for unwanted variation in microarray data</article-title>
<source>Biostatistics</source>
<year>2012</year>
<volume>13</volume>
<issue>3</issue>
<fpage>539</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxr034</pub-id>
<pub-id pub-id-type="pmid">22101192</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Reid</surname>
<given-names>LH</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>WD</given-names>
</name>
<name>
<surname>Shippy</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Warrington</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>SC</given-names>
</name>
<name>
<surname>Collins</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>De Longueville</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Kawasaki</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>KY</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The microarray quality control (maqc) project shows inter-and intraplatform reproducibility of gene expression measurements</article-title>
<source>Nat Biotechnol</source>
<year>2006</year>
<volume>24</volume>
<issue>9</issue>
<fpage>1151</fpage>
<lpage>61</lpage>
<pub-id pub-id-type="doi">10.1038/nbt1239</pub-id>
<pub-id pub-id-type="pmid">16964229</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Labaj</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Thierry-Mieg</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>A comprehensive assessment of rna-seq accuracy, reproducibility and information content by the sequencing quality control consortium</article-title>
<source>Nat Biotechnol</source>
<year>2014</year>
<volume>32</volume>
<issue>9</issue>
<fpage>903</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.2957</pub-id>
<pub-id pub-id-type="pmid">25150838</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Johnson</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Rabinovic</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Adjusting batch effects in microarray expression data using empirical Bayes methods</article-title>
<source>Biostatistics</source>
<year>2007</year>
<volume>8</volume>
<issue>1</issue>
<fpage>118</fpage>
<lpage>27</lpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxj037</pub-id>
<pub-id pub-id-type="pmid">16632515</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hornung</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Boulesteix</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Causeur</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment</article-title>
<source>BMC Bioinforma</source>
<year>2016</year>
<volume>17</volume>
<issue>1</issue>
<fpage>1</fpage>
<pub-id pub-id-type="doi">10.1186/s12859-015-0870-z</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sims</surname>
<given-names>AH</given-names>
</name>
<name>
<surname>Smethurst</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Hey</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Okoniewski</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Pepper</surname>
<given-names>SD</given-names>
</name>
<name>
<surname>Howell</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>CJ</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>RB</given-names>
</name>
</person-group>
<article-title>The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets–improving meta-analysis and prediction of prognosis</article-title>
<source>BMC Med Genomics</source>
<year>2008</year>
<volume>1</volume>
<issue>1</issue>
<fpage>42</fpage>
<pub-id pub-id-type="doi">10.1186/1755-8794-1-42</pub-id>
<pub-id pub-id-type="pmid">18803878</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Listgarten</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kadie</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Schadt</surname>
<given-names>EE</given-names>
</name>
<name>
<surname>Heckerman</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Correction for hidden confounders in the genetic analysis of gene expression</article-title>
<source>Proc Natl Acad Sci USA</source>
<year>2010</year>
<volume>107</volume>
<issue>38</issue>
<fpage>16465</fpage>
<lpage>70</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1002425107</pub-id>
<pub-id pub-id-type="pmid">20810919</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lê Cao</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Rohart</surname>
<given-names>F</given-names>
</name>
<name>
<surname>McHugh</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Korm</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>YuGene: A simple approach to scale gene expression data derived from different platforms for integrated analyses</article-title>
<source>Genomics</source>
<year>2014</year>
<volume>103</volume>
<fpage>239</fpage>
<lpage>51</lpage>
<pub-id pub-id-type="doi">10.1016/j.ygeno.2014.03.001</pub-id>
<pub-id pub-id-type="pmid">24667244</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Breiman</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Random forests</article-title>
<source>Mach Learn</source>
<year>2001</year>
<volume>45</volume>
<issue>1</issue>
<fpage>5</fpage>
<lpage>32</lpage>
<pub-id pub-id-type="doi">10.1023/A:1010933404324</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dudoit</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fridlyand</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Speed</surname>
<given-names>TP</given-names>
</name>
</person-group>
<article-title>Comparison of discrimination methods for the classification of tumors using gene expression data</article-title>
<source>J Am Stat Assoc</source>
<year>2002</year>
<volume>97</volume>
<issue>457</issue>
<fpage>77</fpage>
<lpage>87</lpage>
<pub-id pub-id-type="doi">10.1198/016214502753479248</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guyon</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Weston</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Barnhill</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Vapnik</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Gene selection for cancer classification using support vector machines</article-title>
<source>Mach Learn</source>
<year>2002</year>
<volume>46</volume>
<issue>1-3</issue>
<fpage>389</fpage>
<lpage>422</lpage>
<pub-id pub-id-type="doi">10.1023/A:1012487302797</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Díaz-Uriarte</surname>
<given-names>R</given-names>
</name>
<name>
<surname>De Andres</surname>
<given-names>SA</given-names>
</name>
</person-group>
<article-title>Gene selection and classification of microarray data using random forest</article-title>
<source>BMC Bioinforma</source>
<year>2006</year>
<volume>7</volume>
<issue>1</issue>
<fpage>1</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-7-3</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sowa</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Atmaca</surname>
<given-names>Ö</given-names>
</name>
<name>
<surname>Kahraman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schlattjan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lindner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sydor</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Scherbaum</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lackner</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Gerken</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Heider</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Non-invasive separation of alcoholic and non-alcoholic liver disease with predictive modeling</article-title>
<source>PloS ONE</source>
<year>2014</year>
<volume>9</volume>
<issue>7</issue>
<fpage>101444</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0101444</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rayens</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Partial least squares for discrimination</article-title>
<source>J Chemom</source>
<year>2003</year>
<volume>17</volume>
<issue>3</issue>
<fpage>166</fpage>
<lpage>73</lpage>
<pub-id pub-id-type="doi">10.1002/cem.785</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lê Cao</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Boitard</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Besse</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems</article-title>
<source>BMC Bioinforma</source>
<year>2011</year>
<volume>12</volume>
<fpage>253</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-253</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hughey</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Butte</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>Robust meta-analysis of gene expression using the elastic net</article-title>
<source>Nucleic Acids Res</source>
<year>2015</year>
<volume>43</volume>
<issue>12</issue>
<fpage>79</fpage>
<pub-id pub-id-type="doi">10.1093/nar/gkv229</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parker</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Mullins</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Cheang</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Voduc</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Vickery</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Davies</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Fauron</surname>
<given-names>C</given-names>
</name>
<name>
<surname>He</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>Z</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Supervised risk predictor of breast cancer based on intrinsic subtypes</article-title>
<source>J Clin Oncol</source>
<year>2009</year>
<volume>27</volume>
<issue>8</issue>
<fpage>1160</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1200/JCO.2008.18.1370</pub-id>
<pub-id pub-id-type="pmid">19204204</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohart</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Matigian</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Mosbergen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Korn</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Butcher</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Atkinson</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Khosrotehrani</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Fisk</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Lê Cao</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wells</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>A molecular classification of human mesenchymal stromal cells</article-title>
<source>PeerJ</source>
<year>2016</year>
<volume>4</volume>
<fpage>1845</fpage>
<pub-id pub-id-type="doi">10.7717/peerj.1845</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Eslami</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Qannari</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Kohler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bougeard</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Multi-group PLS regression: application to epidemiology</article-title>
<source>New Perspectives in Partial Least Squares and Related Methods</source>
<year>2013</year>
<publisher-loc>New York</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="CR23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eslami</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Qannari</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Kohler</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bougeard</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Algorithms for multi-group PLS</article-title>
<source>J Chemometrics</source>
<year>2014</year>
<volume>28</volume>
<issue>3</issue>
<fpage>192</fpage>
<lpage>201</lpage>
<pub-id pub-id-type="doi">10.1002/cem.2593</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Regression shrinkage and selection via the lasso</article-title>
<source>J R Stat Soc Ser B Stat Methodol</source>
<year>1996</year>
<volume>58</volume>
<issue>1</issue>
<fpage>267</fpage>
<lpage>88</lpage>
</element-citation>
</ref>
<ref id="CR25">
<label>25</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Tenenhaus</surname>
<given-names>M</given-names>
</name>
</person-group>
<source>La Régression PLS: Théorie et Pratique</source>
<year>1998</year>
<publisher-loc>Paris</publisher-loc>
<publisher-name>Editions Technip</publisher-name>
</element-citation>
</ref>
<ref id="CR26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bilic</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Belmonte</surname>
<given-names>JCI</given-names>
</name>
</person-group>
<article-title>Concise review: Induced pluripotent stem cells versus embryonic stem cells: close enough or yet too far apart?</article-title>
<source>Stem Cells</source>
<year>2012</year>
<volume>30</volume>
<issue>1</issue>
<fpage>33</fpage>
<lpage>41</lpage>
<pub-id pub-id-type="doi">10.1002/stem.700</pub-id>
<pub-id pub-id-type="pmid">22213481</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chin</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Mason</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Volinia</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Singer</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ambartsumyan</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Aimiuwu</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Richter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures</article-title>
<source>Cell stem cell</source>
<year>2009</year>
<volume>5</volume>
<issue>1</issue>
<fpage>111</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="doi">10.1016/j.stem.2009.06.008</pub-id>
<pub-id pub-id-type="pmid">19570518</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Newman</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Cooper</surname>
<given-names>JB</given-names>
</name>
</person-group>
<article-title>Lab-specific gene expression signatures in pluripotent stem cells</article-title>
<source>Cell stem cell</source>
<year>2010</year>
<volume>7</volume>
<issue>2</issue>
<fpage>258</fpage>
<lpage>62</lpage>
<pub-id pub-id-type="doi">10.1016/j.stem.2010.06.016</pub-id>
<pub-id pub-id-type="pmid">20682451</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wells</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Mosbergen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Korn</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Seidenman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Matigian</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Vitale</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Shepherd</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Stemformatics: visualisation and sharing of stem cell gene expression</article-title>
<source>Stem Cell Res</source>
<year>2013</year>
<volume>10</volume>
<issue>3</issue>
<fpage>387</fpage>
<lpage>95</lpage>
<pub-id pub-id-type="doi">10.1016/j.scr.2012.12.003</pub-id>
<pub-id pub-id-type="pmid">23466562</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bolstad</surname>
<given-names>BM</given-names>
</name>
<name>
<surname>Irizarry</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Åstrand</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Speed</surname>
<given-names>TP</given-names>
</name>
</person-group>
<article-title>A comparison of normalization methods for high density oligonucleotide array data based on variance and bias</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<issue>2</issue>
<fpage>185</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/19.2.185</pub-id>
<pub-id pub-id-type="pmid">12538238</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Curtis</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Chin</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Turashvili</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rueda</surname>
<given-names>OM</given-names>
</name>
<name>
<surname>Dunning</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Speed</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lynch</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Samarajiwa</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups</article-title>
<source>Nature</source>
<year>2012</year>
<volume>486</volume>
<issue>7403</issue>
<fpage>346</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="pmid">22522925</pub-id>
</element-citation>
</ref>
<ref id="CR32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<collab>Cancer Genome Atlas Network and others</collab>
</person-group>
<article-title>Comprehensive molecular portraits of human breast tumours</article-title>
<source>Nature</source>
<year>2012</year>
<volume>490</volume>
<issue>7418</issue>
<fpage>61</fpage>
<lpage>70</lpage>
<pub-id pub-id-type="doi">10.1038/nature11412</pub-id>
<pub-id pub-id-type="pmid">23000897</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Whitcomb</surname>
<given-names>BW</given-names>
</name>
<name>
<surname>Perkins</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Albert</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Schisterman</surname>
<given-names>EF</given-names>
</name>
</person-group>
<article-title>Treatment of batch in the detection, calibration, and quantification of immunoassays in large-scale epidemiologic studies</article-title>
<source>Epidemiology (Cambridge)</source>
<year>2010</year>
<volume>21</volume>
<issue>Suppl 4</issue>
<fpage>44</fpage>
<pub-id pub-id-type="doi">10.1097/EDE.0b013e3181dceac2</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rohart</surname>
<given-names>F</given-names>
</name>
<name>
<surname>San Cristobal</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Laurent</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Selection of fixed effects in high dimensional linear mixed models using a multicycle ecm algorithm</article-title>
<source>Comput Stat Data Anal</source>
<year>2014</year>
<volume>80</volume>
<fpage>209</fpage>
<lpage>22</lpage>
<pub-id pub-id-type="doi">10.1016/j.csda.2014.06.022</pub-id>
</element-citation>
</ref>
<ref id="CR35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benjamini</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Hochberg</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>
<source>J R Stat Soc Ser B Stat Methodol</source>
<year>1995</year>
<volume>57</volume>
<issue>1</issue>
<fpage>289</fpage>
<lpage>300</lpage>
</element-citation>
</ref>
<ref id="CR36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Vodyanik</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Smuga-Otto</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Antosiewicz-Bourget</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Frane</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nie</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jonsdottir</surname>
<given-names>GA</given-names>
</name>
<name>
<surname>Ruotti</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Induced pluripotent stem cell lines derived from human somatic cells</article-title>
<source>Science</source>
<year>2007</year>
<volume>318</volume>
<issue>5858</issue>
<fpage>1917</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1126/science.1151526</pub-id>
<pub-id pub-id-type="pmid">18029452</pub-id>
</element-citation>
</ref>
<ref id="CR37">
<label>37</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tsialikas</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Romer-Seibert</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>LIN28: roles and regulation in development and beyond</article-title>
<source>Development</source>
<year>2015</year>
<volume>142</volume>
<issue>14</issue>
<fpage>2397</fpage>
<lpage>404</lpage>
<pub-id pub-id-type="doi">10.1242/dev.117580</pub-id>
<pub-id pub-id-type="pmid">26199409</pub-id>
</element-citation>
</ref>
<ref id="CR38">
<label>38</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krivega</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Geens</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Van de Velde</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>CAR expression in human embryos and hESC illustrates its role in pluripotency and tight junctions</article-title>
<source>Reproduction</source>
<year>2014</year>
<volume>148</volume>
<issue>5</issue>
<fpage>531</fpage>
<lpage>44</lpage>
<pub-id pub-id-type="doi">10.1530/REP-14-0253</pub-id>
<pub-id pub-id-type="pmid">25118298</pub-id>
</element-citation>
</ref>
<ref id="CR39">
<label>39</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kouros-Mehr</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Slorach</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Sternlicht</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Werb</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>Gata-3 maintains the differentiation of the luminal cell fate in the mammary gland</article-title>
<source>Cell</source>
<year>2006</year>
<volume>127</volume>
<issue>5</issue>
<fpage>1041</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2006.09.048</pub-id>
<pub-id pub-id-type="pmid">17129787</pub-id>
</element-citation>
</ref>
<ref id="CR40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Asselin-Labat</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Sutherland</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Barker</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Shackleton</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Forrest</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Hartley</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Robb</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Grosveld</surname>
<given-names>FG</given-names>
</name>
<name>
<surname>van der Wees</surname>
<given-names>J</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gata-3 is an essential regulator of mammary-gland morphogenesis and luminal-cell differentiation</article-title>
<source>Nat Cell Biol</source>
<year>2007</year>
<volume>9</volume>
<issue>2</issue>
<fpage>201</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1038/ncb1530</pub-id>
<pub-id pub-id-type="pmid">17187062</pub-id>
</element-citation>
</ref>
<ref id="CR41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>YZ</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Zuo</surname>
<given-names>WJ</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>WT</given-names>
</name>
<name>
<surname>Shao</surname>
<given-names>ZM</given-names>
</name>
</person-group>
<article-title>Gata3 mutations define a unique subtype of luminal-like breast cancer with improved survival</article-title>
<source>Cancer</source>
<year>2014</year>
<volume>120</volume>
<issue>9</issue>
<fpage>1329</fpage>
<lpage>37</lpage>
<pub-id pub-id-type="doi">10.1002/cncr.28566</pub-id>
<pub-id pub-id-type="pmid">24477928</pub-id>
</element-citation>
</ref>
<ref id="CR42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCleskey</surname>
<given-names>BC</given-names>
</name>
<name>
<surname>Penedo</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Hameed</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Siegal</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Gata3 expression in advanced breast cancer: prognostic value and organ-specific relapse</article-title>
<source>Am J Clin Path</source>
<year>2015</year>
<volume>144</volume>
<issue>5</issue>
<fpage>756</fpage>
<lpage>63</lpage>
<pub-id pub-id-type="doi">10.1309/AJCP5MMR1FJVVTPK</pub-id>
<pub-id pub-id-type="pmid">26486740</pub-id>
</element-citation>
</ref>
<ref id="CR43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vargova</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Curik</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Burda</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Basova</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Kulvait</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Pospisil</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Savvulidi</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Kokavec</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Necas</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Berkova</surname>
<given-names>A</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Myb transcriptionally regulates the mir-155 host gene in chronic lymphocytic leukemia</article-title>
<source>Blood</source>
<year>2011</year>
<volume>117</volume>
<issue>14</issue>
<fpage>3816</fpage>
<lpage>825</lpage>
<pub-id pub-id-type="doi">10.1182/blood-2010-05-285064</pub-id>
<pub-id pub-id-type="pmid">21296997</pub-id>
</element-citation>
</ref>
<ref id="CR44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khan</surname>
<given-names>FH</given-names>
</name>
<name>
<surname>Pandian</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ramraj</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Aravindan</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Herman</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Aravindan</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>Reorganization of metastamirs in the evolution of metastatic aggressive neuroblastoma cells</article-title>
<source>BMC Genomics</source>
<year>2015</year>
<volume>16</volume>
<issue>1</issue>
<fpage>1</fpage>
<pub-id pub-id-type="doi">10.1186/s12864-015-1642-x</pub-id>
<pub-id pub-id-type="pmid">25553907</pub-id>
</element-citation>
</ref>
<ref id="CR45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Iliopoulos</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Greenblatt</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Hatziapostolou</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Tam</surname>
<given-names>WL</given-names>
</name>
<name>
<surname>Ni</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Xbp1 promotes triple-negative breast cancer by controlling the hif1 [agr] pathway</article-title>
<source>Nature</source>
<year>2014</year>
<volume>508</volume>
<issue>7494</issue>
<fpage>103</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1038/nature13119</pub-id>
<pub-id pub-id-type="pmid">24670641</pub-id>
</element-citation>
</ref>
<ref id="CR46">
<label>46</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garczyk</surname>
<given-names>S</given-names>
</name>
<name>
<surname>von Stillfried</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Antonopoulos</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Hartmann</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Schrauder</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Fasching</surname>
<given-names>PA</given-names>
</name>
<name>
<surname>Anzeneder</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tannapfel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ergönenc</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Knüchel</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Agr3 in breast cancer: Prognostic impact and suitable serum-based biomarker for early cancer detection</article-title>
<source>PloS ONE</source>
<year>2015</year>
<volume>10</volume>
<issue>4</issue>
<fpage>0122106</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0122106</pub-id>
</element-citation>
</ref>
<ref id="CR47">
<label>47</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yamamoto-Ibusuki</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yamamoto</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Fujiwara</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sueta</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yamamoto</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hayashi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tomiguchi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Takeshita</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Iwase</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>C6orf97-esr1 breast cancer susceptibility locus: influence on progression and survival in breast cancer patients</article-title>
<source>Eur J Human Genet</source>
<year>2015</year>
<volume>23</volume>
<issue>7</issue>
<fpage>949</fpage>
<lpage>56</lpage>
<pub-id pub-id-type="doi">10.1038/ejhg.2014.219</pub-id>
<pub-id pub-id-type="pmid">25370037</pub-id>
</element-citation>
</ref>
<ref id="CR48">
<label>48</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>May</surname>
<given-names>FE</given-names>
</name>
<name>
<surname>Westley</surname>
<given-names>BR</given-names>
</name>
</person-group>
<article-title>Tff3 is a valuable predictive biomarker of endocrine response in metastatic breast cancer</article-title>
<source>Endocr Relat Cancer</source>
<year>2015</year>
<volume>22</volume>
<issue>3</issue>
<fpage>465</fpage>
<lpage>79</lpage>
<pub-id pub-id-type="doi">10.1530/ERC-15-0129</pub-id>
<pub-id pub-id-type="pmid">25900183</pub-id>
</element-citation>
</ref>
<ref id="CR49">
<label>49</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andres</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Brock</surname>
<given-names>GN</given-names>
</name>
<name>
<surname>Wittliff</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>Interrogating differences in expression of targeted gene sets to predict breast cancer outcome</article-title>
<source>BMC Cancer</source>
<year>2013</year>
<volume>13</volume>
<issue>1</issue>
<fpage>1</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2407-13-326</pub-id>
<pub-id pub-id-type="pmid">23282137</pub-id>
</element-citation>
</ref>
<ref id="CR50">
<label>50</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andres</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Smolenkova</surname>
<given-names>IA</given-names>
</name>
<name>
<surname>Wittliff</surname>
<given-names>JL</given-names>
</name>
</person-group>
<article-title>Gender-associated expression of tumor markers and a small gene set in breast carcinoma</article-title>
<source>Breast</source>
<year>2014</year>
<volume>23</volume>
<issue>3</issue>
<fpage>226</fpage>
<lpage>33</lpage>
<pub-id pub-id-type="doi">10.1016/j.breast.2014.02.007</pub-id>
<pub-id pub-id-type="pmid">24656773</pub-id>
</element-citation>
</ref>
<ref id="CR51">
<label>51</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Parris</surname>
<given-names>TZ</given-names>
</name>
<name>
<surname>Danielsson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nemes</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Kovács</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Delle</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Fallenius</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Möllerström</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Karlsson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Helou</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Clinical implications of gene dosage and gene expression patterns in diploid breast carcinoma</article-title>
<source>Clin Cancer Res</source>
<year>2010</year>
<volume>16</volume>
<issue>15</issue>
<fpage>3860</fpage>
<lpage>874</lpage>
<pub-id pub-id-type="doi">10.1158/1078-0432.CCR-10-0889</pub-id>
<pub-id pub-id-type="pmid">20551037</pub-id>
</element-citation>
</ref>
<ref id="CR52">
<label>52</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lefevre</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Omeiri</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Drougat</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hantel</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Giraud</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Val</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Rodriguez</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Perlemoine</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Blugeon</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Beuschlein</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Combined transcriptome studies identify aff3 as a mediator of the oncogenic effects of
<italic>β</italic>
-catenin in adrenocortical carcinoma</article-title>
<source>Oncogenesis</source>
<year>2015</year>
<volume>4</volume>
<issue>7</issue>
<fpage>161</fpage>
<pub-id pub-id-type="doi">10.1038/oncsis.2015.20</pub-id>
</element-citation>
</ref>
<ref id="CR53">
<label>53</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rosner</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Vigano</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Ozato</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Timmons</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Poirie</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Rigby</surname>
<given-names>PW</given-names>
</name>
<name>
<surname>Staudt</surname>
<given-names>LM</given-names>
</name>
</person-group>
<article-title>A POU-domain transcription factor in early stem cells and germ cells of the mammalian embryo</article-title>
<source>Nature</source>
<year>1990</year>
<volume>345</volume>
<issue>6277</issue>
<fpage>686</fpage>
<lpage>92</lpage>
<pub-id pub-id-type="doi">10.1038/345686a0</pub-id>
<pub-id pub-id-type="pmid">1972777</pub-id>
</element-citation>
</ref>
<ref id="CR54">
<label>54</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schöler</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Ruppert</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Suzuki</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chowdhury</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Gruss</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>New type of POU domain in germ line-specific protein Oct-4</article-title>
<source>Nature</source>
<year>1990</year>
<volume>344</volume>
<issue>6265</issue>
<fpage>435</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1038/344435a0</pub-id>
<pub-id pub-id-type="pmid">1690859</pub-id>
</element-citation>
</ref>
<ref id="CR55">
<label>55</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Niwa</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Miyazaki</surname>
<given-names>J-i</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>AG</given-names>
</name>
</person-group>
<article-title>Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells</article-title>
<source>Nat Genet</source>
<year>2000</year>
<volume>24</volume>
<issue>4</issue>
<fpage>372</fpage>
<lpage>6</lpage>
<pub-id pub-id-type="doi">10.1038/74199</pub-id>
<pub-id pub-id-type="pmid">10742100</pub-id>
</element-citation>
</ref>
<ref id="CR56">
<label>56</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matin</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Gokhale</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Draper</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Bahrami</surname>
<given-names>AR</given-names>
</name>
<name>
<surname>Morton</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>HD</given-names>
</name>
<name>
<surname>Andrews</surname>
<given-names>PW</given-names>
</name>
</person-group>
<article-title>Specific knockdown of Oct4 and
<italic>β</italic>
2-microglobulin expression by RNA interference in human embryonic stem cells and embryonic carcinoma cells</article-title>
<source>Stem Cells</source>
<year>2004</year>
<volume>22</volume>
<issue>5</issue>
<fpage>659</fpage>
<lpage>68</lpage>
<pub-id pub-id-type="doi">10.1634/stemcells.22-5-659</pub-id>
<pub-id pub-id-type="pmid">15342930</pub-id>
</element-citation>
</ref>
<ref id="CR57">
<label>57</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bock</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Kiskinis</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Verstappen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Boulting</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>ZD</given-names>
</name>
<name>
<surname>Ziller</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Croft</surname>
<given-names>GF</given-names>
</name>
<name>
<surname>Amoroso</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Oakley</surname>
<given-names>DH</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Reference Maps of human ES and iPS cell variation enable high-throughput characterization of pluripotent cell lines</article-title>
<source>Cell</source>
<year>2011</year>
<volume>144</volume>
<issue>3</issue>
<fpage>439</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2010.12.032</pub-id>
<pub-id pub-id-type="pmid">21295703</pub-id>
</element-citation>
</ref>
<ref id="CR58">
<label>58</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Briggs</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shepherd</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ovchinnikov</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Nayler</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Kao</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Morrow</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Thakar</surname>
<given-names>NY</given-names>
</name>
<name>
<surname>Soo</surname>
<given-names>SY</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Integration-free induced pluripotent stem cells model genetic and neural developmental features of down syndrome etiology</article-title>
<source>Stem Cells</source>
<year>2013</year>
<volume>31</volume>
<issue>3</issue>
<fpage>467</fpage>
<lpage>78</lpage>
<pub-id pub-id-type="doi">10.1002/stem.1297</pub-id>
<pub-id pub-id-type="pmid">23225669</pub-id>
</element-citation>
</ref>
<ref id="CR59">
<label>59</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chung</surname>
<given-names>HC</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Logan</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Alexander</surname>
<given-names>IE</given-names>
</name>
<name>
<surname>Sachdev</surname>
<given-names>PS</given-names>
</name>
<name>
<surname>Sidhu</surname>
<given-names>KS</given-names>
</name>
</person-group>
<article-title>Human induced pluripotent stem cells derived under feeder-free conditions display unique cell cycle and DNA replication gene profiles</article-title>
<source>Stem Cells Dev</source>
<year>2011</year>
<volume>21</volume>
<issue>2</issue>
<fpage>206</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="doi">10.1089/scd.2010.0440</pub-id>
<pub-id pub-id-type="pmid">21506733</pub-id>
</element-citation>
</ref>
<ref id="CR60">
<label>60</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ebert</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rose</surname>
<given-names>FF</given-names>
</name>
<name>
<surname>Mattis</surname>
<given-names>VB</given-names>
</name>
<name>
<surname>Lorson</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Svendsen</surname>
<given-names>CN</given-names>
</name>
</person-group>
<article-title>Induced pluripotent stem cells from a spinal muscular atrophy patient</article-title>
<source>Nature</source>
<year>2009</year>
<volume>457</volume>
<issue>7227</issue>
<fpage>277</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="doi">10.1038/nature07677</pub-id>
<pub-id pub-id-type="pmid">19098894</pub-id>
</element-citation>
</ref>
<ref id="CR61">
<label>61</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guenther</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Frampton</surname>
<given-names>GM</given-names>
</name>
<name>
<surname>Soldner</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Hockemeyer</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Mitalipova</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Jaenisch</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>RA</given-names>
</name>
</person-group>
<article-title>Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells</article-title>
<source>Cell Stem Cell</source>
<year>2010</year>
<volume>7</volume>
<issue>2</issue>
<fpage>249</fpage>
<lpage>57</lpage>
<pub-id pub-id-type="doi">10.1016/j.stem.2010.06.015</pub-id>
<pub-id pub-id-type="pmid">20682450</pub-id>
</element-citation>
</ref>
<ref id="CR62">
<label>62</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Maherali</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ahfeldt</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rigamonti</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Utikal</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Cowan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hochedlinger</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>A high-efficiency system for the generation and study of human induced pluripotent stem cells</article-title>
<source>Cell Stem Cell</source>
<year>2008</year>
<volume>3</volume>
<issue>3</issue>
<fpage>340</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1016/j.stem.2008.08.003</pub-id>
<pub-id pub-id-type="pmid">18786420</pub-id>
</element-citation>
</ref>
<ref id="CR63">
<label>63</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marchetto</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Carromeu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Acab</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Yeo</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Mu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gage</surname>
<given-names>FH</given-names>
</name>
<name>
<surname>Muotri</surname>
<given-names>AR</given-names>
</name>
</person-group>
<article-title>A model for neural development and treatment of Rett syndrome using human induced pluripotent stem cells</article-title>
<source>Cell</source>
<year>2010</year>
<volume>143</volume>
<issue>4</issue>
<fpage>527</fpage>
<lpage>39</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2010.10.016</pub-id>
<pub-id pub-id-type="pmid">21074045</pub-id>
</element-citation>
</ref>
<ref id="CR64">
<label>64</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Takahashi</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tanabe</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Ohnuki</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Narita</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sasaki</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yamamoto</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nakamura</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sutou</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Osafune</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Yamanaka</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Induction of pluripotency in human somatic cells via a transient state resembling primitive streak-like mesendoderm</article-title>
<source>Nat Commun</source>
<year>2014</year>
<volume>5</volume>
<fpage>3678</fpage>
<pub-id pub-id-type="pmid">24759836</pub-id>
</element-citation>
</ref>
<ref id="CR65">
<label>65</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andrade</surname>
<given-names>LN</given-names>
</name>
<name>
<surname>Nathanson</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Yeo</surname>
<given-names>GW</given-names>
</name>
<name>
<surname>Menck</surname>
<given-names>CFM</given-names>
</name>
<name>
<surname>Muotri</surname>
<given-names>AR</given-names>
</name>
</person-group>
<article-title>Evidence for premature aging due to oxidative stress in iPSCs from Cockayne syndrome</article-title>
<source>Hum Mol Genet</source>
<year>2012</year>
<volume>21</volume>
<issue>17</issue>
<fpage>3825</fpage>
<lpage>4</lpage>
<pub-id pub-id-type="doi">10.1093/hmg/dds211</pub-id>
<pub-id pub-id-type="pmid">22661500</pub-id>
</element-citation>
</ref>
<ref id="CR66">
<label>66</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hu</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Suknuntha</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Montgomery</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Slukvin</surname>
<given-names>II</given-names>
</name>
</person-group>
<article-title>Efficient generation of transgene-free induced pluripotent stem cells from normal and neoplastic bone marrow and cord blood mononuclear cells</article-title>
<source>Blood</source>
<year>2011</year>
<volume>117</volume>
<issue>14</issue>
<fpage>109</fpage>
<lpage>19</lpage>
<pub-id pub-id-type="doi">10.1182/blood-2010-07-298331</pub-id>
</element-citation>
</ref>
<ref id="CR67">
<label>67</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Moon</surname>
<given-names>JI</given-names>
</name>
<name>
<surname>Chung</surname>
<given-names>YG</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>BS</given-names>
</name>
<name>
<surname>Ko</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Cha</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Lanza</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Generation of human induced pluripotent stem cells by direct delivery of reprogramming proteins</article-title>
<source>Cell Stem Cell</source>
<year>2009</year>
<volume>4</volume>
<issue>6</issue>
<fpage>472</fpage>
<pub-id pub-id-type="doi">10.1016/j.stem.2009.05.005</pub-id>
<pub-id pub-id-type="pmid">19481515</pub-id>
</element-citation>
</ref>
<ref id="CR68">
<label>68</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loewer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cabili</surname>
<given-names>MN</given-names>
</name>
<name>
<surname>Guttman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Loh</surname>
<given-names>YH</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>IH</given-names>
</name>
<name>
<surname>Garber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Curran</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Onder</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Agarwal</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells</article-title>
<source>Nat Genet</source>
<year>2010</year>
<volume>42</volume>
<issue>12</issue>
<fpage>1113</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1038/ng.710</pub-id>
<pub-id pub-id-type="pmid">21057500</pub-id>
</element-citation>
</ref>
<ref id="CR69">
<label>69</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Si-Tayeb</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Noto</surname>
<given-names>FK</given-names>
</name>
<name>
<surname>Nagaoka</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Battle</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Duris</surname>
<given-names>C</given-names>
</name>
<name>
<surname>North</surname>
<given-names>PE</given-names>
</name>
<name>
<surname>Dalton</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Duncan</surname>
<given-names>SA</given-names>
</name>
</person-group>
<article-title>Highly efficient generation of human hepatocyte-like cells from induced pluripotent stem cells</article-title>
<source>Hepatology</source>
<year>2010</year>
<volume>51</volume>
<issue>1</issue>
<fpage>297</fpage>
<lpage>305</lpage>
<pub-id pub-id-type="doi">10.1002/hep.23354</pub-id>
<pub-id pub-id-type="pmid">19998274</pub-id>
</element-citation>
</ref>
<ref id="CR70">
<label>70</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vitale</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Matigian</surname>
<given-names>NA</given-names>
</name>
<name>
<surname>Ravishankar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bellette</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wood</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Wolvetang</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Mackay-Sim</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Variability in the generation of induced pluripotent stem cells: importance for disease modeling</article-title>
<source>Stem Cells Transl Med</source>
<year>2012</year>
<volume>1</volume>
<issue>9</issue>
<fpage>641</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="doi">10.5966/sctm.2012-0043</pub-id>
<pub-id pub-id-type="pmid">23197870</pub-id>
</element-citation>
</ref>
<ref id="CR71">
<label>71</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Smuga-Otto</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stewart</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Slukvin</surname>
<given-names>II</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>Human induced pluripotent stem cells free of vector and transgene sequences</article-title>
<source>Science</source>
<year>2009</year>
<volume>324</volume>
<issue>5928</issue>
<fpage>797</fpage>
<lpage>801</lpage>
<pub-id pub-id-type="doi">10.1126/science.1172482</pub-id>
<pub-id pub-id-type="pmid">19325077</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A48 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000A48 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:5327533
   |texte=   MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:28241739" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024