Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0022749 ( Pmc/Corpus ); précédent : 0022748; suivant : 0022750 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets</title>
<author>
<name sortKey="Yao, Fangzhou" sort="Yao, Fangzhou" uniqKey="Yao F" first="Fangzhou" last="Yao">Fangzhou Yao</name>
<affiliation>
<nlm:aff id="I1">Shanghai University of Finance and Economics, Shanghai, P.R. China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Coquery, Jeff" sort="Coquery, Jeff" uniqKey="Coquery J" first="Jeff" last="Coquery">Jeff Coquery</name>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Sup'Biotech, Villejuif, F-94800, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le Cao, Kim Anh" sort="Le Cao, Kim Anh" uniqKey="Le Cao K" first="Kim-Anh" last="Lê Cao">Kim-Anh Lê Cao</name>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22305354</idno>
<idno type="pmc">3298499</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298499</idno>
<idno type="RBID">PMC:3298499</idno>
<idno type="doi">10.1186/1471-2105-13-24</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">002274</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">002274</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets</title>
<author>
<name sortKey="Yao, Fangzhou" sort="Yao, Fangzhou" uniqKey="Yao F" first="Fangzhou" last="Yao">Fangzhou Yao</name>
<affiliation>
<nlm:aff id="I1">Shanghai University of Finance and Economics, Shanghai, P.R. China</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Coquery, Jeff" sort="Coquery, Jeff" uniqKey="Coquery J" first="Jeff" last="Coquery">Jeff Coquery</name>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="I3">Sup'Biotech, Villejuif, F-94800, France</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le Cao, Kim Anh" sort="Le Cao, Kim Anh" uniqKey="Le Cao K" first="Kim-Anh" last="Lê Cao">Kim-Anh Lê Cao</name>
<affiliation>
<nlm:aff id="I2">Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data.</p>
</sec>
<sec>
<title>Results</title>
<p>We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA).</p>
</sec>
<sec>
<title>Conclusions</title>
<p>On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment.</p>
<p>IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Jolliffe, I" uniqKey="Jolliffe I">I Jolliffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, S" uniqKey="Lee S">S Lee</name>
</author>
<author>
<name sortKey="Batzoglou, S" uniqKey="Batzoglou S">S Batzoglou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Purdom, E" uniqKey="Purdom E">E Purdom</name>
</author>
<author>
<name sortKey="Holmes, S" uniqKey="Holmes S">S Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, D" uniqKey="Huang D">D Huang</name>
</author>
<author>
<name sortKey="Zheng, C" uniqKey="Zheng C">C Zheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Engreitz, J" uniqKey="Engreitz J">J Engreitz</name>
</author>
<author>
<name sortKey="Daigle, B" uniqKey="Daigle B">B Daigle</name>
</author>
<author>
<name sortKey="Marshall, J" uniqKey="Marshall J">J Marshall</name>
</author>
<author>
<name sortKey="Altman, R" uniqKey="Altman R">R Altman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Scholz, M" uniqKey="Scholz M">M Scholz</name>
</author>
<author>
<name sortKey="Gatzek, S" uniqKey="Gatzek S">S Gatzek</name>
</author>
<author>
<name sortKey="Sterling, A" uniqKey="Sterling A">A Sterling</name>
</author>
<author>
<name sortKey="Fiehn, O" uniqKey="Fiehn O">O Fiehn</name>
</author>
<author>
<name sortKey="Selbig, J" uniqKey="Selbig J">J Selbig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Frigyesi, A" uniqKey="Frigyesi A">A Frigyesi</name>
</author>
<author>
<name sortKey="Veerla, S" uniqKey="Veerla S">S Veerla</name>
</author>
<author>
<name sortKey="Lindgren, D" uniqKey="Lindgren D">D Lindgren</name>
</author>
<author>
<name sortKey="Hoglund, M" uniqKey="Hoglund M">M Höglund</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Comon, P" uniqKey="Comon P">P Comon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hyv Rinen, A" uniqKey="Hyv Rinen A">A Hyvärinen</name>
</author>
<author>
<name sortKey="Oja, E" uniqKey="Oja E">E Oja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hyv Rinen, A" uniqKey="Hyv Rinen A">A Hyvärinen</name>
</author>
<author>
<name sortKey="Karhunen, J" uniqKey="Karhunen J">J Karhunen</name>
</author>
<author>
<name sortKey="Oja, E" uniqKey="Oja E">E Oja</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liebermeister, W" uniqKey="Liebermeister W">W Liebermeister</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wienkoop, S" uniqKey="Wienkoop S">S Wienkoop</name>
</author>
<author>
<name sortKey="Morgenthal, K" uniqKey="Morgenthal K">K Morgenthal</name>
</author>
<author>
<name sortKey="Wolschin, F" uniqKey="Wolschin F">F Wolschin</name>
</author>
<author>
<name sortKey="Scholz, M" uniqKey="Scholz M">M Scholz</name>
</author>
<author>
<name sortKey="Selbig, J" uniqKey="Selbig J">J Selbig</name>
</author>
<author>
<name sortKey="Weckwerth, W" uniqKey="Weckwerth W">W Weckwerth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rousseau, R" uniqKey="Rousseau R">R Rousseau</name>
</author>
<author>
<name sortKey="Govaerts, B" uniqKey="Govaerts B">B Govaerts</name>
</author>
<author>
<name sortKey="Verleysen, M" uniqKey="Verleysen M">M Verleysen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kong, W" uniqKey="Kong W">W Kong</name>
</author>
<author>
<name sortKey="Vanderburg, C" uniqKey="Vanderburg C">C Vanderburg</name>
</author>
<author>
<name sortKey="Gunshin, H" uniqKey="Gunshin H">H Gunshin</name>
</author>
<author>
<name sortKey="Rogers, J" uniqKey="Rogers J">J Rogers</name>
</author>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Teschendorff, A" uniqKey="Teschendorff A">A Teschendorff</name>
</author>
<author>
<name sortKey="Journee, M" uniqKey="Journee M">M Journée</name>
</author>
<author>
<name sortKey="Absil, P" uniqKey="Absil P">P Absil</name>
</author>
<author>
<name sortKey="Sepulchre, R" uniqKey="Sepulchre R">R Sepulchre</name>
</author>
<author>
<name sortKey="Caldas, C" uniqKey="Caldas C">C Caldas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jolliffe, I" uniqKey="Jolliffe I">I Jolliffe</name>
</author>
<author>
<name sortKey="Trendafilov, N" uniqKey="Trendafilov N">N Trendafilov</name>
</author>
<author>
<name sortKey="Uddin, M" uniqKey="Uddin M">M Uddin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Donoho, D" uniqKey="Donoho D">D Donoho</name>
</author>
<author>
<name sortKey="Johnstone, I" uniqKey="Johnstone I">I Johnstone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shen, H" uniqKey="Shen H">H Shen</name>
</author>
<author>
<name sortKey="Huang, Jz" uniqKey="Huang J">JZ Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davies, D" uniqKey="Davies D">D Davies</name>
</author>
<author>
<name sortKey="Bouldin, D" uniqKey="Bouldin D">D Bouldin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bushel, P" uniqKey="Bushel P">P Bushel</name>
</author>
<author>
<name sortKey="Wolfinger, Rd" uniqKey="Wolfinger R">RD Wolfinger</name>
</author>
<author>
<name sortKey="Gibson, G" uniqKey="Gibson G">G Gibson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Singh, D" uniqKey="Singh D">D Singh</name>
</author>
<author>
<name sortKey="Febbo, P" uniqKey="Febbo P">P Febbo</name>
</author>
<author>
<name sortKey="Ross, K" uniqKey="Ross K">K Ross</name>
</author>
<author>
<name sortKey="Jackson, D" uniqKey="Jackson D">D Jackson</name>
</author>
<author>
<name sortKey="Manola, J" uniqKey="Manola J">J Manola</name>
</author>
<author>
<name sortKey="Ladd, C" uniqKey="Ladd C">C Ladd</name>
</author>
<author>
<name sortKey="Tamayo, P" uniqKey="Tamayo P">P Tamayo</name>
</author>
<author>
<name sortKey="Renshaw, A" uniqKey="Renshaw A">A Renshaw</name>
</author>
<author>
<name sortKey="D Amico, A" uniqKey="D Amico A">A D'Amico</name>
</author>
<author>
<name sortKey="Richie, J" uniqKey="Richie J">J Richie</name>
</author>
<author>
<name sortKey="Lander, E" uniqKey="Lander E">E Lander</name>
</author>
<author>
<name sortKey="Loda, M" uniqKey="Loda M">M Loda</name>
</author>
<author>
<name sortKey="Kantoff, P" uniqKey="Kantoff P">P Kantoff</name>
</author>
<author>
<name sortKey="Golub, T" uniqKey="Golub T">T Golub</name>
</author>
<author>
<name sortKey="Sellers, W" uniqKey="Sellers W">W Sellers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Villas Boas, S" uniqKey="Villas Boas S">S Villas-Boâs</name>
</author>
<author>
<name sortKey="Moxley, J" uniqKey="Moxley J">J Moxley</name>
</author>
<author>
<name sortKey=" Kesson, M" uniqKey=" Kesson M">M Åkesson</name>
</author>
<author>
<name sortKey="Stephanopoulos, G" uniqKey="Stephanopoulos G">G Stephanopoulos</name>
</author>
<author>
<name sortKey="Nielsen, J" uniqKey="Nielsen J">J Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cangelosi, R" uniqKey="Cangelosi R">R Cangelosi</name>
</author>
<author>
<name sortKey="Goriely, A" uniqKey="Goriely A">A Goriely</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bezdek, J" uniqKey="Bezdek J">J Bezdek</name>
</author>
<author>
<name sortKey="Pal, N" uniqKey="Pal N">N Pal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bartlett, M" uniqKey="Bartlett M">M Bartlett</name>
</author>
<author>
<name sortKey="Movellan, J" uniqKey="Movellan J">J Movellan</name>
</author>
<author>
<name sortKey="Sejnowski, T" uniqKey="Sejnowski T">T Sejnowski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, C" uniqKey="Ball C">C Ball</name>
</author>
<author>
<name sortKey="Blake, J" uniqKey="Blake J">J Blake</name>
</author>
<author>
<name sortKey="Botstein, D" uniqKey="Botstein D">D Botstein</name>
</author>
<author>
<name sortKey="Butler, H" uniqKey="Butler H">H Butler</name>
</author>
<author>
<name sortKey="Cherry, J" uniqKey="Cherry J">J Cherry</name>
</author>
<author>
<name sortKey="Davis, A" uniqKey="Davis A">A Davis</name>
</author>
<author>
<name sortKey="Dolinski, K" uniqKey="Dolinski K">K Dolinski</name>
</author>
<author>
<name sortKey="Dwight, S" uniqKey="Dwight S">S Dwight</name>
</author>
<author>
<name sortKey="Eppig, J" uniqKey="Eppig J">J Eppig</name>
</author>
<author>
<name sortKey="Midori, A" uniqKey="Midori A">A Midori</name>
</author>
<author>
<name sortKey="Hill, D" uniqKey="Hill D">D Hill</name>
</author>
<author>
<name sortKey="Issel Tarver, L" uniqKey="Issel Tarver L">L Issel-Tarver</name>
</author>
<author>
<name sortKey="Kasarskis, A" uniqKey="Kasarskis A">A Kasarskis</name>
</author>
<author>
<name sortKey="Lewis, S" uniqKey="Lewis S">S Lewis</name>
</author>
<author>
<name sortKey="Matese, J" uniqKey="Matese J">J Matese</name>
</author>
<author>
<name sortKey="Richardson, J" uniqKey="Richardson J">J Richardson</name>
</author>
<author>
<name sortKey="Ringwald, M" uniqKey="Ringwald M">M Ringwald</name>
</author>
<author>
<name sortKey="Rubin, G" uniqKey="Rubin G">G Rubin</name>
</author>
<author>
<name sortKey="Sherlock, G" uniqKey="Sherlock G">G Sherlock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bauer, I" uniqKey="Bauer I">I Bauer</name>
</author>
<author>
<name sortKey="Vollmar, B" uniqKey="Vollmar B">B Vollmar</name>
</author>
<author>
<name sortKey="Jaeschke, H" uniqKey="Jaeschke H">H Jaeschke</name>
</author>
<author>
<name sortKey="Rensing, H" uniqKey="Rensing H">H Rensing</name>
</author>
<author>
<name sortKey="Kraemer, T" uniqKey="Kraemer T">T Kraemer</name>
</author>
<author>
<name sortKey="Larsen, R" uniqKey="Larsen R">R Larsen</name>
</author>
<author>
<name sortKey="Bauer, M" uniqKey="Bauer M">M Bauer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamadeh, H" uniqKey="Hamadeh H">H Hamadeh</name>
</author>
<author>
<name sortKey="Bushel, P" uniqKey="Bushel P">P Bushel</name>
</author>
<author>
<name sortKey="Jayadev, S" uniqKey="Jayadev S">S Jayadev</name>
</author>
<author>
<name sortKey="Disorbo, O" uniqKey="Disorbo O">O DiSorbo</name>
</author>
<author>
<name sortKey="Bennett, L" uniqKey="Bennett L">L Bennett</name>
</author>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Tennant, R" uniqKey="Tennant R">R Tennant</name>
</author>
<author>
<name sortKey="Stoll, R" uniqKey="Stoll R">R Stoll</name>
</author>
<author>
<name sortKey="Barrett, J" uniqKey="Barrett J">J Barrett</name>
</author>
<author>
<name sortKey="Paules, R" uniqKey="Paules R">R Paules</name>
</author>
<author>
<name sortKey="Blanchard, K" uniqKey="Blanchard K">K Blanchard</name>
</author>
<author>
<name sortKey="Afshari, C" uniqKey="Afshari C">C Afshari</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heijne, W" uniqKey="Heijne W">W Heijne</name>
</author>
<author>
<name sortKey="Slitt, A" uniqKey="Slitt A">A Slitt</name>
</author>
<author>
<name sortKey="Van Bladeren, P" uniqKey="Van Bladeren P">P Van Bladeren</name>
</author>
<author>
<name sortKey="Groten, J" uniqKey="Groten J">J Groten</name>
</author>
<author>
<name sortKey="Klaassen, C" uniqKey="Klaassen C">C Klaassen</name>
</author>
<author>
<name sortKey="Stierum, R" uniqKey="Stierum R">R Stierum</name>
</author>
<author>
<name sortKey="Van Ommen, B" uniqKey="Van Ommen B">B Van Ommen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heinloth, A" uniqKey="Heinloth A">A Heinloth</name>
</author>
<author>
<name sortKey="Irwin, R" uniqKey="Irwin R">R Irwin</name>
</author>
<author>
<name sortKey="Boorman, G" uniqKey="Boorman G">G Boorman</name>
</author>
<author>
<name sortKey="Nettesheim, P" uniqKey="Nettesheim P">P Nettesheim</name>
</author>
<author>
<name sortKey="Fannin, R" uniqKey="Fannin R">R Fannin</name>
</author>
<author>
<name sortKey="Sieber, S" uniqKey="Sieber S">S Sieber</name>
</author>
<author>
<name sortKey="Snell, M" uniqKey="Snell M">M Snell</name>
</author>
<author>
<name sortKey="Tucker, C" uniqKey="Tucker C">C Tucker</name>
</author>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Travlos, G" uniqKey="Travlos G">G Travlos</name>
</author>
<author>
<name sortKey="Vansant, G" uniqKey="Vansant G">G Vansant</name>
</author>
<author>
<name sortKey="Blackshear, P" uniqKey="Blackshear P">P Blackshear</name>
</author>
<author>
<name sortKey="Tennant, R" uniqKey="Tennant R">R Tennant</name>
</author>
<author>
<name sortKey="Cunningham, M" uniqKey="Cunningham M">M Cunningham</name>
</author>
<author>
<name sortKey="Paules, R" uniqKey="Paules R">R Paules</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waring, J" uniqKey="Waring J">J Waring</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wormser, U" uniqKey="Wormser U">U Wormser</name>
</author>
<author>
<name sortKey="Calp, D" uniqKey="Calp D">D Calp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flaherty, K" uniqKey="Flaherty K">K Flaherty</name>
</author>
<author>
<name sortKey="Deluca Flaherty, C" uniqKey="Deluca Flaherty C">C DeLuca-Flaherty</name>
</author>
<author>
<name sortKey="Mckay, D" uniqKey="Mckay D">D McKay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tavaria, M" uniqKey="Tavaria M">M Tavaria</name>
</author>
<author>
<name sortKey="Gabriele, T" uniqKey="Gabriele T">T Gabriele</name>
</author>
<author>
<name sortKey="Kola, I" uniqKey="Kola I">I Kola</name>
</author>
<author>
<name sortKey="Anderson, R" uniqKey="Anderson R">R Anderson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Panaretou, B" uniqKey="Panaretou B">B Panaretou</name>
</author>
<author>
<name sortKey="Siligardi, G" uniqKey="Siligardi G">G Siligardi</name>
</author>
<author>
<name sortKey="Meyer, P" uniqKey="Meyer P">P Meyer</name>
</author>
<author>
<name sortKey="Maloney, A" uniqKey="Maloney A">A Maloney</name>
</author>
<author>
<name sortKey="Sullivan, J" uniqKey="Sullivan J">J Sullivan</name>
</author>
<author>
<name sortKey="Singh, S" uniqKey="Singh S">S Singh</name>
</author>
<author>
<name sortKey="Millson, S" uniqKey="Millson S">S Millson</name>
</author>
<author>
<name sortKey="Clarke, P" uniqKey="Clarke P">P Clarke</name>
</author>
<author>
<name sortKey="Naaby Hansen, S" uniqKey="Naaby Hansen S">S Naaby-Hansen</name>
</author>
<author>
<name sortKey="Stein, R" uniqKey="Stein R">R Stein</name>
</author>
<author>
<name sortKey="Cramer, R" uniqKey="Cramer R">R Cramer</name>
</author>
<author>
<name sortKey="Mollapour, M" uniqKey="Mollapour M">M Mollapour</name>
</author>
<author>
<name sortKey="Workman, P" uniqKey="Workman P">P Workman</name>
</author>
<author>
<name sortKey="Piper, P" uniqKey="Piper P">P Piper</name>
</author>
<author>
<name sortKey="Pearl, L" uniqKey="Pearl L">L Pearl</name>
</author>
<author>
<name sortKey="Prodromou, C" uniqKey="Prodromou C">C Prodromou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Cao, Ka" uniqKey="Le Cao K">KA Lê Cao</name>
</author>
<author>
<name sortKey="Gonzalez, I" uniqKey="Gonzalez I">I González</name>
</author>
<author>
<name sortKey="Dejean, S" uniqKey="Dejean S">S Déjean</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bach, F" uniqKey="Bach F">F Bach</name>
</author>
<author>
<name sortKey="Jordan, M" uniqKey="Jordan M">M Jordan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hastie, T" uniqKey="Hastie T">T Hastie</name>
</author>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Himberg, J" uniqKey="Himberg J">J Himberg</name>
</author>
<author>
<name sortKey="Hyvarinen, A" uniqKey="Hyvarinen A">A Hyvarinen</name>
</author>
<author>
<name sortKey="Esposito, F" uniqKey="Esposito F">F Esposito</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zou, H" uniqKey="Zou H">H Zou</name>
</author>
<author>
<name sortKey="Hastie, T" uniqKey="Hastie T">T Hastie</name>
</author>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Witten, D" uniqKey="Witten D">D Witten</name>
</author>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
<author>
<name sortKey="Hastie, T" uniqKey="Hastie T">T Hastie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tibshirani, R" uniqKey="Tibshirani R">R Tibshirani</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22305354</article-id>
<article-id pub-id-type="pmc">3298499</article-id>
<article-id pub-id-type="publisher-id">1471-2105-13-24</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-13-24</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="A1">
<name>
<surname>Yao</surname>
<given-names>Fangzhou</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<xref ref-type="aff" rid="I2">2</xref>
<email>fang.yao1@uqconnect.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Coquery</surname>
<given-names>Jeff</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<xref ref-type="aff" rid="I3">3</xref>
<email>j.coquery@uq.edu.au</email>
</contrib>
<contrib contrib-type="author" corresp="yes" id="A3">
<name>
<surname>Lê Cao</surname>
<given-names>Kim-Anh</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>k.lecao@uq.edu.au</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Shanghai University of Finance and Economics, Shanghai, P.R. China</aff>
<aff id="I2">
<label>2</label>
Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, QLD 4072, Australia</aff>
<aff id="I3">
<label>3</label>
Sup'Biotech, Villejuif, F-94800, France</aff>
<pub-date pub-type="collection">
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>3</day>
<month>2</month>
<year>2012</year>
</pub-date>
<volume>13</volume>
<fpage>24</fpage>
<lpage>24</lpage>
<history>
<date date-type="received">
<day>5</day>
<month>9</month>
<year>2011</year>
</date>
<date date-type="accepted">
<day>3</day>
<month>2</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright ©2012 Yao et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2012</copyright-year>
<copyright-holder>Yao et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.biomedcentral.com/1471-2105/13/24"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data.</p>
</sec>
<sec>
<title>Results</title>
<p>We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA).</p>
</sec>
<sec>
<title>Conclusions</title>
<p>On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment.</p>
<p>IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>With the development of high throughput technologies, such as microarray and next generation sequencing data, the exploration of high throughput data sets is becoming a necessity to unveil the relevant information contained in the data. Efficient exploratory tools are therefore needed, not only to assess the quality of the data, but also to give a comprehensive overview of the system, extract significant information and cope with the high dimensionality. Indeed, many statistical approaches fail or perform poorly for two main reasons: the number of samples (or observations) is much smaller than the number of variables (the biological entities that are measured) and the data are extremely noisy.</p>
<p>In this study, we are interested in the application of unsupervised approaches to discover novel biological mechanisms and reveal insightful patterns while reducing the dimension in the data. Amongst the different categories of unsupervised approaches (clustering, model-based and projection methods), we are specifically interested in projection-based methods, which linearly decompose the data into components with a desired property. These exploratory approaches project the data into a new subspace spanned by the components. They allow dimension reduction without loss of essential information and visualization of the data in a smaller subspace.</p>
<p>Principal component analysis (PCA) [
<xref ref-type="bibr" rid="B1">1</xref>
] is a classical tool to reduce the dimension of expression data, to visualize the similarities between the biological samples, and to filter noise. It is often used as a pre-processing step for subsequent analyses. PCA projects the data into a new space spanned by the principal components (PC), which are uncorrelated and orthogonal. The PCs can successfully extract relevant information in the data. Through sample and variable representations, they can reveal experimental characteristics, as well as artefacts or bias. Sometimes, however, PCA can fail to accurately reflect our knowledge of biology for the following reasons: a) PCA assumes that gene expression follows a multivariate normal distribution and recent studies have demonstrated that microarray gene expression measurements follow instead a super-Gaussian distribution [
<xref ref-type="bibr" rid="B2">2</xref>
-
<xref ref-type="bibr" rid="B5">5</xref>
], b) PCA decomposes the data based on the maximization of its variance. In some cases, the biological question may not be related to the highest variance in the data [
<xref ref-type="bibr" rid="B6">6</xref>
].</p>
<p>A more plausible assumption of the underlying distribution of high-throughput biological data is that feature measurements following Gaussian distributions represent noise - most genes conform to this distribution as they are not expected to change at a given physiological or pathological transition [
<xref ref-type="bibr" rid="B7">7</xref>
]. Recently, an alternative approach called Independent Component Analysis (ICA) [
<xref ref-type="bibr" rid="B8">8</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
] has been introduced to analyze microrray and metabolomics data [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
-
<xref ref-type="bibr" rid="B13">13</xref>
]. In contrary to PCA, ICA identifies non-Gaussian components which are modelled as a linear combination of the biological features. These components are statistically independent, i.e. there is no overlapping information between the components. ICA therefore involves high order statistics, while PCA constrains the components to be mutually orthogonal, which involves second order statistics [
<xref ref-type="bibr" rid="B14">14</xref>
]. As a result, PCA and ICA often choose different subspaces where the data are projected. As ICA is a blind source signal separation, it is used to reduce the effects of noise or artefacts of the signal since usually, noise is generated from independent sources [
<xref ref-type="bibr" rid="B10">10</xref>
]. In the recent literature, it has been shown that the independent components from ICA were better at separating different biological groups than the principal components from PCA [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
-
<xref ref-type="bibr" rid="B7">7</xref>
]. However, although ICA has been found to be a successful alternative to PCA, it faces some limitations due to some instability, the choice of number of components to extract and high dimensionality. As ICA is a stochastic algorithm, it needs to be run several times and the results averaged in order to obtain robust results [
<xref ref-type="bibr" rid="B5">5</xref>
]. The number of independent component to extract and choose is a hard outstanding problem. It has been the convention to use a fixed number of components [
<xref ref-type="bibr" rid="B2">2</xref>
]. However, ICA does not order its components by 'relevance'. Therefore, some authors proposed to order them either with respect to their kurtosis values [
<xref ref-type="bibr" rid="B9">9</xref>
], or with respect to their l
<sub>2 </sub>
norm [
<xref ref-type="bibr" rid="B2">2</xref>
], or by using Bayesian frameworks to select the number of components [
<xref ref-type="bibr" rid="B15">15</xref>
]. In the case of high dimensional data sets, PCA is often applied as a pre-processing step to reduce the number of dimensions [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
]. In that particular case, ICA is applied on a subset of data summarized by a small number of principal components from PCA.</p>
<p>In this paper, we propose to use ICA as a denoising process of PCA, since ICA is good at separating mixed signals, i.e. noise vs. no noise. The aim is to generate denoised loading vectors. These vectors are crucial in PCA or ICA as each of them indicates the weights assigned to each biological feature in the linear combination that leads to the component. Therefore, the goal is to obtain independent components that better reflect the underlying biology in a study and achieve better dimension reduction than PCA or ICA.</p>
<p>Independent Principal Component Analysis (IPCA) makes the assumption that biologically meaningful components can be obtained if most noise has been removed in the associated loading vectors.</p>
<p>In IPCA, PCA is used as a pre-processing step to reduce the dimension of the data and to generate the loading vectors. The FastICA algorithm [
<xref ref-type="bibr" rid="B9">9</xref>
] is then applied on the previously obtained PCA loading vectors that will subsequently generate the Independent Principal Components (IPC). We use the kurtosis measure of the loading vectors to order the IPCs. We also propose a sparse variant with a built-in variable selection procedure by applying soft-thresholding on the independent loading vectors [
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B17">17</xref>
] (sIPCA).</p>
<p>In the 'Results and Discussion' Section, we first compare the classical PCA and ICA methodologies to IPCA on a simulation study. On three real biological datasets (microarray and metabolomics datasets) we demonstrate the satisfying samples clustering abilities of IPCA. We then illustrate the usefulness of variable selection with sIPCA and compare it with the results obtained from the sparse PCA from [
<xref ref-type="bibr" rid="B18">18</xref>
]. In the 'Methods' Section, we present the PCA, ICA and IPCA methodologies and describe how to perform variable selection with sIPCA.</p>
</sec>
<sec>
<title>Results and Discussion</title>
<p>We first performed a simulation study where the loading vectors follow a Gaussian or super-Gaussian distribution. On three real data sets, we compared the kurtosis values of the loading vectors as a way of measuring their non-Gaussianity and ordering the IPCs. The samples clustering ability of each approach is assessed using the Davies Bouldin index [
<xref ref-type="bibr" rid="B19">19</xref>
]. Finally, the variable selection performed by sIPCA and sPCA are compared on a simulated as well as on the Liver Toxicity data sets.</p>
<sec>
<title>Simulation study</title>
<p>In order to understand the benefits of IPCA compared to PCA or ICA, we simulated 5000 data sets of size
<italic>n </italic>
= 50 samples and
<italic>p </italic>
= 500 variables from a multivariate normal distribution with a pre-specified variance-covariance matrix described in the 'Methods' Section. Two cases were tested.</p>
<p>1. Gaussian case. The first two eigenvectors
<bold>v</bold>
<sub>1 </sub>
and
<bold>v</bold>
<sub>2</sub>
, both of length 500, follow a Gaussian distribution.</p>
<p>2. Super-Gaussian case. In this case the first two eigenvectors follow a mixture of Laplacian and uniform distributions:</p>
<p>
<disp-formula>
<mml:math id="M1" name="1471-2105-13-24-i1" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">~</mml:mo>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>25</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>50</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">and</mml:mtext>
</mml:mstyle>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">~</mml:mo>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>25</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>301</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>350</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Table
<xref ref-type="table" rid="T1">1</xref>
records the median of the angles between the simulated (known) eigenvectors and the loading vectors estimated by the three approaches. PCA gave similar results in both simulation cases, and was able to well estimate the loading vectors, while ICA performed poorly in both cases. IPCA performed quite poorly in the Gaussian case, but outperformed PCA in the super-Gaussian case.</p>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Simulation study: angle (median value) between the simulated and estimated loading vectors simulated with either Gaussian or super-Gaussian distributions.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center">Method</th>
<th align="center" colspan="2">Gaussian</th>
<th align="center" colspan="2">super-Gaussian</th>
</tr>
<tr>
<th></th>
<th colspan="4">
<hr></hr>
</th>
</tr>
<tr>
<th></th>
<th align="left">v
<sub>1</sub>
</th>
<th align="left">v
<sub>2</sub>
</th>
<th align="left">v
<sub>1</sub>
</th>
<th align="left">v
<sub>2</sub>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">PCA</td>
<td align="left">20.48</td>
<td align="left">21.61</td>
<td align="left">20.47</td>
<td align="left">21.62</td>
</tr>
<tr>
<td align="center">ICA</td>
<td align="left">85.70</td>
<td align="left">84.39</td>
<td align="left">82.13</td>
<td align="left">77.77</td>
</tr>
<tr>
<td align="center">IPCA</td>
<td align="left">70.05</td>
<td align="left">69.72</td>
<td align="left">12.46</td>
<td align="left">14.08</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Table
<xref ref-type="table" rid="T2">2</xref>
displays the kurtosis values of the first 5 loading vectors. In IPCA the components are ordered with respect to the kurtosis values of their associated loading vectors, while in the FastICA algorithm the components are ordered with respect to the kurtosis values of the independent components. In the super-Gaussian case, these results show that the kurtosis value is a good post hoc indicator of the number of components to choose, as a sudden drop in the values corresponds to irrelevant dimensions (from 3 and onwards). Low kurtosis values in the Gaussian case indicate that non-Gaussianity of the loading vectors cannot be maximized, and that the assumptions of IPCA are not met (i.e. a small number of genes heavily contribute to the observed biological process).</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>Mean value of the kurtosis measure of the first 5 loading vectors in the simulation study for PCA, IPCA and & ICA.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th></th>
<th align="left">PCA</th>
<th align="left">ICA</th>
<th align="left">IPCA</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Gaussian case</td>
<td align="left">loading 1</td>
<td align="left">-0.007</td>
<td align="left">-0.015</td>
<td align="left">0.54</td>
</tr>
<tr>
<td></td>
<td align="left">loading 2</td>
<td align="left">-0.009</td>
<td align="left">-0.013</td>
<td align="left">0.21</td>
</tr>
<tr>
<td></td>
<td align="left">loading 3</td>
<td align="left">-0.012</td>
<td align="left">-0.013</td>
<td align="left">-0.01</td>
</tr>
<tr>
<td></td>
<td align="left">loading 4</td>
<td align="left">-0.011</td>
<td align="left">-0.013</td>
<td align="left">-0.20</td>
</tr>
<tr>
<td></td>
<td align="left">loading 5</td>
<td align="left">-0.015</td>
<td align="left">-0.015</td>
<td align="left">-0.41</td>
</tr>
<tr>
<td colspan="5">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">super-Gaussian case</td>
<td align="left">loading 1</td>
<td align="left">34.75</td>
<td align="left">0.28</td>
<td align="left">52.58</td>
</tr>
<tr>
<td></td>
<td align="left">loading 2</td>
<td align="left">34.16</td>
<td align="left">0.43</td>
<td align="left">33.81</td>
</tr>
<tr>
<td></td>
<td align="left">loading 3</td>
<td align="left">-0.01</td>
<td align="left">0.42</td>
<td align="left">0.27</td>
</tr>
<tr>
<td></td>
<td align="left">loading 4</td>
<td align="left">-0.01</td>
<td align="left">0.44</td>
<td align="left">-0.02</td>
</tr>
<tr>
<td></td>
<td align="left">loading 5</td>
<td align="left">-0.02</td>
<td align="left">0.47</td>
<td align="left">-0.25</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Tables
<xref ref-type="table" rid="T1">1</xref>
and
<xref ref-type="table" rid="T2">2</xref>
seem to suggest that ICA performs poorly in both Gaussian and super-Gaussian case, even if we would expect quite the contrary in the super-Gaussian case. In the high dimensional case, PCA is used as a pre processing step in the ICA algorithm. It is likely that such step affects the ICA input matrix and that the ICA assumptions are not met. Therefore, the performance of ICA seems to be largely affected by the high number of variables.</p>
<p>PCA gave satisfactory results in both cases. In the super-Gaussian case, PCA is even able to recover some of the super-Gaussian distribution of the loading vectors. However, IPCA is able to recover the loading structure better than PCA in the super-Gaussian case (angles are smaller in Table
<xref ref-type="table" rid="T1">1</xref>
and kurtosis value is much higher for the first loading for IPCA). Depending on the (unknown) nature of the data set to be analyzed, it is therefore advisable to assess both approaches.</p>
</sec>
<sec>
<title>Application to real data sets</title>
<sec>
<title>Liver Toxicity study</title>
<p>In this study, 64 male rats were exposed to non-toxic (50 or 150 mg/kg), moderately toxic (1500 mg/kg) or severely toxic (2000 mg/kg) doses of acetaminophen (paracetamol) in a controlled experiment [
<xref ref-type="bibr" rid="B20">20</xref>
]. In this paper, we considered 50 and 150 mg/kg as low doses, and 1500 and 2000 as high doses. Necropsies were performed at 6, 18, 24 and 48 hours after exposure and the mRNA from the liver was extracted. The microarray data is arranged in matrix of 64 samples and 3116 transcripts.</p>
</sec>
<sec>
<title>Prostate cancer study</title>
<p>This study investigated whether gene expression differences could distinguish between common clinical and pathological features of prostate cancer. Expression profiles were derived from 52 prostate tumors and from 50 non tumor prostate samples (referred to as normal) using oligonucleotide microarrays containing probes for approximately 12,600 genes and ESTs. After preprocessing remains the expression of 6033 genes (see [
<xref ref-type="bibr" rid="B21">21</xref>
]) and 101 samples since one normal sample was suspected to be an outlier and was removed from the analysis.</p>
</sec>
<sec>
<title>Yeast metabolomic study</title>
<p>In this study, two Saccharomyces cerevisiae strains were used - wild-type (WT) and mutant (MT), and were carried out in batch cultures under two different environmental conditions, aerobic (AER) and anaerobic (ANA) in standard mineral media with glucose as the sole carbon source. After normalization and preprocessing, the metabolomic data results in 37 metabolites and 55 samples that include 13 MT-AER, 14 MT-ANA, 15 WT-AER and 13 WT-ANA samples (see [
<xref ref-type="bibr" rid="B22">22</xref>
] for more details).</p>
</sec>
<sec>
<title>Choosing the number the components with the kurtosis measure</title>
<p>As mentioned by [
<xref ref-type="bibr" rid="B5">5</xref>
], one major limitation of ICA is the specification and the choice of the number of components to extract. In PCA, the cumulative percentage of explained variance is a popular criterion to choose the number of principal components, since they are ordered by decreasing explained variance [
<xref ref-type="bibr" rid="B1">1</xref>
]. For the case of high dimensionality, many alternative ad hoc stopping rules have been proposed without, however, leading to a consensus (see [
<xref ref-type="bibr" rid="B23">23</xref>
] for a thorough review). In Liver Toxicity, the first 3 principal components explained 63% of the total variance, in Yeast, the first 2 principal components explained 85% of the total variance. For Prostate that contains a very large number of variables, the first 3 components only explain 51% of the total variance (7 principal components would be necessary to explain more than 60%). However, from a visualization perspective, choosing more than 3 components would be difficult to interpret.</p>
<p>The kurtosis values of the loading vectors from PCA, ICA and IPCA are displayed in Table
<xref ref-type="table" rid="T3">3</xref>
. These values differ from one approach to the others, as well as their order. In IPCA, the kurtosis value of the associated loading vectors gives a good indicator of the ability of the components to separate the clusters, since we are interested in extracting signals from non-Gaussian distributions. Respectively, the first 2, 1 and 2 components seem enough in Liver Toxicity, Prostate and Yeast to extract relevant information with IPCA, as is further discussed below.</p>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>Kurtosis measures of the loading vectors for PCA, IPCA and & ICA.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Dataset</th>
<th></th>
<th align="left">PCA</th>
<th align="left">ICA</th>
<th align="left">IPCA</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Liver Toxicity study</td>
<td align="left">loading 1</td>
<td align="left">6.588</td>
<td align="left">7.697</td>
<td align="left">9.700</td>
</tr>
<tr>
<td></td>
<td align="left">loading 2</td>
<td align="left">1.912</td>
<td align="left">2.737</td>
<td align="left">6.982</td>
</tr>
<tr>
<td></td>
<td align="left">loading 3</td>
<td align="left">6.958</td>
<td align="left">4.799</td>
<td align="left">0.672</td>
</tr>
<tr>
<td colspan="5">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Prostate cancer study</td>
<td align="left">loading 1</td>
<td align="left">-1.527</td>
<td align="left">-0.553</td>
<td align="left">1.513</td>
</tr>
<tr>
<td></td>
<td align="left">loading 2</td>
<td align="left">-0.561</td>
<td align="left">0.723</td>
<td align="left">-0.249</td>
</tr>
<tr>
<td></td>
<td align="left">loading 3</td>
<td align="left">1.176</td>
<td align="left">1.640</td>
<td align="left">-1.509</td>
</tr>
<tr>
<td colspan="5">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Yeast metabolomic study</td>
<td align="left">loading 1</td>
<td align="left">4.532</td>
<td align="left">0.274</td>
<td align="left">1.551</td>
</tr>
<tr>
<td></td>
<td align="left">loading 2</td>
<td align="left">12.261</td>
<td align="left">-0.758</td>
<td align="left">1.437</td>
</tr>
<tr>
<td></td>
<td align="left">loading 3</td>
<td align="left">4.147</td>
<td align="left">1.677</td>
<td align="left">-0.475</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Sample representation</title>
<p>The samples in each data set were projected in the new subspace spanned by the PCA, ICA or IPCA components (Figure
<xref ref-type="fig" rid="F1">1</xref>
,
<xref ref-type="fig" rid="F2">2</xref>
and
<xref ref-type="fig" rid="F3">3</xref>
). This kind of graphical output gives a better insight into the biological study as it reveals the shared similarities between samples. The comparison between the different graphics allows to visualize how each method is able to partition the samples in a way that reflects the internal structure of the data, and to extract the relevant information to represent each sample. One would expect that the samples belonging to the same biological group, or undergoing the same biological treatment would be clustered together and separated from the other groups.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p>
<bold>Liver Toxicity study: Sample representation</bold>
. Sample representation using the first two components from PCA, ICA and IPCA approaches.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-1"></graphic>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>
<bold>Prostate cancer study: sample representation</bold>
. Sample representation using the first two or three components from PCA, ICA and IPCA approaches.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-2"></graphic>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>
<bold>Yeast metabolomic study: sample representation</bold>
. Sample representation using the first two or three components from PCA, ICA and IPCA approaches.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-3"></graphic>
</fig>
<p>In Liver Toxicity, IPCA tended to better cluster the low doses together, compared to PCA or ICA (Figure
<xref ref-type="fig" rid="F1">1</xref>
). In Prostate (Figure
<xref ref-type="fig" rid="F2">2</xref>
), PCA graphical representations showed interesting patterns. Neither the first, nor the second component in PCA were relevant to separate the two groups. Instead, it was the third component that could give more insight into the expected biological characteristics of the samples. It is likely that PCA first attempts to maximize the variance of noisy signals, which has a Gaussian distribution, before being able to find the right direction to differentiate better the sample classes. For IPCA, the first component seemed already sufficient to separate the classes (as indicated by the kurtosis value of its associated loading vector in Table
<xref ref-type="table" rid="T3">3</xref>
), while two components were necessary for ICA to achieve a satisfying clustering. For the Yeast study (Figure
<xref ref-type="fig" rid="F3">3</xref>
), even though the first 2 principal components explained 85% of the total variance, it seemed that 3 components were necessary to separate WT from the MT in the AER samples with PCA, whereas 2 components were sufficient with ICA and IPCA. For all approaches, the WT and MT samples for the ANA group remain mixed and seem to share strong biological similarities.</p>
</sec>
<sec>
<title>Cluster validation</title>
<p>In order to compare how well different methods perform on a data set, different indexes were proposed to measure the similarities between clusters in the literature [
<xref ref-type="bibr" rid="B24">24</xref>
]. We used the Davies-Bouldin index [
<xref ref-type="bibr" rid="B19">19</xref>
] (see 'Methods' section). This index has both a statistical and geometric rationale, and looks for compact and well-separated clusters. The main purpose is to check whether the different approaches can distinguish between the known biological conditions or treatments on the basis of the expression data. The approach that gives the smallest index is considered the best clustering method based on this criterion. The results are displayed in Table
<xref ref-type="table" rid="T4">4</xref>
for a choice of 2 or 3 components. On the Liver Toxicity study, the Davies-Bouldin index indicated that IPCA outperformed the other approaches using 2 components. When choosing 3 components, all approaches gave similar results. On Prostate, ICA slightly outperformed IPCA for 2 components and gave similar performances for 3 components. PCA seemed clearly limited by the large number of noisy variables and was not able to provide a satisfying clustering of the samples. ICA gave good clustering performance on the Yeast data set for 2 components, followed by PCA and IPCA. It is probable that there is very little noise in this small data set.</p>
<table-wrap id="T4" position="float">
<label>Table 4</label>
<caption>
<p>Davies Bouldin index for PCA, ICA and IPCA on the three data sets.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Dataset</th>
<th align="left"># of components</th>
<th align="left">PCA</th>
<th align="left">ICA</th>
<th align="left">IPCA</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Liver Toxicity study</td>
<td align="left">2 components</td>
<td align="left">1.809</td>
<td align="left">1.923</td>
<td align="left">1.242</td>
</tr>
<tr>
<td align="left">Liver Toxicity study</td>
<td align="left">3 components</td>
<td align="left">1.523</td>
<td align="left">1.578</td>
<td align="left">1.525</td>
</tr>
<tr>
<td colspan="5">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Prostate cancer study</td>
<td align="left">2 components</td>
<td align="left">4.117</td>
<td align="left">1.679</td>
<td align="left">1.782</td>
</tr>
<tr>
<td align="left">Prostate cancer study</td>
<td align="left">3 components</td>
<td align="left">3.312</td>
<td align="left">2.316</td>
<td align="left">2.315</td>
</tr>
<tr>
<td colspan="5">
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Yeast metabolomic study</td>
<td align="left">2 components</td>
<td align="left">1.894</td>
<td align="left">1.788</td>
<td align="left">2.338</td>
</tr>
<tr>
<td align="left">Yeast metabolomic study</td>
<td align="left">3 components</td>
<td align="left">2.119</td>
<td align="left">2.139</td>
<td align="left">2.037</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In fact, the Davies-Bouldin index seemed to indicate that for large data sets (Liver Toxicity and Prostate), IPCA seems to perform best for a smaller number of components than PCA. It is able to highlight relevant information in a very small number of dimensions.</p>
</sec>
<sec>
<title>Variable selection</title>
<p>We first performed a simulation study to assess whether sIPCA could identify relevant variables. We then applied sIPCA to the Liver Toxicity study. In both cases, we compared sIPCA with the sparse PCA approach (sPCA-rSVD-soft from [
<xref ref-type="bibr" rid="B18">18</xref>
]) that we will subsequently call 'sPCA'.</p>
</sec>
</sec>
<sec>
<title>Simulated example</title>
<p>Using the simulation framework described in the 'Methods' Section, we considered two cases:</p>
<p>1. Gaussian case. The two sparse simulated eigenvectors followed a Gaussian distribution:</p>
<p>
<disp-formula>
<mml:math id="M2" name="1471-2105-13-24-i2" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">~</mml:mo>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>50</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">and</mml:mtext>
</mml:mstyle>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>301</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>350</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>2. Super-Gaussian case. In this case, we have</p>
<p>
<disp-formula>
<mml:math id="M3" name="1471-2105-13-24-i3" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">~</mml:mo>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>25</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace width="2.77695pt" class="tmspace"></mml:mspace>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mspace width="2.77695pt" class="tmspace"></mml:mspace>
<mml:mn>50</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">and</mml:mtext>
</mml:mstyle>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:mspace width="1em" class="quad"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>v</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable equalrows="false" columnlines="none" equalcolumns="false" class="array">
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">~</mml:mo>
<mml:mi>L</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>25</mml:mn>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mi>k</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>301</mml:mn>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mo class="MathClass-op"></mml:mo>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mn>350</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center">
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd class="array" columnalign="center">
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">otherwise</mml:mtext>
</mml:mstyle>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd class="array" columnalign="center"></mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Each eigenvector has 50 non-zero variables and the coefficients in the loading vectors associated to these non-zero variables follow a Gaussian or super-Gaussian distribution. sPCA and sIPCA were then applied on each generated data set. Both approaches require the degree of sparsity, which was set to 50, as an input parameter on each component. One can imagine that each eigenvector describes a particular biological process where 50 genes contribute heavily or very heavily to. Table
<xref ref-type="table" rid="T5">5</xref>
displays the correct identification rate of each loading vector estimated by sPCA and sIPCA. Given this non trivial setting, both approaches identified very well the important variables, especially on the first dimension, where sPCA slightly outperformed sIPCA. On the second dimension, the performance of sPCA and sIPCA differ as sPCA fails to differentiate each sparse signal separately - it tended to select variables from both dimensions in the second loading vector. On the contrary, and especially in the super-Gaussian case, sIPCA is able to identify each sparse eigenvector signal separately, i.e. each simulated biological process. sPCA performed better in the Gaussian than in the super-Gaussian case, whereas sIPCA performed almost equally well in both cases.</p>
<table-wrap id="T5" position="float">
<label>Table 5</label>
<caption>
<p>Simulation study: average percentage of correctly identified non-zero loadings (standard deviation) when 50 variables are selected on each dimension (each loading vector).</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Method</th>
<th align="center" colspan="2">Gaussian</th>
<th align="center" colspan="2">super-Gaussian</th>
</tr>
<tr>
<th></th>
<th colspan="4">
<hr></hr>
</th>
</tr>
<tr>
<th></th>
<th align="left">v
<sub>1</sub>
</th>
<th align="left">v
<sub>2</sub>
</th>
<th align="left">v
<sub>1</sub>
</th>
<th align="left">v
<sub>2</sub>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">sPCA</td>
<td align="left">90.30% (3.5)</td>
<td align="left">72.5% (11.6)</td>
<td align="left">85.44% (4.3)</td>
<td align="left">68.22% (10.6)</td>
</tr>
<tr>
<td align="left">sIPCA</td>
<td align="left">86.7% (8.3)</td>
<td align="left">87.7% (8.1)</td>
<td align="left">80.80% (8.6)</td>
<td align="left">82.30% (8.4)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Real example with Liver Toxicity study</title>
<sec>
<title>Choosing the number of genes to select</title>
<p>Figure
<xref ref-type="fig" rid="F4">4</xref>
displays the Davies Bouldin index for various gene selection sizes. sIPCA clearly outperformed sPCA. In order to compare the biological relevance of the two gene selections, a selection size of 50 genes per dimension, for 2 dimensions were arbitrarily chosen for the following analysis. Even if not optimal from the index perspective, this choice was mostly guided by the number of subsequent annotated genes that could be analyzed in the biological interpretation. For each approach, the genes lists of different sizes are embedded into each other, and a compromise has to be made to obtain a sufficient but not too large list of genes to be interpreted.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption>
<p>
<bold>Liver Toxicity study: Davis Bouldin index for sIPCA and sPCA</bold>
. Comparison of the Davies Bouldin index for sIPCA and sPCA with respect to the number of variables selected on 2 components.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-4"></graphic>
</fig>
</sec>
<sec>
<title>Comparison of the sparse loading vectors</title>
<p>The first and second sparse loading vectors for both sPCA and sIPCA are plotted in Figure
<xref ref-type="fig" rid="F5">5</xref>
(absolute values). In the first dimension, the loading vectors of the two sparse approaches are very similar (correlation of 0.98), a fact that was already indicated in the above simulation study. Both approaches select the same variables. On the second dimension, however, the sparse loading vectors differ (correlation of 0.28) as IPCA (similar to ICA) leads to an unnecessarily orthogonal basis which may reconstruct the data better than PCA in the presence of noise and is sensitive to high order statistics in the data rather than the covariance matrix only [
<xref ref-type="bibr" rid="B25">25</xref>
]. This explains why sPCA and sIPCA give different subspaces.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption>
<p>
<bold>Liver Toxicity study: sparse loading vectors</bold>
. Comparison of the first two sparse loading vectors generated by sIPCA and sPCA.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-5"></graphic>
</fig>
</sec>
<sec>
<title>Sample representation</title>
<p>The PCs and IPCs are displayed in Figure
<xref ref-type="fig" rid="F6">6</xref>
. Since most of the noisy variables were removed, sPCA seemed to give a better clustering of the low doses compared to Figure
<xref ref-type="fig" rid="F1">1</xref>
. sIPCA and IPCA remain similar, which shows that IPCA is well able to separate the noise from the biologically relevant signal.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption>
<p>
<bold>Liver Toxicity study: sample representation with sparse variants</bold>
. Sample representation using the first two principal components of sPCA and sIPCA approaches when 50 variables are selected on each dimension.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-6"></graphic>
</fig>
</sec>
<sec>
<title>Biological relevance of the selected genes</title>
<p>We have seen that the independent principal components indicate relevant biological similarities between the samples. We next assessed whether these selected genes were relevant to the biological study. The genes selected with either sIPCA or sPCA were further investigated using the GeneGo software [
<xref ref-type="bibr" rid="B26">26</xref>
], that can output pathways, process networks, Gene Ontology (GO) processes and molecular functions.</p>
<p>We decided to focus only on the first two dimensions as they were sufficient to obtain a satisfying cluster of the samples (see previous results). We therefore analyzed the two lists of 50 genes selected with either sIPCA or sPCA for each of these two dimensions. Amongst these 50 genes, between 33 to 39 genes were annotated and recognized by the software.</p>
<sec>
<title>Genes selected on dimension 1</title>
<p>Both methods selected genes previously highlighted in the literature as having functions in detoxification and redox regulation in response to oxidative stress: 2 cytochrome P450 genes (1) and heme oxygenase 1 were selected by sIPCA (sPCA) on the first dimension (see Additional files
<xref ref-type="supplementary-material" rid="S1">1</xref>
and
<xref ref-type="supplementary-material" rid="S2">2</xref>
). The expression of these genes has been found to be altered in biological pathways perturbed subsequent to incipient toxicity [
<xref ref-type="bibr" rid="B27">27</xref>
-
<xref ref-type="bibr" rid="B32">32</xref>
]. These genes were also previously selected with other statistical approaches by other colleagues on the same study [
<xref ref-type="bibr" rid="B20">20</xref>
].</p>
<p>A Gene Ontology enrichment analysis for each list of genes was performed. GO terms significantly enriched included biological processes related to response to unfolded proteins, protein refolding and protein stimulus, as well as response to chemical stimulus and organic substance (Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
). Although very similar, the sPCA gene list highlighted slightly more genes related to these GO terms than the sIPCA gene selection. The GO molecular functions related to these genes were, however, more enriched with sIPCA: heme and unfolded protein binding as well as oxidoreductase activity (Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
).</p>
</sec>
<sec>
<title>Genes selected on dimension 2</title>
<p>The gene lists from dimension two not only highlighted response to unfolded protein and to organic substance, but also cellular carbohydrate biosynthesis process, trygliceride, acylglycerol, neutral metabolic processes as well as catabolic process and glucogenesis. For this dimension, however, it is sIPCA that selected more relevant genes that enriched these terms (Additional file
<xref ref-type="supplementary-material" rid="S5">5</xref>
).</p>
<p>In terms of pathways, both approaches selected HSP70 and HSP90 genes. The HSP90 gene encodes a member of the heat shock proteins 70 family. These proteins play a role in cell proliferation and stress response, which explained the presence of pathways found such as oxidative stress [
<xref ref-type="bibr" rid="B33">33</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
] (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). The HSP90 proteins are highly conserved molecular chaperones that have key roles in signal transduction, protein folding and protein degradation. They play an important roles in folding newly synthesized proteins or stabilizing and refolding denatured proteins after stress [
<xref ref-type="bibr" rid="B35">35</xref>
].</p>
</sec>
<sec>
<title>Summary</title>
<p>This preliminary analysis demonstrates the ability of sIPCA and sPCA to select genes that were relevant to the biological study. These genes that are ranked as being 'important' by both approaches, participate in the determination of the components which are linear combinations of the original variables. Therefore, the expression of these selected genes not only help clustering the samples according to the different treatments or biological conditions but also have a biologically relevant meaning for the system under study.</p>
</sec>
</sec>
</sec>
</sec>
<sec>
<title>Conclusions</title>
<p>We have developed a variant of PCA called IPCA that combines the advantages of both PCA and ICA. IPCA assumes that biologically meaningful components can be obtained if most noise has been removed from the associated loading vectors. By identifying non-Gaussian loading vectors from the biological data, it better reflects the internal structure of the data compared to PCA and ICA. On simulated data sets, we showed that IPCA outperformed PCA and ICA in the super-Gaussian case, and that the kurtosis value of the loading vectors can be used to choose the number of independent principal components. On real data sets, we assessed the cluster validity using the Davies Bouldin index and showed that in high dimensional cases, IPCA could summarize the information of the data better or with a smaller number of components than PCA or ICA.</p>
<p>We also introduced sIPCA that allows an internal variable selection procedure. By applying a soft-thresholding penalization on the independent loading vectors, sparse loading vectors are obtained which enable variable selection. We have shown that sIPCA can correctly identify most of the important variables in a simulation study. For one data set, the genes selected with sIPCA and sPCA were further investigated to assess whether the two approaches were able to select genes that were relevant to the system under study given these genes, relevant GO terms, molecular functions and pathways where highlighted. This analysis demonstrated the ability of such approaches to unravel biologically relevant information. The expression of these selected genes is also decisive to cluster the samples according to their biological conditions.</p>
<p>We believe that (s)IPCA approach can be useful, not only to improve data visualization and reveal experimental characteristics, but also to identify biologically relevant variables. IPCA and sIPCA are implemented in the R package mixomics [
<xref ref-type="bibr" rid="B36">36</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
] and its associated web-interface
<ext-link ext-link-type="uri" xlink:href="http://mixomics.qfab.org">http://mixomics.qfab.org</ext-link>
.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Principal Component Analysis (PCA)</title>
<p>PCA is a classical dimension reduction and feature extraction tool in exploratory analysis, and has been used in a wide range of fields. There exists different ways of solving PCA. The most computationally efficient algorithm uses Singular value decomposition (SVD): suppose
<bold>X </bold>
is a centered
<italic>n </italic>
×
<italic>p </italic>
matrix (the mean of each column has been subtracted), where
<italic>n </italic>
is the number of samples (or observations) and
<italic>p </italic>
is the number of variables or biological entities that are measured. Then the SVD of data matrix
<bold>X </bold>
can be defined as</p>
<p>
<disp-formula id="bmcM1">
<label>(1)</label>
<mml:math id="M4" name="1471-2105-13-24-i4" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">UDV</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<bold>U </bold>
is an
<italic>n </italic>
×
<italic>p </italic>
matrix whose columns are uncorrelated (i.e.
<bold>U
<sup>T</sup>
U = I
<sub>P</sub>
</bold>
),
<bold>V </bold>
is a
<italic>p </italic>
×
<italic>p </italic>
orthogonal matrix (i.e.
<bold>V
<sup>T</sup>
V </bold>
=
<bold>I
<sub>P</sub>
</bold>
), and
<bold>D </bold>
is a
<italic>p </italic>
×
<italic>p </italic>
diagonal matrix with diagonal elements
<italic>d
<sub>j</sub>
</italic>
. We denote
<bold>u</bold>
<italic>
<sub>j </sub>
</italic>
the columns of U and
<bold>v</bold>
<italic>
<sub>j </sub>
</italic>
the columns of
<bold>V</bold>
. Then
<bold>u</bold>
<italic>
<sub>j</sub>
d
<sub>j </sub>
</italic>
is the
<italic>jth principal component </italic>
(PC) and
<bold>v</bold>
<italic>
<sub>j </sub>
</italic>
is the corresponding
<italic>loading vector </italic>
[
<xref ref-type="bibr" rid="B1">1</xref>
]. The PCs are linear combination of the original variables and the loading vectors indicate the weights assigned to each of the variables in the linear combination. The first PC accounts for the maximal amount of the total variance. Similarly, the
<italic>jth </italic>
(
<italic>j </italic>
= 2,...,
<italic>p</italic>
) PC can explain the maximal amount of variance that is not accounted by the previous
<italic>j </italic>
- 1 PCs. Therefore, most of the information contained in
<bold>X </bold>
can be reduced to a few PCs. Plotting the PCs enable a visual representation of the samples projected in the subspace spanned by the PCs. We can expect that the samples belonging to the same biological group, or undergoing the same biological treatment would be clustered together and separated from the other groups.</p>
<sec>
<title>Limitation of PCA</title>
<p>Sometimes, however, PCA may not be able to extract relevant information and may therefore provide meaningless principal components that do not describe experimental characteristics. The reason is that its linear transformation involves second order statistics (i.e. to obtain mutually non-orthogonal PCs) that might not be appropriate for biological data. PCA assumes that gene expression data have Gaussian signals, while it has been demonstrated that many gene expression data in fact have 'super-Gaussian' signals [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
].</p>
</sec>
</sec>
<sec>
<title>Independent Component Analysis (ICA)</title>
<p>Independent Component Analysis (ICA) was first proposed by [
<xref ref-type="bibr" rid="B8">8</xref>
]. ICA can reduce the effects of noise or artefacts in the data as it aims at separating a mixture of signals into their different sources. By assuming non-Gaussian signal distribution, ICA models observations as a linear combinations of variables, or components, which are chosen to be as statistically independent as possible (i.e. the different components represent different non-overlapping information). ICA therefore involves higher-order statistics [
<xref ref-type="bibr" rid="B14">14</xref>
]. In fact, ICA attempts to recover statistically independent signal from the observations of an unknown linear mixture. Several algorithms such as FastICA, Kernel ICA [
<xref ref-type="bibr" rid="B38">38</xref>
] and ProDenICA [
<xref ref-type="bibr" rid="B39">39</xref>
] were proposed to estimate the independent components. The FastICA algorithm maximizes non-Gaussianity of each component, while Kernel ICA and ProDenICA minimize mutual information between components. In this article, we used the FastICA algorithm.</p>
<p>Let
<bold>X </bold>
(
<italic>n </italic>
×
<italic>p</italic>
) be the centered data matrix and
<bold>S </bold>
(
<italic>n </italic>
×
<italic>p</italic>
) the matrix containing the independent components (IC). We can solve the ICA problem by introducing a mixing matrix
<bold>A </bold>
of size
<italic>n </italic>
×
<italic>n</italic>
:</p>
<p>
<disp-formula id="bmcM2">
<label>(2)</label>
<mml:math id="M5" name="1471-2105-13-24-i5" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi mathvariant="bold">A</mml:mi>
<mml:mi mathvariant="bold">S</mml:mi>
<mml:mi>.</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The mixing matrix
<bold>A </bold>
indicates how the independent components of
<bold>S </bold>
are linearly combined to construct
<bold>X</bold>
. If we rearrange the equation above, we get</p>
<p>
<disp-formula id="bmcM3">
<label>(3)</label>
<mml:math id="M6" name="1471-2105-13-24-i6" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<bold>W </bold>
(
<italic>n </italic>
×
<italic>n</italic>
) is the unmixing matrix that describes the inverse process of mixing the ICs. If we assume that
<bold>A </bold>
is a square and orthonormal matrix, then
<bold>W </bold>
is simply the transpose of
<bold>A</bold>
. In practice, it is very useful to whiten the data matrix
<bold>X</bold>
, i.e., to obtain Cov(
<bold>X</bold>
) =
<bold>I</bold>
. This allows the mixing matrix
<bold>A </bold>
to be orthogonal: Cov(
<bold>AS</bold>
) =
<bold>I </bold>
and
<bold>SS
<sup>T </sup>
</bold>
=
<bold>I </bold>
<bold>AA
<sup>T </sup>
</bold>
=
<bold>I</bold>
. The orthogonality of the matrix also enables fewer parameters to be estimated. In the FastICA algorithm, PCA is used as a pre-processing step to whiten the data matrix. If we rearrange (1), we therefore obtain</p>
<p>
<disp-formula id="bmcM4">
<label>(4)</label>
<mml:math id="M7" name="1471-2105-13-24-i7" overflow="scroll">
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo class="MathClass-bin">-</mml:mo>
<mml:mstyle mathvariant="bold">
<mml:mi>1</mml:mi>
</mml:mstyle>
</mml:mrow>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>since the columns of
<bold>V </bold>
are orthonormal. The rows of
<bold>U
<sup>T </sup>
</bold>
are uncorrelated and have zero mean. To complete the whitening step, we can multiply
<bold>U
<sup>T </sup>
</bold>
by
<inline-formula>
<mml:math id="M8" name="1471-2105-13-24-i8" overflow="scroll">
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mo class="MathClass-bin">-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:math>
</inline-formula>
, so that the rows of
<bold>U
<sup>T </sup>
</bold>
have unit variance. Then let
<inline-formula>
<mml:math id="M9" name="1471-2105-13-24-i9" overflow="scroll">
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>
be the whitened PCs
<inline-formula>
<mml:math id="M10" name="1471-2105-13-24-i10" overflow="scroll">
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:msqrt>
<mml:mrow>
<mml:mstyle mathvariant="bold">
<mml:mi>n</mml:mi>
</mml:mstyle>
<mml:mo class="MathClass-bin">-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msqrt>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
. The ICs are estimated through the following equation:</p>
<p>
<disp-formula id="bmcM5">
<label>(5)</label>
<mml:math id="M11" name="1471-2105-13-24-i11" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mstyle mathvariant="bold">
<mml:mi>W</mml:mi>
</mml:mstyle>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
<mml:mi>.</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>ICA assumes that Gaussian distribution represent noise, and therefore aims at identifying non-Gaussian components in the sample space that are as independent as possible. Recent studies have observed that the signal distribution of microarray data are typically super-Gaussian since only a small number of genes contribute heavily to a specific biological process [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B5">5</xref>
].</p>
<p>Two classical quantitative measures of Gaussianity are kurtosis and negentropy.</p>
<p>• Kurtosis, also called the fourth-order cumulant is defined as</p>
<p>
<disp-formula id="bmcM6">
<label>(6)</label>
<mml:math id="M12" name="1471-2105-13-24-i12" overflow="scroll">
<mml:mrow>
<mml:mi>K</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">{</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
<mml:mo class="MathClass-close">}</mml:mo>
</mml:mrow>
<mml:mo class="MathClass-bin">-</mml:mo>
<mml:mn>3</mml:mn>
<mml:mi>.</mml:mi>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
is the row of
<bold>S</bold>
, which has zero mean and unit variance,
<italic>j </italic>
= 1...
<italic>n</italic>
. The kurtosis value equals zero if
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
has a Gaussian probability density function (pdf), is positive if
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
has a spiky pdf (super-Gaussian, i.e. the pdf is relatively large at zero) and is negative if
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
has a flat pdf (sub-Gaussian, i.e. the pdf is rather constant near zero). We are interested in the spiky and flat pdf (i.e. non-Gaussian pdfs) since non-Gaussianity is regarded as independence [
<xref ref-type="bibr" rid="B9">9</xref>
]. Note that although kurtosis is both computationally and theoretically simple, it can be very sensitive to outliers. The authors in [
<xref ref-type="bibr" rid="B6">6</xref>
] proposed to order the ICs based on their kurtosis value.</p>
<p>• In the FastICA algorithm, negentropy is used as it is an excellent measurement of non-Gaussianity. Negentropy equals zero if
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
is Gaussian and is positive if
<bold>s</bold>
<italic>
<sub>i </sub>
</italic>
is non-Gaussian. It is not only easy to compute, but also very robust [
<xref ref-type="bibr" rid="B9">9</xref>
]. However, this measure does not distinguish between super-Gaussianity and sub-Gaussianity.</p>
<sec>
<title>Limitation of ICA</title>
<p>Similar to PCA, ICA also suffers from high dimensionality, which sometimes leads to the inability of the ICs to reflect the (biologically expected) internal structure of the data. Furthermore, since ICA is a stochastic algorithm, it faces the problem of convergence to local optima, leading to slightly different ICs when re-analyzing the same data [
<xref ref-type="bibr" rid="B40">40</xref>
].</p>
</sec>
</sec>
<sec>
<title>Independent Principal Component Analysis (IPCA)</title>
<p>To reduce noise and better reflect the internal structure of the data generated by the biological experiment, we propose a new approach called Independent Principal Component Analysis (IPCA). Rather than denoising the data or the PCs directly, as it is performed in ICA, we propose instead to reduce the noise in the loading vectors. Recall that the PCs, which are then used to visualize the samples and how they cluster together, are a linear combination of the original variables weighted by their elements in the corresponding loading vectors. Thus we will obtain denoised PCs by using ICA as a denoising process of the associated loading vectors.</p>
<p>We make the assumption that in a biological system, different variables (biological entities, such as genes and metabolites) have different levels of expression or abundance depending on the biological conditions. Therefore, only a few variables contribute to a biological process. These relevant variables should have important weights in the loading vectors while other irrelevant or noisy variables should have very small weights. In fact, once the loading vectors are denoised, we expect them to have a super-Gaussian distribution (as opposed to a Gaussian distribution when noise is included, see Figure
<xref ref-type="fig" rid="F7">7</xref>
for the plot of a typical super-Gaussian and a Gaussian distribution). Maximizing non-Gaussianity of the loading vectors will thus enable to remove most of the noise. IPCA is described below and summarized in Table
<xref ref-type="table" rid="T6">6</xref>
.</p>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption>
<p>
<bold>Super-Gaussian vs. Gaussian distribution</bold>
. A super-Gaussian distribution (Laplace distribution for example) has a more spiky peak and a longer tail than a Gaussian distribution. The distribution of a noiseless loading vector is similar to a super-Gaussian distribution. If a large amount of noise exists in the loading vectors, its distribution will tend towards a Gaussian distribution.</p>
</caption>
<graphic xlink:href="1471-2105-13-24-7"></graphic>
</fig>
<table-wrap id="T6" position="float">
<label>Table 6</label>
<caption>
<p>Summary of the IPCA algorithm.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left">
<bold>Algorithm </bold>
Principal Component Analysis with Independent loadings (IPCA)</td>
</tr>
<tr>
<td>
<hr></hr>
</td>
</tr>
<tr>
<td align="left">1. Implement SVD on the centered data matrix
<bold>X </bold>
to generate the whitened loading vectors
<bold>V</bold>
, and choose the number of components
<italic>m </italic>
to reduce the dimension.</td>
</tr>
<tr>
<td align="left">2. Implement FastICA on the loading vectors
<bold>V </bold>
and obtain the independent loading vectors
<bold>S
<sup>T</sup>
</bold>
.</td>
</tr>
<tr>
<td align="left">3. Project the centered data matrix
<bold>X </bold>
on the
<italic>m </italic>
independent loading vectors
<bold>s</bold>
<italic>
<sub>j </sub>
</italic>
and get the Independent PCs
<inline-formula>
<mml:math id="M13" name="1471-2105-13-24-i21" overflow="scroll">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">u</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:mi>j</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mi>.</mml:mi>
<mml:mi>.</mml:mi>
<mml:mi>.</mml:mi>
<mml:mi>m</mml:mi>
</mml:math>
</inline-formula>
.</td>
</tr>
<tr>
<td align="left">4. Order the IPCs by the kurtosis value of their corresponding independent loading vectors.</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec>
<title>Extract the loading vectors from PCA</title>
<p>PCA is applied to the
<bold>X </bold>
(
<italic>n </italic>
×
<italic>p</italic>
) centered data matrix using SVD to extract the loading vectors:</p>
<p>
<disp-formula id="bmcM7">
<label>(7)</label>
<mml:math id="M14" name="1471-2105-13-24-i13" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi mathvariant="bold">U</mml:mi>
<mml:mi mathvariant="bold">D</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">V</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where the columns of
<bold>V </bold>
contain the loading vectors. Since the mean of each loading vector is very close to zero, these vectors are approximately whitened and the FastICA algorithm can be applied on the loading vectors.</p>
</sec>
<sec>
<title>Dimension reduction</title>
<p>Dimension reduction enables a clearer interpretation without the computational burden. Therefore, only a small number of loading vectors, or, equivalently, a small number of PCs is needed to summarize most of the relevant information. However, there is no globally accepted criterion on how to choose the number of PCs to keep. We have shown that the kurtosis value of the independent loading vectors gives a post hoc indication of the number of independent principal components to be chosen (see 'Results and Discussion' Section). We have experimentally observed that 2 or 3 components were sufficient to highlight meaningful characteristics of the data and to discard much of the noise or irrelevant information.</p>
</sec>
<sec>
<title>Apply ICA on the loading vectors</title>
<p>The non-Gaussianity of the loading vectors can be maximized using equation (5):</p>
<p>
<disp-formula id="bmcM8">
<label>(8)</label>
<mml:math id="M15" name="1471-2105-13-24-i14" overflow="scroll">
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi mathvariant="bold">W</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">V</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<inline-formula>
<mml:math id="M16" name="1471-2105-13-24-i15" overflow="scroll">
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">V</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>
is the (
<italic>p </italic>
×
<italic>m</italic>
) matrix containing the m chosen loading vectors,
<bold>W </bold>
is the (
<italic>m </italic>
×
<italic>m</italic>
) unmixing matrix and
<bold>S </bold>
is the (
<italic>m </italic>
×
<italic>p</italic>
) matrix whose rows are the independent loading vectors. The new independent principal components (IPCs) are obtained by projecting
<bold>X </bold>
on
<bold>S
<sup>T</sup>
</bold>
:</p>
<p>
<disp-formula id="bmcM9">
<label>(9)</label>
<mml:math id="M17" name="1471-2105-13-24-i16" overflow="scroll">
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi mathvariant="bold">X</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="bold">S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="bold">T</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
</disp-formula>
</p>
<p>where
<inline-formula>
<mml:math id="M18" name="1471-2105-13-24-i17" overflow="scroll">
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">U</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">˜</mml:mo>
</mml:mover>
</mml:math>
</inline-formula>
is a (
<italic>n </italic>
×
<italic>m</italic>
) matrix whose columns contain the IPCs.</p>
</sec>
<sec>
<title>Ordering the IPCs</title>
<p>Recall that ICA provides unordered components and that the kurtosis measure indicates the Gaussian characteristic of a pdf. [
<xref ref-type="bibr" rid="B6">6</xref>
] recently proposed to use the kurtosis measure of the ICs to order them. In IPCA, we propose instead to order the IPCs according to the kurtosis value of the
<italic>m </italic>
independent loading vectors
<bold>s</bold>
<italic>
<sub>j </sub>
</italic>
(
<italic>j </italic>
= 1...
<italic>m</italic>
), as we are mainly interested loading vectors with a spiky pdf, indicated by a large kurtosis value.</p>
</sec>
</sec>
<sec>
<title>Sparse IPCA (sIPCA)</title>
<p>Similar to PCA and ICA, the elements in the loading vectors in IPCA indicate which variables are important or relevant to determine the principal components. Therefore, obtaining
<italic>sparse </italic>
loading vectors enables variable selection to identify important variables of potential biological relevance, as well as removing noisy variables while calculating the IPCs in the algorithm.</p>
<p>Various sparse PCA approaches have been proposed in the literature: SPCA [
<xref ref-type="bibr" rid="B41">41</xref>
], sPCA-rSVD [
<xref ref-type="bibr" rid="B18">18</xref>
], SPC [
<xref ref-type="bibr" rid="B42">42</xref>
]). In these approaches, the loading vectors are penalized using Lasso [
<xref ref-type="bibr" rid="B43">43</xref>
] to perform an internal variable selection. In fact, all these sparse PCA variants can be approximately solved by using soft-thresholding [
<xref ref-type="bibr" rid="B17">17</xref>
]. Our sparse IPCA therefore directly implements soft-thresholding on the independent loading vector
<bold>s</bold>
<italic>
<sub>j </sub>
</italic>
to select the variables:</p>
<p>
<disp-formula id="bmcM10">
<label>(10)</label>
<mml:math id="M19" name="1471-2105-13-24-i18" overflow="scroll">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>ŝ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mi>s</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>n</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:mo class="MathClass-rel"></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-rel"></mml:mo>
<mml:mo class="MathClass-bin">-</mml:mo>
<mml:mi>γ</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mrow>
<mml:mo class="MathClass-bin">+</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<italic>γ </italic>
is the threshold and is applied on each element
<italic>k </italic>
of the loading vector
<bold>s</bold>
<italic>
<sub>j </sub>
</italic>
(
<italic>k </italic>
= 1...
<italic>p, j </italic>
= 1...
<italic>m</italic>
) so as to obtain the sparse loading vector
<inline-formula>
<mml:math id="M20" name="1471-2105-13-24-i19" overflow="scroll">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi mathvariant="bold">s</mml:mi>
</mml:mrow>
<mml:mo class="MathClass-op">^</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
. The variables whose original weights are smaller than the threshold
<italic>γ </italic>
will be penalized to have zero weights. A classical method to choose
<italic>γ </italic>
is cross-validation. In practice, however,
<italic>γ </italic>
has been replaced by the degree of sparsity (i.e., the number of non-zero elements in each loading vector, see following paragraph). In this way, we can control how many variables to select and save some computational time.</p>
</sec>
<sec>
<title>Using (s)IPCA</title>
<p>IPCA and sIPCA are implemented in the R package mixomics which is dedicated to the analysis of large biological data sets [
<xref ref-type="bibr" rid="B36">36</xref>
,
<xref ref-type="bibr" rid="B37">37</xref>
]. The use of the approaches is straightforward: the user needs to input the data set, and to choose the number of components to keep (usually set to a small value). In the case of the sparse version, the number of variables to select on each sIPCA dimension must also be given. The number of components can be reconsidered afterwards by extracting the kurtosis value of the loading vectors, i.e., identifying when a sudden drop occurs in the obtained values will indicate how many components are enough to explain most of the information in the data.</p>
<p>The number of variables to select is still an open issue (as pinpointed by many authors working on sparse approaches, [
<xref ref-type="bibr" rid="B18">18</xref>
]) as in such studies, we are often limited by the number of samples. Tuning the number of variables to select therefore mostly relies on the biological question. Sometimes, an optimal but too short gene selection may not suffice to give a comprehensive biological interpretation, and sometimes, the experimental validation might be limited in the case of a too large gene selection.</p>
<p>In our example, for the sake of simplicity, we have set the same number of variables to select on each dimension.</p>
</sec>
<sec>
<title>Simulation studies</title>
<p>In the different simulation studies, we used the following framework (previously proposed by [
<xref ref-type="bibr" rid="B18">18</xref>
]).
<bold>Σ </bold>
is the variance-covariance matrix of size 500 × 500, whose first two normalized eigenvectors
<bold>v</bold>
<sub>1 </sub>
and
<bold>v</bold>
<sub>2</sub>
, both of length 500 are simulated for different cases described the the 'Results and Discussion' Section. The other eigenvectors were drawn from
<italic>U</italic>
0[
<xref ref-type="bibr" rid="B1">1</xref>
]. A Gram-Schmidt orthogonalization method was applied to obtain the orthogonal matrix
<bold>V </bold>
whose columns contain
<bold>v</bold>
<sub>1 </sub>
and
<bold>v</bold>
<sub>2 </sub>
and the other eigenvectors. To make the first two eigenvectors dominate, the first two eigenvalues were set to
<italic>c</italic>
<sub>1 </sub>
= 400,
<italic>c</italic>
<sub>2 </sub>
= 300 and
<italic>c
<sub>k </sub>
</italic>
= 1 for
<italic>k </italic>
= 3,..., 500. Let
<bold>C </bold>
=
<italic>diag</italic>
{
<italic>c</italic>
<sub>1</sub>
,..b.,
<italic>c</italic>
<sub>500</sub>
} the eigenvalue matrix, then
<bold>Σ </bold>
=
<bold>VCV
<sup>T</sup>
</bold>
. The data are then generated from a multivariate normal distribution N(
<bold>0</bold>
,
<bold>Σ</bold>
), with
<italic>n </italic>
= 50 samples and
<italic>p </italic>
= 500 variables.</p>
</sec>
<sec>
<title>Davies-Bouldin index</title>
<p>Davies-Bouldin measure is an index of crisp cluster validity [
<xref ref-type="bibr" rid="B19">19</xref>
]. This index compares the within-cluster scatter with the between-cluster separation. It was chosen in this study because of its statistics and geometric rationale. The Davies-Bouldin index is defined as</p>
<p>
<disp-formula>
<mml:math id="M21" name="1471-2105-13-24-i20" overflow="scroll">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:mfrac>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mo mathsize="big"></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo class="MathClass-rel">=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>K</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:munder class="msub">
<mml:mrow>
<mml:mstyle class="text">
<mml:mtext class="textsf" mathvariant="sans-serif">max</mml:mtext>
</mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo class="MathClass-rel"></mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:munder>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-bin">+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mrow>
<mml:mo class="MathClass-open">(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo class="MathClass-punc">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo class="MathClass-close">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
<mml:mo class="MathClass-punc">,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where
<italic>c
<sub>i </sub>
</italic>
is the centroid of cluster
<italic>i</italic>
, and
<italic>σ
<sub>i </sub>
</italic>
is the average distance of all elements in cluster
<italic>i </italic>
to centroid
<italic>c
<sub>i </sub>
</italic>
and
<italic>d</italic>
(
<italic>c
<sub>i</sub>
, c
<sub>j</sub>
</italic>
) is the distance between the two centroids,
<italic>K </italic>
is the number of known biological conditions or treatments. Depending on the number of components that were chosen, we applied a 2- or 3-norm distance. Geometrically speaking, we are seeking to minimize the within-cluster scatter (the numerator) while maximizing the between class separation (the denominator). Therefore, for a given number of components, the approach that gives the lowest index has the best clustering ability.</p>
</sec>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<title>Authors' contributions</title>
<p>FY performed the statistical analysis, wrote the R functions and drafted the manuscript. KALC participated in the design of the manuscript and helped drafting the manuscript. JC participated in the implementation of the R functions and implemented IPCA in the web-interface. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>List of genes from sIPCA</bold>
. List of genes and gene title selected by sIPCA on each dimension on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S1.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2</title>
<p>
<bold>List of genes from sPCA</bold>
. List of genes and gene title selected by sPCA on each dimension on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S2.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>GeneGo analysis</bold>
. Comparison of the GO processes for the genes selected on dimension 1 with sIPCA and sPCA on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S3.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4</title>
<p>
<bold>GeneGo analysis</bold>
. Comparison of the GO molecular functions for the genes selected on dimension 1 with sIPCA and sPCA on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S4.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5</title>
<p>
<bold>GeneGo analysis</bold>
. Comparison of the GO processes for the genes selected on dimension 2 with sIPCA and sPCA on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S5.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6</title>
<p>
<bold>GeneGo analysis</bold>
. Comparison of the GeneGO pathways maps for the genes selected on dimension 1 with sIPCA and sPCA on Liver Toxicity study.</p>
</caption>
<media xlink:href="1471-2105-13-24-S6.XLS" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>We would like to thank Dr Thibault Jombart (Imperial College) for his useful advice. This work was supported, in part, by the Wound Management Innovation CRC (established and supported under the Australian Government's Cooperative Research Centres Program).</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="book">
<name>
<surname>Jolliffe</surname>
<given-names>I</given-names>
</name>
<source>Principal Component Analysis</source>
<year>2002</year>
<edition>second</edition>
<publisher-name>Springer, New York</publisher-name>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<name>
<surname>Lee</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Batzoglou</surname>
<given-names>S</given-names>
</name>
<article-title>Application of independent component analysis to microarrays</article-title>
<source>Genome Biology</source>
<year>2003</year>
<volume>4</volume>
<issue>11</issue>
<fpage>R76</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2003-4-11-r76</pub-id>
<pub-id pub-id-type="pmid">14611662</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Purdom</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Holmes</surname>
<given-names>S</given-names>
</name>
<article-title>Error distribution for gene expression data</article-title>
<source>Statistical applications in genetics and molecular biology</source>
<year>2005</year>
<volume>4</volume>
<fpage>16</fpage>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>Huang</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Zheng</surname>
<given-names>C</given-names>
</name>
<article-title>Independent component analysis-based penalized discriminant method for tumor classification using gene expression data</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<issue>15</issue>
<fpage>1855</fpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btl190</pub-id>
<pub-id pub-id-type="pmid">16709589</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Engreitz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Daigle</surname>
<given-names>B</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Marshall</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Altman</surname>
<given-names>R</given-names>
</name>
<article-title>Independent component analysis: Mining microarray data for fundamental human gene expression modules</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2010</year>
<volume>43</volume>
<fpage>932</fpage>
<lpage>944</lpage>
<pub-id pub-id-type="doi">10.1016/j.jbi.2010.07.001</pub-id>
<pub-id pub-id-type="pmid">20619355</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Scholz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gatzek</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sterling</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fiehn</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Selbig</surname>
<given-names>J</given-names>
</name>
<article-title>Metabolite fingerprinting: detecting biological features by independent component analysis</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<issue>15</issue>
<fpage>2447</fpage>
<lpage>2454</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bth270</pub-id>
<pub-id pub-id-type="pmid">15087312</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Frigyesi</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Veerla</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lindgren</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Höglund</surname>
<given-names>M</given-names>
</name>
<article-title>Independent component analysis reveals new and biologically significant structures in micro array data</article-title>
<source>BMC bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>290</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-7-290</pub-id>
<pub-id pub-id-type="pmid">16762055</pub-id>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Comon</surname>
<given-names>P</given-names>
</name>
<article-title>Independent component analysis, a new concept?</article-title>
<source>Signal Process</source>
<year>1994</year>
<volume>36</volume>
<fpage>287</fpage>
<lpage>314</lpage>
<pub-id pub-id-type="doi">10.1016/0165-1684(94)90029-9</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Hyvärinen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Oja</surname>
<given-names>E</given-names>
</name>
<article-title>Indepedent Component Analysis: Algorithms and Applications</article-title>
<source>Neural Networks</source>
<year>2000</year>
<volume>13</volume>
<issue>4-5</issue>
<fpage>411</fpage>
<lpage>430</lpage>
<pub-id pub-id-type="doi">10.1016/S0893-6080(00)00026-5</pub-id>
<pub-id pub-id-type="pmid">10946390</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="book">
<name>
<surname>Hyvärinen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Karhunen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Oja</surname>
<given-names>E</given-names>
</name>
<source>Independent Component Analysis</source>
<year>2001</year>
<publisher-name>John Wiley & Sons</publisher-name>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Liebermeister</surname>
<given-names>W</given-names>
</name>
<article-title>Linear modes of gene expression determined by independent component analysis</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<fpage>51</fpage>
<lpage>60</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/18.1.51</pub-id>
<pub-id pub-id-type="pmid">11836211</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Wienkoop</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Morgenthal</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Wolschin</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Scholz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Selbig</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Weckwerth</surname>
<given-names>W</given-names>
</name>
<article-title>Integration of Metabolomic and Proteomic Phenotypes</article-title>
<source>Molecular & Cellular Proteomics</source>
<year>2008</year>
<volume>7</volume>
<fpage>1725</fpage>
<lpage>1736</lpage>
<pub-id pub-id-type="doi">10.1074/mcp.M700273-MCP200</pub-id>
<pub-id pub-id-type="pmid">18445580</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="other">
<name>
<surname>Rousseau</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Govaerts</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Verleysen</surname>
<given-names>M</given-names>
</name>
<article-title>Combination of Independent Component Analysis and statistical modelling for the identification of metabonomic biomarkers in H-NMR spectroscopy</article-title>
<source>Tech rep, Universté Catholique de Louvain and Universté Paris I</source>
<year>2009</year>
<pub-id pub-id-type="pmid">21426953</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Kong</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Vanderburg</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Gunshin</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Rogers</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
<article-title>A review of independent component analysis application to microarray gene expression data</article-title>
<source>BioTechniques</source>
<year>2008</year>
<volume>45</volume>
<issue>5</issue>
<fpage>501</fpage>
<pub-id pub-id-type="doi">10.2144/000112950</pub-id>
<pub-id pub-id-type="pmid">19007336</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<name>
<surname>Teschendorff</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Journée</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Absil</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sepulchre</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Caldas</surname>
<given-names>C</given-names>
</name>
<article-title>Elucidating the altered transcriptional programs in breast cancer using independent component analysis</article-title>
<source>PLoS computational biology</source>
<year>2007</year>
<volume>3</volume>
<issue>8</issue>
<fpage>e161</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0030161</pub-id>
<pub-id pub-id-type="pmid">17708679</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Jolliffe</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Trendafilov</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Uddin</surname>
<given-names>M</given-names>
</name>
<article-title>A modified principal component technique based on the lasso</article-title>
<source>Journal of Computational and Graphical Statistics</source>
<year>2003</year>
<volume>12</volume>
<fpage>531</fpage>
<lpage>547</lpage>
<pub-id pub-id-type="doi">10.1198/1061860032148</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Donoho</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Johnstone</surname>
<given-names>I</given-names>
</name>
<article-title>Ideal spatial adaptation by wavelet shrinkage</article-title>
<source>Biometrika</source>
<year>1994</year>
<volume>81</volume>
<fpage>425</fpage>
<lpage>455</lpage>
<pub-id pub-id-type="doi">10.1093/biomet/81.3.425</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Shen</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>JZ</given-names>
</name>
<article-title>Sparse Principal Component Analysis via Regularized Low Rank Matrix Approximation</article-title>
<source>Journal of Multivariate Analysis</source>
<year>2008</year>
<volume>99</volume>
<fpage>1015</fpage>
<lpage>1034</lpage>
<pub-id pub-id-type="doi">10.1016/j.jmva.2007.06.007</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="other">
<name>
<surname>Davies</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Bouldin</surname>
<given-names>D</given-names>
</name>
<article-title>A cluster separation measure</article-title>
<source>Pattern Analysis and Machine Intelligence, IEEE Transactions on</source>
<year>1979</year>
<issue>2</issue>
<fpage>224</fpage>
<lpage>227</lpage>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<name>
<surname>Bushel</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Wolfinger</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Gibson</surname>
<given-names>G</given-names>
</name>
<article-title>Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes</article-title>
<source>BMC Systems Biology</source>
<year>2007</year>
<volume>1</volume>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Singh</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Febbo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ross</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Jackson</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Manola</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ladd</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Tamayo</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Renshaw</surname>
<given-names>A</given-names>
</name>
<name>
<surname>D'Amico</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Richie</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Loda</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kantoff</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Golub</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sellers</surname>
<given-names>W</given-names>
</name>
<article-title>Gene expression correlates of clinical prostate cancer behavior</article-title>
<source>Cancer cell</source>
<year>2002</year>
<volume>1</volume>
<issue>2</issue>
<fpage>203</fpage>
<lpage>209</lpage>
<pub-id pub-id-type="doi">10.1016/S1535-6108(02)00030-2</pub-id>
<pub-id pub-id-type="pmid">12086878</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Villas-Boâs</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Moxley</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Åkesson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stephanopoulos</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>J</given-names>
</name>
<article-title>High-throughput metabolic state analysis: the missing link in integrated functional genomics</article-title>
<source>Biochemical Journal</source>
<year>2005</year>
<volume>388</volume>
<fpage>669</fpage>
<lpage>677</lpage>
<pub-id pub-id-type="doi">10.1042/BJ20041162</pub-id>
<pub-id pub-id-type="pmid">15667247</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<name>
<surname>Cangelosi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Goriely</surname>
<given-names>A</given-names>
</name>
<article-title>Component retention in principal component analysis with application to cDNA microarray data</article-title>
<source>Biology Direct</source>
<year>2007</year>
<volume>2</volume>
<issue>2</issue>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<name>
<surname>Bezdek</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Pal</surname>
<given-names>N</given-names>
</name>
<article-title>Some new indexes of cluster validity</article-title>
<source>Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on</source>
<year>1998</year>
<volume>28</volume>
<issue>3</issue>
<fpage>301</fpage>
<lpage>315</lpage>
<pub-id pub-id-type="doi">10.1109/3477.678624</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<name>
<surname>Bartlett</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Movellan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Sejnowski</surname>
<given-names>T</given-names>
</name>
<article-title>Face recognition by independent component analysis</article-title>
<source>Neural Networks, IEEE Transactions on</source>
<year>2002</year>
<volume>13</volume>
<issue>6</issue>
<fpage>1450</fpage>
<lpage>1464</lpage>
<pub-id pub-id-type="doi">10.1109/TNN.2002.804287</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cherry</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Dolinski</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dwight</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Eppig</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Midori</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Issel-Tarver</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Kasarskis</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Matese</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Richardson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ringwald</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Sherlock</surname>
<given-names>G</given-names>
</name>
<article-title>Gene Ontology: tool for the unification of biology</article-title>
<source>Nature genetics</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="doi">10.1038/75556</pub-id>
<pub-id pub-id-type="pmid">10802651</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<name>
<surname>Bauer</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Vollmar</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Jaeschke</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Rensing</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kraemer</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Larsen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>M</given-names>
</name>
<article-title>Transcriptional activation of heme oxygenase-1 and its functional significance in acetaminophen-induced hepatitis and hepatocellular injury in the rat</article-title>
<source>Journal of hepatology</source>
<year>2000</year>
<volume>33</volume>
<issue>3</issue>
<fpage>395</fpage>
<lpage>406</lpage>
<pub-id pub-id-type="doi">10.1016/S0168-8278(00)80275-5</pub-id>
<pub-id pub-id-type="pmid">11019995</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<name>
<surname>Hamadeh</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bushel</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Jayadev</surname>
<given-names>S</given-names>
</name>
<name>
<surname>DiSorbo</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Bennett</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tennant</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Stoll</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Barrett</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Paules</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Blanchard</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Afshari</surname>
<given-names>C</given-names>
</name>
<article-title>Prediction of compound signature using high density gene expression profiling</article-title>
<source>Toxicological Sciences</source>
<year>2002</year>
<volume>67</volume>
<issue>2</issue>
<fpage>232</fpage>
<pub-id pub-id-type="doi">10.1093/toxsci/67.2.232</pub-id>
<pub-id pub-id-type="pmid">12011482</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Heijne</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Slitt</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Van Bladeren</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Groten</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Klaassen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Stierum</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Van Ommen</surname>
<given-names>B</given-names>
</name>
<article-title>Bromobenzene-induced hepatotoxicity at the transcriptome level</article-title>
<source>Toxicological Sciences</source>
<year>2004</year>
<volume>79</volume>
<issue>2</issue>
<fpage>411</fpage>
<pub-id pub-id-type="doi">10.1093/toxsci/kfh128</pub-id>
<pub-id pub-id-type="pmid">15056800</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Heinloth</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Irwin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Boorman</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Nettesheim</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fannin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Sieber</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Snell</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tucker</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Travlos</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Vansant</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Blackshear</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tennant</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cunningham</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Paules</surname>
<given-names>R</given-names>
</name>
<article-title>Gene expression profiling of rat livers reveals indicators of potential adverse effects</article-title>
<source>Toxicological Sciences</source>
<year>2004</year>
<volume>80</volume>
<fpage>193</fpage>
<pub-id pub-id-type="doi">10.1093/toxsci/kfh145</pub-id>
<pub-id pub-id-type="pmid">15084756</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Waring</surname>
<given-names>J</given-names>
</name>
<article-title>Development of a DNA microarray for toxicology based on hepatotoxin-regulated sequences</article-title>
<source>Environmental health perspectives</source>
<year>2003</year>
<volume>111</volume>
<issue>6</issue>
<fpage>863</fpage>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<name>
<surname>Wormser</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Calp</surname>
<given-names>D</given-names>
</name>
<article-title>Increased levels of hepatic metallothionein in rat and mouse after injection of acetaminophen</article-title>
<source>Toxicology</source>
<year>1988</year>
<volume>53</volume>
<issue>2-3</issue>
<fpage>323</fpage>
<lpage>329</lpage>
<pub-id pub-id-type="doi">10.1016/0300-483X(88)90224-7</pub-id>
<pub-id pub-id-type="pmid">3212790</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Flaherty</surname>
<given-names>K</given-names>
</name>
<name>
<surname>DeLuca-Flaherty</surname>
<given-names>C</given-names>
</name>
<name>
<surname>McKay</surname>
<given-names>D</given-names>
</name>
<article-title>Three-dimensional structure of the ATPase fragment of a 70 K heat-shock cognate protein</article-title>
<source>Nature</source>
<year>1990</year>
<volume>346</volume>
<issue>6285</issue>
<fpage>623</fpage>
<pub-id pub-id-type="doi">10.1038/346623a0</pub-id>
<pub-id pub-id-type="pmid">2143562</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<name>
<surname>Tavaria</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gabriele</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Kola</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>R</given-names>
</name>
<article-title>A hitchhiker's guide to the human Hsp70 family</article-title>
<source>Cell Stress & Chaperones</source>
<year>1996</year>
<volume>1</volume>
<fpage>23</fpage>
<pub-id pub-id-type="doi">10.1379/1466-1268(1996)001<0023:AHSGTT>2.3.CO;2</pub-id>
<pub-id pub-id-type="pmid">9222585</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<name>
<surname>Panaretou</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Siligardi</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Maloney</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Millson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Clarke</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Naaby-Hansen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Stein</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cramer</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mollapour</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Workman</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Piper</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pearl</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Prodromou</surname>
<given-names>C</given-names>
</name>
<article-title>Activation of the ATPase activity of hsp90 by the stress-regulated cochaperone aha1</article-title>
<source>Molecular cell</source>
<year>2002</year>
<volume>10</volume>
<issue>6</issue>
<fpage>1307</fpage>
<lpage>1318</lpage>
<pub-id pub-id-type="doi">10.1016/S1097-2765(02)00785-2</pub-id>
<pub-id pub-id-type="pmid">12504007</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<name>
<surname>Lê Cao</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>González</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Déjean</surname>
<given-names>S</given-names>
</name>
<article-title>integrOmics: an R package to unravel relationships between two omics data sets</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<issue>21</issue>
<fpage>2855</fpage>
<lpage>2856</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp515</pub-id>
<pub-id pub-id-type="pmid">19706745</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="other">
<article-title>mixOmics</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.math.univ-toulouse.fr/~biostat/mixOmics">http://www.math.univ-toulouse.fr/~ biostat/mixOmics</ext-link>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Bach</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Jordan</surname>
<given-names>M</given-names>
</name>
<article-title>Kernel Independent Component Analysis</article-title>
<source>Journal of Machine Learning Research</source>
<year>2002</year>
<volume>3</volume>
<fpage>1</fpage>
<lpage>48</lpage>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="other">
<name>
<surname>Hastie</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<article-title>Independent Components Analysis through Product Density Estimation</article-title>
<year>2002</year>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Himberg</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hyvarinen</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Esposito</surname>
<given-names>F</given-names>
</name>
<article-title>Validating the independent components of neuroimaging time series via clustering and visualization</article-title>
<source>Neuroimage</source>
<year>2004</year>
<volume>22</volume>
<issue>3</issue>
<fpage>1214</fpage>
<lpage>1222</lpage>
<pub-id pub-id-type="doi">10.1016/j.neuroimage.2004.03.027</pub-id>
<pub-id pub-id-type="pmid">15219593</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Zou</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Hastie</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<article-title>Sparse Principal Component Analysis</article-title>
<source>J Comput Graph Statist</source>
<year>2006</year>
<volume>15</volume>
<issue>2</issue>
<fpage>265</fpage>
<lpage>286</lpage>
<pub-id pub-id-type="doi">10.1198/106186006X113430</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Witten</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hastie</surname>
<given-names>T</given-names>
</name>
<article-title>A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis</article-title>
<source>Biostatistics</source>
<year>2009</year>
<volume>10</volume>
<issue>3</issue>
<fpage>515</fpage>
<pub-id pub-id-type="doi">10.1093/biostatistics/kxp008</pub-id>
<pub-id pub-id-type="pmid">19377034</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Tibshirani</surname>
<given-names>R</given-names>
</name>
<article-title>Regression shrinkage and selection via the lasso</article-title>
<source>Journal of the Royal Statistical Society, Series B</source>
<year>1996</year>
<volume>58</volume>
<fpage>267</fpage>
<lpage>288</lpage>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0022749 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0022749 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024