Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Identifieur interne : 001092 ( Pmc/Checkpoint ); précédent : 001091; suivant : 001093

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Auteurs : Laurent Jacob

Source :

RBID : PMC:4679071

Abstract

When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.


Url:
DOI: 10.1093/biostatistics/kxv026
PubMed: 26286812
PubMed Central: 4679071


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4679071

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed</title>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26286812</idno>
<idno type="pmc">4679071</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4679071</idno>
<idno type="RBID">PMC:4679071</idno>
<idno type="doi">10.1093/biostatistics/kxv026</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000251</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000251</idno>
<idno type="wicri:Area/Pmc/Curation">000251</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000251</idno>
<idno type="wicri:Area/Pmc/Checkpoint">001092</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">001092</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed</title>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
</author>
</analytic>
<series>
<title level="j">Biostatistics (Oxford, England)</title>
<idno type="ISSN">1465-4644</idno>
<idno type="eISSN">1468-4357</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package
<monospace>RUVnormalize</monospace>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alter, O" uniqKey="Alter O">O. Alter</name>
</author>
<author>
<name sortKey="Brown, P O" uniqKey="Brown P">P. O. Brown</name>
</author>
<author>
<name sortKey="Botstein, D" uniqKey="Botstein D">D. Botstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benito, M" uniqKey="Benito M">M. Benito</name>
</author>
<author>
<name sortKey="Parker, J" uniqKey="Parker J">J. Parker</name>
</author>
<author>
<name sortKey="Du, Q" uniqKey="Du Q">Q. Du</name>
</author>
<author>
<name sortKey="Wu, J" uniqKey="Wu J">J. Wu</name>
</author>
<author>
<name sortKey="Xiang, D" uniqKey="Xiang D">D. Xiang</name>
</author>
<author>
<name sortKey="Perou, C M" uniqKey="Perou C">C. M. Perou</name>
</author>
<author>
<name sortKey="Marron, J S" uniqKey="Marron J">J. S. Marron</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bolstad, B M" uniqKey="Bolstad B">B. M. Bolstad</name>
</author>
<author>
<name sortKey="Irizarry, R A" uniqKey="Irizarry R">R. A. Irizarry</name>
</author>
<author>
<name sortKey="Astr, M" uniqKey="Astr M">M. Astr</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P. Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Livera, A M" uniqKey="De Livera A">A. M. De Livera</name>
</author>
<author>
<name sortKey="Sysi Aho, M" uniqKey="Sysi Aho M">M. Sysi-Aho</name>
</author>
<author>
<name sortKey="Jacob, L" uniqKey="Jacob L">L. Jacob</name>
</author>
<author>
<name sortKey="Gagnon Bartsch, J A" uniqKey="Gagnon Bartsch J">J. A. Gagnon-Bartsch</name>
</author>
<author>
<name sortKey="Castillo, S" uniqKey="Castillo S">S. Castillo</name>
</author>
<author>
<name sortKey="Simpson, J A" uniqKey="Simpson J">J. A. Simpson</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P. Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Freedman, D" uniqKey="Freedman D">D. Freedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gagnon Bartsch, J" uniqKey="Gagnon Bartsch J">J. Gagnon-Bartsch</name>
</author>
<author>
<name sortKey="Jacob, L" uniqKey="Jacob L">L. Jacob</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P. Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gagnon Bartsch, J A" uniqKey="Gagnon Bartsch J">J. A. Gagnon-Bartsch</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P. Speed</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hotelling, H" uniqKey="Hotelling H">H. Hotelling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jacob, L" uniqKey="Jacob L">L. Jacob</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jacob, L" uniqKey="Jacob L">L. Jacob</name>
</author>
<author>
<name sortKey="Van Den Akker, J" uniqKey="Van Den Akker J">J. Van Den Akker</name>
</author>
<author>
<name sortKey="Witteveen, A" uniqKey="Witteveen A">A. Witteveen</name>
</author>
<author>
<name sortKey="Goosens, I" uniqKey="Goosens I">I. Goosens</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P. Speed</name>
</author>
<author>
<name sortKey="Glas, A" uniqKey="Glas A">A. Glas</name>
</author>
<author>
<name sortKey="Veer, L V" uniqKey="Veer L">L. V. Veer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, W E" uniqKey="Johnson W">W. E. Johnson</name>
</author>
<author>
<name sortKey="Li, C" uniqKey="Li C">C. Li</name>
</author>
<author>
<name sortKey="Rabinovic, A" uniqKey="Rabinovic A">A. Rabinovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kang, H M" uniqKey="Kang H">H. M. Kang</name>
</author>
<author>
<name sortKey="Ye, C" uniqKey="Ye C">C. Ye</name>
</author>
<author>
<name sortKey="Eskin, E" uniqKey="Eskin E">E. Eskin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leek, J T" uniqKey="Leek J">J. T. Leek</name>
</author>
<author>
<name sortKey="Storey, J D" uniqKey="Storey J">J. D. Storey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leek, J T" uniqKey="Leek J">J. T. Leek</name>
</author>
<author>
<name sortKey="Storey, J D" uniqKey="Storey J">J. D. Storey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Listgarten, J" uniqKey="Listgarten J">J. Listgarten</name>
</author>
<author>
<name sortKey="Kadie, C" uniqKey="Kadie C">C. Kadie</name>
</author>
<author>
<name sortKey="Schadt, E E" uniqKey="Schadt E">E. E. Schadt</name>
</author>
<author>
<name sortKey="Heckerman, D" uniqKey="Heckerman D">D. Heckerman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mairal, J" uniqKey="Mairal J">J. Mairal</name>
</author>
<author>
<name sortKey="Bach, F" uniqKey="Bach F">F. Bach</name>
</author>
<author>
<name sortKey="Ponce, J" uniqKey="Ponce J">J. Ponce</name>
</author>
<author>
<name sortKey="Sapiro, G" uniqKey="Sapiro G">G. Sapiro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Risso, D" uniqKey="Risso D">D. Risso</name>
</author>
<author>
<name sortKey="Ngai, J" uniqKey="Ngai J">J. Ngai</name>
</author>
<author>
<name sortKey="Speed, T P" uniqKey="Speed T">T. P Speed</name>
</author>
<author>
<name sortKey="Dudoit, S" uniqKey="Dudoit S">S. Dudoit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vawter, M P And Others" uniqKey="Vawter M">M. P. and others Vawter</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Biostatistics</journal-id>
<journal-id journal-id-type="iso-abbrev">Biostatistics</journal-id>
<journal-id journal-id-type="publisher-id">biosts</journal-id>
<journal-id journal-id-type="hwp">biosts</journal-id>
<journal-title-group>
<journal-title>Biostatistics (Oxford, England)</journal-title>
</journal-title-group>
<issn pub-type="ppub">1465-4644</issn>
<issn pub-type="epub">1468-4357</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26286812</article-id>
<article-id pub-id-type="pmc">4679071</article-id>
<article-id pub-id-type="doi">10.1093/biostatistics/kxv026</article-id>
<article-id pub-id-type="publisher-id">kxv026</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Jacob</surname>
<given-names>Laurent</given-names>
</name>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<aff>
<addr-line>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon, France</addr-line>
</aff>
</contrib-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Gagnon-Bartsch</surname>
<given-names>Johann A.</given-names>
</name>
</contrib>
<aff>
<addr-line>Department of Statistics, University of California, Berkeley, CA 974720, USA</addr-line>
</aff>
</contrib-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Speed</surname>
<given-names>Terence P.</given-names>
</name>
</contrib>
<aff>
<addr-line>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052, Australia</addr-line>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor1">
<label>*</label>
To whom correspondence should be addressed.
<email>laurent.jacob@univ-lyon1.fr</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>1</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub">
<day>17</day>
<month>8</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>17</day>
<month>8</month>
<year>2015</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>17</volume>
<issue>1</issue>
<fpage>16</fpage>
<lpage>28</lpage>
<history>
<date date-type="received">
<day>26</day>
<month>11</month>
<year>2014</year>
</date>
<date date-type="rev-recd">
<day>18</day>
<month>6</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>6</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2015. Published by Oxford University Press.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="kxv026.pdf"></self-uri>
<abstract>
<p>When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset—as opposed to the study of an observed factor of interest—taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package
<monospace>RUVnormalize</monospace>
.</p>
</abstract>
<kwd-group>
<kwd>Batch effect</kwd>
<kwd>Control genes</kwd>
<kwd>Gene expression</kwd>
<kwd>Normalization</kwd>
<kwd>Replicate samples</kwd>
</kwd-group>
<funding-group>
<award-group id="funding-1">
<award-id>SU2C-AACR-DT0409</award-id>
</award-group>
<award-group id="funding-2">
<funding-source>Australian National Health and Medical Research Council Program</funding-source>
<award-id>APP1054618</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
</pmc>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001092 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 001092 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:4679071
   |texte=   Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:26286812" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024