Serveur d'exploration sur les relations entre la France et l'Australie

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.

Identifieur interne : 002122 ( PubMed/Checkpoint ); précédent : 002121; suivant : 002123

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.

Auteurs : Laurent Jacob [France] ; Johann A. Gagnon-Bartsch [États-Unis] ; Terence P. Speed [Australie]

Source :

RBID : pubmed:26286812

Descripteurs français

English descriptors

Abstract

When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.

DOI: 10.1093/biostatistics/kxv026
PubMed: 26286812


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:26286812

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.</title>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
<affiliation wicri:level="1">
<nlm:affiliation>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon, France laurent.jacob@univ-lyon1.fr.</nlm:affiliation>
<country wicri:rule="url">France</country>
<wicri:regionArea>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon</wicri:regionArea>
<wicri:noRegion>5558 Lyon</wicri:noRegion>
<placeName>
<settlement type="city">Lyon</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Gagnon Bartsch, Johann A" sort="Gagnon Bartsch, Johann A" uniqKey="Gagnon Bartsch J" first="Johann A" last="Gagnon-Bartsch">Johann A. Gagnon-Bartsch</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Statistics, University of California, Berkeley, CA 974720</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Speed, Terence P" sort="Speed, Terence P" uniqKey="Speed T" first="Terence P" last="Speed">Terence P. Speed</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052</wicri:regionArea>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:26286812</idno>
<idno type="pmid">26286812</idno>
<idno type="doi">10.1093/biostatistics/kxv026</idno>
<idno type="wicri:Area/PubMed/Corpus">002560</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">002560</idno>
<idno type="wicri:Area/PubMed/Curation">002492</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">002492</idno>
<idno type="wicri:Area/PubMed/Checkpoint">002492</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">002492</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.</title>
<author>
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
<affiliation wicri:level="1">
<nlm:affiliation>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon, France laurent.jacob@univ-lyon1.fr.</nlm:affiliation>
<country wicri:rule="url">France</country>
<wicri:regionArea>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon</wicri:regionArea>
<wicri:noRegion>5558 Lyon</wicri:noRegion>
<placeName>
<settlement type="city">Lyon</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Gagnon Bartsch, Johann A" sort="Gagnon Bartsch, Johann A" uniqKey="Gagnon Bartsch J" first="Johann A" last="Gagnon-Bartsch">Johann A. Gagnon-Bartsch</name>
<affiliation wicri:level="2">
<nlm:affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Statistics, University of California, Berkeley, CA 974720</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Speed, Terence P" sort="Speed, Terence P" uniqKey="Speed T" first="Terence P" last="Speed">Terence P. Speed</name>
<affiliation wicri:level="1">
<nlm:affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052, Australia.</nlm:affiliation>
<country xml:lang="fr">Australie</country>
<wicri:regionArea>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052</wicri:regionArea>
<placeName>
<settlement type="city">Melbourne</settlement>
<region type="état">Victoria (État)</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Biostatistics (Oxford, England)</title>
<idno type="eISSN">1468-4357</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Data Interpretation, Statistical</term>
<term>Gene Expression (genetics)</term>
<term>Genetic Variation (genetics)</term>
<term>Humans</term>
<term>Microarray Analysis (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Analyse sur microréseau ()</term>
<term>Expression des gènes (génétique)</term>
<term>Humains</term>
<term>Interprétation statistique de données</term>
<term>Variation génétique (génétique)</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Gene Expression</term>
<term>Genetic Variation</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Expression des gènes</term>
<term>Variation génétique</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Microarray Analysis</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Data Interpretation, Statistical</term>
<term>Humans</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Analyse sur microréseau</term>
<term>Humains</term>
<term>Interprétation statistique de données</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">26286812</PMID>
<DateCreated>
<Year>2015</Year>
<Month>12</Month>
<Day>16</Day>
</DateCreated>
<DateCompleted>
<Year>2016</Year>
<Month>09</Month>
<Day>27</Day>
</DateCompleted>
<DateRevised>
<Year>2015</Year>
<Month>12</Month>
<Day>16</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1468-4357</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>17</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2016</Year>
<Month>Jan</Month>
</PubDate>
</JournalIssue>
<Title>Biostatistics (Oxford, England)</Title>
<ISOAbbreviation>Biostatistics</ISOAbbreviation>
</Journal>
<ArticleTitle>Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.</ArticleTitle>
<Pagination>
<MedlinePgn>16-28</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1093/biostatistics/kxv026</ELocationID>
<Abstract>
<AbstractText>When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset--as opposed to the study of an observed factor of interest--taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.</AbstractText>
<CopyrightInformation>© The Author 2015. Published by Oxford University Press.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Jacob</LastName>
<ForeName>Laurent</ForeName>
<Initials>L</Initials>
<AffiliationInfo>
<Affiliation>Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR, 5558 Lyon, France laurent.jacob@univ-lyon1.fr.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Gagnon-Bartsch</LastName>
<ForeName>Johann A</ForeName>
<Initials>JA</Initials>
<AffiliationInfo>
<Affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Speed</LastName>
<ForeName>Terence P</ForeName>
<Initials>TP</Initials>
<AffiliationInfo>
<Affiliation>Department of Statistics, University of California, Berkeley, CA 974720, USA and Division of Bioinformatics, Walter and Eliza Hall Institute of Medical Research, Melbourne 3052, Australia.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2015</Year>
<Month>08</Month>
<Day>17</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Biostatistics</MedlineTA>
<NlmUniqueID>100897327</NlmUniqueID>
<ISSNLinking>1465-4644</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70</RefSource>
<PMID Version="1">20810919</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Biostatistics. 2007 Jan;8(1):118-27</RefSource>
<PMID Version="1">16632515</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nat Biotechnol. 2014 Sep;32(9):896-902</RefSource>
<PMID Version="1">25150836</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Anal Chem. 2015 Apr 7;87(7):3606-15</RefSource>
<PMID Version="1">25692814</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6</RefSource>
<PMID Version="1">10963673</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2003 Jan 22;19(2):185-93</RefSource>
<PMID Version="1">12538238</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Genetics. 2008 Dec;180(4):1909-25</RefSource>
<PMID Version="1">18791227</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Neuropsychopharmacology. 2004 Feb;29(2):373-84</RefSource>
<PMID Version="1">14583743</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Biostatistics. 2012 Jul;13(3):539-52</RefSource>
<PMID Version="1">22101192</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>PLoS Genet. 2007 Sep;3(9):1724-35</RefSource>
<PMID Version="1">17907809</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Nature. 2008 Oct 23;455(7216):1061-8</RefSource>
<PMID Version="1">18772890</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Proc Natl Acad Sci U S A. 2008 Dec 2;105(48):18718-23</RefSource>
<PMID Version="1">19033188</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites">
<RefSource>Bioinformatics. 2004 Jan 1;20(1):105-14</RefSource>
<PMID Version="1">14693816</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D003627" MajorTopicYN="Y">Data Interpretation, Statistical</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015870" MajorTopicYN="N">Gene Expression</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D014644" MajorTopicYN="N">Genetic Variation</DescriptorName>
<QualifierName UI="Q000235" MajorTopicYN="Y">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D046228" MajorTopicYN="N">Microarray Analysis</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">Batch effect</Keyword>
<Keyword MajorTopicYN="N">Control genes</Keyword>
<Keyword MajorTopicYN="N">Gene expression</Keyword>
<Keyword MajorTopicYN="N">Normalization</Keyword>
<Keyword MajorTopicYN="N">Replicate samples</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2014</Year>
<Month>11</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2015</Year>
<Month>06</Month>
<Day>25</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>8</Month>
<Day>20</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2015</Year>
<Month>8</Month>
<Day>20</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>9</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">26286812</ArticleId>
<ArticleId IdType="pii">kxv026</ArticleId>
<ArticleId IdType="doi">10.1093/biostatistics/kxv026</ArticleId>
<ArticleId IdType="pmc">PMC4679071</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations>
<list>
<country>
<li>Australie</li>
<li>France</li>
<li>États-Unis</li>
</country>
<region>
<li>Auvergne-Rhône-Alpes</li>
<li>Californie</li>
<li>Rhône-Alpes</li>
<li>Victoria (État)</li>
</region>
<settlement>
<li>Lyon</li>
<li>Melbourne</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Auvergne-Rhône-Alpes">
<name sortKey="Jacob, Laurent" sort="Jacob, Laurent" uniqKey="Jacob L" first="Laurent" last="Jacob">Laurent Jacob</name>
</region>
</country>
<country name="États-Unis">
<region name="Californie">
<name sortKey="Gagnon Bartsch, Johann A" sort="Gagnon Bartsch, Johann A" uniqKey="Gagnon Bartsch J" first="Johann A" last="Gagnon-Bartsch">Johann A. Gagnon-Bartsch</name>
</region>
</country>
<country name="Australie">
<region name="Victoria (État)">
<name sortKey="Speed, Terence P" sort="Speed, Terence P" uniqKey="Speed T" first="Terence P" last="Speed">Terence P. Speed</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/PubMed/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002122 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 002122 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Asie
   |area=    AustralieFrV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:26286812
   |texte=   Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:26286812" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a AustralieFrV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Dec 5 10:43:12 2017. Site generation: Tue Mar 5 14:07:20 2024