Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.

Identifieur interne : 001067 ( PubMed/Curation ); précédent : 001066; suivant : 001068

GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.

Auteurs : Daniel L. Greenfield [Royaume-Uni] ; Oliver Stegle [Royaume-Uni] ; Alban Rrustemi [Royaume-Uni]

Source :

RBID : pubmed:27354700

Descripteurs français

English descriptors

Abstract

The exponential reduction in cost of genome sequencing has resulted in a rapid growth of genomic data. Most of the entropy of short read data lies not in the sequence of read bases themselves but in their Quality Scores-the confidence measurement that each base has been sequenced correctly. Lossless compression methods are now close to their theoretical limits and hence there is a need for lossy methods that further reduce the complexity of these data without impacting downstream analyses.

DOI: 10.1093/bioinformatics/btw385
PubMed: 27354700

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:27354700

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.</title>
<author>
<name sortKey="Greenfield, Daniel L" sort="Greenfield, Daniel L" uniqKey="Greenfield D" first="Daniel L" last="Greenfield">Daniel L. Greenfield</name>
<affiliation wicri:level="1">
<nlm:affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Stegle, Oliver" sort="Stegle, Oliver" uniqKey="Stegle O" first="Oliver" last="Stegle">Oliver Stegle</name>
<affiliation wicri:level="1">
<nlm:affiliation>European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SQ, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SQ</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rrustemi, Alban" sort="Rrustemi, Alban" uniqKey="Rrustemi A" first="Alban" last="Rrustemi">Alban Rrustemi</name>
<affiliation wicri:level="1">
<nlm:affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:27354700</idno>
<idno type="pmid">27354700</idno>
<idno type="doi">10.1093/bioinformatics/btw385</idno>
<idno type="wicri:Area/PubMed/Corpus">001067</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001067</idno>
<idno type="wicri:Area/PubMed/Curation">001067</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001067</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.</title>
<author>
<name sortKey="Greenfield, Daniel L" sort="Greenfield, Daniel L" uniqKey="Greenfield D" first="Daniel L" last="Greenfield">Daniel L. Greenfield</name>
<affiliation wicri:level="1">
<nlm:affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Stegle, Oliver" sort="Stegle, Oliver" uniqKey="Stegle O" first="Oliver" last="Stegle">Oliver Stegle</name>
<affiliation wicri:level="1">
<nlm:affiliation>European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SQ, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SQ</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rrustemi, Alban" sort="Rrustemi, Alban" uniqKey="Rrustemi A" first="Alban" last="Rrustemi">Alban Rrustemi</name>
<affiliation wicri:level="1">
<nlm:affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</nlm:affiliation>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics (Oxford, England)</title>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Bayes Theorem</term>
<term>Data Compression (methods)</term>
<term>Genotype</term>
<term>High-Throughput Nucleotide Sequencing</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Compression de données ()</term>
<term>Génotype</term>
<term>Séquençage nucléotidique à haut débit</term>
<term>Théorème de Bayes</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Data Compression</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Bayes Theorem</term>
<term>Genotype</term>
<term>High-Throughput Nucleotide Sequencing</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Compression de données</term>
<term>Génotype</term>
<term>Séquençage nucléotidique à haut débit</term>
<term>Théorème de Bayes</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The exponential reduction in cost of genome sequencing has resulted in a rapid growth of genomic data. Most of the entropy of short read data lies not in the sequence of read bases themselves but in their Quality Scores-the confidence measurement that each base has been sequenced correctly. Lossless compression methods are now close to their theoretical limits and hence there is a need for lossy methods that further reduce the complexity of these data without impacting downstream analyses.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" IndexingMethod="Curated" Owner="NLM">
<PMID Version="1">27354700</PMID>
<DateCompleted>
<Year>2017</Year>
<Month>08</Month>
<Day>15</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>12</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1367-4811</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>32</Volume>
<Issue>20</Issue>
<PubDate>
<Year>2016</Year>
<Month>10</Month>
<Day>15</Day>
</PubDate>
</JournalIssue>
<Title>Bioinformatics (Oxford, England)</Title>
<ISOAbbreviation>Bioinformatics</ISOAbbreviation>
</Journal>
<ArticleTitle>GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.</ArticleTitle>
<Pagination>
<MedlinePgn>3124-3132</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText Label="MOTIVATION">The exponential reduction in cost of genome sequencing has resulted in a rapid growth of genomic data. Most of the entropy of short read data lies not in the sequence of read bases themselves but in their Quality Scores-the confidence measurement that each base has been sequenced correctly. Lossless compression methods are now close to their theoretical limits and hence there is a need for lossy methods that further reduce the complexity of these data without impacting downstream analyses.</AbstractText>
<AbstractText Label="RESULTS">We here propose GeneCodeq, a Bayesian method inspired by coding theory for adjusting quality scores to improve the compressibility of quality scores without adversely impacting genotyping accuracy. Our model leverages a corpus of k-mers to reduce the entropy of the quality scores and thereby the compressibility of these data (in FASTQ or SAM/BAM/CRAM files), resulting in compression ratios that significantly exceeds those of other methods. Our approach can also be combined with existing lossy compression schemes to further reduce entropy and allows the user to specify a reference panel of expected sequence variations to improve the model accuracy. In addition to extensive empirical evaluation, we also derive novel theoretical insights that explain the empirical performance and pitfalls of corpus-based quality score compression schemes in general. Finally, we show that as a positive side effect of compression, the model can lead to improved genotyping accuracy.</AbstractText>
<AbstractText Label="AVAILABILITY AND IMPLEMENTATION">GeneCodeq is available at: github.com/genecodeq/eval CONTACT: dan@petagene.comSupplementary information: Supplementary data are available at Bioinformatics online.</AbstractText>
<CopyrightInformation>© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Greenfield</LastName>
<ForeName>Daniel L</ForeName>
<Initials>DL</Initials>
<AffiliationInfo>
<Affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Stegle</LastName>
<ForeName>Oliver</ForeName>
<Initials>O</Initials>
<AffiliationInfo>
<Affiliation>European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SQ, UK.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Rrustemi</LastName>
<ForeName>Alban</ForeName>
<Initials>A</Initials>
<AffiliationInfo>
<Affiliation>PetaGene, Ideaspace, 3 Charles Babbage Rd, Cambridge CB3 0GT, UK.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>06</Month>
<Day>26</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>England</Country>
<MedlineTA>Bioinformatics</MedlineTA>
<NlmUniqueID>9808944</NlmUniqueID>
<ISSNLinking>1367-4803</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D001499" MajorTopicYN="N">Bayes Theorem</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D044962" MajorTopicYN="N">Data Compression</DescriptorName>
<QualifierName UI="Q000379" MajorTopicYN="Y">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D005838" MajorTopicYN="N">Genotype</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D059014" MajorTopicYN="Y">High-Throughput Nucleotide Sequencing</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2015</Year>
<Month>10</Month>
<Day>08</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>06</Month>
<Day>15</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>6</Month>
<Day>30</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2017</Year>
<Month>8</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>6</Month>
<Day>30</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27354700</ArticleId>
<ArticleId IdType="pii">btw385</ArticleId>
<ArticleId IdType="doi">10.1093/bioinformatics/btw385</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001067 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001067 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:27354700
   |texte=   GeneCodeq: quality score compression and improved genotyping using a Bayesian framework.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:27354700" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021