Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.

Identifieur interne : 000E54 ( Ncbi/Checkpoint ); précédent : 000E53; suivant : 000E55

Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.

Auteurs : Loni Philip Tabb [États-Unis] ; Wei Zhao ; Jingyu Huang ; Gail L. Rosen

Source :

RBID : pubmed:25075627

Descripteurs français

English descriptors

Abstract

Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.

DOI: 10.1089/cmb.2014.0108
PubMed: 25075627


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:25075627

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.</title>
<author>
<name sortKey="Tabb, Loni Philip" sort="Tabb, Loni Philip" uniqKey="Tabb L" first="Loni Philip" last="Tabb">Loni Philip Tabb</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Wei" sort="Zhao, Wei" uniqKey="Zhao W" first="Wei" last="Zhao">Wei Zhao</name>
</author>
<author>
<name sortKey="Huang, Jingyu" sort="Huang, Jingyu" uniqKey="Huang J" first="Jingyu" last="Huang">Jingyu Huang</name>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25075627</idno>
<idno type="pmid">25075627</idno>
<idno type="doi">10.1089/cmb.2014.0108</idno>
<idno type="wicri:Area/PubMed/Corpus">001894</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001894</idno>
<idno type="wicri:Area/PubMed/Curation">001894</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001894</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001972</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001972</idno>
<idno type="wicri:Area/Ncbi/Merge">000E54</idno>
<idno type="wicri:Area/Ncbi/Curation">000E54</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000E54</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.</title>
<author>
<name sortKey="Tabb, Loni Philip" sort="Tabb, Loni Philip" uniqKey="Tabb L" first="Loni Philip" last="Tabb">Loni Philip Tabb</name>
<affiliation wicri:level="2">
<nlm:affiliation>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Wei" sort="Zhao, Wei" uniqKey="Zhao W" first="Wei" last="Zhao">Wei Zhao</name>
</author>
<author>
<name sortKey="Huang, Jingyu" sort="Huang, Jingyu" uniqKey="Huang J" first="Jingyu" last="Huang">Jingyu Huang</name>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Binomial Distribution</term>
<term>Genome</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Poisson Distribution</term>
<term>Prokaryotic Cells</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Cellules procaryotes</term>
<term>Génome</term>
<term>Loi binomiale</term>
<term>Loi de Poisson</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Binomial Distribution</term>
<term>Genome</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Poisson Distribution</term>
<term>Prokaryotic Cells</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Cellules procaryotes</term>
<term>Génome</term>
<term>Loi binomiale</term>
<term>Loi de Poisson</term>
<term>Modèles génétiques</term>
<term>Modèles statistiques</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
</region>
</list>
<tree>
<noCountry>
<name sortKey="Huang, Jingyu" sort="Huang, Jingyu" uniqKey="Huang J" first="Jingyu" last="Huang">Jingyu Huang</name>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
<name sortKey="Zhao, Wei" sort="Zhao, Wei" uniqKey="Zhao W" first="Wei" last="Zhao">Wei Zhao</name>
</noCountry>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Tabb, Loni Philip" sort="Tabb, Loni Philip" uniqKey="Tabb L" first="Loni Philip" last="Tabb">Loni Philip Tabb</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000E54 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd -nk 000E54 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:25075627
   |texte=   Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Checkpoint/RBID.i   -Sk "pubmed:25075627" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021