Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.

Identifieur interne : 001894 ( PubMed/Corpus ); précédent : 001893; suivant : 001895

Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.

Auteurs : Loni Philip Tabb ; Wei Zhao ; Jingyu Huang ; Gail L. Rosen

Source :

RBID : pubmed:25075627

English descriptors

Abstract

Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.

DOI: 10.1089/cmb.2014.0108
PubMed: 25075627

Links to Exploration step

pubmed:25075627

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.</title>
<author>
<name sortKey="Tabb, Loni Philip" sort="Tabb, Loni Philip" uniqKey="Tabb L" first="Loni Philip" last="Tabb">Loni Philip Tabb</name>
<affiliation>
<nlm:affiliation>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Wei" sort="Zhao, Wei" uniqKey="Zhao W" first="Wei" last="Zhao">Wei Zhao</name>
</author>
<author>
<name sortKey="Huang, Jingyu" sort="Huang, Jingyu" uniqKey="Huang J" first="Jingyu" last="Huang">Jingyu Huang</name>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25075627</idno>
<idno type="pmid">25075627</idno>
<idno type="doi">10.1089/cmb.2014.0108</idno>
<idno type="wicri:Area/PubMed/Corpus">001894</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001894</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.</title>
<author>
<name sortKey="Tabb, Loni Philip" sort="Tabb, Loni Philip" uniqKey="Tabb L" first="Loni Philip" last="Tabb">Loni Philip Tabb</name>
<affiliation>
<nlm:affiliation>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Wei" sort="Zhao, Wei" uniqKey="Zhao W" first="Wei" last="Zhao">Wei Zhao</name>
</author>
<author>
<name sortKey="Huang, Jingyu" sort="Huang, Jingyu" uniqKey="Huang J" first="Jingyu" last="Huang">Jingyu Huang</name>
</author>
<author>
<name sortKey="Rosen, Gail L" sort="Rosen, Gail L" uniqKey="Rosen G" first="Gail L" last="Rosen">Gail L. Rosen</name>
</author>
</analytic>
<series>
<title level="j">Journal of computational biology : a journal of computational molecular cell biology</title>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Binomial Distribution</term>
<term>Genome</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Poisson Distribution</term>
<term>Prokaryotic Cells</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Binomial Distribution</term>
<term>Genome</term>
<term>Models, Genetic</term>
<term>Models, Statistical</term>
<term>Poisson Distribution</term>
<term>Prokaryotic Cells</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">25075627</PMID>
<DateCompleted>
<Year>2015</Year>
<Month>06</Month>
<Day>23</Day>
</DateCompleted>
<DateRevised>
<Year>2014</Year>
<Month>09</Month>
<Day>24</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1557-8666</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>21</Volume>
<Issue>10</Issue>
<PubDate>
<Year>2014</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Journal of computational biology : a journal of computational molecular cell biology</Title>
<ISOAbbreviation>J. Comput. Biol.</ISOAbbreviation>
</Journal>
<ArticleTitle>Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.</ArticleTitle>
<Pagination>
<MedlinePgn>732-40</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1089/cmb.2014.0108</ELocationID>
<Abstract>
<AbstractText>Characterizing the empirical distribution of the frequency of n-mers is a vital step in understanding the entire genome. This will allow for researchers to examine how complex the genome really is, and move beyond simple, traditional modeling frameworks that are often biased in the presence of abundant and/or extremely rare words. We hypothesize that models based on the negative binomial distribution and its zero-inflated counterpart will characterize the n-mer distributions of genomes better than the Poisson. Our study examined the empirical distribution of the frequency of n-mers (6 ≤ n ≤ 11) in 2,199 genomes. We considered four distributions: Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial (ZINB). The number of genomes that have nullomers in 6-, 7-, and 8-mers was 150, 602 and 2,012, respectively, whereas all of the genomes for the 9-, 10-, and 11-mers had nullomers. In each n-mer considered, the negative binomial model performed the best for at least 93% of the 2,199 genomes; however, a small percentage (i.e., <7%) of the genomes did prefer the ZINB. The negative binomial and zero-inflation distributions extend the traditional Poisson setting and are more flexible in handling overdispersion that can be caused by an increase in nullomers. In an effort to characterize the distribution of the frequency of n-mers, researchers should also consider other discrete distributions that are more flexible and adjust for possible overdispersion.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Tabb</LastName>
<ForeName>Loni Philip</ForeName>
<Initials>LP</Initials>
<AffiliationInfo>
<Affiliation>1 Department of Epidemiology & Biostatistics, Drexel University , Philadelphia, Pennsylvania.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Zhao</LastName>
<ForeName>Wei</ForeName>
<Initials>W</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Huang</LastName>
<ForeName>Jingyu</ForeName>
<Initials>J</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Rosen</LastName>
<ForeName>Gail L</ForeName>
<Initials>GL</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013486">Research Support, U.S. Gov't, Non-P.H.S.</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2014</Year>
<Month>07</Month>
<Day>30</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Comput Biol</MedlineTA>
<NlmUniqueID>9433358</NlmUniqueID>
<ISSNLinking>1066-5277</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D016010" MajorTopicYN="Y">Binomial Distribution</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016678" MajorTopicYN="Y">Genome</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008957" MajorTopicYN="N">Models, Genetic</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D015233" MajorTopicYN="N">Models, Statistical</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D016012" MajorTopicYN="N">Poisson Distribution</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D011387" MajorTopicYN="Y">Prokaryotic Cells</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">DNA</Keyword>
<Keyword MajorTopicYN="N">genome analysis</Keyword>
<Keyword MajorTopicYN="N">probability</Keyword>
<Keyword MajorTopicYN="N">statistics</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2014</Year>
<Month>7</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2014</Year>
<Month>7</Month>
<Day>31</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2015</Year>
<Month>6</Month>
<Day>24</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">25075627</ArticleId>
<ArticleId IdType="doi">10.1089/cmb.2014.0108</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001894 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 001894 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:25075627
   |texte=   Characterizing the empirical distribution of prokaryotic genome n-mers in the presence of nullomers.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:25075627" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021