Serveur d'exploration sur la télématique

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A genomic distance for assembly comparison based on compressed maximal exact matches.

Identifieur interne : 000245 ( PubMed/Corpus ); précédent : 000244; suivant : 000246

A genomic distance for assembly comparison based on compressed maximal exact matches.

Auteurs : S P Garcia ; J M O S. Rodrigues ; S. Santos ; D. Pratas ; V. Afreixo ; C A C. Bastos ; P J S G. Ferreira ; A J Pinho

Source :

RBID : pubmed:25594089

English descriptors

Abstract

Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.

PubMed: 25594089

Links to Exploration step

pubmed:25594089

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A genomic distance for assembly comparison based on compressed maximal exact matches.</title>
<author>
<name sortKey="Garcia, S P" sort="Garcia, S P" uniqKey="Garcia S" first="S P" last="Garcia">S P Garcia</name>
<affiliation>
<nlm:affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rodrigues, J M O S" sort="Rodrigues, J M O S" uniqKey="Rodrigues J" first="J M O S" last="Rodrigues">J M O S. Rodrigues</name>
</author>
<author>
<name sortKey="Santos, S" sort="Santos, S" uniqKey="Santos S" first="S" last="Santos">S. Santos</name>
</author>
<author>
<name sortKey="Pratas, D" sort="Pratas, D" uniqKey="Pratas D" first="D" last="Pratas">D. Pratas</name>
</author>
<author>
<name sortKey="Afreixo, V" sort="Afreixo, V" uniqKey="Afreixo V" first="V" last="Afreixo">V. Afreixo</name>
</author>
<author>
<name sortKey="Bastos, C A C" sort="Bastos, C A C" uniqKey="Bastos C" first="C A C" last="Bastos">C A C. Bastos</name>
</author>
<author>
<name sortKey="Ferreira, P J S G" sort="Ferreira, P J S G" uniqKey="Ferreira P" first="P J S G" last="Ferreira">P J S G. Ferreira</name>
</author>
<author>
<name sortKey="Pinho, A J" sort="Pinho, A J" uniqKey="Pinho A" first="A J" last="Pinho">A J Pinho</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="????">
<PubDate>
<MedlineDate>2013 May-Jun</MedlineDate>
</PubDate>
</date>
<idno type="RBID">pubmed:25594089</idno>
<idno type="pmid">25594089</idno>
<idno type="wicri:Area/PubMed/Corpus">000245</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000245</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">A genomic distance for assembly comparison based on compressed maximal exact matches.</title>
<author>
<name sortKey="Garcia, S P" sort="Garcia, S P" uniqKey="Garcia S" first="S P" last="Garcia">S P Garcia</name>
<affiliation>
<nlm:affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</nlm:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Rodrigues, J M O S" sort="Rodrigues, J M O S" uniqKey="Rodrigues J" first="J M O S" last="Rodrigues">J M O S. Rodrigues</name>
</author>
<author>
<name sortKey="Santos, S" sort="Santos, S" uniqKey="Santos S" first="S" last="Santos">S. Santos</name>
</author>
<author>
<name sortKey="Pratas, D" sort="Pratas, D" uniqKey="Pratas D" first="D" last="Pratas">D. Pratas</name>
</author>
<author>
<name sortKey="Afreixo, V" sort="Afreixo, V" uniqKey="Afreixo V" first="V" last="Afreixo">V. Afreixo</name>
</author>
<author>
<name sortKey="Bastos, C A C" sort="Bastos, C A C" uniqKey="Bastos C" first="C A C" last="Bastos">C A C. Bastos</name>
</author>
<author>
<name sortKey="Ferreira, P J S G" sort="Ferreira, P J S G" uniqKey="Ferreira P" first="P J S G" last="Ferreira">P J S G. Ferreira</name>
</author>
<author>
<name sortKey="Pinho, A J" sort="Pinho, A J" uniqKey="Pinho A" first="A J" last="Pinho">A J Pinho</name>
</author>
</analytic>
<series>
<title level="j">IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM</title>
<idno type="eISSN">1557-9964</idno>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genome, Human (genetics)</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Genome, Human</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Humans</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.</div>
</front>
</TEI>
<pubmed>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">25594089</PMID>
<DateCreated>
<Year>2015</Year>
<Month>01</Month>
<Day>15</Day>
</DateCreated>
<DateCompleted>
<Year>2015</Year>
<Month>07</Month>
<Day>27</Day>
</DateCompleted>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>10</Volume>
<Issue>3</Issue>
<PubDate>
<MedlineDate>2013 May-Jun</MedlineDate>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>A genomic distance for assembly comparison based on compressed maximal exact matches.</ArticleTitle>
<Pagination>
<MedlinePgn>793-8</MedlinePgn>
</Pagination>
<Abstract>
<AbstractText>Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Garcia</LastName>
<ForeName>S P</ForeName>
<Initials>SP</Initials>
<AffiliationInfo>
<Affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Rodrigues</LastName>
<ForeName>J M O S</ForeName>
<Initials>JM</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Santos</LastName>
<ForeName>S</ForeName>
<Initials>S</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Pratas</LastName>
<ForeName>D</ForeName>
<Initials>D</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Afreixo</LastName>
<ForeName>V</ForeName>
<Initials>V</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Bastos</LastName>
<ForeName>C A C</ForeName>
<Initials>CA</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Ferreira</LastName>
<ForeName>P J S G</ForeName>
<Initials>PJ</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Pinho</LastName>
<ForeName>A J</ForeName>
<Initials>AJ</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D000465">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D015894">Genome, Human</DescriptorName>
<QualifierName MajorTopicYN="N" UI="Q000235">genetics</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D023281">Genomics</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D006801">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N" UI="D017422">Sequence Analysis, DNA</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="entrez">
<Year>2015</Year>
<Month>1</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2013</Year>
<Month>5</Month>
<Day>1</Day>
<Hour>0</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2015</Year>
<Month>7</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">25594089</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/PubMed/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000245 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd -nk 000245 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    TelematiV1
   |flux=    PubMed
   |étape=   Corpus
   |type=    RBID
   |clé=     pubmed:25594089
   |texte=   A genomic distance for assembly comparison based on compressed maximal exact matches.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Corpus/RBID.i   -Sk "pubmed:25594089" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a TelematiV1 

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Thu Nov 2 16:09:04 2017. Site generation: Sun Mar 10 16:42:28 2024