A genomic distance for assembly comparison based on compressed maximal exact matches.
Identifieur interne : 000B11 ( Ncbi/Merge ); précédent : 000B10; suivant : 000B12A genomic distance for assembly comparison based on compressed maximal exact matches.
Auteurs : S P Garcia [Portugal] ; J M O S. Rodrigues ; S. Santos ; D. Pratas ; V. Afreixo ; C A C. Bastos ; P J S G. Ferreira ; A J PinhoSource :
- IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM [ 1557-9964 ]
Descripteurs français
- KwdFr :
- MESH :
- génétique : Génome humain.
- Algorithmes, Analyse de séquence d'ADN, Génomique, Humains.
English descriptors
- KwdEn :
- MESH :
- genetics : Genome, Human.
- methods : Genomics, Sequence Analysis, DNA.
- Algorithms, Humans.
Abstract
Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.
PubMed: 25594089
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 000245
- to stream PubMed, to step Curation: 000245
- to stream PubMed, to step Checkpoint: 000245
Links to Exploration step
pubmed:25594089Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A genomic distance for assembly comparison based on compressed maximal exact matches.</title>
<author><name sortKey="Garcia, S P" sort="Garcia, S P" uniqKey="Garcia S" first="S P" last="Garcia">S P Garcia</name>
<affiliation wicri:level="1"><nlm:affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</nlm:affiliation>
<country xml:lang="fr">Portugal</country>
<wicri:regionArea>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro</wicri:regionArea>
<wicri:noRegion>3810-193 Aveiro</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Rodrigues, J M O S" sort="Rodrigues, J M O S" uniqKey="Rodrigues J" first="J M O S" last="Rodrigues">J M O S. Rodrigues</name>
</author>
<author><name sortKey="Santos, S" sort="Santos, S" uniqKey="Santos S" first="S" last="Santos">S. Santos</name>
</author>
<author><name sortKey="Pratas, D" sort="Pratas, D" uniqKey="Pratas D" first="D" last="Pratas">D. Pratas</name>
</author>
<author><name sortKey="Afreixo, V" sort="Afreixo, V" uniqKey="Afreixo V" first="V" last="Afreixo">V. Afreixo</name>
</author>
<author><name sortKey="Bastos, C A C" sort="Bastos, C A C" uniqKey="Bastos C" first="C A C" last="Bastos">C A C. Bastos</name>
</author>
<author><name sortKey="Ferreira, P J S G" sort="Ferreira, P J S G" uniqKey="Ferreira P" first="P J S G" last="Ferreira">P J S G. Ferreira</name>
</author>
<author><name sortKey="Pinho, A J" sort="Pinho, A J" uniqKey="Pinho A" first="A J" last="Pinho">A J Pinho</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="????"><PubDate><MedlineDate>2013 May-Jun</MedlineDate>
</PubDate>
</date>
<idno type="RBID">pubmed:25594089</idno>
<idno type="pmid">25594089</idno>
<idno type="wicri:Area/PubMed/Corpus">000245</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000245</idno>
<idno type="wicri:Area/PubMed/Curation">000245</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000245</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000245</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000245</idno>
<idno type="wicri:Area/Ncbi/Merge">000B11</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">A genomic distance for assembly comparison based on compressed maximal exact matches.</title>
<author><name sortKey="Garcia, S P" sort="Garcia, S P" uniqKey="Garcia S" first="S P" last="Garcia">S P Garcia</name>
<affiliation wicri:level="1"><nlm:affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</nlm:affiliation>
<country xml:lang="fr">Portugal</country>
<wicri:regionArea>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro</wicri:regionArea>
<wicri:noRegion>3810-193 Aveiro</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Rodrigues, J M O S" sort="Rodrigues, J M O S" uniqKey="Rodrigues J" first="J M O S" last="Rodrigues">J M O S. Rodrigues</name>
</author>
<author><name sortKey="Santos, S" sort="Santos, S" uniqKey="Santos S" first="S" last="Santos">S. Santos</name>
</author>
<author><name sortKey="Pratas, D" sort="Pratas, D" uniqKey="Pratas D" first="D" last="Pratas">D. Pratas</name>
</author>
<author><name sortKey="Afreixo, V" sort="Afreixo, V" uniqKey="Afreixo V" first="V" last="Afreixo">V. Afreixo</name>
</author>
<author><name sortKey="Bastos, C A C" sort="Bastos, C A C" uniqKey="Bastos C" first="C A C" last="Bastos">C A C. Bastos</name>
</author>
<author><name sortKey="Ferreira, P J S G" sort="Ferreira, P J S G" uniqKey="Ferreira P" first="P J S G" last="Ferreira">P J S G. Ferreira</name>
</author>
<author><name sortKey="Pinho, A J" sort="Pinho, A J" uniqKey="Pinho A" first="A J" last="Pinho">A J Pinho</name>
</author>
</analytic>
<series><title level="j">IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM</title>
<idno type="eISSN">1557-9964</idno>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genome, Human (genetics)</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Sequence Analysis, DNA (methods)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génome humain (génétique)</term>
<term>Génomique ()</term>
<term>Humains</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Genome, Human</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Génome humain</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Humans</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génomique</term>
<term>Humains</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.</div>
</front>
</TEI>
<pubmed><MedlineCitation Owner="NLM" Status="MEDLINE"><PMID Version="1">25594089</PMID>
<DateCreated><Year>2015</Year>
<Month>01</Month>
<Day>15</Day>
</DateCreated>
<DateCompleted><Year>2015</Year>
<Month>07</Month>
<Day>27</Day>
</DateCompleted>
<Article PubModel="Print"><Journal><ISSN IssnType="Electronic">1557-9964</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>10</Volume>
<Issue>3</Issue>
<PubDate><MedlineDate>2013 May-Jun</MedlineDate>
</PubDate>
</JournalIssue>
<Title>IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM</Title>
<ISOAbbreviation>IEEE/ACM Trans Comput Biol Bioinform</ISOAbbreviation>
</Journal>
<ArticleTitle>A genomic distance for assembly comparison based on compressed maximal exact matches.</ArticleTitle>
<Pagination><MedlinePgn>793-8</MedlinePgn>
</Pagination>
<Abstract><AbstractText>Genome assemblies are typically compared with respect to their contiguity, coverage, and accuracy. We propose a genome-wide, alignment-free genomic distance based on compressed maximal exact matches and suggest adding it to the benchmark of commonly used assembly quality metrics. Maximal exact matches are perfect repeats, without gaps or misspellings, which cannot be further extended to either their left- or right-end side without loss of similarity. The genomic distance here proposed is based on the normalized compression distance, an information-theoretic measure of the relative compressibility of two sequences estimated using multiple finite-context models. This measure exposes similarities between the sequences, as well as, the nesting structure underlying the assembly of larger maximal exact matches from smaller ones. We use four human genome assemblies for illustration and discuss the impact of genome sequencing and assembly in the final content of maximal exact matches and the genomic distance here proposed.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Garcia</LastName>
<ForeName>S P</ForeName>
<Initials>SP</Initials>
<AffiliationInfo><Affiliation>Signal Processing Laboratory, Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal. spgarcia@ua.pt</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Rodrigues</LastName>
<ForeName>J M O S</ForeName>
<Initials>JM</Initials>
</Author>
<Author ValidYN="Y"><LastName>Santos</LastName>
<ForeName>S</ForeName>
<Initials>S</Initials>
</Author>
<Author ValidYN="Y"><LastName>Pratas</LastName>
<ForeName>D</ForeName>
<Initials>D</Initials>
</Author>
<Author ValidYN="Y"><LastName>Afreixo</LastName>
<ForeName>V</ForeName>
<Initials>V</Initials>
</Author>
<Author ValidYN="Y"><LastName>Bastos</LastName>
<ForeName>C A C</ForeName>
<Initials>CA</Initials>
</Author>
<Author ValidYN="Y"><LastName>Ferreira</LastName>
<ForeName>P J S G</ForeName>
<Initials>PJ</Initials>
</Author>
<Author ValidYN="Y"><LastName>Pinho</LastName>
<ForeName>A J</ForeName>
<Initials>AJ</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
<PublicationType UI="D013485">Research Support, Non-U.S. Gov't</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo><Country>United States</Country>
<MedlineTA>IEEE/ACM Trans Comput Biol Bioinform</MedlineTA>
<NlmUniqueID>101196755</NlmUniqueID>
<ISSNLinking>1545-5963</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<MeshHeadingList><MeshHeading><DescriptorName MajorTopicYN="N" UI="D000465">Algorithms</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName MajorTopicYN="N" UI="D015894">Genome, Human</DescriptorName>
<QualifierName MajorTopicYN="N" UI="Q000235">genetics</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName MajorTopicYN="N" UI="D023281">Genomics</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
<MeshHeading><DescriptorName MajorTopicYN="N" UI="D006801">Humans</DescriptorName>
</MeshHeading>
<MeshHeading><DescriptorName MajorTopicYN="N" UI="D017422">Sequence Analysis, DNA</DescriptorName>
<QualifierName MajorTopicYN="Y" UI="Q000379">methods</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="entrez"><Year>2015</Year>
<Month>1</Month>
<Day>17</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2013</Year>
<Month>5</Month>
<Day>1</Day>
<Hour>0</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2015</Year>
<Month>7</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">25594089</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations><list><country><li>Portugal</li>
</country>
</list>
<tree><noCountry><name sortKey="Afreixo, V" sort="Afreixo, V" uniqKey="Afreixo V" first="V" last="Afreixo">V. Afreixo</name>
<name sortKey="Bastos, C A C" sort="Bastos, C A C" uniqKey="Bastos C" first="C A C" last="Bastos">C A C. Bastos</name>
<name sortKey="Ferreira, P J S G" sort="Ferreira, P J S G" uniqKey="Ferreira P" first="P J S G" last="Ferreira">P J S G. Ferreira</name>
<name sortKey="Pinho, A J" sort="Pinho, A J" uniqKey="Pinho A" first="A J" last="Pinho">A J Pinho</name>
<name sortKey="Pratas, D" sort="Pratas, D" uniqKey="Pratas D" first="D" last="Pratas">D. Pratas</name>
<name sortKey="Rodrigues, J M O S" sort="Rodrigues, J M O S" uniqKey="Rodrigues J" first="J M O S" last="Rodrigues">J M O S. Rodrigues</name>
<name sortKey="Santos, S" sort="Santos, S" uniqKey="Santos S" first="S" last="Santos">S. Santos</name>
</noCountry>
<country name="Portugal"><noRegion><name sortKey="Garcia, S P" sort="Garcia, S P" uniqKey="Garcia S" first="S P" last="Garcia">S P Garcia</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/TelematiV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B11 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 000B11 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= TelematiV1 |flux= Ncbi |étape= Merge |type= RBID |clé= pubmed:25594089 |texte= A genomic distance for assembly comparison based on compressed maximal exact matches. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i -Sk "pubmed:25594089" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd \ | NlmPubMed2Wicri -a TelematiV1
This area was generated with Dilib version V0.6.31. |