Serveur d'exploration sur la visibilité du Havre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

From Indexing Data Structures to de Bruijn Graphs

Identifieur interne : 000279 ( France/Analysis ); précédent : 000278; suivant : 000280

From Indexing Data Structures to de Bruijn Graphs

Auteurs : Bastien Cazaux [France] ; Thierry Lecroq [France] ; Eric Rivals [France]

Source :

RBID : Hal:lirmm-00950983

English descriptors

Abstract

New technologies have tremendously increased sequencing throughput compared to traditional techniques, thereby complicating DNA assembly. Hence, assembly programs resort to de Bruijn graphs (dBG) of $k$-mers of short reads to compute a set of long contigs, each being a putative segment of the sequenced molecule. Other types of DNA sequence analysis, as well as preprocessing of the reads for assembly, use classical data structures to index all substrings of the reads. It is thus interesting to exhibit algorithms that directly build a de Bruijn graph of order $k$ from a pre-existing index, and especially a contracted version of the de Bruijn graph, where non branching paths are condensed into single nodes. Here, we formalise the relationship between suffix trees/arrays and dBGs, and exhibit linear time algorithms for constructing the full or contracted de Bruijn graphs. Finally, we provide hints explaining why this bridge between indexes and dBGs enables to dynamically update the order $k$ of the graph.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:lirmm-00950983

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">From Indexing Data Structures to de Bruijn Graphs</title>
<author>
<name sortKey="Cazaux, Bastien" sort="Cazaux, Bastien" uniqKey="Cazaux B" first="Bastien" last="Cazaux">Bastien Cazaux</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-388224" status="VALID">
<orgName>Méthodes et Algorithmes pour la Bioinformatique</orgName>
<orgName type="acronym">MAB</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.lirmm.fr/recherche/equipes/mab</ref>
</desc>
<listRelation>
<relation active="#struct-181" type="direct"></relation>
<relation name="UMR5506" active="#struct-441569" type="indirect"></relation>
<relation name="UMR5506" active="#struct-410122" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-181" type="direct">
<org type="laboratory" xml:id="struct-181" status="VALID">
<idno type="RNSR">199111950H</idno>
<orgName>Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier</orgName>
<orgName type="acronym">LIRMM</orgName>
<date type="start">1995</date>
<desc>
<address>
<addrLine>CC 477, 161 rue Ada, 34095 Montpellier Cedex 5</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.lirmm.fr</ref>
</desc>
<listRelation>
<relation name="UMR5506" active="#struct-441569" type="direct"></relation>
<relation name="UMR5506" active="#struct-410122" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR5506" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR5506" active="#struct-410122" type="indirect">
<org type="institution" xml:id="struct-410122" status="VALID">
<orgName>Université de Montpellier</orgName>
<orgName type="acronym">UM</orgName>
<desc>
<address>
<addrLine>163 rue Auguste Broussonnet - 34090 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.umontpellier.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Lecroq, Thierry" sort="Lecroq, Thierry" uniqKey="Lecroq T" first="Thierry" last="Lecroq">Thierry Lecroq</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author>
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-213159" status="VALID">
<orgName>Institut de Biologie Computationnelle</orgName>
<orgName type="acronym">IBC</orgName>
<desc>
<address>
<addrLine>95 rue de la Galéra, 34095 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ibc-montpellier.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-92114" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-11574" type="direct"></relation>
<relation active="#struct-441569" type="direct"></relation>
<relation active="#struct-410122" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-92114" type="direct">
<org type="institution" xml:id="struct-92114" status="VALID">
<orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="direct">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-11574" type="direct">
<org type="institution" xml:id="struct-11574" status="VALID">
<orgName>Centre de Coopération Internationale en Recherche Agronomique pour le Développement</orgName>
<orgName type="acronym">CIRAD</orgName>
<desc>
<address>
<addrLine>42, rue Scheffer 75116 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.cirad.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-410122" type="direct">
<org type="institution" xml:id="struct-410122" status="VALID">
<orgName>Université de Montpellier</orgName>
<orgName type="acronym">UM</orgName>
<desc>
<address>
<addrLine>163 rue Auguste Broussonnet - 34090 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.umontpellier.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:lirmm-00950983</idno>
<idno type="halId">lirmm-00950983</idno>
<idno type="halUri">https://hal-lirmm.ccsd.cnrs.fr/lirmm-00950983</idno>
<idno type="url">https://hal-lirmm.ccsd.cnrs.fr/lirmm-00950983</idno>
<date when="2014-02-20">2014-02-20</date>
<idno type="wicri:Area/Hal/Corpus">000135</idno>
<idno type="wicri:Area/Hal/Curation">000135</idno>
<idno type="wicri:Area/Hal/Checkpoint">000167</idno>
<idno type="wicri:Area/Main/Merge">000292</idno>
<idno type="wicri:Area/Main/Curation">000291</idno>
<idno type="wicri:Area/Main/Exploration">000291</idno>
<idno type="wicri:Area/France/Extraction">000279</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">From Indexing Data Structures to de Bruijn Graphs</title>
<author>
<name sortKey="Cazaux, Bastien" sort="Cazaux, Bastien" uniqKey="Cazaux B" first="Bastien" last="Cazaux">Bastien Cazaux</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-388224" status="VALID">
<orgName>Méthodes et Algorithmes pour la Bioinformatique</orgName>
<orgName type="acronym">MAB</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.lirmm.fr/recherche/equipes/mab</ref>
</desc>
<listRelation>
<relation active="#struct-181" type="direct"></relation>
<relation name="UMR5506" active="#struct-441569" type="indirect"></relation>
<relation name="UMR5506" active="#struct-410122" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-181" type="direct">
<org type="laboratory" xml:id="struct-181" status="VALID">
<idno type="RNSR">199111950H</idno>
<orgName>Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier</orgName>
<orgName type="acronym">LIRMM</orgName>
<date type="start">1995</date>
<desc>
<address>
<addrLine>CC 477, 161 rue Ada, 34095 Montpellier Cedex 5</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.lirmm.fr</ref>
</desc>
<listRelation>
<relation name="UMR5506" active="#struct-441569" type="direct"></relation>
<relation name="UMR5506" active="#struct-410122" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR5506" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR5506" active="#struct-410122" type="indirect">
<org type="institution" xml:id="struct-410122" status="VALID">
<orgName>Université de Montpellier</orgName>
<orgName type="acronym">UM</orgName>
<desc>
<address>
<addrLine>163 rue Auguste Broussonnet - 34090 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.umontpellier.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Lecroq, Thierry" sort="Lecroq, Thierry" uniqKey="Lecroq T" first="Thierry" last="Lecroq">Thierry Lecroq</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author>
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-213159" status="VALID">
<orgName>Institut de Biologie Computationnelle</orgName>
<orgName type="acronym">IBC</orgName>
<desc>
<address>
<addrLine>95 rue de la Galéra, 34095 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ibc-montpellier.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-92114" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-11574" type="direct"></relation>
<relation active="#struct-441569" type="direct"></relation>
<relation active="#struct-410122" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-92114" type="direct">
<org type="institution" xml:id="struct-92114" status="VALID">
<orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="direct">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-11574" type="direct">
<org type="institution" xml:id="struct-11574" status="VALID">
<orgName>Centre de Coopération Internationale en Recherche Agronomique pour le Développement</orgName>
<orgName type="acronym">CIRAD</orgName>
<desc>
<address>
<addrLine>42, rue Scheffer 75116 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.cirad.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-410122" type="direct">
<org type="institution" xml:id="struct-410122" status="VALID">
<orgName>Université de Montpellier</orgName>
<orgName type="acronym">UM</orgName>
<desc>
<address>
<addrLine>163 rue Auguste Broussonnet - 34090 Montpellier</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.umontpellier.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term>assembly</term>
<term>construction algorithms</term>
<term>contracted de Bruijn graph</term>
<term>dynamic update</term>
<term>index data structures</term>
<term>k-mer</term>
<term>overlap</term>
<term>suffix array</term>
<term>suffix tree</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">New technologies have tremendously increased sequencing throughput compared to traditional techniques, thereby complicating DNA assembly. Hence, assembly programs resort to de Bruijn graphs (dBG) of $k$-mers of short reads to compute a set of long contigs, each being a putative segment of the sequenced molecule. Other types of DNA sequence analysis, as well as preprocessing of the reads for assembly, use classical data structures to index all substrings of the reads. It is thus interesting to exhibit algorithms that directly build a de Bruijn graph of order $k$ from a pre-existing index, and especially a contracted version of the de Bruijn graph, where non branching paths are condensed into single nodes. Here, we formalise the relationship between suffix trees/arrays and dBGs, and exhibit linear time algorithms for constructing the full or contracted de Bruijn graphs. Finally, we provide hints explaining why this bridge between indexes and dBGs enables to dynamically update the order $k$ of the graph.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Haute-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement>
<li>Le Havre</li>
<li>Rouen</li>
</settlement>
<orgName>
<li>Université de Rouen</li>
<li>Université du Havre</li>
</orgName>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Cazaux, Bastien" sort="Cazaux, Bastien" uniqKey="Cazaux B" first="Bastien" last="Cazaux">Bastien Cazaux</name>
</noRegion>
<name sortKey="Lecroq, Thierry" sort="Lecroq, Thierry" uniqKey="Lecroq T" first="Thierry" last="Lecroq">Thierry Lecroq</name>
<name sortKey="Rivals, Eric" sort="Rivals, Eric" uniqKey="Rivals E" first="Eric" last="Rivals">Eric Rivals</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/France/explor/LeHavreV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000279 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000279 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/France
   |area=    LeHavreV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     Hal:lirmm-00950983
   |texte=   From Indexing Data Structures to de Bruijn Graphs
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 14:37:02 2016. Site generation: Tue Mar 5 08:25:07 2024