Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

Identifieur interne : 001258 ( PubMed/Curation ); précédent : 001257; suivant : 001259

HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

Auteurs : Ramin Karimi [Hongrie] ; Andras Hajdu [Hongrie]

Source :

RBID : pubmed:26884678

Abstract

Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.

DOI: 10.4137/EBO.S35545
PubMed: 26884678

Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:26884678

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.</title>
<author>
<name sortKey="Karimi, Ramin" sort="Karimi, Ramin" uniqKey="Karimi R" first="Ramin" last="Karimi">Ramin Karimi</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.</nlm:affiliation>
<country xml:lang="fr">Hongrie</country>
<wicri:regionArea>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hajdu, Andras" sort="Hajdu, Andras" uniqKey="Hajdu A" first="Andras" last="Hajdu">Andras Hajdu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.; Bioinformatics Research Group, University of Debrecen, Debrecen, Hungary.</nlm:affiliation>
<country xml:lang="fr">Hongrie</country>
<wicri:regionArea>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.; Bioinformatics Research Group, University of Debrecen, Debrecen</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:26884678</idno>
<idno type="pmid">26884678</idno>
<idno type="doi">10.4137/EBO.S35545</idno>
<idno type="wicri:Area/PubMed/Corpus">001258</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001258</idno>
<idno type="wicri:Area/PubMed/Curation">001258</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001258</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.</title>
<author>
<name sortKey="Karimi, Ramin" sort="Karimi, Ramin" uniqKey="Karimi R" first="Ramin" last="Karimi">Ramin Karimi</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.</nlm:affiliation>
<country xml:lang="fr">Hongrie</country>
<wicri:regionArea>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Hajdu, Andras" sort="Hajdu, Andras" uniqKey="Hajdu A" first="Andras" last="Hajdu">Andras Hajdu</name>
<affiliation wicri:level="1">
<nlm:affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.; Bioinformatics Research Group, University of Debrecen, Debrecen, Hungary.</nlm:affiliation>
<country xml:lang="fr">Hongrie</country>
<wicri:regionArea>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.; Bioinformatics Research Group, University of Debrecen, Debrecen</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Evolutionary bioinformatics online</title>
<idno type="ISSN">1176-9343</idno>
<imprint>
<date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. </div>
</front>
</TEI>
<pubmed>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">26884678</PMID>
<DateCompleted>
<Year>2016</Year>
<Month>02</Month>
<Day>17</Day>
</DateCompleted>
<DateRevised>
<Year>2019</Year>
<Month>08</Month>
<Day>30</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Print">1176-9343</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>12</Volume>
<PubDate>
<Year>2016</Year>
</PubDate>
</JournalIssue>
<Title>Evolutionary bioinformatics online</Title>
<ISOAbbreviation>Evol. Bioinform. Online</ISOAbbreviation>
</Journal>
<ArticleTitle>HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.</ArticleTitle>
<Pagination>
<MedlinePgn>73-85</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.4137/EBO.S35545</ELocationID>
<Abstract>
<AbstractText>Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. </AbstractText>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Karimi</LastName>
<ForeName>Ramin</ForeName>
<Initials>R</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Hajdu</LastName>
<ForeName>Andras</ForeName>
<Initials>A</Initials>
<AffiliationInfo>
<Affiliation>Faculty of Informatics, Department of Computer Graphics and Image Processing, University of Debrecen, Debrecen, Hungary.; Bioinformatics Research Group, University of Debrecen, Debrecen, Hungary.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>02</Month>
<Day>10</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Evol Bioinform Online</MedlineTA>
<NlmUniqueID>101256319</NlmUniqueID>
<ISSNLinking>1176-9343</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">DNA signature</Keyword>
<Keyword MajorTopicYN="N">Hadoop</Keyword>
<Keyword MajorTopicYN="N">Hive</Keyword>
<Keyword MajorTopicYN="N">MapReduce</Keyword>
<Keyword MajorTopicYN="N">WordCount</Keyword>
<Keyword MajorTopicYN="N">k-mers</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2015</Year>
<Month>09</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised">
<Year>2015</Year>
<Month>11</Month>
<Day>05</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2015</Year>
<Month>12</Month>
<Day>05</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>2</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>2</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>2</Month>
<Day>18</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">26884678</ArticleId>
<ArticleId IdType="doi">10.4137/EBO.S35545</ArticleId>
<ArticleId IdType="pii">ebo-12-2016-073</ArticleId>
<ArticleId IdType="pmc">PMC4750899</ArticleId>
</ArticleIdList>
<ReferenceList>
<Reference>
<Citation>Bioinformatics. 2001 Nov;17(11):1067-76</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">11724738</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2008 Apr 10;9:185</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18402679</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Clin Microbiol. 2003 May;41(5):2068-79</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12734250</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2004 Feb 25;32(4):1363-71</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14985472</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2004 Sep 1;20(13):2101-12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15059835</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS One. 2013;8(2):e57923</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">23460914</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2014 Oct 05;15:339</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">25282047</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Genomics. 2008 Oct 21;9:496</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">18940003</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2002 Oct;18(10):1340-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12376378</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Bacteriol. 1994 Dec;176(24):7694-702</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">8002595</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Jun 1;27(11):1546-54</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21471017</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2011 Mar 15;27(6):764-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">21217122</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2005 Apr 15;21(8):1365-70</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15572465</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Diagn Microbiol Infect Dis. 2006 May;55(1):37-45</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16546342</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W611-5</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15980547</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2002 Aug 1;30(15):3481-9</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12140334</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Bioinformatics. 2007 Jan 1;23(1):5-13</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">17068088</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Genome Biol. 2004;5(2):R12</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">14759262</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2009 Jul;37(Web Server issue):W229-34</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">19417071</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>PLoS Comput Biol. 2010 Feb 26;6(2):e1000667</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20195499</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Nucleic Acids Res. 2003 Jun 15;31(12):3057-62</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">12799432</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>J Microbiol Methods. 2006 Jun;65(3):390-403</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">16216356</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Syst Appl Microbiol. 2004 Mar;27(2):175-85</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">15046306</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2010 Jun 23;11:340</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20573238</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>BMC Bioinformatics. 2010 Mar 16;11:132</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">20230647</ArticleId>
</ArticleIdList>
</Reference>
<Reference>
<Citation>Science. 1991 Jun 21;252(5013):1651-6</Citation>
<ArticleIdList>
<ArticleId IdType="pubmed">2047873</ArticleId>
</ArticleIdList>
</Reference>
</ReferenceList>
</PubmedData>
</pubmed>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/PubMed/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001258 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd -nk 001258 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    PubMed
   |étape=   Curation
   |type=    RBID
   |clé=     pubmed:26884678
   |texte=   HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Curation/RBID.i   -Sk "pubmed:26884678" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021