Serveur d'exploration sur l'oranger

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Identifieur interne : 000559 ( Pmc/Checkpoint ); précédent : 000558; suivant : 000560

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Auteurs : Christopher Leela Biji ; Manu K. Madhu ; Vineetha Vishnu ; Satheesh Kumar K ; Achuthsankar S. Nair

Source :

RBID : PMC:4464544

Abstract

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

Availability

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/


Url:
DOI: 10.6026/97320630011267
PubMed: 26124572
PubMed Central: 4464544


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4464544

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Compression of Large genomic datasets using COMRAD on Parallel Computing Platform</title>
<author>
<name sortKey="Biji, Christopher Leela" sort="Biji, Christopher Leela" uniqKey="Biji C" first="Christopher Leela" last="Biji">Christopher Leela Biji</name>
<affiliation>
<nlm:aff id="A1">Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Madhu, Manu K" sort="Madhu, Manu K" uniqKey="Madhu M" first="Manu K" last="Madhu">Manu K. Madhu</name>
<affiliation>
<nlm:aff id="A2">School of Computer Science, Mahathma Gandhi University, Kottayam</nlm:aff>
<wicri:noCountry code="subfield">Kottayam</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Vishnu, Vineetha" sort="Vishnu, Vineetha" uniqKey="Vishnu V" first="Vineetha" last="Vishnu">Vineetha Vishnu</name>
<affiliation>
<nlm:aff id="A3">Infosys Technologies, Trivandrum</nlm:aff>
<wicri:noCountry code="subfield">Trivandrum</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="K, Satheesh Kumar" sort="K, Satheesh Kumar" uniqKey="K S" first="Satheesh Kumar" last="K">Satheesh Kumar K</name>
<affiliation>
<nlm:aff id="A4">Department of Future Studies, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Nair, Achuthsankar S" sort="Nair, Achuthsankar S" uniqKey="Nair A" first="Achuthsankar S" last="Nair">Achuthsankar S. Nair</name>
<affiliation>
<nlm:aff id="A1">Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26124572</idno>
<idno type="pmc">4464544</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4464544</idno>
<idno type="RBID">PMC:4464544</idno>
<idno type="doi">10.6026/97320630011267</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000039</idno>
<idno type="wicri:Area/Pmc/Curation">000039</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000559</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Compression of Large genomic datasets using COMRAD on Parallel Computing Platform</title>
<author>
<name sortKey="Biji, Christopher Leela" sort="Biji, Christopher Leela" uniqKey="Biji C" first="Christopher Leela" last="Biji">Christopher Leela Biji</name>
<affiliation>
<nlm:aff id="A1">Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Madhu, Manu K" sort="Madhu, Manu K" uniqKey="Madhu M" first="Manu K" last="Madhu">Manu K. Madhu</name>
<affiliation>
<nlm:aff id="A2">School of Computer Science, Mahathma Gandhi University, Kottayam</nlm:aff>
<wicri:noCountry code="subfield">Kottayam</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Vishnu, Vineetha" sort="Vishnu, Vineetha" uniqKey="Vishnu V" first="Vineetha" last="Vishnu">Vineetha Vishnu</name>
<affiliation>
<nlm:aff id="A3">Infosys Technologies, Trivandrum</nlm:aff>
<wicri:noCountry code="subfield">Trivandrum</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="K, Satheesh Kumar" sort="K, Satheesh Kumar" uniqKey="K S" first="Satheesh Kumar" last="K">Satheesh Kumar K</name>
<affiliation>
<nlm:aff id="A4">Department of Future Studies, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Nair, Achuthsankar S" sort="Nair, Achuthsankar S" uniqKey="Nair A" first="Achuthsankar S" last="Nair">Achuthsankar S. Nair</name>
<affiliation>
<nlm:aff id="A1">Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram</nlm:aff>
<wicri:noCountry code="subfield">Thiruvananthapuram</wicri:noCountry>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformation</title>
<idno type="ISSN">0973-8894</idno>
<idno type="eISSN">0973-2063</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.</p>
<sec id="sb1e">
<title>Availability</title>
<p>The source codes are written in C using message passing libraries and are available at
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/comradmpi/files/COMRADMPI/">https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/</ext-link>
</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="O Driscoll, A" uniqKey="O Driscoll A">A O׳Driscoll</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marx, V" uniqKey="Marx V">V Marx</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Koboldt, Dc" uniqKey="Koboldt D">DC Koboldt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozov, R" uniqKey="Rozov R">R Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hsi Yang Fritz, M" uniqKey="Hsi Yang Fritz M">M Hsi-Yang Fritz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Kb" uniqKey="Li K">KB Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pinho, Aj" uniqKey="Pinho A">AJ Pinho</name>
</author>
<author>
<name sortKey="Pratas, D" uniqKey="Pratas D">D Pratas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kuruppu, S" uniqKey="Kuruppu S">S Kuruppu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haubold, B" uniqKey="Haubold B">B Haubold</name>
</author>
<author>
<name sortKey="Wiehe, T" uniqKey="Wiehe T">T Wiehe</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformation</journal-id>
<journal-id journal-id-type="iso-abbrev">Bioinformation</journal-id>
<journal-id journal-id-type="publisher-id">Bioinformation</journal-id>
<journal-title-group>
<journal-title>Bioinformation</journal-title>
</journal-title-group>
<issn pub-type="ppub">0973-8894</issn>
<issn pub-type="epub">0973-2063</issn>
<publisher>
<publisher-name>Biomedical Informatics</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26124572</article-id>
<article-id pub-id-type="pmc">4464544</article-id>
<article-id pub-id-type="publisher-id">97320630011267</article-id>
<article-id pub-id-type="doi">10.6026/97320630011267</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Compression of Large genomic datasets using COMRAD on Parallel Computing Platform</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Biji</surname>
<given-names>Christopher Leela</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Madhu</surname>
<given-names>Manu K</given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vishnu</surname>
<given-names>Vineetha</given-names>
</name>
<xref ref-type="aff" rid="A3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>K</surname>
<given-names>Satheesh Kumar</given-names>
</name>
<xref ref-type="aff" rid="A4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vijayakumar</surname>
<given-names></given-names>
</name>
<xref ref-type="aff" rid="A2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nair</surname>
<given-names>Achuthsankar S</given-names>
</name>
<xref ref-type="aff" rid="A1">1</xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
<aff id="A1">
<label>1</label>
Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram</aff>
<aff id="A2">
<label>2</label>
School of Computer Science, Mahathma Gandhi University, Kottayam</aff>
<aff id="A3">
<label>3</label>
Infosys Technologies, Trivandrum</aff>
<aff id="A4">
<label>4</label>
Department of Future Studies, University of Kerala, Thiruvananthapuram</aff>
</contrib-group>
<author-notes>
<corresp id="COR1">
<label>*</label>
Achuthsankar:
<email>sankar.achuth@gmail.com</email>
</corresp>
</author-notes>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>5</month>
<year>2015</year>
</pub-date>
<volume>11</volume>
<issue>5</issue>
<fpage>267</fpage>
<lpage>271</lpage>
<history>
<date date-type="received">
<day>04</day>
<month>5</month>
<year>2015</year>
</date>
<date date-type="rev-recd">
<day>06</day>
<month>5</month>
<year>2015</year>
</date>
<date date-type="accepted">
<day>06</day>
<month>5</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>© 2015 Biomedical Informatics</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access">
<license-p>This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.</license-p>
</license>
</permissions>
<abstract>
<p>The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.</p>
<sec id="sb1e">
<title>Availability</title>
<p>The source codes are written in C using message passing libraries and are available at
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/comradmpi/files/COMRADMPI/">https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/</ext-link>
</p>
</sec>
</abstract>
<kwd-group>
<kwd>Genome compression</kwd>
<kwd>Sequence analysis</kwd>
<kwd>Parallel Computing</kwd>
<kwd>Big data storage</kwd>
<kwd>Genome Analysis</kwd>
</kwd-group>
</article-meta>
</front>
<floats-group>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption>
<p> Flow Chart of Compression of Large Genome Dataset using COMRAD on Parallel Computing Platform</p>
</caption>
<graphic xlink:href="97320630011267F1"></graphic>
</fig>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption>
<p>An example showing the code book generation for an input string with string length, L=2 and threshold frequency F=2</p>
</caption>
<graphic xlink:href="97320630011267F2"></graphic>
</fig>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption>
<p>Compression time and speedups for Compression of Large Genomic Datasets using COMRAD on Parallel Computing Platform for selected sample data set with size in MB as a function of number of processors.</p>
</caption>
<graphic xlink:href="97320630011267F3"></graphic>
</fig>
</floats-group>
</pmc>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Biji, Christopher Leela" sort="Biji, Christopher Leela" uniqKey="Biji C" first="Christopher Leela" last="Biji">Christopher Leela Biji</name>
<name sortKey="K, Satheesh Kumar" sort="K, Satheesh Kumar" uniqKey="K S" first="Satheesh Kumar" last="K">Satheesh Kumar K</name>
<name sortKey="Madhu, Manu K" sort="Madhu, Manu K" uniqKey="Madhu M" first="Manu K" last="Madhu">Manu K. Madhu</name>
<name sortKey="Nair, Achuthsankar S" sort="Nair, Achuthsankar S" uniqKey="Nair A" first="Achuthsankar S" last="Nair">Achuthsankar S. Nair</name>
<name sortKey="Vishnu, Vineetha" sort="Vishnu, Vineetha" uniqKey="Vishnu V" first="Vineetha" last="Vishnu">Vineetha Vishnu</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Bois/explor/OrangerV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000559 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000559 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Bois
   |area=    OrangerV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:4464544
   |texte=   Compression of Large genomic datasets using COMRAD on Parallel Computing Platform
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:26124572" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OrangerV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 17:11:04 2016. Site generation: Wed Mar 6 18:18:32 2024