Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

KAnalyze: a fast versatile pipelined K-mer toolkit

Identifieur interne : 000B11 ( Pmc/Corpus ); précédent : 000B10; suivant : 000B12

KAnalyze: a fast versatile pipelined K-mer toolkit

Auteurs : Peter Audano ; Fredrik Vannberg

Source :

RBID : PMC:4080738

Abstract

Motivation: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.

Results: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.

Availability and implementation: KAnalyze is available on SourceForge:

https://sourceforge.net/projects/kanalyze/

Contact:fredrik.vannberg@biology.gatech.edu

Supplementary information: Supplementary data are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/btu152
PubMed: 24642064
PubMed Central: 4080738

Links to Exploration step

PMC:4080738

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">KAnalyze: a fast versatile pipelined K-mer toolkit</title>
<author>
<name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
</author>
<author>
<name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24642064</idno>
<idno type="pmc">4080738</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4080738</idno>
<idno type="RBID">PMC:4080738</idno>
<idno type="doi">10.1093/bioinformatics/btu152</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Pmc/Corpus">000B11</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B11</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">KAnalyze: a fast versatile pipelined K-mer toolkit</title>
<author>
<name sortKey="Audano, Peter" sort="Audano, Peter" uniqKey="Audano P" first="Peter" last="Audano">Peter Audano</name>
</author>
<author>
<name sortKey="Vannberg, Fredrik" sort="Vannberg, Fredrik" uniqKey="Vannberg F" first="Fredrik" last="Vannberg">Fredrik Vannberg</name>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>
<bold>Motivation</bold>
: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.</p>
<p>
<bold>Results</bold>
: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.</p>
<p>
<bold>Availability and implementation</bold>
: KAnalyze is available on SourceForge:</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/kanalyze/">https://sourceforge.net/projects/kanalyze/</ext-link>
</p>
<p>
<bold>Contact:</bold>
<email>fredrik.vannberg@biology.gatech.edu</email>
</p>
<p>
<bold>Supplementary information</bold>
:
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Knuth, D" uniqKey="Knuth D">D Knuth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marcais, G" uniqKey="Marcais G">G Marçais</name>
</author>
<author>
<name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Newburger, De" uniqKey="Newburger D">DE Newburger</name>
</author>
<author>
<name sortKey="Bulyk, Ml" uniqKey="Bulyk M">ML Bulyk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nordstrom, Kjv" uniqKey="Nordstrom K">KJV Nordström</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, G" uniqKey="Wilson G">G Wilson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-id journal-id-type="hwp">bioinfo</journal-id>
<journal-title-group>
<journal-title>Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1367-4811</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24642064</article-id>
<article-id pub-id-type="pmc">4080738</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/btu152</article-id>
<article-id pub-id-type="publisher-id">btu152</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Applications Notes</subject>
<subj-group subj-group-type="heading">
<subject>Sequence Analysis</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>KAnalyze: a fast versatile pipelined K-mer toolkit</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Audano</surname>
<given-names>Peter</given-names>
</name>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vannberg</surname>
<given-names>Fredrik</given-names>
</name>
<xref ref-type="corresp" rid="btu152-COR1">*</xref>
</contrib>
<aff>School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA</aff>
</contrib-group>
<author-notes>
<corresp id="btu152-COR1">* To whom correspondence should be addressed.</corresp>
<fn id="btu152-FN1">
<p>Associate Editor: Alfonso Valencia</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<day>15</day>
<month>7</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>18</day>
<month>3</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>18</day>
<month>3</month>
<year>2014</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>30</volume>
<issue>14</issue>
<fpage>2070</fpage>
<lpage>2072</lpage>
<history>
<date date-type="received">
<day>2</day>
<month>12</month>
<year>2013</year>
</date>
<date date-type="rev-recd">
<day>24</day>
<month>1</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>12</day>
<month>3</month>
<year>2014</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author 2014. Published by Oxford University Press.</copyright-statement>
<copyright-year>2014</copyright-year>
<license license-type="creative-commons" xlink:href="http://creativecommons.org/licenses/by/3.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>
<bold>Motivation</bold>
: Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.</p>
<p>
<bold>Results</bold>
: As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.</p>
<p>
<bold>Availability and implementation</bold>
: KAnalyze is available on SourceForge:</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/kanalyze/">https://sourceforge.net/projects/kanalyze/</ext-link>
</p>
<p>
<bold>Contact:</bold>
<email>fredrik.vannberg@biology.gatech.edu</email>
</p>
<p>
<bold>Supplementary information</bold>
:
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary data</ext-link>
are available at
<italic>Bioinformatics</italic>
online.</p>
</abstract>
<counts>
<page-count count="3"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec>
<title>1 INTRODUCTION</title>
<p>K-merizing sequence data is a necessary step for many bioinformatics applications. K-mer-based approaches are used to assemble reads, detect repeats, estimate read depth, identify protein binding sites (
<xref rid="btu152-B2" ref-type="bibr">Newburger and Bulyk, 2009</xref>
), find mutations in sequencing data (
<xref rid="btu152-B3" ref-type="bibr">Nordström
<italic>et al.</italic>
, 2013</xref>
) and perform a variety of other tasks.</p>
<p>As new applications are created, it is important to have reliable software for generating k-mers. If developers choose to rewrite k-mer code, there is an additional risk of introducing bugs that can affect results. This problem is compounded when algorithms become more complex, such as counting k-mers in large datasets with limited memory. The time required to develop and to test a fast algorithm becomes prohibitive. Existing tools often lack features that make them more available to new applications. Few have an API or document return codes.</p>
<p>We created KAnalyze as a fast reusable k-mer toolkit capable of running on multiple platforms. It is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted pipelines. The count module has a graphical mode for desktop use.</p>
<p>Because it is designed for longevity, the project is organized, documented and tested. The source code includes unit tests to quickly verify accuracy as the code changes. We ran tests on several datasets and compared the results with other k-mer software, including a Perl pipeline we built for verifying results. Throughout the design process, the best practices for scientific computing were observed (
<xref rid="btu152-B5" ref-type="bibr">Wilson
<italic>et al.</italic>
, 2014</xref>
). KAnalyze makes both speed and accuracy available to k-mer applications.</p>
</sec>
<sec>
<title>2 METHODS</title>
<sec id="SEC2.1">
<title>2.1 Pipelined components and modules</title>
<p>KAnalyze is organized as a set of modules and components. Modules are pipelines where each step is implemented as a component. Components may be shared among modules. The pipeline runs in parallel with each component passing intermediate results to the next. Input is sent to the first step, and output is written by the last step. Each command line mode executes a single module.</p>
<p>Bounded, synchronized memory queues allow rapid exchange of data through the pipeline. To reduce lock overhead incurred by each queue operation, most components send batches of data elements. By passing intermediate results through memory, disk I/O overhead is avoided.</p>
</sec>
<sec id="SEC2.2">
<title>2.2 API and CLI</title>
<p>The API is fully annotated with Javadoc comments for every class, method and field. No method throws any exception without declaring and documenting the conditions under which the exception is thrown. Every constructor and method comment states how null arguments are handled. The web pages generated from the Javadoc comments are available to API developers, and the KAnalyze manual describes how to extend the API.</p>
<p>KAnalyze uses Java's Reflection API to dynamically load some classes. The CLI uses the first command-line argument to find the desired module class. The file reader component uses the file type, such as FASTA or FASTQ, to find the appropriate reader class. As a result, new modules and readers can be added to KAnalyze without modifying existing code.</p>
<p>The CLI is both user-friendly and easily integrated into scripted pipelines. When a program completes, it sends a code back to its caller. Zero is returned to the caller only when the program executes without error. Different types of errors have a specific return codes, and each is documented in the KAnalyze manual. Constants for each return code are defined in the API.</p>
</sec>
<sec id="SEC2.3">
<title>2.3 Count module algorithm</title>
<p>The KAnalyze count module counts k-mers in large and small datasets. It works efficiently for datasets too large to fit into memory. Counting takes place in two steps over two components. The split component writes sorted subsets of data to disk, and the merge component accumulates counts from each subset. Split and merge operations can be performed in multiple steps, which allows counting to take place in a distributed environment.</p>
<p>The split component reads k-mers into a memory array until it is full. The array is sorted using Java's Arrays.sort() method, which implements a dual-pivot quicksort algorithm. K-mers are counted by traversing the sorted array. Each k-mer and its count are written to disk, and the location of the output file is sent to the merge component. The memory array is then filled with the next set of k-mers, and a new file of k-mer counts is created. The process repeats until all k-mers have been written.</p>
<p>The merge component reads k-mers and their counts from each file and sums the counts for each k-mer. To avoid loading entire files at once, each file has a small buffer of k-mers. As the files are sorted by k-mer, this module efficiently accumulates k-mer counts and writes a sorted output file.</p>
<p>This modified external merge sort algorithm (
<xref rid="btu152-B6" ref-type="bibr">Knuth, 1998</xref>
) efficiently counts all k-mers with limited memory and sorts k-mers as they are processed.</p>
</sec>
</sec>
<sec>
<title>3 SOFTWARE TEST RESULTS</title>
<p>We tested KAnalyze performance and accuracy on two public datasets. We obtained human chromosome 1 (Chr1) from UCSC and a randomly chosen dataset from The 1000 Genomes Project Pilot Project 3 (gene region targeted), NA18580. The hg19 Chr1 sequence is a single fully assembled 249 Mb (megabase) sequence, and NA18580 is a set of 1.5 million sequence reads totaling 453 Mb. See
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary Section 3.6</ext-link>
for links to these datasets.</p>
<p>We tested KAnalyze 0.9.3, Jellyfish 1.1.10 (
<xref rid="btu152-B1" ref-type="bibr">Marçais and Kingsford, 2011</xref>
), DSK 1.5280 (
<xref rid="btu152-B4" ref-type="bibr">Rizk et al., 2013</xref>
) and a Perl pipeline we developed for verifying accuracy. These were the latest versions available when testing began. We used Jellyfish hash size 100000000 (10
<sup>8</sup>
) because it yielded the best performance results. Tests were run on a 12 core machine (2 × Intel Xeon E5-2620) with 32 GB of RAM (DDR3-1600), RAID-6 over SATA drives (3 GB/s, 72K RPM) and CentOS 6.4 (minimal install).</p>
<p>Each pipeline was run in triplicate over both datasets. The run time of each step of each pipeline was recorded with the Linux utility time. The reported time is the average (mean) over all three runs. See
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary Section 3</ext-link>
for individual run times.
<xref ref-type="fig" rid="btu152-F1">Figure 1</xref>
shows the final results.
<fig id="btu152-F1" position="float">
<label>Fig. 1.</label>
<caption>
<p>31-mer performance with KAnalyze count, Jellyfish, DSK and a Perl script implementation over two datasets, NA18580 (1000 Genomes) and Chr1 (hg19)</p>
</caption>
<graphic xlink:href="btu152f1p"></graphic>
</fig>
</p>
<p>Memory usage for the NA18580 dataset was determined by recording the maximum RSS (non-swapped physical memory used) in 0.1 s interval with the Linux command ps. For pipelines with multiple steps, we recorded the maximum memory usage of all steps. Actual memory usage was 1.58 GB (KAnalyze count), 2.18 GB (Jellyfish), 0.03 GB (DSK) and 1.96 GB (Perl pipeline). The memory test was done separately from the performance tests.</p>
<p>To test scalability, we obtained HG01889 from the Human Genome Project. This dataset contains 71.95 Gb over 988 million reads. For Jellyfish, we uncompressed the files, which took 1.06 h. In three attempts, we could not get Jellyfish to complete a run on this data in 24 h (see
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary Section 3.5</ext-link>
). In one attempt, we allowed Jellyfish to use 17 threads, which we determined to be optimal on NA18580. KAnalyze counted k-mers in 14.65 h using 2 GB of memory and default settings in one run. To see how KAnalyze scales in a high-performance setting, we allowed it to use more memory, additional threads and we read directly from the gzipped fastq files. In two tests, KAnalyze counted all k-mers in an average of 3.35 h with 26.01 GB of memory.</p>
<p>For each test, we produced a tab delimited file of k-mers and their counts sorted by k-mer. The KAnalyze sort module (Section 2.3) produces this format. The Perl pipeline produces this format using with Perl scripts with Linux utilities sort and uniq. Jellyfish results were converted from their FASTA representation to a tab-delimited file with a Perl script, and then sorted with Linux sort. DSK produces an unsorted tab-delimited file, which we sorted with Linux sort. The average time to convert and sort results was 901 s for Jellyfish and 507 s for DSK. This time is not shown in
<xref ref-type="fig" rid="btu152-F1">Figure 1</xref>
.</p>
<p>The SHA1 checksum on the sorted output files was recorded. For each dataset, KAnalyze produced results consistent with Jellyfish and the Perl pipeline. DSK k-mer counts did not agree with the other methods (See
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.oxfordjournals.org/lookup/suppl/doi:10.1093/bioinformatics/btu152/-/DC1">Supplementary Section 3.4</ext-link>
). We obtained the same results running KAnalyze on A Windows computer (Windows 7) and an Apple computer (OS 10.8.5).</p>
</sec>
<sec>
<title>4 CONCLUSION</title>
<p>KAnalyze offers an extensible API and a complete CLI for k-mer processing tools. These interfaces allow KAnalyze to be integrated directly into Java programs via the API, or into pipelines of any language via the CLI. For desktop users, a graphical interface is included for the count module.</p>
<p>With carefully chosen algorithms and data structures, KAnalyze can perform at a level commensurate with programs compiled to native code. Through extensive testing, we are confident that it produces accurate results.</p>
<p>KAnalyze is designed to survive years of maintenance and feature additions. The source is distributed under the GNU Lesser GPL to restrict its usage as little as possible. We encourage others to contribute to the KAnalyze project.</p>
<p>
<italic>Funding</italic>
:
<funding-source>Georgia Institute of Technology</funding-source>
provided financial support through a startup grant to the Vannberg Lab.</p>
<p>
<italic>Conflict of Interest</italic>
: none declared.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_30_14_2070__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="pdf" xlink:href="supp_btu152_KAnalyze_Supplementary.pdf"></media>
</supplementary-material>
</sec>
</body>
<back>
<ref-list>
<title>REFERENCES</title>
<ref id="btu152-B6">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Knuth</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>The Art of Computer Programming</article-title>
<source>Sorting and Searching</source>
<year>1998</year>
<volume>Vol. 3</volume>
<edition>2nd edn</edition>
<publisher-name>Addison-Wesley</publisher-name>
<fpage>248</fpage>
<lpage>379</lpage>
</element-citation>
</ref>
<ref id="btu152-B1">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marçais</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kingsford</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>A fast, lock-free approach for efficient parallel counting of occurrences of k-mers</article-title>
<source>Bioinformatics</source>
<year>2011</year>
<volume>27</volume>
<fpage>764</fpage>
<lpage>770</lpage>
<pub-id pub-id-type="pmid">21217122</pub-id>
</element-citation>
</ref>
<ref id="btu152-B2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Newburger</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Bulyk</surname>
<given-names>ML</given-names>
</name>
</person-group>
<article-title>UniPROBE: an online database of protein binding microarray data on protein-DNA interactions</article-title>
<source>Nucleic Acids Res.</source>
<year>2009</year>
<volume>37</volume>
<fpage>D77</fpage>
<lpage>D82</lpage>
<pub-id pub-id-type="pmid">18842628</pub-id>
</element-citation>
</ref>
<ref id="btu152-B3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nordström</surname>
<given-names>KJV</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Mutation identification by direct comparison of whole-genome sequencing data from mutant and wild-type individuals using k-mers</article-title>
<source>Nat. Biotechnol.</source>
<year>2013</year>
<volume>31</volume>
<fpage>325</fpage>
<lpage>330</lpage>
<pub-id pub-id-type="pmid">23475072</pub-id>
</element-citation>
</ref>
<ref id="btu152-B4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rizk</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>DSK: k-mer counting with very low memory usage</article-title>
<source>Bioinformatics</source>
<year>2013</year>
<volume>29</volume>
<fpage>652</fpage>
<lpage>653</lpage>
<pub-id pub-id-type="pmid">23325618</pub-id>
</element-citation>
</ref>
<ref id="btu152-B5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilson</surname>
<given-names>G</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Best practices for scientific computing</article-title>
<source>PLoS Biol.</source>
<year>2014</year>
<volume>12</volume>
<fpage>e1001745</fpage>
<pub-id pub-id-type="pmid">24415924</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B11 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000B11 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4080738
   |texte=   KAnalyze: a fast versatile pipelined K-mer toolkit
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24642064" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021