Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The khmer software package: enabling efficient nucleotide sequence analysis

Identifieur interne : 000C67 ( Pmc/Corpus ); précédent : 000C66; suivant : 000C68

The khmer software package: enabling efficient nucleotide sequence analysis

Auteurs : Michael R. Crusoe ; Hussien F. Alameldin ; Sherine Awad ; Elmar Boucher ; Adam Caldwell ; Reed Cartwright ; Amanda Charbonneau ; Bede Constantinides ; Greg Edvenson ; Scott Fay ; Jacob Fenton ; Thomas Fenzl ; Jordan Fish ; Leonor Garcia-Gutierrez ; Phillip Garland ; Jonathan Gluck ; Iván González ; Sarah Guermond ; Jiarong Guo ; Aditi Gupta ; Joshua R. Herr ; Adina Howe ; Alex Hyer ; Andreas H Rpfer ; Luiz Irber ; Rhys Kidd ; David Lin ; Justin Lippi ; Tamer Mansour ; Pamela Mca'Nulty ; Eric Mcdonald ; Jessica Mizzi ; Kevin D. Murray ; Joshua R. Nahum ; Kaben Nanlohy ; Alexander Johan Nederbragt ; Humberto Ortiz-Zuazaga ; Jeramia Ory ; Jason Pell ; Charles Pepe-Ranney ; Zachary N. Russ ; Erich Schwarz ; Camille Scott ; Josiah Seaman ; Scott Sievert ; Jared Simpson ; Connor T. Skennerton ; James Spencer ; Ramakrishnan Srinivasan ; Daniel Standage ; James A. Stapleton ; Susan R. Steinman ; Joe Stein ; Benjamin Taylor ; Will Trimble ; Heather L. Wiencko ; Michael Wright ; Brian Wyss ; Qingpeng Zhang ; En Zyme ; C. Titus Brown

Source :

RBID : PMC:4608353

Abstract

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at  https://github.com/dib-lab/khmer/.


Url:
DOI: 10.12688/f1000research.6924.1
PubMed: 26535114
PubMed Central: 4608353

Links to Exploration step

PMC:4608353

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The khmer software package: enabling efficient nucleotide sequence analysis</title>
<author>
<name sortKey="Crusoe, Michael R" sort="Crusoe, Michael R" uniqKey="Crusoe M" first="Michael R." last="Crusoe">Michael R. Crusoe</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alameldin, Hussien F" sort="Alameldin, Hussien F" uniqKey="Alameldin H" first="Hussien F." last="Alameldin">Hussien F. Alameldin</name>
<affiliation>
<nlm:aff id="a2">Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Awad, Sherine" sort="Awad, Sherine" uniqKey="Awad S" first="Sherine" last="Awad">Sherine Awad</name>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Elmar" sort="Boucher, Elmar" uniqKey="Boucher E" first="Elmar" last="Boucher">Elmar Boucher</name>
<affiliation>
<nlm:aff id="a4">Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Caldwell, Adam" sort="Caldwell, Adam" uniqKey="Caldwell A" first="Adam" last="Caldwell">Adam Caldwell</name>
<affiliation>
<nlm:aff id="a5">Biology Department, San Jose State University, San Jose, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cartwright, Reed" sort="Cartwright, Reed" uniqKey="Cartwright R" first="Reed" last="Cartwright">Reed Cartwright</name>
<affiliation>
<nlm:aff id="a6">School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, AZ, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Charbonneau, Amanda" sort="Charbonneau, Amanda" uniqKey="Charbonneau A" first="Amanda" last="Charbonneau">Amanda Charbonneau</name>
<affiliation>
<nlm:aff id="a7">Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Constantinides, Bede" sort="Constantinides, Bede" uniqKey="Constantinides B" first="Bede" last="Constantinides">Bede Constantinides</name>
<affiliation>
<nlm:aff id="a8">Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Edvenson, Greg" sort="Edvenson, Greg" uniqKey="Edvenson G" first="Greg" last="Edvenson">Greg Edvenson</name>
<affiliation>
<nlm:aff id="a9">Micron Technology, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fay, Scott" sort="Fay, Scott" uniqKey="Fay S" first="Scott" last="Fay">Scott Fay</name>
<affiliation>
<nlm:aff id="a10">Invitae, San Francisco, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fenton, Jacob" sort="Fenton, Jacob" uniqKey="Fenton J" first="Jacob" last="Fenton">Jacob Fenton</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fenzl, Thomas" sort="Fenzl, Thomas" uniqKey="Fenzl T" first="Thomas" last="Fenzl">Thomas Fenzl</name>
<affiliation>
<nlm:aff id="a12">Independent Researcher, Munich, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fish, Jordan" sort="Fish, Jordan" uniqKey="Fish J" first="Jordan" last="Fish">Jordan Fish</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garcia Gutierrez, Leonor" sort="Garcia Gutierrez, Leonor" uniqKey="Garcia Gutierrez L" first="Leonor" last="Garcia-Gutierrez">Leonor Garcia-Gutierrez</name>
<affiliation>
<nlm:aff id="a13">Mathematics Institute, University of Warwick, Warwick, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garland, Phillip" sort="Garland, Phillip" uniqKey="Garland P" first="Phillip" last="Garland">Phillip Garland</name>
<affiliation>
<nlm:aff id="a14">Eastlake Data, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gluck, Jonathan" sort="Gluck, Jonathan" uniqKey="Gluck J" first="Jonathan" last="Gluck">Jonathan Gluck</name>
<affiliation>
<nlm:aff id="a15">Graduate Program, University of Maryland, College Park, MD, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gonzalez, Ivan" sort="Gonzalez, Ivan" uniqKey="Gonzalez I" first="Iván" last="González">Iván González</name>
<affiliation>
<nlm:aff id="a16">Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guermond, Sarah" sort="Guermond, Sarah" uniqKey="Guermond S" first="Sarah" last="Guermond">Sarah Guermond</name>
<affiliation>
<nlm:aff id="a17">Independent Researcher, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guo, Jiarong" sort="Guo, Jiarong" uniqKey="Guo J" first="Jiarong" last="Guo">Jiarong Guo</name>
<affiliation>
<nlm:aff id="a18">Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gupta, Aditi" sort="Gupta, Aditi" uniqKey="Gupta A" first="Aditi" last="Gupta">Aditi Gupta</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Herr, Joshua R" sort="Herr, Joshua R" uniqKey="Herr J" first="Joshua R." last="Herr">Joshua R. Herr</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Howe, Adina" sort="Howe, Adina" uniqKey="Howe A" first="Adina" last="Howe">Adina Howe</name>
<affiliation>
<nlm:aff id="a19">Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hyer, Alex" sort="Hyer, Alex" uniqKey="Hyer A" first="Alex" last="Hyer">Alex Hyer</name>
<affiliation>
<nlm:aff id="a20">Department of Biology, University of Utah, Salt Lake City, UT, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="H Rpfer, Andreas" sort="H Rpfer, Andreas" uniqKey="H Rpfer A" first="Andreas" last="H Rpfer">Andreas H Rpfer</name>
<affiliation>
<nlm:aff id="a21">ConSol* Software GmbH, Munchen, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Irber, Luiz" sort="Irber, Luiz" uniqKey="Irber L" first="Luiz" last="Irber">Luiz Irber</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kidd, Rhys" sort="Kidd, Rhys" uniqKey="Kidd R" first="Rhys" last="Kidd">Rhys Kidd</name>
<affiliation>
<nlm:aff id="a22">Independent Researcher, Sydney, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lin, David" sort="Lin, David" uniqKey="Lin D" first="David" last="Lin">David Lin</name>
<affiliation>
<nlm:aff id="a23">Verdematics, Fremont, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lippi, Justin" sort="Lippi, Justin" uniqKey="Lippi J" first="Justin" last="Lippi">Justin Lippi</name>
<affiliation>
<nlm:aff id="a24">Independent Researcher, San Francisco, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mansour, Tamer" sort="Mansour, Tamer" uniqKey="Mansour T" first="Tamer" last="Mansour">Tamer Mansour</name>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a25">Clinical Pathology, Mansoura University, Mansoura, Egypt</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mca Nulty, Pamela" sort="Mca Nulty, Pamela" uniqKey="Mca Nulty P" first="Pamela" last="Mca'Nulty">Pamela Mca'Nulty</name>
<affiliation>
<nlm:aff id="a26">Addgene, Cambridge, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mcdonald, Eric" sort="Mcdonald, Eric" uniqKey="Mcdonald E" first="Eric" last="Mcdonald">Eric Mcdonald</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mizzi, Jessica" sort="Mizzi, Jessica" uniqKey="Mizzi J" first="Jessica" last="Mizzi">Jessica Mizzi</name>
<affiliation>
<nlm:aff id="a27">Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Murray, Kevin D" sort="Murray, Kevin D" uniqKey="Murray K" first="Kevin D." last="Murray">Kevin D. Murray</name>
<affiliation>
<nlm:aff id="a28">ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, ACT, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nahum, Joshua R" sort="Nahum, Joshua R" uniqKey="Nahum J" first="Joshua R." last="Nahum">Joshua R. Nahum</name>
<affiliation>
<nlm:aff id="a29">BEACON Center, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nanlohy, Kaben" sort="Nanlohy, Kaben" uniqKey="Nanlohy K" first="Kaben" last="Nanlohy">Kaben Nanlohy</name>
<affiliation>
<nlm:aff id="a30">Independent Researcher, New Orleans, LA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nederbragt, Alexander Johan" sort="Nederbragt, Alexander Johan" uniqKey="Nederbragt A" first="Alexander Johan" last="Nederbragt">Alexander Johan Nederbragt</name>
<affiliation>
<nlm:aff id="a31">Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ortiz Zuazaga, Humberto" sort="Ortiz Zuazaga, Humberto" uniqKey="Ortiz Zuazaga H" first="Humberto" last="Ortiz-Zuazaga">Humberto Ortiz-Zuazaga</name>
<affiliation>
<nlm:aff id="a32">Department of Computer Science, Rio Piedras Campus, University of Puerto Rico, San Juan, Puerto Rico</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ory, Jeramia" sort="Ory, Jeramia" uniqKey="Ory J" first="Jeramia" last="Ory">Jeramia Ory</name>
<affiliation>
<nlm:aff id="a33">Biochemistry, St. Louis College of Pharmacy, St. Louis, MO, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pell, Jason" sort="Pell, Jason" uniqKey="Pell J" first="Jason" last="Pell">Jason Pell</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pepe Ranney, Charles" sort="Pepe Ranney, Charles" uniqKey="Pepe Ranney C" first="Charles" last="Pepe-Ranney">Charles Pepe-Ranney</name>
<affiliation>
<nlm:aff id="a34">Crop and Soil Sciences, Cornell University, Ithaca, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Russ, Zachary N" sort="Russ, Zachary N" uniqKey="Russ Z" first="Zachary N." last="Russ">Zachary N. Russ</name>
<affiliation>
<nlm:aff id="a35">Department of Bioengineering, UC Berkeley, Berkeley, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schwarz, Erich" sort="Schwarz, Erich" uniqKey="Schwarz E" first="Erich" last="Schwarz">Erich Schwarz</name>
<affiliation>
<nlm:aff id="a36">Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Scott, Camille" sort="Scott, Camille" uniqKey="Scott C" first="Camille" last="Scott">Camille Scott</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seaman, Josiah" sort="Seaman, Josiah" uniqKey="Seaman J" first="Josiah" last="Seaman">Josiah Seaman</name>
<affiliation>
<nlm:aff id="a37">Data Visualization, Newline Technical Innovations, Windsor, CO, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sievert, Scott" sort="Sievert, Scott" uniqKey="Sievert S" first="Scott" last="Sievert">Scott Sievert</name>
<affiliation>
<nlm:aff id="a38">Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Simpson, Jared" sort="Simpson, Jared" uniqKey="Simpson J" first="Jared" last="Simpson">Jared Simpson</name>
<affiliation>
<nlm:aff id="a39">Ontario Institute for Cancer Research, Toronto, ON, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a40">Computer Science, University of Toronto, Toronto, ON, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Skennerton, Connor T" sort="Skennerton, Connor T" uniqKey="Skennerton C" first="Connor T." last="Skennerton">Connor T. Skennerton</name>
<affiliation>
<nlm:aff id="a41">Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Spencer, James" sort="Spencer, James" uniqKey="Spencer J" first="James" last="Spencer">James Spencer</name>
<affiliation>
<nlm:aff id="a42">Dept of Physics and Dept of Materials, Imperial College London, London, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Srinivasan, Ramakrishnan" sort="Srinivasan, Ramakrishnan" uniqKey="Srinivasan R" first="Ramakrishnan" last="Srinivasan">Ramakrishnan Srinivasan</name>
<affiliation>
<nlm:aff id="a43">Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Standage, Daniel" sort="Standage, Daniel" uniqKey="Standage D" first="Daniel" last="Standage">Daniel Standage</name>
<affiliation>
<nlm:aff id="a44">Department of Biology, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a45">Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stapleton, James A" sort="Stapleton, James A" uniqKey="Stapleton J" first="James A." last="Stapleton">James A. Stapleton</name>
<affiliation>
<nlm:aff id="a46">Chemical Engineering & Materials Science, Michigan State University, East Lansing, MIS, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Steinman, Susan R" sort="Steinman, Susan R" uniqKey="Steinman S" first="Susan R." last="Steinman">Susan R. Steinman</name>
<affiliation>
<nlm:aff id="a47">The New York Eye and Ear Infirmary of Mount Sinai, New York, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Joe" sort="Stein, Joe" uniqKey="Stein J" first="Joe" last="Stein">Joe Stein</name>
<affiliation>
<nlm:aff id="a48">Independent Researcher, Providence, RI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Benjamin" sort="Taylor, Benjamin" uniqKey="Taylor B" first="Benjamin" last="Taylor">Benjamin Taylor</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Trimble, Will" sort="Trimble, Will" uniqKey="Trimble W" first="Will" last="Trimble">Will Trimble</name>
<affiliation>
<nlm:aff id="a49">Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wiencko, Heather L" sort="Wiencko, Heather L" uniqKey="Wiencko H" first="Heather L." last="Wiencko">Heather L. Wiencko</name>
<affiliation>
<nlm:aff id="a50">Department of Genetics, Smurfit Institute, Trinity College Dublin, Dublin, Ireland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wright, Michael" sort="Wright, Michael" uniqKey="Wright M" first="Michael" last="Wright">Michael Wright</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wyss, Brian" sort="Wyss, Brian" uniqKey="Wyss B" first="Brian" last="Wyss">Brian Wyss</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Qingpeng" sort="Zhang, Qingpeng" uniqKey="Zhang Q" first="Qingpeng" last="Zhang">Qingpeng Zhang</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zyme, En" sort="Zyme, En" uniqKey="Zyme E" first="En" last="Zyme">En Zyme</name>
<affiliation>
<nlm:aff id="a51">Independent Researcher, Boston, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brown, C Titus" sort="Brown, C Titus" uniqKey="Brown C" first="C. Titus" last="Brown">C. Titus Brown</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">26535114</idno>
<idno type="pmc">4608353</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608353</idno>
<idno type="RBID">PMC:4608353</idno>
<idno type="doi">10.12688/f1000research.6924.1</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000C67</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000C67</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">The khmer software package: enabling efficient nucleotide sequence analysis</title>
<author>
<name sortKey="Crusoe, Michael R" sort="Crusoe, Michael R" uniqKey="Crusoe M" first="Michael R." last="Crusoe">Michael R. Crusoe</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Alameldin, Hussien F" sort="Alameldin, Hussien F" uniqKey="Alameldin H" first="Hussien F." last="Alameldin">Hussien F. Alameldin</name>
<affiliation>
<nlm:aff id="a2">Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Awad, Sherine" sort="Awad, Sherine" uniqKey="Awad S" first="Sherine" last="Awad">Sherine Awad</name>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boucher, Elmar" sort="Boucher, Elmar" uniqKey="Boucher E" first="Elmar" last="Boucher">Elmar Boucher</name>
<affiliation>
<nlm:aff id="a4">Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Caldwell, Adam" sort="Caldwell, Adam" uniqKey="Caldwell A" first="Adam" last="Caldwell">Adam Caldwell</name>
<affiliation>
<nlm:aff id="a5">Biology Department, San Jose State University, San Jose, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cartwright, Reed" sort="Cartwright, Reed" uniqKey="Cartwright R" first="Reed" last="Cartwright">Reed Cartwright</name>
<affiliation>
<nlm:aff id="a6">School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, AZ, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Charbonneau, Amanda" sort="Charbonneau, Amanda" uniqKey="Charbonneau A" first="Amanda" last="Charbonneau">Amanda Charbonneau</name>
<affiliation>
<nlm:aff id="a7">Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Constantinides, Bede" sort="Constantinides, Bede" uniqKey="Constantinides B" first="Bede" last="Constantinides">Bede Constantinides</name>
<affiliation>
<nlm:aff id="a8">Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Edvenson, Greg" sort="Edvenson, Greg" uniqKey="Edvenson G" first="Greg" last="Edvenson">Greg Edvenson</name>
<affiliation>
<nlm:aff id="a9">Micron Technology, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fay, Scott" sort="Fay, Scott" uniqKey="Fay S" first="Scott" last="Fay">Scott Fay</name>
<affiliation>
<nlm:aff id="a10">Invitae, San Francisco, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fenton, Jacob" sort="Fenton, Jacob" uniqKey="Fenton J" first="Jacob" last="Fenton">Jacob Fenton</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fenzl, Thomas" sort="Fenzl, Thomas" uniqKey="Fenzl T" first="Thomas" last="Fenzl">Thomas Fenzl</name>
<affiliation>
<nlm:aff id="a12">Independent Researcher, Munich, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fish, Jordan" sort="Fish, Jordan" uniqKey="Fish J" first="Jordan" last="Fish">Jordan Fish</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garcia Gutierrez, Leonor" sort="Garcia Gutierrez, Leonor" uniqKey="Garcia Gutierrez L" first="Leonor" last="Garcia-Gutierrez">Leonor Garcia-Gutierrez</name>
<affiliation>
<nlm:aff id="a13">Mathematics Institute, University of Warwick, Warwick, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garland, Phillip" sort="Garland, Phillip" uniqKey="Garland P" first="Phillip" last="Garland">Phillip Garland</name>
<affiliation>
<nlm:aff id="a14">Eastlake Data, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gluck, Jonathan" sort="Gluck, Jonathan" uniqKey="Gluck J" first="Jonathan" last="Gluck">Jonathan Gluck</name>
<affiliation>
<nlm:aff id="a15">Graduate Program, University of Maryland, College Park, MD, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gonzalez, Ivan" sort="Gonzalez, Ivan" uniqKey="Gonzalez I" first="Iván" last="González">Iván González</name>
<affiliation>
<nlm:aff id="a16">Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guermond, Sarah" sort="Guermond, Sarah" uniqKey="Guermond S" first="Sarah" last="Guermond">Sarah Guermond</name>
<affiliation>
<nlm:aff id="a17">Independent Researcher, Seattle, WA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Guo, Jiarong" sort="Guo, Jiarong" uniqKey="Guo J" first="Jiarong" last="Guo">Jiarong Guo</name>
<affiliation>
<nlm:aff id="a18">Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Gupta, Aditi" sort="Gupta, Aditi" uniqKey="Gupta A" first="Aditi" last="Gupta">Aditi Gupta</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Herr, Joshua R" sort="Herr, Joshua R" uniqKey="Herr J" first="Joshua R." last="Herr">Joshua R. Herr</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Howe, Adina" sort="Howe, Adina" uniqKey="Howe A" first="Adina" last="Howe">Adina Howe</name>
<affiliation>
<nlm:aff id="a19">Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hyer, Alex" sort="Hyer, Alex" uniqKey="Hyer A" first="Alex" last="Hyer">Alex Hyer</name>
<affiliation>
<nlm:aff id="a20">Department of Biology, University of Utah, Salt Lake City, UT, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="H Rpfer, Andreas" sort="H Rpfer, Andreas" uniqKey="H Rpfer A" first="Andreas" last="H Rpfer">Andreas H Rpfer</name>
<affiliation>
<nlm:aff id="a21">ConSol* Software GmbH, Munchen, Germany</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Irber, Luiz" sort="Irber, Luiz" uniqKey="Irber L" first="Luiz" last="Irber">Luiz Irber</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kidd, Rhys" sort="Kidd, Rhys" uniqKey="Kidd R" first="Rhys" last="Kidd">Rhys Kidd</name>
<affiliation>
<nlm:aff id="a22">Independent Researcher, Sydney, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lin, David" sort="Lin, David" uniqKey="Lin D" first="David" last="Lin">David Lin</name>
<affiliation>
<nlm:aff id="a23">Verdematics, Fremont, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lippi, Justin" sort="Lippi, Justin" uniqKey="Lippi J" first="Justin" last="Lippi">Justin Lippi</name>
<affiliation>
<nlm:aff id="a24">Independent Researcher, San Francisco, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mansour, Tamer" sort="Mansour, Tamer" uniqKey="Mansour T" first="Tamer" last="Mansour">Tamer Mansour</name>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a25">Clinical Pathology, Mansoura University, Mansoura, Egypt</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mca Nulty, Pamela" sort="Mca Nulty, Pamela" uniqKey="Mca Nulty P" first="Pamela" last="Mca'Nulty">Pamela Mca'Nulty</name>
<affiliation>
<nlm:aff id="a26">Addgene, Cambridge, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mcdonald, Eric" sort="Mcdonald, Eric" uniqKey="Mcdonald E" first="Eric" last="Mcdonald">Eric Mcdonald</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mizzi, Jessica" sort="Mizzi, Jessica" uniqKey="Mizzi J" first="Jessica" last="Mizzi">Jessica Mizzi</name>
<affiliation>
<nlm:aff id="a27">Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Murray, Kevin D" sort="Murray, Kevin D" uniqKey="Murray K" first="Kevin D." last="Murray">Kevin D. Murray</name>
<affiliation>
<nlm:aff id="a28">ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, ACT, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nahum, Joshua R" sort="Nahum, Joshua R" uniqKey="Nahum J" first="Joshua R." last="Nahum">Joshua R. Nahum</name>
<affiliation>
<nlm:aff id="a29">BEACON Center, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nanlohy, Kaben" sort="Nanlohy, Kaben" uniqKey="Nanlohy K" first="Kaben" last="Nanlohy">Kaben Nanlohy</name>
<affiliation>
<nlm:aff id="a30">Independent Researcher, New Orleans, LA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nederbragt, Alexander Johan" sort="Nederbragt, Alexander Johan" uniqKey="Nederbragt A" first="Alexander Johan" last="Nederbragt">Alexander Johan Nederbragt</name>
<affiliation>
<nlm:aff id="a31">Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ortiz Zuazaga, Humberto" sort="Ortiz Zuazaga, Humberto" uniqKey="Ortiz Zuazaga H" first="Humberto" last="Ortiz-Zuazaga">Humberto Ortiz-Zuazaga</name>
<affiliation>
<nlm:aff id="a32">Department of Computer Science, Rio Piedras Campus, University of Puerto Rico, San Juan, Puerto Rico</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ory, Jeramia" sort="Ory, Jeramia" uniqKey="Ory J" first="Jeramia" last="Ory">Jeramia Ory</name>
<affiliation>
<nlm:aff id="a33">Biochemistry, St. Louis College of Pharmacy, St. Louis, MO, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pell, Jason" sort="Pell, Jason" uniqKey="Pell J" first="Jason" last="Pell">Jason Pell</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Pepe Ranney, Charles" sort="Pepe Ranney, Charles" uniqKey="Pepe Ranney C" first="Charles" last="Pepe-Ranney">Charles Pepe-Ranney</name>
<affiliation>
<nlm:aff id="a34">Crop and Soil Sciences, Cornell University, Ithaca, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Russ, Zachary N" sort="Russ, Zachary N" uniqKey="Russ Z" first="Zachary N." last="Russ">Zachary N. Russ</name>
<affiliation>
<nlm:aff id="a35">Department of Bioengineering, UC Berkeley, Berkeley, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Schwarz, Erich" sort="Schwarz, Erich" uniqKey="Schwarz E" first="Erich" last="Schwarz">Erich Schwarz</name>
<affiliation>
<nlm:aff id="a36">Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Scott, Camille" sort="Scott, Camille" uniqKey="Scott C" first="Camille" last="Scott">Camille Scott</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Seaman, Josiah" sort="Seaman, Josiah" uniqKey="Seaman J" first="Josiah" last="Seaman">Josiah Seaman</name>
<affiliation>
<nlm:aff id="a37">Data Visualization, Newline Technical Innovations, Windsor, CO, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sievert, Scott" sort="Sievert, Scott" uniqKey="Sievert S" first="Scott" last="Sievert">Scott Sievert</name>
<affiliation>
<nlm:aff id="a38">Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Simpson, Jared" sort="Simpson, Jared" uniqKey="Simpson J" first="Jared" last="Simpson">Jared Simpson</name>
<affiliation>
<nlm:aff id="a39">Ontario Institute for Cancer Research, Toronto, ON, Canada</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a40">Computer Science, University of Toronto, Toronto, ON, Canada</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Skennerton, Connor T" sort="Skennerton, Connor T" uniqKey="Skennerton C" first="Connor T." last="Skennerton">Connor T. Skennerton</name>
<affiliation>
<nlm:aff id="a41">Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Spencer, James" sort="Spencer, James" uniqKey="Spencer J" first="James" last="Spencer">James Spencer</name>
<affiliation>
<nlm:aff id="a42">Dept of Physics and Dept of Materials, Imperial College London, London, UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Srinivasan, Ramakrishnan" sort="Srinivasan, Ramakrishnan" uniqKey="Srinivasan R" first="Ramakrishnan" last="Srinivasan">Ramakrishnan Srinivasan</name>
<affiliation>
<nlm:aff id="a43">Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Standage, Daniel" sort="Standage, Daniel" uniqKey="Standage D" first="Daniel" last="Standage">Daniel Standage</name>
<affiliation>
<nlm:aff id="a44">Department of Biology, Indiana University, Bloomington, IN, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a45">Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stapleton, James A" sort="Stapleton, James A" uniqKey="Stapleton J" first="James A." last="Stapleton">James A. Stapleton</name>
<affiliation>
<nlm:aff id="a46">Chemical Engineering & Materials Science, Michigan State University, East Lansing, MIS, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Steinman, Susan R" sort="Steinman, Susan R" uniqKey="Steinman S" first="Susan R." last="Steinman">Susan R. Steinman</name>
<affiliation>
<nlm:aff id="a47">The New York Eye and Ear Infirmary of Mount Sinai, New York, NY, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Stein, Joe" sort="Stein, Joe" uniqKey="Stein J" first="Joe" last="Stein">Joe Stein</name>
<affiliation>
<nlm:aff id="a48">Independent Researcher, Providence, RI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Taylor, Benjamin" sort="Taylor, Benjamin" uniqKey="Taylor B" first="Benjamin" last="Taylor">Benjamin Taylor</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Trimble, Will" sort="Trimble, Will" uniqKey="Trimble W" first="Will" last="Trimble">Will Trimble</name>
<affiliation>
<nlm:aff id="a49">Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wiencko, Heather L" sort="Wiencko, Heather L" uniqKey="Wiencko H" first="Heather L." last="Wiencko">Heather L. Wiencko</name>
<affiliation>
<nlm:aff id="a50">Department of Genetics, Smurfit Institute, Trinity College Dublin, Dublin, Ireland</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wright, Michael" sort="Wright, Michael" uniqKey="Wright M" first="Michael" last="Wright">Michael Wright</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wyss, Brian" sort="Wyss, Brian" uniqKey="Wyss B" first="Brian" last="Wyss">Brian Wyss</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Qingpeng" sort="Zhang, Qingpeng" uniqKey="Zhang Q" first="Qingpeng" last="Zhang">Qingpeng Zhang</name>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zyme, En" sort="Zyme, En" uniqKey="Zyme E" first="En" last="Zyme">En Zyme</name>
<affiliation>
<nlm:aff id="a51">Independent Researcher, Boston, MA, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Brown, C Titus" sort="Brown, C Titus" uniqKey="Brown C" first="C. Titus" last="Brown">C. Titus Brown</name>
<affiliation>
<nlm:aff id="a1">Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a3">Population Health and Reproduction, University of California, Davis, Davis, CA, USA</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="a11">Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">F1000Research</title>
<idno type="eISSN">2046-1402</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at 
<ext-link ext-link-type="uri" xlink:href="https://github.com/dib-lab/khmer/">https://github.com/dib-lab/khmer/</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J Pell</name>
</author>
<author>
<name sortKey="Canino Koning, R" uniqKey="Canino Koning R">R Canino-Koning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J Pell</name>
</author>
<author>
<name sortKey="Hintze, A" uniqKey="Hintze A">A Hintze</name>
</author>
<author>
<name sortKey="Canino Koning, R" uniqKey="Canino Koning R">R Canino-Koning</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
<author>
<name sortKey="Howe, A" uniqKey="Howe A">A Howe</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q Zhang</name>
</author>
<author>
<name sortKey="Awad, S" uniqKey="Awad S">S Awad</name>
</author>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Doring, A" uniqKey="Doring A">A Döring</name>
</author>
<author>
<name sortKey="Weese, D" uniqKey="Weese D">D Weese</name>
</author>
<author>
<name sortKey="Rausch, T" uniqKey="Rausch T">T Rausch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crusoe, Mr" uniqKey="Crusoe M">MR Crusoe</name>
</author>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
<author>
<name sortKey="Crusoe, Mr" uniqKey="Crusoe M">MR Crusoe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lowe, Ek" uniqKey="Lowe E">EK Lowe</name>
</author>
<author>
<name sortKey="Swalla, Bj" uniqKey="Swalla B">BJ Swalla</name>
</author>
<author>
<name sortKey="Brown, Ct" uniqKey="Brown C">CT Brown</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Preston Werner, T" uniqKey="Preston Werner T">T Preston-Werner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hcm" uniqKey="Leung H">HCM Leung</name>
</author>
<author>
<name sortKey="Yiu, Sm" uniqKey="Yiu S">SM Yiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author>
<name sortKey="Papanicolaou, A" uniqKey="Papanicolaou A">A Papanicolaou</name>
</author>
<author>
<name sortKey="Yassour, M" uniqKey="Yassour M">M Yassour</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
<author>
<name sortKey="Nurk, S" uniqKey="Nurk S">S Nurk</name>
</author>
<author>
<name sortKey="Antipov, D" uniqKey="Antipov D">D Antipov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flajolet, P" uniqKey="Flajolet P">P Flajolet</name>
</author>
<author>
<name sortKey="Fusy, E" uniqKey="Fusy E">E Fusy</name>
</author>
<author>
<name sortKey="Gandouet, O" uniqKey="Gandouet O">O Gandouet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Howe, Ac" uniqKey="Howe A">AC Howe</name>
</author>
<author>
<name sortKey="Jansson, Jk" uniqKey="Jansson J">JK Jansson</name>
</author>
<author>
<name sortKey="Malfatti, Sa" uniqKey="Malfatti S">SA Malfatti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crusoe, Mr" uniqKey="Crusoe M">MR Crusoe</name>
</author>
<author>
<name sortKey="Alameldin, Hf" uniqKey="Alameldin H">HF Alameldin</name>
</author>
<author>
<name sortKey="Awad, S" uniqKey="Awad S">S Awad</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">F1000Res</journal-id>
<journal-id journal-id-type="iso-abbrev">F1000Res</journal-id>
<journal-id journal-id-type="pmc">F1000Research</journal-id>
<journal-title-group>
<journal-title>F1000Research</journal-title>
</journal-title-group>
<issn pub-type="epub">2046-1402</issn>
<publisher>
<publisher-name>F1000Research</publisher-name>
<publisher-loc>London, UK</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">26535114</article-id>
<article-id pub-id-type="pmc">4608353</article-id>
<article-id pub-id-type="doi">10.12688/f1000research.6924.1</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Software Tool Article</subject>
</subj-group>
<subj-group>
<subject>Articles</subject>
<subj-group>
<subject>Bioinformatics</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>The khmer software package: enabling efficient nucleotide sequence analysis</article-title>
<fn-group content-type="pub-status">
<fn>
<p>[version 1; referees: 2 approved</p>
</fn>
</fn-group>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Crusoe</surname>
<given-names>Michael R.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Alameldin</surname>
<given-names>Hussien F.</given-names>
</name>
<xref ref-type="aff" rid="a2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Awad</surname>
<given-names>Sherine</given-names>
</name>
<xref ref-type="aff" rid="a3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Boucher</surname>
<given-names>Elmar</given-names>
</name>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Caldwell</surname>
<given-names>Adam</given-names>
</name>
<xref ref-type="aff" rid="a5">5</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cartwright</surname>
<given-names>Reed</given-names>
</name>
<xref ref-type="aff" rid="a6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Charbonneau</surname>
<given-names>Amanda</given-names>
</name>
<xref ref-type="aff" rid="a7">7</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Constantinides</surname>
<given-names>Bede</given-names>
</name>
<xref ref-type="aff" rid="a8">8</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Edvenson</surname>
<given-names>Greg</given-names>
</name>
<xref ref-type="aff" rid="a9">9</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fay</surname>
<given-names>Scott</given-names>
</name>
<xref ref-type="aff" rid="a10">10</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fenton</surname>
<given-names>Jacob</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fenzl</surname>
<given-names>Thomas</given-names>
</name>
<xref ref-type="aff" rid="a12">12</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Fish</surname>
<given-names>Jordan</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Garcia-Gutierrez</surname>
<given-names>Leonor</given-names>
</name>
<xref ref-type="aff" rid="a13">13</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Garland</surname>
<given-names>Phillip</given-names>
</name>
<xref ref-type="aff" rid="a14">14</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gluck</surname>
<given-names>Jonathan</given-names>
</name>
<xref ref-type="aff" rid="a15">15</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>González</surname>
<given-names>Iván</given-names>
</name>
<xref ref-type="aff" rid="a16">16</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guermond</surname>
<given-names>Sarah</given-names>
</name>
<xref ref-type="aff" rid="a17">17</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Guo</surname>
<given-names>Jiarong</given-names>
</name>
<xref ref-type="aff" rid="a18">18</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Gupta</surname>
<given-names>Aditi</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Herr</surname>
<given-names>Joshua R.</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Howe</surname>
<given-names>Adina</given-names>
</name>
<xref ref-type="aff" rid="a19">19</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hyer</surname>
<given-names>Alex</given-names>
</name>
<xref ref-type="aff" rid="a20">20</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Härpfer</surname>
<given-names>Andreas</given-names>
</name>
<xref ref-type="aff" rid="a21">21</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Irber</surname>
<given-names>Luiz</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kidd</surname>
<given-names>Rhys</given-names>
</name>
<xref ref-type="aff" rid="a22">22</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lin</surname>
<given-names>David</given-names>
</name>
<xref ref-type="aff" rid="a23">23</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lippi</surname>
<given-names>Justin</given-names>
</name>
<xref ref-type="aff" rid="a24">24</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mansour</surname>
<given-names>Tamer</given-names>
</name>
<xref ref-type="aff" rid="a3">3</xref>
<xref ref-type="aff" rid="a25">25</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>McA'Nulty</surname>
<given-names>Pamela</given-names>
</name>
<xref ref-type="aff" rid="a26">26</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>McDonald</surname>
<given-names>Eric</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mizzi</surname>
<given-names>Jessica</given-names>
</name>
<xref ref-type="aff" rid="a27">27</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Murray</surname>
<given-names>Kevin D.</given-names>
</name>
<xref ref-type="aff" rid="a28">28</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nahum</surname>
<given-names>Joshua R.</given-names>
</name>
<xref ref-type="aff" rid="a29">29</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nanlohy</surname>
<given-names>Kaben</given-names>
</name>
<xref ref-type="aff" rid="a30">30</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nederbragt</surname>
<given-names>Alexander Johan</given-names>
</name>
<xref ref-type="aff" rid="a31">31</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ortiz-Zuazaga</surname>
<given-names>Humberto</given-names>
</name>
<xref ref-type="aff" rid="a32">32</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ory</surname>
<given-names>Jeramia</given-names>
</name>
<xref ref-type="aff" rid="a33">33</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pell</surname>
<given-names>Jason</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pepe-Ranney</surname>
<given-names>Charles</given-names>
</name>
<xref ref-type="aff" rid="a34">34</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Russ</surname>
<given-names>Zachary N.</given-names>
</name>
<xref ref-type="aff" rid="a35">35</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Schwarz</surname>
<given-names>Erich</given-names>
</name>
<xref ref-type="aff" rid="a36">36</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Scott</surname>
<given-names>Camille</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Seaman</surname>
<given-names>Josiah</given-names>
</name>
<xref ref-type="aff" rid="a37">37</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sievert</surname>
<given-names>Scott</given-names>
</name>
<xref ref-type="aff" rid="a38">38</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Simpson</surname>
<given-names>Jared</given-names>
</name>
<xref ref-type="aff" rid="a39">39</xref>
<xref ref-type="aff" rid="a40">40</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Skennerton</surname>
<given-names>Connor T.</given-names>
</name>
<xref ref-type="aff" rid="a41">41</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Spencer</surname>
<given-names>James</given-names>
</name>
<xref ref-type="aff" rid="a42">42</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Srinivasan</surname>
<given-names>Ramakrishnan</given-names>
</name>
<xref ref-type="aff" rid="a43">43</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Standage</surname>
<given-names>Daniel</given-names>
</name>
<xref ref-type="aff" rid="a44">44</xref>
<xref ref-type="aff" rid="a45">45</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Stapleton</surname>
<given-names>James A.</given-names>
</name>
<xref ref-type="aff" rid="a46">46</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Steinman</surname>
<given-names>Susan R.</given-names>
</name>
<xref ref-type="aff" rid="a47">47</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Stein</surname>
<given-names>Joe</given-names>
</name>
<xref ref-type="aff" rid="a48">48</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Taylor</surname>
<given-names>Benjamin</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Trimble</surname>
<given-names>Will</given-names>
</name>
<xref ref-type="aff" rid="a49">49</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wiencko</surname>
<given-names>Heather L.</given-names>
</name>
<xref ref-type="aff" rid="a50">50</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wright</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wyss</surname>
<given-names>Brian</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Qingpeng</given-names>
</name>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>zyme</surname>
<given-names>en</given-names>
</name>
<xref ref-type="aff" rid="a51">51</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Brown</surname>
<given-names>C. Titus</given-names>
</name>
<xref ref-type="corresp" rid="c1">a</xref>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a3">3</xref>
<xref ref-type="aff" rid="a11">11</xref>
</contrib>
<aff id="a1">
<label>1</label>
Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a2">
<label>2</label>
Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a3">
<label>3</label>
Population Health and Reproduction, University of California, Davis, Davis, CA, USA</aff>
<aff id="a4">
<label>4</label>
Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA</aff>
<aff id="a5">
<label>5</label>
Biology Department, San Jose State University, San Jose, CA, USA</aff>
<aff id="a6">
<label>6</label>
School of Life Sciences and The Biodesign Institute, Arizona State University, Tempe, AZ, USA</aff>
<aff id="a7">
<label>7</label>
Genetics, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a8">
<label>8</label>
Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, UK</aff>
<aff id="a9">
<label>9</label>
Micron Technology, Seattle, WA, USA</aff>
<aff id="a10">
<label>10</label>
Invitae, San Francisco, CA, USA</aff>
<aff id="a11">
<label>11</label>
Computer Science and Engineering, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a12">
<label>12</label>
Independent Researcher, Munich, Germany</aff>
<aff id="a13">
<label>13</label>
Mathematics Institute, University of Warwick, Warwick, UK</aff>
<aff id="a14">
<label>14</label>
Eastlake Data, Seattle, WA, USA</aff>
<aff id="a15">
<label>15</label>
Graduate Program, University of Maryland, College Park, MD, USA</aff>
<aff id="a16">
<label>16</label>
Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA</aff>
<aff id="a17">
<label>17</label>
Independent Researcher, Seattle, WA, USA</aff>
<aff id="a18">
<label>18</label>
Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a19">
<label>19</label>
Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, IA, USA</aff>
<aff id="a20">
<label>20</label>
Department of Biology, University of Utah, Salt Lake City, UT, USA</aff>
<aff id="a21">
<label>21</label>
ConSol* Software GmbH, Munchen, Germany</aff>
<aff id="a22">
<label>22</label>
Independent Researcher, Sydney, Australia</aff>
<aff id="a23">
<label>23</label>
Verdematics, Fremont, CA, USA</aff>
<aff id="a24">
<label>24</label>
Independent Researcher, San Francisco, CA, USA</aff>
<aff id="a25">
<label>25</label>
Clinical Pathology, Mansoura University, Mansoura, Egypt</aff>
<aff id="a26">
<label>26</label>
Addgene, Cambridge, MA, USA</aff>
<aff id="a27">
<label>27</label>
Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a28">
<label>28</label>
ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, ACT, Australia</aff>
<aff id="a29">
<label>29</label>
BEACON Center, Michigan State University, East Lansing, MI, USA</aff>
<aff id="a30">
<label>30</label>
Independent Researcher, New Orleans, LA, USA</aff>
<aff id="a31">
<label>31</label>
Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University of Oslo, Oslo, Norway</aff>
<aff id="a32">
<label>32</label>
Department of Computer Science, Rio Piedras Campus, University of Puerto Rico, San Juan, Puerto Rico</aff>
<aff id="a33">
<label>33</label>
Biochemistry, St. Louis College of Pharmacy, St. Louis, MO, USA</aff>
<aff id="a34">
<label>34</label>
Crop and Soil Sciences, Cornell University, Ithaca, NY, USA</aff>
<aff id="a35">
<label>35</label>
Department of Bioengineering, UC Berkeley, Berkeley, CA, USA</aff>
<aff id="a36">
<label>36</label>
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA</aff>
<aff id="a37">
<label>37</label>
Data Visualization, Newline Technical Innovations, Windsor, CO, USA</aff>
<aff id="a38">
<label>38</label>
Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA</aff>
<aff id="a39">
<label>39</label>
Ontario Institute for Cancer Research, Toronto, ON, Canada</aff>
<aff id="a40">
<label>40</label>
Computer Science, University of Toronto, Toronto, ON, Canada</aff>
<aff id="a41">
<label>41</label>
Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA</aff>
<aff id="a42">
<label>42</label>
Dept of Physics and Dept of Materials, Imperial College London, London, UK</aff>
<aff id="a43">
<label>43</label>
Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA</aff>
<aff id="a44">
<label>44</label>
Department of Biology, Indiana University, Bloomington, IN, USA</aff>
<aff id="a45">
<label>45</label>
Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA, USA</aff>
<aff id="a46">
<label>46</label>
Chemical Engineering & Materials Science, Michigan State University, East Lansing, MIS, USA</aff>
<aff id="a47">
<label>47</label>
The New York Eye and Ear Infirmary of Mount Sinai, New York, NY, USA</aff>
<aff id="a48">
<label>48</label>
Independent Researcher, Providence, RI, USA</aff>
<aff id="a49">
<label>49</label>
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA</aff>
<aff id="a50">
<label>50</label>
Department of Genetics, Smurfit Institute, Trinity College Dublin, Dublin, Ireland</aff>
<aff id="a51">
<label>51</label>
Independent Researcher, Boston, MA, USA</aff>
</contrib-group>
<author-notes>
<corresp id="c1">
<label>a</label>
<email xlink:href="mailto:titus@idyll.org">titus@idyll.org</email>
</corresp>
<fn fn-type="con">
<p>CTB is the primary investigator for the khmer software package. MRC is the lead software developer from July 2013 onwards. Many significant components of khmer have their own paper describing them (see “Use Cases”, above). The remaining authors each have one or more Git commits in their name.</p>
</fn>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>25</day>
<month>9</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>4</volume>
<elocation-id>900</elocation-id>
<history>
<date date-type="accepted">
<day>25</day>
<month>9</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: © 2015 Crusoe MR et al.</copyright-statement>
<copyright-year>2015</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="f1000research-4-7456.pdf"></self-uri>
<abstract>
<p>The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at 
<ext-link ext-link-type="uri" xlink:href="https://github.com/dib-lab/khmer/">https://github.com/dib-lab/khmer/</ext-link>
.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>bioinformatics</kwd>
<kwd>dna sequencing analysis</kwd>
<kwd>k-mer</kwd>
<kwd>kmer</kwd>
<kwd>khmer</kwd>
<kwd>online</kwd>
<kwd>low-memory</kwd>
<kwd>streaming</kwd>
</kwd-group>
<funding-group>
<award-group id="fund-1">
<funding-source>USDA NIFA</funding-source>
<award-id>2010-65205-20361</award-id>
</award-group>
<award-group id="fund-2">
<funding-source>National Institutes of Health</funding-source>
<award-id>R01HG007513</award-id>
</award-group>
<award-group id="fund-3">
<funding-source>Gordon and Betty Moore Foundation</funding-source>
<award-id>GBMF4551</award-id>
</award-group>
<funding-statement>khmer development has largely been supported by AFRI Competitive Grant no. 2010-65205-20361 from the USDA NIFA, and is now funded by the National Human Genome Research Institute of the National Institutes of Health under Award Number R01HG007513, as well as by the the Gordon and Betty Moore Foundation under Award number GBMF4551, all to CTB.</funding-statement>
<funding-statement>
<italic>I confirm that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>DNA words of a fixed-length k, or “k-mers”, are a common abstraction in DNA sequence analysis that enable alignment-free sequence analysis and comparison. With the advent of second-generation sequencing and the widespread adoption of De Bruijn graph-based assemblers, k-mers have become even more widely used in recent years. However, the dramatically increased rate of sequence data generation from Illumina sequencers continues to challenge the basic data structures and algorithms for k-mer storage and manipulation. This has led to the development of a wide range of data structures and algorithms that explore possible improvements to k-mer-based approaches.</p>
<p>Here we present version 2.0 of the khmer software package, a high-performance library implementing memory- and time-efficient algorithms for the manipulation and analysis of short-read data sets. khmer contains reference implementations of several approaches, including a probabilistic k-mer counter based on the CountMin Sketch
<sup>
<xref rid="ref-1" ref-type="bibr">1</xref>
</sup>
, a compressible De Bruijn graph representation built on top of Bloom filters
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
, a streaming lossy compression approach for short-read data sets termed “digital normalization”
<sup>
<xref rid="ref-3" ref-type="bibr">3</xref>
</sup>
, and a generalized semi-streaming approach for k-mer spectral analysis of variable-coverage shotgun sequencing data sets
<sup>
<xref rid="ref-4" ref-type="bibr">4</xref>
</sup>
.</p>
<p>khmer is both research software and a software product for users: it has been used in the development of novel data structures and algorithms, and it is also immediately useful for certain kinds of data analysis (discussed below). We continue to develop research extensions while maintaining existing functionality.</p>
<p>The khmer software consists of a core library implemented in C++, a CPython library wrapper implemented in C, and a set of Python “driver” scripts that make use of the library to perform various sequence analysis tasks. The software is currently developed on GitHub under
<ext-link ext-link-type="uri" xlink:href="https://github.com/dib-lab/khmer">https://github.com/dib-lab/khmer</ext-link>
, and it is released under the BSD License. There is greater than 87% statement coverage under automated tests, measured on both C++ and Python code but primarily executed at the Python level.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>Implementation</title>
<p>The core data k-mer counting data structures and graph traversal code are implemented in C++, and then wrapped for Python in hand-written C code, for a total of 10.5k lines of C/C++ code. The command-line API and all of the tests are written in 13.7k lines of Python code. C++ FASTQ and FASTA parsers came from the SeqAn library
<sup>
<xref rid="ref-5" ref-type="bibr">5</xref>
</sup>
.</p>
<p>Documentation is written in reStructuredText, compiled with Sphinx, and hosted on ReadTheDocs.org.</p>
<p>We develop khmer on github.com as a community open source project focused on sustainable software development
<sup>
<xref rid="ref-6" ref-type="bibr">6</xref>
</sup>
, and encourage contributions of any kind. As an outcome of several community events, we have comprehensive documentation on contributing to khmer at
<ext-link ext-link-type="uri" xlink:href="https://khmer.readthedocs.org/en/latest/dev/">https://khmer.readthedocs.org/en/latest/dev/</ext-link>
<sup>
<xref rid="ref-7" ref-type="bibr">7</xref>
</sup>
. Most development decisions are discussed and documented publicly as they happen.</p>
</sec>
<sec>
<title>Operation</title>
<p>khmer is primarily developed on Linux for Python 2.7 and 64-bit processors, and several core developers use Mac OS X. The project is tested regularly using the Jenkins continuous integration system running on Ubuntu 14.04 LTS and Mac OS X 10.10; the current development branch is also tested under Python 3.3, 3.4, and 3.5. Releases are tested against many Linux distributions, including RedHat Enterprise Linux, Debian, Fedora, and Ubuntu. khmer should work on most UNIX derivatives with little modification. Windows is explicitly not supported.</p>
<p>Memory requirements for using khmer vary with the complexity of data and are user configurable. Several core data structures can trade memory for false positives, and we have explored these details in several papers, most notably Pell
<italic>et al.</italic>
2012
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
and Zhang
<italic>et al.</italic>
2014
<sup>
<xref rid="ref-1" ref-type="bibr">1</xref>
</sup>
. For example, most single organism mRNAseq data sets can be processed in under 16 GB of RAM
<sup>
<xref rid="ref-3" ref-type="bibr">3</xref>
,
<xref rid="ref-8" ref-type="bibr">8</xref>
</sup>
, while memory requirements for metagenome data sets may vary from dozens of gigabytes to terabytes of RAM.</p>
<p>The user interface for khmer is via the command line. The command line interface consists of approximately 25 Python scripts; they are documented at
<ext-link ext-link-type="uri" xlink:href="http://khmer.readthedocs.org/">http://khmer.readthedocs.org/</ext-link>
under
<ext-link ext-link-type="uri" xlink:href="http://khmer.readthedocs.org/en/v2.0/user/scripts.html">User Documentation</ext-link>
. Changes to the interface are managed with semantic versioning
<sup>
<xref rid="ref-9" ref-type="bibr">9</xref>
</sup>
which guarantees command line compatibility between releases with the same major version.</p>
<p>khmer also has an unstable developer interface via its Python and C++ libraries, on which the command line scripts are built.</p>
</sec>
</sec>
<sec>
<title>Use cases</title>
<p>khmer has several complementary feature sets, all centered on short-read manipulation and filtering. The most common use of khmer is for preprocessing short read Illumina data sets prior to
<italic>de novo</italic>
sequence assembly, with the goals of decreasing compute requirements for the assembler as well as potentially improving the assembly results.</p>
<sec>
<title>Prefiltering sequence data for
<italic>de novo</italic>
assembly with digital normalization</title>
<p>We provide an implementation of a novel streaming “lossy compression” algorithm in khmer that performs abundance normalization of shotgun sequence data. This “digital normalization” algorithm eliminates redundant short reads while retaining sufficient information to generate a contig assembly
<sup>
<xref rid="ref-3" ref-type="bibr">3</xref>
</sup>
. The algorithm takes advantage of the online k-mer counting functionality in khmer to estimate per-read coverage as reads are examined; reads can then be accepted as novel or rejected as redundant. This is a form of error reduction, because the net effect is to decrease not only the total number of reads considered for assembly, but also the total number of errors considered by the assembler. Digital normalization results in a decrease of the amount of memory needed for
<italic>de novo</italic>
assembly of high-coverage data sets with little to no change in the assembled contigs.</p>
<p>Digital normalization is implemented in the script
<monospace>normalize-by-median.py</monospace>
. This script takes as input a list of FASTA or FASTQ files, which it then filters by abundance as described above; see
<xref rid="ref-3" ref-type="bibr">3</xref>
for details. The output of the digital normalization script is a downsampled set of reads, with no modifications to the individual reads. The three key parameters for the script are the k-mer size, the desired coverage level, and the amount of memory to be used for k-mer counting. The interaction between these three parameters and the filtering process is complex and depends on the data set being processed, but higher coverage levels and longer k-mer sizes result in less data being removed. Lower memory allocation increases the rate at which reads are removed due to erroneous estimates of their abundance, but this process is very robust in practice
<sup>
<xref rid="ref-1" ref-type="bibr">1</xref>
</sup>
.</p>
<p>The output of
<monospace>normalize-by-median.py</monospace>
can be assembled using a
<italic>de novo</italic>
assembler such as Velvet
<sup>
<xref rid="ref-10" ref-type="bibr">10</xref>
</sup>
, IDBA
<sup>
<xref rid="ref-11" ref-type="bibr">11</xref>
</sup>
, Trinity
<sup>
<xref rid="ref-12" ref-type="bibr">12</xref>
</sup>
or SPAdes
<sup>
<xref rid="ref-13" ref-type="bibr">13</xref>
</sup>
.</p>
</sec>
<sec>
<title>K-mer counting and read trimming</title>
<p>Using a memory-efficient CountMin Sketch data structure, khmer provides an interface for online counting of k-mers in streams of reads. The basic functionality includes calculating the k-mer frequency spectrum in sequence data sets and trimming reads at low-abundance k-mers. This functionality is explored and benchmarked in
<sup>
<xref rid="ref-1" ref-type="bibr">1</xref>
</sup>
.</p>
<p>Basic read trimming is performed by the script
<monospace>filter-abund.py</monospace>
, which takes as arguments a k-mer countgraph (created by khmer’s
<monospace>load-into-counting.py</monospace>
script) and one or more sequence data files. The script examines each sequence to find k-mers below the given abundance cutoff, and truncates the sequence at the first such k-mer. This truncates reads at the location of substitution errors produced by the sequencing process. When processing sequences from variable coverage data sets,
<monospace>filter-abund.py</monospace>
can also be configured to ignore reads that have low estimated abundance.</p>
<p>K-mer abundance distributions can be calculated using the script
<monospace>abundance-dist.py</monospace>
, which takes as arguments a k-mer countgraph, a sequence data file, and an output filename. This script determines the abundance of each distinct k-mer in the data file according to the k-mer countgraph, and summarizes the abundances in a histogram output.</p>
<p>We recently extended digital normalization to provide a generalized semi-streaming approach for k-mer spectral analysis
<sup>
<xref rid="ref-4" ref-type="bibr">4</xref>
</sup>
. Here, we examine read coverage on a per-locus basis in the De Bruijn graph and, once a particular locus has sufficient coverage, call errors or trim bases for all following reads belonging to that graph locus. The approach is “semi-streaming”
<sup>
<xref rid="ref-4" ref-type="bibr">4</xref>
</sup>
because some reads must be examined twice. This semi-streaming approach enables few-pass analysis of high coverage data sets. More, the approach also makes it possible to apply k-mer spectral analysis to data sets with uneven coverage such as metagenomes, transcriptomes, and whole-genome amplified samples.</p>
<p>Because our core data structure sizes are preallocated based on estimates of the unique k-mer content of the data, we also provide fast and low-memory k-mer cardinality estimation via the script
<monospace>unique-kmers.py</monospace>
. This script uses the HyperLogLog algorithm to provide a probabilistic estimate of the number of unique k-mers in a data set with a guaranteed upper bound
<sup>
<xref rid="ref-14" ref-type="bibr">14</xref>
</sup>
. A manuscript on this implementation is in progress (Irber and Brown, unpublished).</p>
</sec>
<sec>
<title>Partitioning reads into disconnected assembly graphs</title>
<p>We have also built a De Bruijn graph representation on top of a Bloom filter, and implemented this in khmer. The primary use for this so far has been to enable memory efficient
<italic>graph partitioning</italic>
, in which reads contributing to disconnected subgraphs are placed into different files. This can lead to an approximately 20-fold decrease in the amount of memory needed for metagenome assembly
<sup>
<xref rid="ref-2" ref-type="bibr">2</xref>
</sup>
, and may also separate reads into species-specific bins
<sup>
<xref rid="ref-15" ref-type="bibr">15</xref>
</sup>
.</p>
</sec>
<sec>
<title>Reformatting collections of short reads</title>
<p>In support of the streaming nature of this project, our preferred paired-read format is with pairs interleaved in a single file. As an extension of this, we automatically support a “broken-paired” read format where orphaned reads and pairs coexist in a single file. This enables single input/output streaming connections between tools, while leaving our tools compatible with fully paired read files as well as files containing only orphaned reads.</p>
<p>For converting to and from this format, we supply the
<monospace>scripts extract-paired-reads.py</monospace>
,
<monospace>interleave-reads.py</monospace>
, and
<monospace>split-paired-reads.py</monospace>
to respectively extract fully paired reads from sequence files, interleave two files containing read pairs, and split an interleaved file into two files containing read pairs.</p>
<p>In addition, we supply several utility scripts that we use in our own work. These include
<monospace>sample-reads-randomly.py</monospace>
for performing reservoir sampling of reads and
<monospace>readstats.py</monospace>
for summarizing sequence files.</p>
</sec>
</sec>
<sec>
<title>Summary</title>
<p>The khmer project is an increasingly mature open source scientific software project that provides several efficient data structures and algorithms for analyzing short-read nucleotide sequencing data. khmer emphasizes online analysis, low memory data structures and streaming algorithms. khmer continues to be useful for both advancing bioinformatics research and analyzing biological data.</p>
</sec>
<sec>
<title>Software availability</title>
<sec>
<title>Software available from</title>
<p>
<ext-link ext-link-type="uri" xlink:href="https://khmer.readthedocs.org/en/v2.0/">https://khmer.readthedocs.org/en/v2.0/</ext-link>
</p>
</sec>
<sec>
<title>Link to source code</title>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/dib-lab/khmer/releases/tag/v2.0">https://github.com/dib-lab/khmer/releases/tag/v2.0</ext-link>
</p>
</sec>
<sec>
<title>Link to archived source code as at time of publication</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.31258">http://dx.doi.org/10.5281/zenodo.31258</ext-link>
<sup>
<xref rid="ref-16" ref-type="bibr">16</xref>
</sup>
</p>
</sec>
<sec>
<title>Software license</title>
<p>Michael Crusoe: Copyright: 2010–2015, Michigan State University. Copyright: 2015, The Regents of the University of California. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
<list list-type="bullet">
<list-item>
<p>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</p>
</list-item>
<list-item>
<p>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</p>
</list-item>
<list-item>
<p>Neither the name of the Michigan State University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.</p>
</list-item>
</list>
</p>
<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<ref id="ref-1">
<label>1</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Pell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Canino-Koning</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>These are not the k-mers you are looking for: Efficient online k-mer counting using a probabilistic data structure.</article-title>
<source>
<italic>PLoS One.</italic>
</source>
<year>2014</year>
;
<volume>9</volume>
(
<issue>7</issue>
):
<fpage>e101271</fpage>
.
<pub-id pub-id-type="doi">10.1371/journal.pone.0101271</pub-id>
<pmc-comment>4111482</pmc-comment>
<pub-id pub-id-type="pmid">25062443</pub-id>
</mixed-citation>
</ref>
<ref id="ref-2">
<label>2</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hintze</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Canino-Koning</surname>
<given-names>R</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.</article-title>
<source>
<italic>Proc Natl Acad Sci U S A.</italic>
</source>
<year>2012</year>
;
<volume>109</volume>
(
<issue>33</issue>
):
<fpage>13272</fpage>
<lpage>7</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.1121464109</pub-id>
<pmc-comment>3421212</pmc-comment>
<pub-id pub-id-type="pmid">22847406</pub-id>
</mixed-citation>
</ref>
<ref id="ref-3">
<label>3</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Howe</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>A reference-free algorithm for computational normalization of shotgun sequencing data</article-title>
. arXiv preprint.
<year>2012</year>
<ext-link ext-link-type="uri" xlink:href="http://arxiv.org/pdf/1203.4802v2.pdf">Reference Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-4">
<label>4</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Awad</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
</person-group>
:
<article-title>Crossing the streams: a framework for streaming analysis of short DNA sequencing reads.</article-title>
<source>
<italic>PeerJ PrePrints.</italic>
</source>
<year>2015</year>
;
<volume>3</volume>
:
<fpage>e1100</fpage>
<pub-id pub-id-type="doi">10.7287/peerj.preprints.890v1</pub-id>
</mixed-citation>
</ref>
<ref id="ref-5">
<label>5</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Döring</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Weese</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rausch</surname>
<given-names>T</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>SeqAn an efficient, generic C++ library for sequence analysis.</article-title>
<source>
<italic>BMC Bioinformatics.</italic>
</source>
<year>2008</year>
;
<volume>9</volume>
(
<issue>1</issue>
):
<fpage>11</fpage>
.
<pub-id pub-id-type="doi">10.1186/1471-2105-9-11</pub-id>
<pmc-comment>2246154</pmc-comment>
<pub-id pub-id-type="pmid">18184432</pub-id>
</mixed-citation>
</ref>
<ref id="ref-6">
<label>6</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crusoe</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
</person-group>
:
<article-title>Walking the talk: adopting and adapting sustainable scientific software development processes in a small biology lab.</article-title>
<source>
<italic>figshare.</italic>
</source>
<year>2013</year>
<pub-id pub-id-type="doi">10.6084/m9.figshare.791567</pub-id>
</mixed-citation>
</ref>
<ref id="ref-7">
<label>7</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
<name>
<surname>Crusoe</surname>
<given-names>MR</given-names>
</name>
</person-group>
:
<article-title>Channeling community contributions to scientific software: a sprint experience.</article-title>
<source>
<italic>figshare.</italic>
</source>
<year>2014</year>
<pub-id pub-id-type="doi">10.6084/m9.figshare.1112541</pub-id>
</mixed-citation>
</ref>
<ref id="ref-8">
<label>8</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lowe</surname>
<given-names>EK</given-names>
</name>
<name>
<surname>Swalla</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>CT</given-names>
</name>
</person-group>
:
<article-title>Evaluating a lightweight transcriptome assembly pipeline on two closely related ascidian species.</article-title>
<source>
<italic>PeerJ Preprints.</italic>
</source>
<year>2014</year>
;
<volume>2</volume>
<pub-id pub-id-type="doi">10.7287/peerj.preprints.505v1</pub-id>
</mixed-citation>
</ref>
<ref id="ref-9">
<label>9</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Preston-Werner</surname>
<given-names>T</given-names>
</name>
</person-group>
:
<article-title>Semantic versioning 2.0.0</article-title>
.
<year>2015</year>
[Online; accessed 3-August-2015].
<ext-link ext-link-type="uri" xlink:href="http://semver.org/spec/v2.0.0.html">Reference Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-10">
<label>10</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zerbino</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
:
<article-title>Velvet: algorithms for
<italic>de novo</italic>
short read assembly using de Bruijn graphs.</article-title>
<source>
<italic>Genome Res.</italic>
</source>
<year>2008</year>
;
<volume>18</volume>
(
<issue>5</issue>
):
<fpage>821</fpage>
<lpage>9</lpage>
.
<pub-id pub-id-type="doi">10.1101/gr.074492.107</pub-id>
<pmc-comment>2336801</pmc-comment>
<pub-id pub-id-type="pmid">18349386</pub-id>
</mixed-citation>
</ref>
<ref id="ref-11">
<label>11</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peng</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Leung</surname>
<given-names>HCM</given-names>
</name>
<name>
<surname>Yiu</surname>
<given-names>SM</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>IDBA–a practical iterative de Bruijn graph
<italic>de novo</italic>
assembler</article-title>
. In
<italic>Research in Computational Molecular Biology</italic>
<year>2010</year>
;
<fpage>426</fpage>
<lpage>440</lpage>
.
<pub-id pub-id-type="doi">10.1007/978-3-642-12683-3_28</pub-id>
</mixed-citation>
</ref>
<ref id="ref-12">
<label>12</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Papanicolaou</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yassour</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>
<italic>De novo</italic>
transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.</article-title>
<source>
<italic>Nat Protoc.</italic>
</source>
<year>2013</year>
;
<volume>8</volume>
(
<issue>8</issue>
):
<fpage>1494</fpage>
<lpage>512</lpage>
.
<pub-id pub-id-type="doi">10.1038/nprot.2013.084</pub-id>
<pmc-comment>3875132</pmc-comment>
<pub-id pub-id-type="pmid">23845962</pub-id>
</mixed-citation>
</ref>
<ref id="ref-13">
<label>13</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bankevich</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nurk</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Antipov</surname>
<given-names>D</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.</article-title>
<source>
<italic>J Comput Biol.</italic>
</source>
<year>2012</year>
;
<volume>19</volume>
(
<issue>5</issue>
):
<fpage>455</fpage>
<lpage>477</lpage>
.
<pub-id pub-id-type="doi">10.1089/cmb.2012.0021</pub-id>
<pmc-comment>3342519</pmc-comment>
<pub-id pub-id-type="pmid">22506599</pub-id>
</mixed-citation>
</ref>
<ref id="ref-14">
<label>14</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Flajolet</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Fusy</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Gandouet</surname>
<given-names>O</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm.</article-title>
<source>
<italic>DMTCS Proceedings.</italic>
</source>
<year>2008</year>
; (
<issue>1</issue>
).
<ext-link ext-link-type="uri" xlink:href="http://www.dmtcs.org/dmtcs-ojs/index.php/proceedings/article/viewArticle/914">Reference Source</ext-link>
</mixed-citation>
</ref>
<ref id="ref-15">
<label>15</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Howe</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Jansson</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Malfatti</surname>
<given-names>SA</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>Tackling soil diversity with the assembly of large, complex metagenomes.</article-title>
<source>
<italic>Proc Natl Acad Sci U S A.</italic>
</source>
<year>2014</year>
;
<volume>111</volume>
(
<issue>13</issue>
):
<fpage>4904</fpage>
<lpage>9</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.1402564111</pub-id>
<pmc-comment>3977251</pmc-comment>
<pub-id pub-id-type="pmid">24632729</pub-id>
</mixed-citation>
</ref>
<ref id="ref-16">
<label>16</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crusoe</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Alameldin</surname>
<given-names>HF</given-names>
</name>
<name>
<surname>Awad</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
:
<article-title>The khmer project v2.0.</article-title>
<source>
<italic>Zenodo.</italic>
</source>
<year>2015</year>
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.31258">Data Source</ext-link>
</mixed-citation>
</ref>
</ref-list>
</back>
<sub-article id="report10508" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.7456.r10508</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Patro</surname>
<given-names>Rob</given-names>
</name>
<xref ref-type="aff" rid="r10508a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r10508a1">
<label>1</label>
Computer Science Department, Stony Brook University, Stony Brook, NY, USA</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>8</day>
<month>10</month>
<year>2015</year>
</pub-date>
<related-article id="d35e1925" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.6924.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>This paper describes version 2 of the khmer software suite.  The software is developed to provide both a set of directly usable tools (e.g. normalize-by-median for digital normalization) as well as an experimental framework for developers looking to design new algorithms and methods.  It has proven very useful on both of these fronts.  The repository is highly watched and starred on GitHub, the developers are very responsive (see more below), and both the senior author's group and other researches seem to be leveraging this framework to build new tools and algorithms.</p>
<p>The paper itself does a good job of describing the software at a high level, including the overall design and goals.  I would have appreciated slightly more detail about the motivation behind the design decisions, and the tradeoffs they entail (e.g. Why have a Python front-end? Why use hand-written binding code rather than a binding generator, like SWIG, that would allow interfaces to other languages as well?).  I understand that a comprehensive description is not feasible in a manuscript of this length.  It would be very interesting to know, however, the cost paid for using the high-level interface rather than the C++ library directly.  When the underlying computation is trivial, simply having to iterate over an enormous number of things in Python could add non-trivial overhead.  Despite these desiderata, I find that the paper is generally well written and does a good job of describing what a new user might want to know about khmer, and so I approve of this manuscript.</p>
<p>Like Daniel, I also downloaded and built the software using the instructions provided in the ReadTheDocs documentation.  The process was simple, and worked well, with the exception of a minor glitch running the tests.  After debugging the cause of the problem, I posted an issue to the GitHub repository, and received a response in less than a day.  I bring this up because, while not an aspect of the paper itself, good developer support is crucial to the long-term survival and utility of a software package — khmer seems to have this.</p>
<p>This brings me to my final point, about the (currently) controversial authorship policy on this paper, which is ancillary to the quality of the paper (and software) itself.  At this point,
<italic>I must reserve judgement</italic>
on whether I think the authorship policy adopted by this paper is "good" or "bad" (for science, the community, etc.).  Incidentally, this is a dichotomy that does not capture the subtlety or importance of this issue well.  In the manuscript, the authors state "We develop khmer on github.com as a community open source project focused on sustainable software development, and encourage contributions of any kind."  Thus, contributions to khmer are of a potentially wide variety in character (and also, I believe, not simply related to improving or maintaining the code).  Those who contribute to the design, improve the usability, work on documentation, support new and existing users, and develop and propagate best practices are all contributing something valuable to the khmer software "ecosystem".  It is unreasonable to expect a piece of software that is ~25k lines of code (and growing) to be actively developed, maintained, and supported by only a small contingent of people, many of whom may be graduate students soon to graduate and move on.  Thus, if we are interested in the long-term viability and quality of such software, we must adopt a system of credit that values and recognizes a variety of different types of contribution.  On the other hand, I do share the concern that, in the midst of the current authorship system, bestowing that recognition in the form of authorship may have the adverse effect of diminishing the public perception of the very credit one is trying to grant.  Perhaps there is a solution 
<italic>along</italic>
 the lines adopted by this paper, or perhaps something drastically different needs to be considered.</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
</body>
</sub-article>
<sub-article id="report10513" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.7456.r10513</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Katz</surname>
<given-names>Daniel</given-names>
</name>
<xref ref-type="aff" rid="r10513a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r10513a1">
<label>1</label>
Computation Institute, University of Chicago, Chicago, IL, USA</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>6</day>
<month>10</month>
<year>2015</year>
</pub-date>
<related-article id="d35e1986" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.6924.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>Regarding the paper, it is a fairly straightforward description of a software package, containing all the things that such a paper should have - a description of the goals, the implemented methods, the hardware and software dependencies (systems on which the software has been tested), some guidance on usage, pointers to the software and documentation, and references.</p>
<p>Regarding the software, I did download and build the software, which seemed to work, other than a fair number of warnings.  I was not able to successfully test the software, however, due to issues in 
<ext-link ext-link-type="uri" xlink:href="https://protect-eu.mimecast.com/s/5kZHBM7EDcd">https://khmer.readthedocs.org/en/v2.0/user/install.html#run-the-tests</ext-link>
 Does this mean I should not approve the article?  Or should I ask the authors for help in understanding the error and hold off on submitting this report?</p>
<p>I would have liked to have chosen "Approved with reservations" for the status of this review, but my reservations are with the F1000 system for this type of paper, not with this specific paper, so in fairness to the authors, given the lack of clarity of what I should be doing as a reviewer for a software paper, I approve this paper based on its quality as a good description of the software, and not on the quality of software (and related documentation) itself.</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
</body>
<sub-article id="comment1643" article-type="response">
<front-stub>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Katz</surname>
<given-names>Daniel S.</given-names>
</name>
<aff>University of Chicago, USA</aff>
</contrib>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
none</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>6</day>
<month>10</month>
<year>2015</year>
</pub-date>
</front-stub>
<body>
<p>In addition to my report, regarding software papers in general under F1000, I believe that much more should be required from their reviewers, and what 
<bold>is required</bold>
should be made clear.  Software journals (e.g., Ubiquity Press's Journal of Open Research Software, Elsevier's Software X) have specific statements of what a reviewer should do, which say a lot about the quality of the review.  For JORS, this is defined on a web page (
<ext-link ext-link-type="uri" xlink:href="http://openresearchsoftware.metajnl.com/about/editorialPolicies/">http://openresearchsoftware.metajnl.com/about/editorialPolicies/</ext-link>
).  For Software X, the criteria are not on the web (as far as I know) but are embedded in the review form/process, and are roughly equivalent.</p>
</body>
</sub-article>
</sub-article>
<sub-article id="report10514" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.7456.r10514</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Birney</surname>
<given-names>Ewan</given-names>
</name>
<xref ref-type="aff" rid="r10514a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r10514a1">
<label>1</label>
European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>5</day>
<month>10</month>
<year>2015</year>
</pub-date>
<related-article id="d35e2076" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.6924.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve-with-reservations</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>This is an update of a widely used tool, khmer, which is in broad use in the technical community around de Bruijn graphs and short reads, based on Bloom filters. It is a good update, provides link to the code, and is sensibly written with tests. I have no concern about the scientific aspect of this paper.</p>
<p>I do find the author inclusion list taking a concept and going to the extreme, and I don't think it is sensible to have an anonymised author (en zyme) on the list, with in effect no way to attribute to a person this. Science's openness in publication is also about attribution. Although I understand Titus' consistency of having all git committers as authors, I think it is sensible to make a distinction of substantial/scientific changes, of which the vast majority of the authors are. Acknowledgements are precisely there to handle these other cases. </p>
<p>I believe it is uncontroversial to appropriately trim the author list, to use the acknowledgements for anonymous improvement (happens regularly in science) and small details (again, a commonplace practice).</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
</body>
</sub-article>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C67 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000C67 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4608353
   |texte=   The khmer software package: enabling efficient nucleotide sequence analysis
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26535114" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021