Barcode identification for single cell genomics
Identifieur interne : 000278 ( Pmc/Curation ); précédent : 000277; suivant : 000279Barcode identification for single cell genomics
Auteurs : Akshay Tambe [États-Unis] ; Lior Pachter [États-Unis]Source :
- BMC Bioinformatics [ 1471-2105 ] ; 2019.
Abstract
Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes.
Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of
We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.
The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users.
Url:
DOI: 10.1186/s12859-019-2612-0
PubMed: 30654736
PubMed Central: 6337828
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000278
Links to Exploration step
PMC:6337828Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Barcode identification for single cell genomics</title>
<author><name sortKey="Tambe, Akshay" sort="Tambe, Akshay" uniqKey="Tambe A" first="Akshay" last="Tambe">Akshay Tambe</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Division of Biology and Biological Engineering,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>116 Kerckhoff Laboratory, Pasadena</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Pachter, Lior" sort="Pachter, Lior" uniqKey="Pachter L" first="Lior" last="Pachter">Lior Pachter</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Departments of Biology and Computing & Mathematical Sciences,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>116 Kerckhoff Laboratory, Pasadena</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">30654736</idno>
<idno type="pmc">6337828</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337828</idno>
<idno type="RBID">PMC:6337828</idno>
<idno type="doi">10.1186/s12859-019-2612-0</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000278</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000278</idno>
<idno type="wicri:Area/Pmc/Curation">000278</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000278</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Barcode identification for single cell genomics</title>
<author><name sortKey="Tambe, Akshay" sort="Tambe, Akshay" uniqKey="Tambe A" first="Akshay" last="Tambe">Akshay Tambe</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Division of Biology and Biological Engineering,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>116 Kerckhoff Laboratory, Pasadena</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Pachter, Lior" sort="Pachter, Lior" uniqKey="Pachter L" first="Lior" last="Pachter">Lior Pachter</name>
<affiliation wicri:level="2"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Departments of Biology and Computing & Mathematical Sciences,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>116 Kerckhoff Laboratory, Pasadena</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p id="Par1">Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes.</p>
</sec>
<sec><title>Results</title>
<p id="Par2">Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of <italic>k</italic>
is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers.</p>
</sec>
<sec><title>Conclusion</title>
<p id="Par3">We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Bray, Nl" uniqKey="Bray N">NL Bray</name>
</author>
<author><name sortKey="Pimentel, H" uniqKey="Pimentel H">H Pimentel</name>
</author>
<author><name sortKey="Melsted, P" uniqKey="Melsted P">P Melsted</name>
</author>
<author><name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Compeau, Pec" uniqKey="Compeau P">PEC Compeau</name>
</author>
<author><name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author><name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Fincher, Ct" uniqKey="Fincher C">CT Fincher</name>
</author>
<author><name sortKey="Wurtzel, O" uniqKey="Wurtzel O">O Wurtzel</name>
</author>
<author><name sortKey="De Hoog, T" uniqKey="De Hoog T">T de Hoog</name>
</author>
<author><name sortKey="Kravarik, Km" uniqKey="Kravarik K">KM Kravarik</name>
</author>
<author><name sortKey="Reddien, Pw" uniqKey="Reddien P">PW Reddien</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gierahn, Tm" uniqKey="Gierahn T">TM Gierahn</name>
</author>
<author><name sortKey="Wadsworth, Mh" uniqKey="Wadsworth M">MH Wadsworth</name>
</author>
<author><name sortKey="Hughes, Tk" uniqKey="Hughes T">TK Hughes</name>
</author>
<author><name sortKey="Bryson, Bd" uniqKey="Bryson B">BD Bryson</name>
</author>
<author><name sortKey="Butler, A" uniqKey="Butler A">A Butler</name>
</author>
<author><name sortKey="Satija, R" uniqKey="Satija R">R Satija</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Karaiskos, N" uniqKey="Karaiskos N">N Karaiskos</name>
</author>
<author><name sortKey="Wahle, P" uniqKey="Wahle P">P Wahle</name>
</author>
<author><name sortKey="Alles, J" uniqKey="Alles J">J Alles</name>
</author>
<author><name sortKey="Boltengagen, A" uniqKey="Boltengagen A">A Boltengagen</name>
</author>
<author><name sortKey="Ayoub, S" uniqKey="Ayoub S">S Ayoub</name>
</author>
<author><name sortKey="Kipar, C" uniqKey="Kipar C">C Kipar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Klein, Am" uniqKey="Klein A">AM Klein</name>
</author>
<author><name sortKey="Mazutis, L" uniqKey="Mazutis L">L Mazutis</name>
</author>
<author><name sortKey="Akartuna, I" uniqKey="Akartuna I">I Akartuna</name>
</author>
<author><name sortKey="Tallapragada, N" uniqKey="Tallapragada N">N Tallapragada</name>
</author>
<author><name sortKey="Veres, A" uniqKey="Veres A">A Veres</name>
</author>
<author><name sortKey="Li, V" uniqKey="Li V">V Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author><name sortKey="Schroder, J" uniqKey="Schroder J">J Schroder</name>
</author>
<author><name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Macosko, Ez" uniqKey="Macosko E">EZ Macosko</name>
</author>
<author><name sortKey="Basu, A" uniqKey="Basu A">A Basu</name>
</author>
<author><name sortKey="Satija, R" uniqKey="Satija R">R Satija</name>
</author>
<author><name sortKey="Nemesh, J" uniqKey="Nemesh J">J Nemesh</name>
</author>
<author><name sortKey="Shekhar, K" uniqKey="Shekhar K">K Shekhar</name>
</author>
<author><name sortKey="Goldman, M" uniqKey="Goldman M">M Goldman</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Patro, R" uniqKey="Patro R">R Patro</name>
</author>
<author><name sortKey="Mount, Sm" uniqKey="Mount S">SM Mount</name>
</author>
<author><name sortKey="Kingsford, C" uniqKey="Kingsford C">C Kingsford</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plass, M" uniqKey="Plass M">M Plass</name>
</author>
<author><name sortKey="Solana, J" uniqKey="Solana J">J Solana</name>
</author>
<author><name sortKey="Wolf, Fa" uniqKey="Wolf F">FA Wolf</name>
</author>
<author><name sortKey="Ayoub, S" uniqKey="Ayoub S">S Ayoub</name>
</author>
<author><name sortKey="Misios, A" uniqKey="Misios A">A Misios</name>
</author>
<author><name sortKey="Glazar, P" uniqKey="Glazar P">P Glažar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rosenberg, Ab" uniqKey="Rosenberg A">AB Rosenberg</name>
</author>
<author><name sortKey="Roco, C" uniqKey="Roco C">C Roco</name>
</author>
<author><name sortKey="Muscat, Ra" uniqKey="Muscat R">RA Muscat</name>
</author>
<author><name sortKey="Kuchina, A" uniqKey="Kuchina A">A Kuchina</name>
</author>
<author><name sortKey="Mukherjee, S" uniqKey="Mukherjee S">S Mukherjee</name>
</author>
<author><name sortKey="Chen, W" uniqKey="Chen W">W Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L Schaeffer</name>
</author>
<author><name sortKey="Pimentel, H" uniqKey="Pimentel H">H Pimentel</name>
</author>
<author><name sortKey="Bray, N" uniqKey="Bray N">N Bray</name>
</author>
<author><name sortKey="Mellsted, P" uniqKey="Mellsted P">P Mellsted</name>
</author>
<author><name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Svensson, V" uniqKey="Svensson V">V Svensson</name>
</author>
<author><name sortKey="Natarajan, Kn" uniqKey="Natarajan K">KN Natarajan</name>
</author>
<author><name sortKey="Ly, L H" uniqKey="Ly L">L-H Ly</name>
</author>
<author><name sortKey="Miragaia, Rj" uniqKey="Miragaia R">RJ Miragaia</name>
</author>
<author><name sortKey="Labalette, C" uniqKey="Labalette C">C Labalette</name>
</author>
<author><name sortKey="Macaulay, Ic" uniqKey="Macaulay I">IC Macaulay</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tosches, Ma" uniqKey="Tosches M">MA Tosches</name>
</author>
<author><name sortKey="Yamawaki, Tm" uniqKey="Yamawaki T">TM Yamawaki</name>
</author>
<author><name sortKey="Naumann, Rk" uniqKey="Naumann R">RK Naumann</name>
</author>
<author><name sortKey="Jacobi, Aa" uniqKey="Jacobi A">AA Jacobi</name>
</author>
<author><name sortKey="Tushev, G" uniqKey="Tushev G">G Tushev</name>
</author>
<author><name sortKey="Laurent, G" uniqKey="Laurent G">G Laurent</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author><name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zorita, E" uniqKey="Zorita E">E Zorita</name>
</author>
<author><name sortKey="Cusc, P" uniqKey="Cusc P">P Cuscó</name>
</author>
<author><name sortKey="Filion, Gj" uniqKey="Filion G">GJ Filion</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article"><pmc-dir>properties open_access</pmc-dir>
<front><journal-meta><journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group><journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher><publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta><article-id pub-id-type="pmid">30654736</article-id>
<article-id pub-id-type="pmc">6337828</article-id>
<article-id pub-id-type="publisher-id">2612</article-id>
<article-id pub-id-type="doi">10.1186/s12859-019-2612-0</article-id>
<article-categories><subj-group subj-group-type="heading"><subject>Software</subject>
</subj-group>
</article-categories>
<title-group><article-title>Barcode identification for single cell genomics</article-title>
</title-group>
<contrib-group><contrib contrib-type="author"><name><surname>Tambe</surname>
<given-names>Akshay</given-names>
</name>
<address><email>akshay.tambe@caltech.edu</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes"><name><surname>Pachter</surname>
<given-names>Lior</given-names>
</name>
<address><email>lpachter@caltech.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1"><label>1</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Division of Biology and Biological Engineering,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</aff>
<aff id="Aff2"><label>2</label>
<institution-wrap><institution-id institution-id-type="ISNI">0000000107068890</institution-id>
<institution-id institution-id-type="GRID">grid.20861.3d</institution-id>
<institution>Departments of Biology and Computing & Mathematical Sciences,</institution>
<institution>California Institute of Technology,</institution>
</institution-wrap>
116 Kerckhoff Laboratory, Pasadena, CA 91125 USA</aff>
</contrib-group>
<pub-date pub-type="epub"><day>17</day>
<month>1</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release"><day>17</day>
<month>1</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection"><year>2019</year>
</pub-date>
<volume>20</volume>
<elocation-id>32</elocation-id>
<history><date date-type="received"><day>23</day>
<month>5</month>
<year>2017</year>
</date>
<date date-type="accepted"><day>7</day>
<month>1</month>
<year>2019</year>
</date>
</history>
<permissions><copyright-statement>© The Author(s). 2019</copyright-statement>
<license license-type="OpenAccess"><license-p><bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1"><sec><title>Background</title>
<p id="Par1">Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes.</p>
</sec>
<sec><title>Results</title>
<p id="Par2">Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of <italic>k</italic>
is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers.</p>
</sec>
<sec><title>Conclusion</title>
<p id="Par3">We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.</p>
</sec>
<sec><title>Electronic supplementary material</title>
<p>The online version of this article (10.1186/s12859-019-2612-0) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en"><title>Keywords</title>
<kwd>Single-cell</kwd>
<kwd>Barcodes</kwd>
<kwd>Barcode identification</kwd>
<kwd>de Bruijn graph</kwd>
<kwd>Circularization</kwd>
<kwd>K-mer counting</kwd>
</kwd-group>
<funding-group><award-group><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/100000002</institution-id>
<institution>National Institutes of Health</institution>
</institution-wrap>
</funding-source>
</award-group>
</funding-group>
<custom-meta-group><custom-meta><meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2019</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000278 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000278 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Pmc |étape= Curation |type= RBID |clé= PMC:6337828 |texte= Barcode identification for single cell genomics }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i -Sk "pubmed:30654736" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |