Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Positional bias in variant calls against draft reference assemblies

Identifieur interne : 000746 ( Pmc/Checkpoint ); précédent : 000745; suivant : 000747

Positional bias in variant calls against draft reference assemblies

Auteurs : Roman V. Briskine ; Kentaro K. Shimizu [Japon]

Source :

RBID : PMC:5368935

Abstract

Background

Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis.

Results

In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants’ relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements.

Conclusions

Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-3637-2) contains supplementary material, which is available to authorized users.


Url:
DOI: 10.1186/s12864-017-3637-2
PubMed: 28351369
PubMed Central: 5368935


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5368935

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Positional bias in variant calls against draft reference assemblies</title>
<author>
<name sortKey="Briskine, Roman V" sort="Briskine, Roman V" uniqKey="Briskine R" first="Roman V." last="Briskine">Roman V. Briskine</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 0650</institution-id>
<institution-id institution-id-type="GRID">grid.7400.3</institution-id>
<institution>Department of Evolutionary Biology and Environmental Studies,</institution>
<institution>University of Zurich,</institution>
</institution-wrap>
Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">Functional Genomics Center Zurich, Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shimizu, Kentaro K" sort="Shimizu, Kentaro K" uniqKey="Shimizu K" first="Kentaro K." last="Shimizu">Kentaro K. Shimizu</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 0650</institution-id>
<institution-id institution-id-type="GRID">grid.7400.3</institution-id>
<institution>Department of Evolutionary Biology and Environmental Studies,</institution>
<institution>University of Zurich,</institution>
</institution-wrap>
Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1033 6139</institution-id>
<institution-id institution-id-type="GRID">grid.268441.d</institution-id>
<institution>Kihara Institute for Biological Research,</institution>
<institution>Yokohama City University,</institution>
</institution-wrap>
641-12 Maioka, Totsuka-ward, Yokohama, 244-0813 Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>641-12 Maioka, Totsuka-ward, Yokohama</wicri:regionArea>
<wicri:noRegion>Yokohama</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28351369</idno>
<idno type="pmc">5368935</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5368935</idno>
<idno type="RBID">PMC:5368935</idno>
<idno type="doi">10.1186/s12864-017-3637-2</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000295</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000295</idno>
<idno type="wicri:Area/Pmc/Curation">000295</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000295</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000746</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000746</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Positional bias in variant calls against draft reference assemblies</title>
<author>
<name sortKey="Briskine, Roman V" sort="Briskine, Roman V" uniqKey="Briskine R" first="Roman V." last="Briskine">Roman V. Briskine</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 0650</institution-id>
<institution-id institution-id-type="GRID">grid.7400.3</institution-id>
<institution>Department of Evolutionary Biology and Environmental Studies,</institution>
<institution>University of Zurich,</institution>
</institution-wrap>
Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
<affiliation>
<nlm:aff id="Aff2">Functional Genomics Center Zurich, Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Shimizu, Kentaro K" sort="Shimizu, Kentaro K" uniqKey="Shimizu K" first="Kentaro K." last="Shimizu">Kentaro K. Shimizu</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 0650</institution-id>
<institution-id institution-id-type="GRID">grid.7400.3</institution-id>
<institution>Department of Evolutionary Biology and Environmental Studies,</institution>
<institution>University of Zurich,</institution>
</institution-wrap>
Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</nlm:aff>
<wicri:noCountry code="subfield">CH-8057 Switzerland</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1033 6139</institution-id>
<institution-id institution-id-type="GRID">grid.268441.d</institution-id>
<institution>Kihara Institute for Biological Research,</institution>
<institution>Yokohama City University,</institution>
</institution-wrap>
641-12 Maioka, Totsuka-ward, Yokohama, 244-0813 Japan</nlm:aff>
<country xml:lang="fr">Japon</country>
<wicri:regionArea>641-12 Maioka, Totsuka-ward, Yokohama</wicri:regionArea>
<wicri:noRegion>Yokohama</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis.</p>
</sec>
<sec>
<title>Results</title>
<p>In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants’ relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-017-3637-2) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28351369</article-id>
<article-id pub-id-type="pmc">5368935</article-id>
<article-id pub-id-type="publisher-id">3637</article-id>
<article-id pub-id-type="doi">10.1186/s12864-017-3637-2</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methodology Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Positional bias in variant calls against draft reference assemblies</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-6831-3914</contrib-id>
<name>
<surname>Briskine</surname>
<given-names>Roman V.</given-names>
</name>
<address>
<email>roman.briskine@ieu.uzh.ch</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Shimizu</surname>
<given-names>Kentaro K.</given-names>
</name>
<address>
<email>kentaro.shimizu@ieu.uzh.ch</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 0650</institution-id>
<institution-id institution-id-type="GRID">grid.7400.3</institution-id>
<institution>Department of Evolutionary Biology and Environmental Studies,</institution>
<institution>University of Zurich,</institution>
</institution-wrap>
Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</aff>
<aff id="Aff2">
<label>2</label>
Functional Genomics Center Zurich, Winterthurerstrasse 190, Zurich, CH-8057 Switzerland</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1033 6139</institution-id>
<institution-id institution-id-type="GRID">grid.268441.d</institution-id>
<institution>Kihara Institute for Biological Research,</institution>
<institution>Yokohama City University,</institution>
</institution-wrap>
641-12 Maioka, Totsuka-ward, Yokohama, 244-0813 Japan</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>28</day>
<month>3</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>28</day>
<month>3</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>18</volume>
<elocation-id>263</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>6</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>3</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2017</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis.</p>
</sec>
<sec>
<title>Results</title>
<p>In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants’ relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-017-3637-2) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Reseqencing</kwd>
<kwd>Variants</kwd>
<kwd>Polymorphisms</kwd>
<kwd>SNPs</kwd>
<kwd>Positional bias</kwd>
<kwd>Draft reference genome</kwd>
<kwd>Repetitive elements</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001711</institution-id>
<institution>Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung</institution>
</institution-wrap>
</funding-source>
</award-group>
<award-group>
<funding-source>
<institution>Seventh Framework Programme - PLANT FELLOWS</institution>
</funding-source>
</award-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100006447</institution-id>
<institution>Universität Zürich</institution>
</institution-wrap>
</funding-source>
</award-group>
<award-group>
<funding-source>
<institution-wrap>
<institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100001700</institution-id>
<institution>Ministry of Education, Culture, Sports, Science, and Technology</institution>
</institution-wrap>
</funding-source>
</award-group>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2017</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Briskine, Roman V" sort="Briskine, Roman V" uniqKey="Briskine R" first="Roman V." last="Briskine">Roman V. Briskine</name>
</noCountry>
<country name="Japon">
<noRegion>
<name sortKey="Shimizu, Kentaro K" sort="Shimizu, Kentaro K" uniqKey="Shimizu K" first="Kentaro K." last="Shimizu">Kentaro K. Shimizu</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000746 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000746 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:5368935
   |texte=   Positional bias in variant calls against draft reference assemblies
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:28351369" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021