Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats

Identifieur interne : 000567 ( Pmc/Corpus ); précédent : 000566; suivant : 000568

Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats

Auteurs : Arkarachai Fungtammasan ; Marta Tomaszkiewicz ; Rebeca Campos-Sánchez ; Kristin A. Eckert ; Michael Degiorgio ; Kateryna D. Makova

Source :

RBID : PMC:5026258

Abstract

Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.


Url:
DOI: 10.1093/molbev/msw139
PubMed: 27413049
PubMed Central: 5026258

Links to Exploration step

PMC:5026258

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats</title>
<author>
<name sortKey="Fungtammasan, Arkarachai" sort="Fungtammasan, Arkarachai" uniqKey="Fungtammasan A" first="Arkarachai" last="Fungtammasan">Arkarachai Fungtammasan</name>
<affiliation>
<nlm:aff id="msw139-aff1">Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff4">Huck Institute of Genome Sciences, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tomaszkiewicz, Marta" sort="Tomaszkiewicz, Marta" uniqKey="Tomaszkiewicz M" first="Marta" last="Tomaszkiewicz">Marta Tomaszkiewicz</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Campos Sanchez, Rebeca" sort="Campos Sanchez, Rebeca" uniqKey="Campos Sanchez R" first="Rebeca" last="Campos-Sánchez">Rebeca Campos-Sánchez</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eckert, Kristin A" sort="Eckert, Kristin A" uniqKey="Eckert K" first="Kristin A." last="Eckert">Kristin A. Eckert</name>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff5">Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Degiorgio, Michael" sort="Degiorgio, Michael" uniqKey="Degiorgio M" first="Michael" last="Degiorgio">Michael Degiorgio</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff6">Institute for CyberScience, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Makova, Kateryna D" sort="Makova, Kateryna D" uniqKey="Makova K" first="Kateryna D." last="Makova">Kateryna D. Makova</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff4">Huck Institute of Genome Sciences, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27413049</idno>
<idno type="pmc">5026258</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5026258</idno>
<idno type="RBID">PMC:5026258</idno>
<idno type="doi">10.1093/molbev/msw139</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000567</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats</title>
<author>
<name sortKey="Fungtammasan, Arkarachai" sort="Fungtammasan, Arkarachai" uniqKey="Fungtammasan A" first="Arkarachai" last="Fungtammasan">Arkarachai Fungtammasan</name>
<affiliation>
<nlm:aff id="msw139-aff1">Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff4">Huck Institute of Genome Sciences, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tomaszkiewicz, Marta" sort="Tomaszkiewicz, Marta" uniqKey="Tomaszkiewicz M" first="Marta" last="Tomaszkiewicz">Marta Tomaszkiewicz</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Campos Sanchez, Rebeca" sort="Campos Sanchez, Rebeca" uniqKey="Campos Sanchez R" first="Rebeca" last="Campos-Sánchez">Rebeca Campos-Sánchez</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Eckert, Kristin A" sort="Eckert, Kristin A" uniqKey="Eckert K" first="Kristin A." last="Eckert">Kristin A. Eckert</name>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff5">Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Degiorgio, Michael" sort="Degiorgio, Michael" uniqKey="Degiorgio M" first="Michael" last="Degiorgio">Michael Degiorgio</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff6">Institute for CyberScience, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Makova, Kateryna D" sort="Makova, Kateryna D" uniqKey="Makova K" first="Kateryna D." last="Makova">Kateryna D. Makova</name>
<affiliation>
<nlm:aff id="msw139-aff2">Department of Biology, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff3">Center for Medical Genomics, Pennsylvania State University</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="msw139-aff4">Huck Institute of Genome Sciences, Pennsylvania State University</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Molecular Biology and Evolution</title>
<idno type="ISSN">0737-4038</idno>
<idno type="eISSN">1537-1719</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a
<italic>Caenorhabditis elegans</italic>
data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Abdul Muneer, Pm" uniqKey="Abdul Muneer P">PM. Abdul-Muneer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amos, W" uniqKey="Amos W">W. Amos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananda, G" uniqKey="Ananda G">G Ananda</name>
</author>
<author>
<name sortKey="Chiaromonte, F" uniqKey="Chiaromonte F">F Chiaromonte</name>
</author>
<author>
<name sortKey="Makova, Kd" uniqKey="Makova K">KD. Makova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananda, G" uniqKey="Ananda G">G Ananda</name>
</author>
<author>
<name sortKey="Walsh, E" uniqKey="Walsh E">E Walsh</name>
</author>
<author>
<name sortKey="Jacob, Kd" uniqKey="Jacob K">KD Jacob</name>
</author>
<author>
<name sortKey="Krasilnikova, M" uniqKey="Krasilnikova M">M Krasilnikova</name>
</author>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA Eckert</name>
</author>
<author>
<name sortKey="Chiaromonte, F" uniqKey="Chiaromonte F">F Chiaromonte</name>
</author>
<author>
<name sortKey="Makova, Kd" uniqKey="Makova K">KD. Makova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Arezi, B" uniqKey="Arezi B">B Arezi</name>
</author>
<author>
<name sortKey="Hogrefe, Hh" uniqKey="Hogrefe H">HH. Hogrefe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bahn, Jh" uniqKey="Bahn J">JH Bahn</name>
</author>
<author>
<name sortKey="Lee, J H" uniqKey="Lee J">J-H Lee</name>
</author>
<author>
<name sortKey="Li, G" uniqKey="Li G">G Li</name>
</author>
<author>
<name sortKey="Greer, C" uniqKey="Greer C">C Greer</name>
</author>
<author>
<name sortKey="Peng, G" uniqKey="Peng G">G Peng</name>
</author>
<author>
<name sortKey="Xiao, X" uniqKey="Xiao X">X. Xiao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baptiste, Ba" uniqKey="Baptiste B">BA Baptiste</name>
</author>
<author>
<name sortKey="Ananda, G" uniqKey="Ananda G">G Ananda</name>
</author>
<author>
<name sortKey="Strubczewski, N" uniqKey="Strubczewski N">N Strubczewski</name>
</author>
<author>
<name sortKey="Lutzkanin, A" uniqKey="Lutzkanin A">A Lutzkanin</name>
</author>
<author>
<name sortKey="Khoo, Sj" uniqKey="Khoo S">SJ Khoo</name>
</author>
<author>
<name sortKey="Srikanth, A" uniqKey="Srikanth A">A Srikanth</name>
</author>
<author>
<name sortKey="Kim, N" uniqKey="Kim N">N Kim</name>
</author>
<author>
<name sortKey="Makova, Kd" uniqKey="Makova K">KD Makova</name>
</author>
<author>
<name sortKey="Krasilnikova, Mm" uniqKey="Krasilnikova M">MM Krasilnikova</name>
</author>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA. Eckert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baptiste, Ba" uniqKey="Baptiste B">BA Baptiste</name>
</author>
<author>
<name sortKey="Jacob, Kd" uniqKey="Jacob K">KD Jacob</name>
</author>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA. Eckert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Barrioluengo, V" uniqKey="Barrioluengo V">V Barrioluengo</name>
</author>
<author>
<name sortKey="Alvarez, M" uniqKey="Alvarez M">M Alvarez</name>
</author>
<author>
<name sortKey="Barbieri, D" uniqKey="Barbieri D">D Barbieri</name>
</author>
<author>
<name sortKey="Menendez Arias, L" uniqKey="Menendez Arias L">L. Menéndez-Arias</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bass, Bl" uniqKey="Bass B">BL. Bass</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bass, B" uniqKey="Bass B">B Bass</name>
</author>
<author>
<name sortKey="Hundley, H" uniqKey="Hundley H">H Hundley</name>
</author>
<author>
<name sortKey="Li, Jb" uniqKey="Li J">JB Li</name>
</author>
<author>
<name sortKey="Peng, Z" uniqKey="Peng Z">Z Peng</name>
</author>
<author>
<name sortKey="Pickrell, J" uniqKey="Pickrell J">J Pickrell</name>
</author>
<author>
<name sortKey="Xiao, Xg" uniqKey="Xiao X">XG Xiao</name>
</author>
<author>
<name sortKey="Yang, L" uniqKey="Yang L">L. Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blank, A" uniqKey="Blank A">A Blank</name>
</author>
<author>
<name sortKey="Gallant, Ja" uniqKey="Gallant J">JA Gallant</name>
</author>
<author>
<name sortKey="Burgess, Rr" uniqKey="Burgess R">RR Burgess</name>
</author>
<author>
<name sortKey="Loeb, La" uniqKey="Loeb L">LA. Loeb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blankenberg, D" uniqKey="Blankenberg D">D Blankenberg</name>
</author>
<author>
<name sortKey="Kuster, Gv" uniqKey="Kuster G">GV Kuster</name>
</author>
<author>
<name sortKey="Bouvier, E" uniqKey="Bouvier E">E Bouvier</name>
</author>
<author>
<name sortKey="Baker, D" uniqKey="Baker D">D Baker</name>
</author>
<author>
<name sortKey="Afgan, E" uniqKey="Afgan E">E Afgan</name>
</author>
<author>
<name sortKey="Stoler, N" uniqKey="Stoler N">N Stoler</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
<author>
<name sortKey="Nekrutenko, A" uniqKey="Nekrutenko A">A. Nekrutenko</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blankenberg, D" uniqKey="Blankenberg D">D Blankenberg</name>
</author>
<author>
<name sortKey="Von Kuster, G" uniqKey="Von Kuster G">G Von Kuster</name>
</author>
<author>
<name sortKey="Coraor, N" uniqKey="Coraor N">N Coraor</name>
</author>
<author>
<name sortKey="Ananda, G" uniqKey="Ananda G">G Ananda</name>
</author>
<author>
<name sortKey="Lazarus, R" uniqKey="Lazarus R">R Lazarus</name>
</author>
<author>
<name sortKey="Mangan, M" uniqKey="Mangan M">M Mangan</name>
</author>
<author>
<name sortKey="Nekrutenko, A" uniqKey="Nekrutenko A">A Nekrutenko</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J. Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boby, T" uniqKey="Boby T">T Boby</name>
</author>
<author>
<name sortKey="Patch, A M" uniqKey="Patch A">A-M Patch</name>
</author>
<author>
<name sortKey="Aves, Sj" uniqKey="Aves S">SJ. Aves</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Borel, C" uniqKey="Borel C">C Borel</name>
</author>
<author>
<name sortKey="Ferreira, Pg" uniqKey="Ferreira P">PG Ferreira</name>
</author>
<author>
<name sortKey="Santoni, F" uniqKey="Santoni F">F Santoni</name>
</author>
<author>
<name sortKey="Delaneau, O" uniqKey="Delaneau O">O Delaneau</name>
</author>
<author>
<name sortKey="Fort, A" uniqKey="Fort A">A Fort</name>
</author>
<author>
<name sortKey="Popadin, Ky" uniqKey="Popadin K">KY Popadin</name>
</author>
<author>
<name sortKey="Garieri, M" uniqKey="Garieri M">M Garieri</name>
</author>
<author>
<name sortKey="Falconnet, E" uniqKey="Falconnet E">E Falconnet</name>
</author>
<author>
<name sortKey="Ribaux, P" uniqKey="Ribaux P">P Ribaux</name>
</author>
<author>
<name sortKey="Guipponi, M" uniqKey="Guipponi M">M Guipponi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brais, B" uniqKey="Brais B">B Brais</name>
</author>
<author>
<name sortKey="Bouchard, Jp" uniqKey="Bouchard J">JP Bouchard</name>
</author>
<author>
<name sortKey="Xie, Yg" uniqKey="Xie Y">YG Xie</name>
</author>
<author>
<name sortKey="Rochefort, Dl" uniqKey="Rochefort D">DL Rochefort</name>
</author>
<author>
<name sortKey="Chretien, N" uniqKey="Chretien N">N Chrétien</name>
</author>
<author>
<name sortKey="Tome, Fm" uniqKey="Tome F">FM Tomé</name>
</author>
<author>
<name sortKey="Lafreniere, Rg" uniqKey="Lafreniere R">RG Lafrenière</name>
</author>
<author>
<name sortKey="Rommens, Jm" uniqKey="Rommens J">JM Rommens</name>
</author>
<author>
<name sortKey="Uyama, E" uniqKey="Uyama E">E Uyama</name>
</author>
<author>
<name sortKey="Nohira, O" uniqKey="Nohira O">O Nohira</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Byrd, R" uniqKey="Byrd R">R Byrd</name>
</author>
<author>
<name sortKey="Lu, P" uniqKey="Lu P">P Lu</name>
</author>
<author>
<name sortKey="Nocedal, J" uniqKey="Nocedal J">J Nocedal</name>
</author>
<author>
<name sortKey="Zhu, C" uniqKey="Zhu C">C. Zhu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Castel, Al" uniqKey="Castel A">AL Castel</name>
</author>
<author>
<name sortKey="Cleary, Jd" uniqKey="Cleary J">JD Cleary</name>
</author>
<author>
<name sortKey="Pearson, Ce" uniqKey="Pearson C">CE. Pearson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Mias, Gi" uniqKey="Mias G">GI Mias</name>
</author>
<author>
<name sortKey="Li Pook Than, J" uniqKey="Li Pook Than J">J Li-Pook-Than</name>
</author>
<author>
<name sortKey="Jiang, L" uniqKey="Jiang L">L Jiang</name>
</author>
<author>
<name sortKey="Lam, Hyk" uniqKey="Lam H">HYK Lam</name>
</author>
<author>
<name sortKey="Chen, R" uniqKey="Chen R">R Chen</name>
</author>
<author>
<name sortKey="Miriami, E" uniqKey="Miriami E">E Miriami</name>
</author>
<author>
<name sortKey="Karczewski, Kj" uniqKey="Karczewski K">KJ Karczewski</name>
</author>
<author>
<name sortKey="Hariharan, M" uniqKey="Hariharan M">M Hariharan</name>
</author>
<author>
<name sortKey="Dewey, Fe" uniqKey="Dewey F">FE Dewey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conesa, A" uniqKey="Conesa A">A Conesa</name>
</author>
<author>
<name sortKey="Madrigal, P" uniqKey="Madrigal P">P Madrigal</name>
</author>
<author>
<name sortKey="Tarazona, S" uniqKey="Tarazona S">S Tarazona</name>
</author>
<author>
<name sortKey="Gomez Cabrero, D" uniqKey="Gomez Cabrero D">D Gomez-Cabrero</name>
</author>
<author>
<name sortKey="Cervera, A" uniqKey="Cervera A">A Cervera</name>
</author>
<author>
<name sortKey="Mcpherson, A" uniqKey="Mcpherson A">A McPherson</name>
</author>
<author>
<name sortKey="Szcze Niak, Mw" uniqKey="Szcze Niak M">MW Szcześniak</name>
</author>
<author>
<name sortKey="Gaffney, Dj" uniqKey="Gaffney D">DJ Gaffney</name>
</author>
<author>
<name sortKey="Elo, Ll" uniqKey="Elo L">LL Elo</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corneille, S" uniqKey="Corneille S">S Corneille</name>
</author>
<author>
<name sortKey="Lutz, K" uniqKey="Lutz K">K Lutz</name>
</author>
<author>
<name sortKey="Maliga, P" uniqKey="Maliga P">P. Maliga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Danecek, P" uniqKey="Danecek P">P Danecek</name>
</author>
<author>
<name sortKey="Nell Ker, C" uniqKey="Nell Ker C">C Nellåker</name>
</author>
<author>
<name sortKey="Mcintyre, Re" uniqKey="Mcintyre R">RE McIntyre</name>
</author>
<author>
<name sortKey="Buendia Buendia, Je" uniqKey="Buendia Buendia J">JE Buendia-Buendia</name>
</author>
<author>
<name sortKey="Bumpstead, S" uniqKey="Bumpstead S">S Bumpstead</name>
</author>
<author>
<name sortKey="Ponting, Cp" uniqKey="Ponting C">CP Ponting</name>
</author>
<author>
<name sortKey="Flint, J" uniqKey="Flint J">J Flint</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Keane, Tm" uniqKey="Keane T">TM Keane</name>
</author>
<author>
<name sortKey="Adams, Dj" uniqKey="Adams D">DJ. Adams</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dare, Jt" uniqKey="Dare J">JT DaRe</name>
</author>
<author>
<name sortKey="Vasta, V" uniqKey="Vasta V">V Vasta</name>
</author>
<author>
<name sortKey="Penn, J" uniqKey="Penn J">J Penn</name>
</author>
<author>
<name sortKey="Tran, N Tb" uniqKey="Tran N">N-TB Tran</name>
</author>
<author>
<name sortKey="Hahn, Sh" uniqKey="Hahn S">SH. Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Wit, P" uniqKey="De Wit P">P De Wit</name>
</author>
<author>
<name sortKey="Pespeni, Mh" uniqKey="Pespeni M">MH Pespeni</name>
</author>
<author>
<name sortKey="Ladner, Jt" uniqKey="Ladner J">JT Ladner</name>
</author>
<author>
<name sortKey="Barshis, Dj" uniqKey="Barshis D">DJ Barshis</name>
</author>
<author>
<name sortKey="Seneca, F" uniqKey="Seneca F">F Seneca</name>
</author>
<author>
<name sortKey="Jaris, H" uniqKey="Jaris H">H Jaris</name>
</author>
<author>
<name sortKey="Therkildsen, No" uniqKey="Therkildsen N">NO Therkildsen</name>
</author>
<author>
<name sortKey="Morikawa, M" uniqKey="Morikawa M">M Morikawa</name>
</author>
<author>
<name sortKey="Palumbi, Sr" uniqKey="Palumbi S">SR. Palumbi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Denver, Dr" uniqKey="Denver D">DR Denver</name>
</author>
<author>
<name sortKey="Morris, K" uniqKey="Morris K">K Morris</name>
</author>
<author>
<name sortKey="Kewalramani, A" uniqKey="Kewalramani A">A Kewalramani</name>
</author>
<author>
<name sortKey="Harris, Ke" uniqKey="Harris K">KE Harris</name>
</author>
<author>
<name sortKey="Chow, A" uniqKey="Chow A">A Chow</name>
</author>
<author>
<name sortKey="Estes, S" uniqKey="Estes S">S Estes</name>
</author>
<author>
<name sortKey="Lynch, M" uniqKey="Lynch M">M Lynch</name>
</author>
<author>
<name sortKey="Thomas, Wk" uniqKey="Thomas W">WK. Thomas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dobin, A" uniqKey="Dobin A">A Dobin</name>
</author>
<author>
<name sortKey="Davis, Ca" uniqKey="Davis C">CA Davis</name>
</author>
<author>
<name sortKey="Schlesinger, F" uniqKey="Schlesinger F">F Schlesinger</name>
</author>
<author>
<name sortKey="Drenkow, J" uniqKey="Drenkow J">J Drenkow</name>
</author>
<author>
<name sortKey="Zaleski, C" uniqKey="Zaleski C">C Zaleski</name>
</author>
<author>
<name sortKey="Jha, S" uniqKey="Jha S">S Jha</name>
</author>
<author>
<name sortKey="Batut, P" uniqKey="Batut P">P Batut</name>
</author>
<author>
<name sortKey="Chaisson, M" uniqKey="Chaisson M">M Chaisson</name>
</author>
<author>
<name sortKey="Gingeras, Tr" uniqKey="Gingeras T">TR. Gingeras</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Drake, Jw" uniqKey="Drake J">JW Drake</name>
</author>
<author>
<name sortKey="Charlesworth, B" uniqKey="Charlesworth B">B Charlesworth</name>
</author>
<author>
<name sortKey="Charlesworth, D" uniqKey="Charlesworth D">D Charlesworth</name>
</author>
<author>
<name sortKey="Crow, Jf" uniqKey="Crow J">JF. Crow</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA Eckert</name>
</author>
<author>
<name sortKey="Kunkel, Ta" uniqKey="Kunkel T">TA. Kunkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA Eckert</name>
</author>
<author>
<name sortKey="Kunkel, Ta" uniqKey="Kunkel T">TA. Kunkel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ellegren, H" uniqKey="Ellegren H">H. Ellegren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ellegren, H" uniqKey="Ellegren H">H. Ellegren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elowitz, Mb" uniqKey="Elowitz M">MB Elowitz</name>
</author>
<author>
<name sortKey="Levine, Aj" uniqKey="Levine A">AJ Levine</name>
</author>
<author>
<name sortKey="Siggia, Ed" uniqKey="Siggia E">ED Siggia</name>
</author>
<author>
<name sortKey="Swain, Ps" uniqKey="Swain P">PS. Swain</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feng, C" uniqKey="Feng C">C Feng</name>
</author>
<author>
<name sortKey="Chen, M" uniqKey="Chen M">M Chen</name>
</author>
<author>
<name sortKey="Xu, C" uniqKey="Xu C">C Xu</name>
</author>
<author>
<name sortKey="Bai, L" uniqKey="Bai L">L Bai</name>
</author>
<author>
<name sortKey="Yin, X" uniqKey="Yin X">X Yin</name>
</author>
<author>
<name sortKey="Li, X" uniqKey="Li X">X Li</name>
</author>
<author>
<name sortKey="Allan, Ac" uniqKey="Allan A">AC Allan</name>
</author>
<author>
<name sortKey="Ferguson, Ib" uniqKey="Ferguson I">IB Ferguson</name>
</author>
<author>
<name sortKey="Chen, K" uniqKey="Chen K">K. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fungtammasan, A" uniqKey="Fungtammasan A">A Fungtammasan</name>
</author>
<author>
<name sortKey="Ananda, G" uniqKey="Ananda G">G Ananda</name>
</author>
<author>
<name sortKey="Hile, Se" uniqKey="Hile S">SE Hile</name>
</author>
<author>
<name sortKey="Su, Ms W" uniqKey="Su M">MS-W Su</name>
</author>
<author>
<name sortKey="Sun, C" uniqKey="Sun C">C Sun</name>
</author>
<author>
<name sortKey="Harris, R" uniqKey="Harris R">R Harris</name>
</author>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P Medvedev</name>
</author>
<author>
<name sortKey="Eckert, K" uniqKey="Eckert K">K Eckert</name>
</author>
<author>
<name sortKey="Makova, Kd" uniqKey="Makova K">KD. Makova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garber, M" uniqKey="Garber M">M Garber</name>
</author>
<author>
<name sortKey="Grabherr, Mg" uniqKey="Grabherr M">MG Grabherr</name>
</author>
<author>
<name sortKey="Guttman, M" uniqKey="Guttman M">M Guttman</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C. Trapnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garrett, S" uniqKey="Garrett S">S Garrett</name>
</author>
<author>
<name sortKey="Rosenthal, Jjc" uniqKey="Rosenthal J">JJC. Rosenthal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garrett, S" uniqKey="Garrett S">S Garrett</name>
</author>
<author>
<name sortKey="Rosenthal, Jjc" uniqKey="Rosenthal J">JJC. Rosenthal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gayral, P" uniqKey="Gayral P">P Gayral</name>
</author>
<author>
<name sortKey="Melo Ferreira, J" uniqKey="Melo Ferreira J">J Melo-Ferreira</name>
</author>
<author>
<name sortKey="Glemin, S" uniqKey="Glemin S">S Glémin</name>
</author>
<author>
<name sortKey="Bierne, N" uniqKey="Bierne N">N Bierne</name>
</author>
<author>
<name sortKey="Carneiro, M" uniqKey="Carneiro M">M Carneiro</name>
</author>
<author>
<name sortKey="Nabholz, B" uniqKey="Nabholz B">B Nabholz</name>
</author>
<author>
<name sortKey="Lourenco, Jm" uniqKey="Lourenco J">JM Lourenco</name>
</author>
<author>
<name sortKey="Alves, Pc" uniqKey="Alves P">PC Alves</name>
</author>
<author>
<name sortKey="Ballenghien, M" uniqKey="Ballenghien M">M Ballenghien</name>
</author>
<author>
<name sortKey="Faivre, N" uniqKey="Faivre N">N Faivre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giardine, B" uniqKey="Giardine B">B Giardine</name>
</author>
<author>
<name sortKey="Riemer, C" uniqKey="Riemer C">C Riemer</name>
</author>
<author>
<name sortKey="Hardison, Rc" uniqKey="Hardison R">RC Hardison</name>
</author>
<author>
<name sortKey="Burhans, R" uniqKey="Burhans R">R Burhans</name>
</author>
<author>
<name sortKey="Elnitski, L" uniqKey="Elnitski L">L Elnitski</name>
</author>
<author>
<name sortKey="Shah, P" uniqKey="Shah P">P Shah</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Blankenberg, D" uniqKey="Blankenberg D">D Blankenberg</name>
</author>
<author>
<name sortKey="Albert, I" uniqKey="Albert I">I Albert</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goecks, J" uniqKey="Goecks J">J Goecks</name>
</author>
<author>
<name sortKey="Nekrutenko, A" uniqKey="Nekrutenko A">A Nekrutenko</name>
</author>
<author>
<name sortKey="Taylor, J" uniqKey="Taylor J">J. Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gommans, Wm" uniqKey="Gommans W">WM Gommans</name>
</author>
<author>
<name sortKey="Mullen, Sp" uniqKey="Mullen S">SP Mullen</name>
</author>
<author>
<name sortKey="Maas, S" uniqKey="Maas S">S. Maas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gout, J F" uniqKey="Gout J">J-F Gout</name>
</author>
<author>
<name sortKey="Thomas, Wk" uniqKey="Thomas W">WK Thomas</name>
</author>
<author>
<name sortKey="Smith, Z" uniqKey="Smith Z">Z Smith</name>
</author>
<author>
<name sortKey="Okamoto, K" uniqKey="Okamoto K">K Okamoto</name>
</author>
<author>
<name sortKey="Lynch, M" uniqKey="Lynch M">M. Lynch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gowen, Cm" uniqKey="Gowen C">CM Gowen</name>
</author>
<author>
<name sortKey="Fong, Ss" uniqKey="Fong S">SS. Fong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Griffin, Hr" uniqKey="Griffin H">HR Griffin</name>
</author>
<author>
<name sortKey="Pyle, A" uniqKey="Pyle A">A Pyle</name>
</author>
<author>
<name sortKey="Blakely, El" uniqKey="Blakely E">EL Blakely</name>
</author>
<author>
<name sortKey="Alston, Cl" uniqKey="Alston C">CL Alston</name>
</author>
<author>
<name sortKey="Duff, J" uniqKey="Duff J">J Duff</name>
</author>
<author>
<name sortKey="Hudson, G" uniqKey="Hudson G">G Hudson</name>
</author>
<author>
<name sortKey="Horvath, R" uniqKey="Horvath R">R Horvath</name>
</author>
<author>
<name sortKey="Wilson, Ij" uniqKey="Wilson I">IJ Wilson</name>
</author>
<author>
<name sortKey="Santibanez Koref, M" uniqKey="Santibanez Koref M">M Santibanez-Koref</name>
</author>
<author>
<name sortKey="Taylor, Rw" uniqKey="Taylor R">RW Taylor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ardlie, Kg" uniqKey="Ardlie K">KG Ardlie</name>
</author>
<author>
<name sortKey="Deluca, Ds" uniqKey="Deluca D">DS Deluca</name>
</author>
<author>
<name sortKey="Segre, Av" uniqKey="Segre A">AV Segrè</name>
</author>
<author>
<name sortKey="Sullivan, Tj" uniqKey="Sullivan T">TJ Sullivan</name>
</author>
<author>
<name sortKey="Young, Tr" uniqKey="Young T">TR Young</name>
</author>
<author>
<name sortKey="Gelfand, Et" uniqKey="Gelfand E">ET Gelfand</name>
</author>
<author>
<name sortKey="Trowbridge, Ca" uniqKey="Trowbridge C">CA Trowbridge</name>
</author>
<author>
<name sortKey="Maller, Jb" uniqKey="Maller J">JB Maller</name>
</author>
<author>
<name sortKey="Tukiainen, T" uniqKey="Tukiainen T">T Tukiainen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gu, T" uniqKey="Gu T">T Gu</name>
</author>
<author>
<name sortKey="Gatti, Dm" uniqKey="Gatti D">DM Gatti</name>
</author>
<author>
<name sortKey="Srivastava, A" uniqKey="Srivastava A">A Srivastava</name>
</author>
<author>
<name sortKey="Snyder, Em" uniqKey="Snyder E">EM Snyder</name>
</author>
<author>
<name sortKey="Raghupathy, N" uniqKey="Raghupathy N">N Raghupathy</name>
</author>
<author>
<name sortKey="Simecek, P" uniqKey="Simecek P">P Simecek</name>
</author>
<author>
<name sortKey="Svenson, Kl" uniqKey="Svenson K">KL Svenson</name>
</author>
<author>
<name sortKey="Dotu, I" uniqKey="Dotu I">I Dotu</name>
</author>
<author>
<name sortKey="Chuang, Jh" uniqKey="Chuang J">JH Chuang</name>
</author>
<author>
<name sortKey="Keller, Mp" uniqKey="Keller M">MP Keller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guo, Y" uniqKey="Guo Y">Y Guo</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Li, C I" uniqKey="Li C">C-I Li</name>
</author>
<author>
<name sortKey="Shyr, Y" uniqKey="Shyr Y">Y Shyr</name>
</author>
<author>
<name sortKey="Samuels, Dc" uniqKey="Samuels D">DC. Samuels</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gupta, Pk" uniqKey="Gupta P">PK Gupta</name>
</author>
<author>
<name sortKey="Varshney, Rk" uniqKey="Varshney R">RK. Varshney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gymrek, M" uniqKey="Gymrek M">M Gymrek</name>
</author>
<author>
<name sortKey="Golan, D" uniqKey="Golan D">D Golan</name>
</author>
<author>
<name sortKey="Rosset, S" uniqKey="Rosset S">S Rosset</name>
</author>
<author>
<name sortKey="Erlich, Y" uniqKey="Erlich Y">Y. Erlich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ho, M R" uniqKey="Ho M">M-R Ho</name>
</author>
<author>
<name sortKey="Tsai, K W" uniqKey="Tsai K">K-W Tsai</name>
</author>
<author>
<name sortKey="Chen, C" uniqKey="Chen C">C Chen</name>
</author>
<author>
<name sortKey="Lin, W" uniqKey="Lin W">W. Lin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ibrahim, Me" uniqKey="Ibrahim M">ME Ibrahim</name>
</author>
<author>
<name sortKey="Mahdi, Ma" uniqKey="Mahdi M">MA Mahdi</name>
</author>
<author>
<name sortKey="Bereir, Re" uniqKey="Bereir R">RE Bereir</name>
</author>
<author>
<name sortKey="Giha, Rs" uniqKey="Giha R">RS Giha</name>
</author>
<author>
<name sortKey="Wasunna, C" uniqKey="Wasunna C">C. Wasunna</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ji, J" uniqKey="Ji J">J Ji</name>
</author>
<author>
<name sortKey="Loeb, La" uniqKey="Loeb L">LA. Loeb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaufmann, Bb" uniqKey="Kaufmann B">BB Kaufmann</name>
</author>
<author>
<name sortKey="Van Oudenaarden, A" uniqKey="Van Oudenaarden A">A. van Oudenaarden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelkar, Yd" uniqKey="Kelkar Y">YD Kelkar</name>
</author>
<author>
<name sortKey="Strubczewski, N" uniqKey="Strubczewski N">N Strubczewski</name>
</author>
<author>
<name sortKey="Hile, Se" uniqKey="Hile S">SE Hile</name>
</author>
<author>
<name sortKey="Chiaromonte, F" uniqKey="Chiaromonte F">F Chiaromonte</name>
</author>
<author>
<name sortKey="Eckert, Ka" uniqKey="Eckert K">KA Eckert</name>
</author>
<author>
<name sortKey="Makova, Kd" uniqKey="Makova K">KD. Makova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khatri, P" uniqKey="Khatri P">P Khatri</name>
</author>
<author>
<name sortKey="Sirota, M" uniqKey="Sirota M">M Sirota</name>
</author>
<author>
<name sortKey="Butte, Aj" uniqKey="Butte A">AJ. Butte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Pertea, G" uniqKey="Pertea G">G Pertea</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pimentel, H" uniqKey="Pimentel H">H Pimentel</name>
</author>
<author>
<name sortKey="Kelley, R" uniqKey="Kelley R">R Kelley</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kimura, M" uniqKey="Kimura M">M Kimura</name>
</author>
<author>
<name sortKey="Ohta, T" uniqKey="Ohta T">T. Ohta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kleinman, Cl" uniqKey="Kleinman C">CL Kleinman</name>
</author>
<author>
<name sortKey="Majewski, J" uniqKey="Majewski J">J. Majewski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Knippa, K" uniqKey="Knippa K">K Knippa</name>
</author>
<author>
<name sortKey="Peterson, Do" uniqKey="Peterson D">DO. Peterson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kong, A" uniqKey="Kong A">A Kong</name>
</author>
<author>
<name sortKey="Frigge, Ml" uniqKey="Frigge M">ML Frigge</name>
</author>
<author>
<name sortKey="Masson, G" uniqKey="Masson G">G Masson</name>
</author>
<author>
<name sortKey="Besenbacher, S" uniqKey="Besenbacher S">S Besenbacher</name>
</author>
<author>
<name sortKey="Sulem, P" uniqKey="Sulem P">P Sulem</name>
</author>
<author>
<name sortKey="Magnusson, G" uniqKey="Magnusson G">G Magnusson</name>
</author>
<author>
<name sortKey="Gudjonsson, Sa" uniqKey="Gudjonsson S">SA Gudjonsson</name>
</author>
<author>
<name sortKey="Sigurdsson, A" uniqKey="Sigurdsson A">A Sigurdsson</name>
</author>
<author>
<name sortKey="Jonasdottir, A" uniqKey="Jonasdottir A">A Jonasdottir</name>
</author>
<author>
<name sortKey="Jonasdottir, A" uniqKey="Jonasdottir A">A Jonasdottir</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, D" uniqKey="Lee D">D Lee</name>
</author>
<author>
<name sortKey="Smallbone, K" uniqKey="Smallbone K">K Smallbone</name>
</author>
<author>
<name sortKey="Dunn, Wb" uniqKey="Dunn W">WB Dunn</name>
</author>
<author>
<name sortKey="Murabito, E" uniqKey="Murabito E">E Murabito</name>
</author>
<author>
<name sortKey="Winder, Cl" uniqKey="Winder C">CL Winder</name>
</author>
<author>
<name sortKey="Kell, Db" uniqKey="Kell D">DB Kell</name>
</author>
<author>
<name sortKey="Mendes, P" uniqKey="Mendes P">P Mendes</name>
</author>
<author>
<name sortKey="Swainston, N" uniqKey="Swainston N">N. Swainston</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Legendre, M" uniqKey="Legendre M">M Legendre</name>
</author>
<author>
<name sortKey="Pochet, N" uniqKey="Pochet N">N Pochet</name>
</author>
<author>
<name sortKey="Pak, T" uniqKey="Pak T">T Pak</name>
</author>
<author>
<name sortKey="Verstrepen, Kj" uniqKey="Verstrepen K">KJ. Verstrepen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Leung, D" uniqKey="Leung D">D Leung</name>
</author>
<author>
<name sortKey="Jung, I" uniqKey="Jung I">I Jung</name>
</author>
<author>
<name sortKey="Rajagopal, N" uniqKey="Rajagopal N">N Rajagopal</name>
</author>
<author>
<name sortKey="Schmitt, A" uniqKey="Schmitt A">A Schmitt</name>
</author>
<author>
<name sortKey="Selvaraj, S" uniqKey="Selvaraj S">S Selvaraj</name>
</author>
<author>
<name sortKey="Lee, Ay" uniqKey="Lee A">AY Lee</name>
</author>
<author>
<name sortKey="Yen, C A" uniqKey="Yen C">C-A Yen</name>
</author>
<author>
<name sortKey="Lin, S" uniqKey="Lin S">S Lin</name>
</author>
<author>
<name sortKey="Lin, Y" uniqKey="Lin Y">Y Lin</name>
</author>
<author>
<name sortKey="Qiu, Y" uniqKey="Qiu Y">Y Qiu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Jb" uniqKey="Li J">JB Li</name>
</author>
<author>
<name sortKey="Levanon, Ey" uniqKey="Levanon E">EY Levanon</name>
</author>
<author>
<name sortKey="Yoon, J K" uniqKey="Yoon J">J-K Yoon</name>
</author>
<author>
<name sortKey="Aach, J" uniqKey="Aach J">J Aach</name>
</author>
<author>
<name sortKey="Xie, B" uniqKey="Xie B">B Xie</name>
</author>
<author>
<name sortKey="Leproust, E" uniqKey="Leproust E">E LeProust</name>
</author>
<author>
<name sortKey="Zhang, K" uniqKey="Zhang K">K Zhang</name>
</author>
<author>
<name sortKey="Gao, Y" uniqKey="Gao Y">Y Gao</name>
</author>
<author>
<name sortKey="Church, Gm" uniqKey="Church G">GM. Church</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author>
<name sortKey="Wang, Ix" uniqKey="Wang I">IX Wang</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Bruzel, A" uniqKey="Bruzel A">A Bruzel</name>
</author>
<author>
<name sortKey="Richards, Al" uniqKey="Richards A">AL Richards</name>
</author>
<author>
<name sortKey="Toung, Jm" uniqKey="Toung J">JM Toung</name>
</author>
<author>
<name sortKey="Cheung, Vg" uniqKey="Cheung V">VG. Cheung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lin, W" uniqKey="Lin W">W Lin</name>
</author>
<author>
<name sortKey="Piskol, R" uniqKey="Piskol R">R Piskol</name>
</author>
<author>
<name sortKey="Tan, Mh" uniqKey="Tan M">MH Tan</name>
</author>
<author>
<name sortKey="Li, Jb" uniqKey="Li J">JB. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macaulay, Ic" uniqKey="Macaulay I">IC Macaulay</name>
</author>
<author>
<name sortKey="Haerty, W" uniqKey="Haerty W">W Haerty</name>
</author>
<author>
<name sortKey="Kumar, P" uniqKey="Kumar P">P Kumar</name>
</author>
<author>
<name sortKey="Li, Yi" uniqKey="Li Y">YI Li</name>
</author>
<author>
<name sortKey="Hu, Tx" uniqKey="Hu T">TX Hu</name>
</author>
<author>
<name sortKey="Teng, Mj" uniqKey="Teng M">MJ Teng</name>
</author>
<author>
<name sortKey="Goolam, M" uniqKey="Goolam M">M Goolam</name>
</author>
<author>
<name sortKey="Saurat, N" uniqKey="Saurat N">N Saurat</name>
</author>
<author>
<name sortKey="Coupland, P" uniqKey="Coupland P">P Coupland</name>
</author>
<author>
<name sortKey="Shirley, Lm" uniqKey="Shirley L">LM Shirley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madsen, Be" uniqKey="Madsen B">BE Madsen</name>
</author>
<author>
<name sortKey="Villesen, P" uniqKey="Villesen P">P Villesen</name>
</author>
<author>
<name sortKey="Wiuf, C" uniqKey="Wiuf C">C. Wiuf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Malouf, R" uniqKey="Malouf R">R. Malouf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccarthy, Dj" uniqKey="Mccarthy D">DJ McCarthy</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Smyth, Gk" uniqKey="Smyth G">GK. Smyth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Miah, G" uniqKey="Miah G">G Miah</name>
</author>
<author>
<name sortKey="Rafii, My" uniqKey="Rafii M">MY Rafii</name>
</author>
<author>
<name sortKey="Ismail, Mr" uniqKey="Ismail M">MR Ismail</name>
</author>
<author>
<name sortKey="Puteh, Ab" uniqKey="Puteh A">AB Puteh</name>
</author>
<author>
<name sortKey="Rahim, Ha" uniqKey="Rahim H">HA Rahim</name>
</author>
<author>
<name sortKey="Islam, Kn" uniqKey="Islam K">KN Islam</name>
</author>
<author>
<name sortKey="Latif, Ma" uniqKey="Latif M">MA. Latif</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ninio, J" uniqKey="Ninio J">J. Ninio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O Uallachain, M" uniqKey="O Uallachain M">M O’Huallachain</name>
</author>
<author>
<name sortKey="Karczewski, Kj" uniqKey="Karczewski K">KJ Karczewski</name>
</author>
<author>
<name sortKey="Weissman, Sm" uniqKey="Weissman S">SM Weissman</name>
</author>
<author>
<name sortKey="Urban, Ae" uniqKey="Urban A">AE Urban</name>
</author>
<author>
<name sortKey="Snyder, Mp" uniqKey="Snyder M">MP. Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oshlack, A" uniqKey="Oshlack A">A Oshlack</name>
</author>
<author>
<name sortKey="Robinson, Md" uniqKey="Robinson M">MD Robinson</name>
</author>
<author>
<name sortKey="Young, Md" uniqKey="Young M">MD. Young</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ozbudak, Em" uniqKey="Ozbudak E">EM Ozbudak</name>
</author>
<author>
<name sortKey="Thattai, M" uniqKey="Thattai M">M Thattai</name>
</author>
<author>
<name sortKey="Kurtser, I" uniqKey="Kurtser I">I Kurtser</name>
</author>
<author>
<name sortKey="Grossman, Ad" uniqKey="Grossman A">AD Grossman</name>
</author>
<author>
<name sortKey="Van Oudenaarden, A" uniqKey="Van Oudenaarden A">A. van Oudenaarden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ozsolak, F" uniqKey="Ozsolak F">F Ozsolak</name>
</author>
<author>
<name sortKey="Milos, Pm" uniqKey="Milos P">PM. Milos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Park, E" uniqKey="Park E">E Park</name>
</author>
<author>
<name sortKey="Williams, B" uniqKey="Williams B">B Williams</name>
</author>
<author>
<name sortKey="Wold, Bj" uniqKey="Wold B">BJ Wold</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A. Mortazavi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pearson, Ce" uniqKey="Pearson C">CE Pearson</name>
</author>
<author>
<name sortKey="Edamura, Kn" uniqKey="Edamura K">KN Edamura</name>
</author>
<author>
<name sortKey="Cleary, Jd" uniqKey="Cleary J">JD. Cleary</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Z" uniqKey="Peng Z">Z Peng</name>
</author>
<author>
<name sortKey="Cheng, Y" uniqKey="Cheng Y">Y Cheng</name>
</author>
<author>
<name sortKey="Tan, Bc M" uniqKey="Tan B">BC-M Tan</name>
</author>
<author>
<name sortKey="Kang, L" uniqKey="Kang L">L Kang</name>
</author>
<author>
<name sortKey="Tian, Z" uniqKey="Tian Z">Z Tian</name>
</author>
<author>
<name sortKey="Zhu, Y" uniqKey="Zhu Y">Y Zhu</name>
</author>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W Zhang</name>
</author>
<author>
<name sortKey="Liang, Y" uniqKey="Liang Y">Y Liang</name>
</author>
<author>
<name sortKey="Hu, X" uniqKey="Hu X">X Hu</name>
</author>
<author>
<name sortKey="Tan, X" uniqKey="Tan X">X Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Perez, Jd" uniqKey="Perez J">JD Perez</name>
</author>
<author>
<name sortKey="Rubinstein, Nd" uniqKey="Rubinstein N">ND Rubinstein</name>
</author>
<author>
<name sortKey="Fernandez, De" uniqKey="Fernandez D">DE Fernandez</name>
</author>
<author>
<name sortKey="Santoro, Sw" uniqKey="Santoro S">SW Santoro</name>
</author>
<author>
<name sortKey="Needleman, La" uniqKey="Needleman L">LA Needleman</name>
</author>
<author>
<name sortKey="Ho Shing, O" uniqKey="Ho Shing O">O Ho-Shing</name>
</author>
<author>
<name sortKey="Choi, Jj" uniqKey="Choi J">JJ Choi</name>
</author>
<author>
<name sortKey="Zirlinger, M" uniqKey="Zirlinger M">M Zirlinger</name>
</author>
<author>
<name sortKey="Chen, S K" uniqKey="Chen S">S-K Chen</name>
</author>
<author>
<name sortKey="Liu, Js" uniqKey="Liu J">JS Liu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pickrell, Jk" uniqKey="Pickrell J">JK Pickrell</name>
</author>
<author>
<name sortKey="Gilad, Y" uniqKey="Gilad Y">Y Gilad</name>
</author>
<author>
<name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK. Pritchard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Quail, Ma" uniqKey="Quail M">MA Quail</name>
</author>
<author>
<name sortKey="Smith, M" uniqKey="Smith M">M Smith</name>
</author>
<author>
<name sortKey="Coupland, P" uniqKey="Coupland P">P Coupland</name>
</author>
<author>
<name sortKey="Otto, Td" uniqKey="Otto T">TD Otto</name>
</author>
<author>
<name sortKey="Harris, Sr" uniqKey="Harris S">SR Harris</name>
</author>
<author>
<name sortKey="Connor, Tr" uniqKey="Connor T">TR Connor</name>
</author>
<author>
<name sortKey="Bertoni, A" uniqKey="Bertoni A">A Bertoni</name>
</author>
<author>
<name sortKey="Swerdlow, Hp" uniqKey="Swerdlow H">HP Swerdlow</name>
</author>
<author>
<name sortKey="Gu, Y" uniqKey="Gu Y">Y. Gu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramaswami, G" uniqKey="Ramaswami G">G Ramaswami</name>
</author>
<author>
<name sortKey="Lin, W" uniqKey="Lin W">W Lin</name>
</author>
<author>
<name sortKey="Piskol, R" uniqKey="Piskol R">R Piskol</name>
</author>
<author>
<name sortKey="Tan, Mh" uniqKey="Tan M">MH Tan</name>
</author>
<author>
<name sortKey="Davis, C" uniqKey="Davis C">C Davis</name>
</author>
<author>
<name sortKey="Li, Jb" uniqKey="Li J">JB. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramaswami, G" uniqKey="Ramaswami G">G Ramaswami</name>
</author>
<author>
<name sortKey="Zhang, R" uniqKey="Zhang R">R Zhang</name>
</author>
<author>
<name sortKey="Piskol, R" uniqKey="Piskol R">R Piskol</name>
</author>
<author>
<name sortKey="Keegan, Lp" uniqKey="Keegan L">LP Keegan</name>
</author>
<author>
<name sortKey="Deng, P" uniqKey="Deng P">P Deng</name>
</author>
<author>
<name sortKey="O Onnell, Ma" uniqKey="O Onnell M">MA O’Connell</name>
</author>
<author>
<name sortKey="Li, Jb" uniqKey="Li J">JB. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raser, Jm" uniqKey="Raser J">JM Raser</name>
</author>
<author>
<name sortKey="O Hea, Ek" uniqKey="O Hea E">EK. O’Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rieder, Le" uniqKey="Rieder L">LE Rieder</name>
</author>
<author>
<name sortKey="Savva, Ya" uniqKey="Savva Y">YA Savva</name>
</author>
<author>
<name sortKey="Reyna, Ma" uniqKey="Reyna M">MA Reyna</name>
</author>
<author>
<name sortKey="Chang, Y J" uniqKey="Chang Y">Y-J Chang</name>
</author>
<author>
<name sortKey="Dorsky, Js" uniqKey="Dorsky J">JS Dorsky</name>
</author>
<author>
<name sortKey="Rezaei, A" uniqKey="Rezaei A">A Rezaei</name>
</author>
<author>
<name sortKey="Reenan, Ra" uniqKey="Reenan R">RA. Reenan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rienzo, Ad" uniqKey="Rienzo A">AD Rienzo</name>
</author>
<author>
<name sortKey="Donnelly, P" uniqKey="Donnelly P">P Donnelly</name>
</author>
<author>
<name sortKey="Toomajian, C" uniqKey="Toomajian C">C Toomajian</name>
</author>
<author>
<name sortKey="Sisk, B" uniqKey="Sisk B">B Sisk</name>
</author>
<author>
<name sortKey="Hill, A" uniqKey="Hill A">A Hill</name>
</author>
<author>
<name sortKey="Petzl Erler, Ml" uniqKey="Petzl Erler M">ML Petzl-Erler</name>
</author>
<author>
<name sortKey="Haines, Gk" uniqKey="Haines G">GK Haines</name>
</author>
<author>
<name sortKey="Barch, Dh" uniqKey="Barch D">DH. Barch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ross, Mg" uniqKey="Ross M">MG Ross</name>
</author>
<author>
<name sortKey="Russ, C" uniqKey="Russ C">C Russ</name>
</author>
<author>
<name sortKey="Costello, M" uniqKey="Costello M">M Costello</name>
</author>
<author>
<name sortKey="Hollinger, A" uniqKey="Hollinger A">A Hollinger</name>
</author>
<author>
<name sortKey="Lennon, Nj" uniqKey="Lennon N">NJ Lennon</name>
</author>
<author>
<name sortKey="Hegarty, R" uniqKey="Hegarty R">R Hegarty</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Jaffe, Db" uniqKey="Jaffe D">DB. Jaffe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sainudiin, R" uniqKey="Sainudiin R">R Sainudiin</name>
</author>
<author>
<name sortKey="Durrett, Rt" uniqKey="Durrett R">RT Durrett</name>
</author>
<author>
<name sortKey="Aquadro, Cf" uniqKey="Aquadro C">CF Aquadro</name>
</author>
<author>
<name sortKey="Nielsen, R" uniqKey="Nielsen R">R. Nielsen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saliba, A E" uniqKey="Saliba A">A-E Saliba</name>
</author>
<author>
<name sortKey="Westermann, Aj" uniqKey="Westermann A">AJ Westermann</name>
</author>
<author>
<name sortKey="Gorski, Sa" uniqKey="Gorski S">SA Gorski</name>
</author>
<author>
<name sortKey="Vogel, J" uniqKey="Vogel J">J. Vogel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Samuels, Dc" uniqKey="Samuels D">DC Samuels</name>
</author>
<author>
<name sortKey="Han, L" uniqKey="Han L">L Han</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Quanghu, S" uniqKey="Quanghu S">S Quanghu</name>
</author>
<author>
<name sortKey="Clark, Ta" uniqKey="Clark T">TA Clark</name>
</author>
<author>
<name sortKey="Shyr, Y" uniqKey="Shyr Y">Y Shyr</name>
</author>
<author>
<name sortKey="Guo, Y" uniqKey="Guo Y">Y. Guo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schaub, M" uniqKey="Schaub M">M Schaub</name>
</author>
<author>
<name sortKey="Keller, W" uniqKey="Keller W">W. Keller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schrider, Dr" uniqKey="Schrider D">DR Schrider</name>
</author>
<author>
<name sortKey="Gout, J F" uniqKey="Gout J">J-F Gout</name>
</author>
<author>
<name sortKey="Hahn, Mw" uniqKey="Hahn M">MW. Hahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Strathern, J" uniqKey="Strathern J">J Strathern</name>
</author>
<author>
<name sortKey="Malagon, F" uniqKey="Malagon F">F Malagon</name>
</author>
<author>
<name sortKey="Irvin, J" uniqKey="Irvin J">J Irvin</name>
</author>
<author>
<name sortKey="Gotte, D" uniqKey="Gotte D">D Gotte</name>
</author>
<author>
<name sortKey="Shafer, B" uniqKey="Shafer B">B Shafer</name>
</author>
<author>
<name sortKey="Kireeva, M" uniqKey="Kireeva M">M Kireeva</name>
</author>
<author>
<name sortKey="Lubkowska, L" uniqKey="Lubkowska L">L Lubkowska</name>
</author>
<author>
<name sortKey="Jin, Dj" uniqKey="Jin D">DJ Jin</name>
</author>
<author>
<name sortKey="Kashlev, M" uniqKey="Kashlev M">M. Kashlev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Strathern, Jn" uniqKey="Strathern J">JN Strathern</name>
</author>
<author>
<name sortKey="Jin, Dj" uniqKey="Jin D">DJ Jin</name>
</author>
<author>
<name sortKey="Court, Dl" uniqKey="Court D">DL Court</name>
</author>
<author>
<name sortKey="Kashlev, M" uniqKey="Kashlev M">M. Kashlev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Subramanian, S" uniqKey="Subramanian S">S Subramanian</name>
</author>
<author>
<name sortKey="Mishra, R" uniqKey="Mishra R">R Mishra</name>
</author>
<author>
<name sortKey="Singh, L" uniqKey="Singh L">L. Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, Jx" uniqKey="Sun J">JX Sun</name>
</author>
<author>
<name sortKey="Helgason, A" uniqKey="Helgason A">A Helgason</name>
</author>
<author>
<name sortKey="Masson, G" uniqKey="Masson G">G Masson</name>
</author>
<author>
<name sortKey="Ebenesersd Ttir, Ss" uniqKey="Ebenesersd Ttir S">SS Ebenesersdóttir</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Mallick, S" uniqKey="Mallick S">S Mallick</name>
</author>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
<author>
<name sortKey="Patterson, N" uniqKey="Patterson N">N Patterson</name>
</author>
<author>
<name sortKey="Kong, A" uniqKey="Kong A">A Kong</name>
</author>
<author>
<name sortKey="Reich, D" uniqKey="Reich D">D Reich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sunnucks, P" uniqKey="Sunnucks P">P. Sunnucks</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Hendrickson, Dg" uniqKey="Hendrickson D">DG Hendrickson</name>
</author>
<author>
<name sortKey="Sauvageau, M" uniqKey="Sauvageau M">M Sauvageau</name>
</author>
<author>
<name sortKey="Goff, L" uniqKey="Goff L">L Goff</name>
</author>
<author>
<name sortKey="Rinn, Jl" uniqKey="Rinn J">JL Rinn</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Traverse, Cc" uniqKey="Traverse C">CC Traverse</name>
</author>
<author>
<name sortKey="Ochman, H" uniqKey="Ochman H">H. Ochman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vigouroux, Y" uniqKey="Vigouroux Y">Y Vigouroux</name>
</author>
<author>
<name sortKey="Jaqueth, Js" uniqKey="Jaqueth J">JS Jaqueth</name>
</author>
<author>
<name sortKey="Matsuoka, Y" uniqKey="Matsuoka Y">Y Matsuoka</name>
</author>
<author>
<name sortKey="Smith, Os" uniqKey="Smith O">OS Smith</name>
</author>
<author>
<name sortKey="Beavis, Wd" uniqKey="Beavis W">WD Beavis</name>
</author>
<author>
<name sortKey="Smith, Jsc" uniqKey="Smith J">JSC Smith</name>
</author>
<author>
<name sortKey="Doebley, J" uniqKey="Doebley J">J. Doebley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Valdes, Am" uniqKey="Valdes A">AM Valdes</name>
</author>
<author>
<name sortKey="Slatkin, M" uniqKey="Slatkin M">M Slatkin</name>
</author>
<author>
<name sortKey="Freimer, Nb" uniqKey="Freimer N">NB. Freimer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M. Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wedekind, Je" uniqKey="Wedekind J">JE Wedekind</name>
</author>
<author>
<name sortKey="Dance, Gsc" uniqKey="Dance G">GSC Dance</name>
</author>
<author>
<name sortKey="Sowden, Mp" uniqKey="Sowden M">MP Sowden</name>
</author>
<author>
<name sortKey="Smith, Hc" uniqKey="Smith H">HC. Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilhelm, Bt" uniqKey="Wilhelm B">BT Wilhelm</name>
</author>
<author>
<name sortKey="Landry, J R" uniqKey="Landry J">J-R. Landry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wright, Jm" uniqKey="Wright J">JM Wright</name>
</author>
<author>
<name sortKey="Bentzen, P" uniqKey="Bentzen P">P. Bentzen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, G" uniqKey="Xu G">G Xu</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, G" uniqKey="Xu G">G Xu</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, Yn" uniqKey="Zhou Y">YN Zhou</name>
</author>
<author>
<name sortKey="Lubkowska, L" uniqKey="Lubkowska L">L Lubkowska</name>
</author>
<author>
<name sortKey="Hui, M" uniqKey="Hui M">M Hui</name>
</author>
<author>
<name sortKey="Court, C" uniqKey="Court C">C Court</name>
</author>
<author>
<name sortKey="Chen, S" uniqKey="Chen S">S Chen</name>
</author>
<author>
<name sortKey="Court, Dl" uniqKey="Court D">DL Court</name>
</author>
<author>
<name sortKey="Strathern, J" uniqKey="Strathern J">J Strathern</name>
</author>
<author>
<name sortKey="Jin, Dj" uniqKey="Jin D">DJ Jin</name>
</author>
<author>
<name sortKey="Kashlev, M" uniqKey="Kashlev M">M. Kashlev</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Mol Biol Evol</journal-id>
<journal-id journal-id-type="iso-abbrev">Mol. Biol. Evol</journal-id>
<journal-id journal-id-type="publisher-id">molbev</journal-id>
<journal-id journal-id-type="hwp">molbiolevol</journal-id>
<journal-title-group>
<journal-title>Molecular Biology and Evolution</journal-title>
</journal-title-group>
<issn pub-type="ppub">0737-4038</issn>
<issn pub-type="epub">1537-1719</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">27413049</article-id>
<article-id pub-id-type="pmc">5026258</article-id>
<article-id pub-id-type="doi">10.1093/molbev/msw139</article-id>
<article-id pub-id-type="publisher-id">msw139</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Fungtammasan</surname>
<given-names>Arkarachai</given-names>
</name>
<xref ref-type="aff" rid="msw139-aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tomaszkiewicz</surname>
<given-names>Marta</given-names>
</name>
<xref ref-type="aff" rid="msw139-aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Campos-Sánchez</surname>
<given-names>Rebeca</given-names>
</name>
<xref ref-type="author-notes" rid="msw139-FM1">
<sup></sup>
</xref>
<xref ref-type="aff" rid="msw139-aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Eckert</surname>
<given-names>Kristin A.</given-names>
</name>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>DeGiorgio</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="corresp" rid="msw139-cor1">*</xref>
<xref ref-type="aff" rid="msw139-aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff6">
<sup>6</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Makova</surname>
<given-names>Kateryna D.</given-names>
</name>
<xref ref-type="corresp" rid="msw139-cor1">*</xref>
<xref ref-type="aff" rid="msw139-aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="msw139-aff4">
<sup>4</sup>
</xref>
</contrib>
<aff id="msw139-aff1">
<sup>1</sup>
Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University</aff>
<aff id="msw139-aff2">
<sup>2</sup>
Department of Biology, Pennsylvania State University</aff>
<aff id="msw139-aff3">
<sup>3</sup>
Center for Medical Genomics, Pennsylvania State University</aff>
<aff id="msw139-aff4">
<sup>4</sup>
Huck Institute of Genome Sciences, Pennsylvania State University</aff>
<aff id="msw139-aff5">
<sup>5</sup>
Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine</aff>
<aff id="msw139-aff6">
<sup>6</sup>
Institute for CyberScience, Pennsylvania State University</aff>
</contrib-group>
<author-notes>
<fn id="msw139-FM1">
<p>
<sup></sup>
Present address: Centro De Investigación En Biología Celular Y Molecular, Universidad De Costa Rica, San José, Costa Rica</p>
</fn>
<corresp id="msw139-cor1">*
<bold>Corresponding author:</bold>
E-mail:
<email>kdm16@psu.edu</email>
;
<email>mxd60@psu.edu</email>
.</corresp>
<fn id="msw139-FM2">
<p>
<bold>Associate editor:</bold>
Claus Wilke</p>
</fn>
</author-notes>
<pub-date pub-type="ppub">
<month>10</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub">
<day>12</day>
<month>7</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>12</day>
<month>7</month>
<year>2016</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>33</volume>
<issue>10</issue>
<fpage>2744</fpage>
<lpage>2758</lpage>
<permissions>
<copyright-statement>© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.</copyright-statement>
<copyright-year>2016</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/" license-type="creative-commons">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a
<italic>Caenorhabditis elegans</italic>
data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>microsatellites</kwd>
<kwd>tandem repeats</kwd>
<kwd>RNA sequencing</kwd>
<kwd>RNA–DNA differences</kwd>
<kwd>transcription errors</kwd>
<kwd>reverse transcription errors</kwd>
<kwd>sequencing errors</kwd>
<kwd>error correction model.</kwd>
</kwd-group>
<counts>
<page-count count="15"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Transcription transfers genetic information from DNA to RNA, and multiple types of transcripts (e.g., transfer RNA, ribosomal RNA, messenger RNA, etc.) have critical functions in the cell. Therefore, the modifications or errors that occur in transcripts can lead to phenotypic variation among tissues and individuals. RNA–DNA differences (RDDs) are created by specific enzymatic machinery leading to RNA editing (
<xref rid="msw139-B9" ref-type="bibr">Bass 2002</xref>
;
<xref rid="msw139-B89" ref-type="bibr">Schaub and Keller 2002</xref>
), or arise as RNA polymerase errors during transcription (
<xref rid="msw139-B11" ref-type="bibr">Blank et al. 1986</xref>
;
<xref rid="msw139-B69" ref-type="bibr">Ninio 1991</xref>
;
<xref rid="msw139-B91" ref-type="bibr">Strathern et al. 2012</xref>
,
<xref rid="msw139-B90" ref-type="bibr">2013</xref>
;
<xref rid="msw139-B57" ref-type="bibr">Knippa and Peterson 2013</xref>
;
<xref rid="msw139-B303" ref-type="bibr">Zhou et al. 2013</xref>
). RDDs increase the variability of transcripts and proteins. Note that RDDs can contribute to inherited variation in the sense that the enzymatic machinery responsible for RNA editing is genetically encoded (
<xref rid="msw139-B45" ref-type="bibr">Gu et al. 2016</xref>
). Several loci undergo RNA editing consistently in a large number of species (
<xref rid="msw139-B21" ref-type="bibr">Corneille et al. 2000</xref>
;
<xref rid="msw139-B50" ref-type="bibr">Ibrahim et al. 2008</xref>
;
<xref rid="msw139-B22" ref-type="bibr">Danecek et al. 2012</xref>
). In comparison with mutations, RDDs have lower evolutionary cost because an organism with RDDs can achieve higher phenotypic plasticity while retaining wild-type alleles (
<xref rid="msw139-B40" ref-type="bibr">Gommans et al. 2009</xref>
). As RDDs can enhance the adaptability of an organism to the environment, some level of RDDs is expected to be beneficial (
<xref rid="msw139-B35" ref-type="bibr">Garrett and Rosenthal 2012a</xref>
,
<xref rid="msw139-B36" ref-type="bibr">2012b</xref>
;
<xref rid="msw139-B84" ref-type="bibr">Rieder et al. 2015</xref>
). However, a recent large-scale comparative genomics study found that, although some sites undergoing RNA editing might be under selective constraint (
<xref rid="msw139-B302" ref-type="bibr">Xu and Zhang 2015</xref>
), the majority of them do not have the characteristics of beneficial modifications (
<xref rid="msw139-B301" ref-type="bibr">Xu and Zhang 2014</xref>
).</p>
<p>With the availability of next-generation sequencing (NGS) data from whole genomes and transcriptomes of many different species, RDDs have been extensively studied, particularly with respect to base-substitution RNA editing (
<xref rid="msw139-B61" ref-type="bibr">Li et al. 2009</xref>
,
<xref rid="msw139-B62" ref-type="bibr">2011</xref>
;
<xref rid="msw139-B5" ref-type="bibr">Bahn et al. 2012</xref>
;
<xref rid="msw139-B10" ref-type="bibr">Bass et al. 2012</xref>
;
<xref rid="msw139-B74" ref-type="bibr">Pachter 2012</xref>
;
<xref rid="msw139-B108" ref-type="bibr">Park et al. 2012</xref>
;
<xref rid="msw139-B76" ref-type="bibr">Peng et al. 2012</xref>
;
<xref rid="msw139-B81" ref-type="bibr">Ramaswami et al. 2012</xref>
,
<xref rid="msw139-B82" ref-type="bibr">2013</xref>
). Moreover, RDDs arising from RNA editing were demonstrated in biochemical experiments. For example, adenosine deaminases can transform adenosine into inosine (which is read as guanine by a sequencing instrument) (
<xref rid="msw139-B9" ref-type="bibr">Bass 2002</xref>
;
<xref rid="msw139-B89" ref-type="bibr">Schaub and Keller 2002</xref>
), and apolipoprotein B mRNA editing enzymes can change cytosine into uracil (
<xref rid="msw139-B99" ref-type="bibr">Wedekind et al. 2003</xref>
). Other types of base-substitution RDDs have also been reported (
<xref rid="msw139-B62" ref-type="bibr">Li et al. 2011</xref>
). However, the extent to which technical and methodological errors contribute to RDDs is unclear (
<xref rid="msw139-B10" ref-type="bibr">Bass et al. 2012</xref>
;
<xref rid="msw139-B74" ref-type="bibr">Pachter 2012</xref>
). Some studies identified an excess of RDD sites toward the termini of NGS reads (
<xref rid="msw139-B62" ref-type="bibr">Li et al. 2011</xref>
;
<xref rid="msw139-B104" ref-type="bibr">Kleinman and Majewski 2012</xref>
;
<xref rid="msw139-B78" ref-type="bibr">Pickrell et al. 2012</xref>
), a location known to have high sequencing error rates (
<xref rid="msw139-B104" ref-type="bibr">Kleinman and Majewski 2012</xref>
;
<xref rid="msw139-B78" ref-type="bibr">Pickrell et al. 2012</xref>
), or in duplicated regions of the genome (
<xref rid="msw139-B62" ref-type="bibr">Li et al. 2011</xref>
;
<xref rid="msw139-B81" ref-type="bibr">Ramaswami et al. 2012</xref>
), in which RDDs can arise from a misalignment of paralogs (
<xref rid="msw139-B114" ref-type="bibr">Schrider et al. 2011</xref>
;
<xref rid="msw139-B104" ref-type="bibr">Kleinman and Majewski 2012</xref>
;
<xref rid="msw139-B63" ref-type="bibr">Lin et al. 2012</xref>
). Besides RNA editing, transcription errors leading to base-substitution RDDs were also studied and were found to exhibit similar rates across different types of transcripts and growth states of bacteria (
<xref rid="msw139-B111" ref-type="bibr">Traverse and Ochman 2016</xref>
).</p>
<p>RDDs in the form of insertions and deletions, particularly at short tandem repeats (STRs), have been less studied than base-substitution RDDs. Indeed, RNA editing that expands or contracts STRs is yet to be demonstrated. Transcription errors at STRs have been shown both in vitro and in vivo (
<xref rid="msw139-B91" ref-type="bibr">Strathern et al. 2012</xref>
,
<xref rid="msw139-B90" ref-type="bibr">2013</xref>
;
<xref rid="msw139-B303" ref-type="bibr">Zhou et al. 2013</xref>
). Although the frequency with which such errors occur has not been evaluated quantitatively, possessing some fraction of malfunctioning RNA is expected to have a smaller effect on the fitness of an organism in comparison with malfunctioning DNA, which could affect the cells and body throughout a lifetime and is transmitted to daughter cells in the case of a germ-line mutation.</p>
<p>STRs, which after a certain number of repeats are also called microsatellites (
<xref rid="msw139-B53" ref-type="bibr">Kelkar et al. 2010</xref>
;
<xref rid="msw139-B3" ref-type="bibr">Ananda et al. 2013</xref>
), exhibit high mutation rates due to polymerase slippage (
<xref rid="msw139-B27" ref-type="bibr">Drake et al. 1998</xref>
;
<xref rid="msw139-B29" ref-type="bibr">Ellegren 2000</xref>
;
<xref rid="msw139-B96" ref-type="bibr">Vigouroux et al. 2002</xref>
;
<xref rid="msw139-B30" ref-type="bibr">Ellegren 2004</xref>
;
<xref rid="msw139-B6" ref-type="bibr">Baptiste et al. 2013</xref>
,
<xref rid="msw139-B7" ref-type="bibr">2015</xref>
). They are particularly important for understanding disease susceptibility, as mutations at STRs are implicated in over 40 neurological disorders (
<xref rid="msw139-B14" ref-type="bibr">Boby et al. 2005</xref>
;
<xref rid="msw139-B75" ref-type="bibr">Pearson et al. 2005</xref>
;
<xref rid="msw139-B18" ref-type="bibr">Castel et al. 2010</xref>
), and, more than 30% of human genes contain one or more STRs in their exonic regions (
<xref rid="msw139-B106" ref-type="bibr">Legendre et al. 2007</xref>
). All classes of long STRs have been found to be overrepresented in disease-associated genes (
<xref rid="msw139-B65" ref-type="bibr">Madsen et al. 2008</xref>
), and some relatively short STRs have also been implicated in disease. For example, a (CGC)
<sub>
<italic>n</italic>
</sub>
repeat number change from
<italic>n</italic>
= 11 to
<italic>n</italic>
= 12 in the
<italic>PABPN1</italic>
gene can cause Oculopharyngeal Muscular Dystrophy (
<xref rid="msw139-B16" ref-type="bibr">Brais 1998</xref>
). As RDDs at a locus with a wild-type allele can result in a transcript that mimics a transcript from a disease-causing allele, STR RDDs may have pathological consequences. Thus, estimating RDD rates at STRs is critical for understanding the fidelity of transcription, and for estimating the probability of disease occurrence as a consequence of transcript alteration. If the estimated STR RDD rates are high, then this observation can significantly change the paradigm of medical genomics in the diagnostics of diseases caused by STR mutations.</p>
<p>Detecting RDDs at STRs is challenging for a number of reasons. First, conventional short-read mapping approaches favor alignments to the reference allele (
<xref rid="msw139-B48" ref-type="bibr">Gymrek et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
) and, as a result, the transcription error rates can be underestimated. Second, short-read sequencing at STRs is error-prone (
<xref rid="msw139-B86" ref-type="bibr">Ross et al. 2013</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
) and sequencing errors can be misinterpreted as transcription errors. These two limitations can be alleviated with the use of the STR-FM pipeline, which incorporates flank-based mapping and utilizes previously estimated STR sequencing error rates (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). Third, the profile and rates of reverse transcription (RT) errors at STRs are unknown. If these rates are high, then they can greatly affect the estimation of RDDs. Thus, it is crucial to consider RT errors in STR RDD studies. Fourth, STRs are highly mutable and exhibit substantial somatic and inter-individual genetic variation (
<xref rid="msw139-B70" ref-type="bibr">O’Huallachain et al. 2012</xref>
). This somatic variation can lead to STR length variation among tissues. Therefore, to accurately detect STR RDDs, it is necessary to study DNA and RNA from the same tissue of the same individual.</p>
<p>Recently, the barcoded RNA sequencing technique (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) was proposed as an approach for studying RDD and RT errors. In this technique, each RNA molecule is tagged with a unique barcode, which makes it possible to trace all subsequent cDNA molecules and sequencing reads. In combination with several rounds of cDNA library construction from the same set of barcoded RNA, the consensus cDNA and RNA sequences can be generated, and the RDD and RT error rates can be estimated based on the proportion of incongruent reads.</p>
<p>Although barcoded RNA sequencing is a powerful technique for estimating RDD and RT error rates, an alternative approach would still be useful. On the one hand, there is a need to estimate these rates from the existing data sets not processed with RNA barcoding. Such data sets are highly abundant and will allow the reliable estimation of RT error and RDD rates at STRs that require ample data for their analysis because of the flank-based mapping (
<xref rid="msw139-B48" ref-type="bibr">Gymrek et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). The existing barcoded RNA data sets are currently of limited scale (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
), and generating larger data sets is expensive. On the other hand, batch effects are inevitable, and different library preparation procedures can greatly affect RT error rates (
<xref rid="msw139-B79" ref-type="bibr">Quail et al. 2012</xref>
). A novel method to estimate RDD and RT error rates that is compatible with the standard method of RNA sequencing would be indispensable for correcting for batch effects.</p>
<p>To estimate RT error and RDD rates at STRs, we developed a maximum-likelihood estimator (MLE) that utilizes sequencing data from replicate cDNA libraries. Our method can be employed with conventional RNA sequencing procedures, and as such represents an attractive alternative to barcoded RNA sequencing. Using our method, we addressed three questions. First, what are the levels of RDDs and of RT errors at STRs, and do they exhibit contraction or expansion biases? To address this question, we generated DNA and RNA sequencing data from the same tissue of the same individual to eliminate the effects of somatic genetic variation, and simultaneously estimated RT error and RDD rates at STRs. Second, what are the precision and accuracy of our estimates? To assess these properties, we validated the estimated rates with a replicated trial and compared them with those obtained from the published barcoded RNA sequencing data (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
). Finally, what are the RT error and RDD rates compared with the germ-line mutation rates and sequencing error rates at STRs? To evaluate these levels, we contrasted the RT error and RDD rates estimated here with published germ-line mutation rates and sequencing error rates (
<xref rid="msw139-B93" ref-type="bibr">Sun et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
).</p>
</sec>
<sec sec-type="results">
<title>Results</title>
<sec>
<title>Experimental Design</title>
<p>To study the RT error and RDD rates at STRs, we designed the following experiment (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
). We isolated genomic DNA and total RNA from the same sample (orangutan testis of a single individual). The genomic DNA was sequenced using two different library preparation protocols—PCR-containing and PCR-free (see Materials and Methods section for details)—allowing us to test for genotype congruence between the two libraries (see “Genotyping STRs Using the DNA Sequencing Data” in Results). Total RNA was divided into two aliquots that were used to construct two separate RNA-seq libraries. Each of these two libraries was sequenced in two separate batches. Such an experiment, ideally, should allow one to differentiate between RDDs (such differences from the DNA sequence should be present in both RNA-seq libraries) and RT errors (such variants should be present in only one of the two RNA-seq libraries but in both sequencing batches). However, empirical data frequently have missing information at some loci due to limited sampling, which can distort results. For example, if a deviant STR variant is not sampled in one cDNA library, then an RT error can be incorrectly inferred instead of an RDD. For instance, if one-tenth of RNA molecules at a locus was modified from (A)
<sub>6</sub>
to (A)
<sub>7</sub>
due to RDD, then we should expect to observe (A)
<sub>7</sub>
in both replicated cDNA libraries sequenced. However, if (A)
<sub>7</sub>
was not sampled in one library, then we will observe (A)
<sub>7</sub>
only in the other library, thereby misclassifying this situation as an RT error. Therefore, we developed a full likelihood method that permits sampling errors in the likelihood calculation to avoid error misclassifications.
<fig id="msw139-F1" orientation="portrait" position="float">
<label>Fig. 1</label>
<caption>
<p>A schematic representation of the experimental design.</p>
</caption>
<graphic xlink:href="msw139f1p"></graphic>
</fig>
</p>
<p>The rationale behind the method is in the correlation of variants observed between cDNA libraries. RDDs lead to correlated shifts in the distribution of variants between cDNA libraries, whereas RT errors lead to independent shifts between these distributions. For example, suppose at a locus the repeat number in the DNA is
<italic>D</italic>
, and we observe variants with repeat lengths
<italic>D</italic>
− 1 at high frequency in one cDNA library and
<italic>D </italic>
+ 1 at high frequency in another cDNA library. Such a scenario is likely to have occurred due to substantial RT errors at this locus, as a large portion of the distribution is different between the libraries. However, now suppose that we instead observe variants with repeat length
<italic>D </italic>
+ 1 at high frequency in both cDNA libraries. Such a scenario could have occurred through either RDDs or RT errors, though this probability is lower for RT errors than for RDDs, with the uncertainty accounted for within our likelihood method. Finally, suppose that we instead observe variants with repeat length
<italic>D </italic>
+ 2 at high frequency in both cDNA libraries. Such a scenario is likely to have occurred due to both substantial RDDs and RT errors, because at each step the stepwise mutation model only permits a change in the STR repeat length by one unit. By taking the likelihood across independent loci, we are accumulating evidence for the prevalence of each scenario, and are also directly accounting for the uncertainty by modeling the unobserved states (RNA and actual cDNA).</p>
</sec>
<sec>
<title>Genotyping STRs Using the DNA Sequencing Data</title>
<p>Sequencing of the PCR-containing and PCR-free genomic DNA libraries resulted in the estimated genome-wide mean sequencing depth of 6.7× (267 million reads) and 1.8× (73 million reads), respectively. We employed our previously published software, STR-FM (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
), to locate STRs in DNA sequencing reads. Namely, STRs with at least five mono-, three di-, three tri-, and three tetranucleotide repeats were detected in reads from each sequenced library (see Materials and Methods). After mapping such reads to the orangutan reference genome, we utilized published sequencing error rates (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
) to genotype STRs at each locus. To estimate genotyping accuracy, we used loci for which we could derive genotypes from both libraries. For them, the genotypes from the PCR-free library were compared with those from the PCR-containing library. This comparison resulted in a 99.86% genotype concordance (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online), a higher concordance than that achieved in previous studies (
<xref rid="msw139-B48" ref-type="bibr">Gymrek et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). After removing discordant genotypes, we merged the data from the two libraries and limited our analysis to homozygous loci (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) to reduce complexity of MLE estimation (also see “Samples, DNA Sequencing, and Genotyping” in Materials and Methods). They constitute 99.5% of our data. After additional filtering (see “Samples, DNA Sequencing, and Genotyping” in Materials and Methods), we retained 5,582,009 mono-, 2,768,451 di-, 309,546 tri-, and 78,454 tetranucleotide STR-containing loci.</p>
</sec>
<sec>
<title>STR Profiling of RNA</title>
<p>For the RNA-seq data, we generated a total of 56.6, 55.3, 39.7, and 38.9 million paired-end reads for library 1 batch A, library 2 batch A, library 1 batch B, and library 2 batch B, respectively. These sequencing depths are higher than those recommended by the best practice guidelines for gene expression studies of species with a reference genome (
<xref rid="msw139-B103" ref-type="bibr">ENCODE 2011</xref>
;
<xref rid="msw139-B20" ref-type="bibr">Conesa et al. 2016</xref>
). Though our MLE does not require that sequencing depth is balanced among cDNA libraries, we chose to balance the sequencing depths to avoid any unforeseen biases. To balance the depths, we downsampled library 1 batch A to 55.3 million reads and library 1 batch B to 38.9 million reads to have the equivalent number of reads between the two libraries sequenced in the same batch.</p>
<p>To profile STRs in the RNA-seq data, we followed the same procedure as that used for DNA data. Briefly, STR-containing RNA-seq reads were mapped to the reference genome, and reads with uniquely mapping flanking sequences (20 bp upstream and 20 bp downstream from an STR) were retained. This procedure resulted in length profiles (a collection of lengths from the reads mapping to this locus) for each STR locus. Each RNA-seq library and each sequencing batch was analyzed separately (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
).</p>
<p>We focused our analysis on STRs with the (A/T)
<sub>
<italic>n</italic>
</sub>
motif ((A/U)
<sub>
<italic>n</italic>
</sub>
for RNA). We call this motif (A)
<sub>
<italic>n</italic>
</sub>
for brevity. Most analyses were performed in the range of (A)
<sub>5</sub>
–(A)
<sub>10</sub>
because of the high abundance of STRs with this repeat number (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) (
<xref rid="msw139-B92" ref-type="bibr">Subramanian et al. 2003</xref>
), and due to their high propensity to polymerase slippage (
<xref rid="msw139-B29" ref-type="bibr">Ellegren 2000</xref>
,
<xref rid="msw139-B30" ref-type="bibr">2004</xref>
;
<xref rid="msw139-B101" ref-type="bibr">Ananda et al. 2011</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). Other motifs are discussed in the “Estimation of STR RDD and RT Error Rates Using MLE” subsection of Results. Overall, in the RNA-seq data, the number of loci with the (A)
<sub>
<italic>n</italic>
</sub>
motif decreased as the STR length increased (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S6</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online), which is expected based on the distribution of STRs in the genome (
<xref rid="msw139-B25" ref-type="bibr">Denver et al. 2004</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). For each STR length, the (A)
<sub>
<italic>n</italic>
</sub>
-containing loci with low expression level (proxied by the number of RNA-seq reads per locus) were considerably more prevalent than the loci with high expression level (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Thus, most (A)
<sub>
<italic>n</italic>
</sub>
-containing loci in our data set were short and had low expression levels.</p>
</sec>
<sec>
<title>An MLE to Estimate RT Error and RDD Parameters</title>
<p>To estimate the RT error and RDD rates and their expansion probabilities, we developed an MLE that jointly infers this set of parameters by maximizing the likelihood of observing a given set of sequenced STR length profiles. Although the model includes expansion probabilities for RT errors and for RDDs, the corresponding contraction probabilities can be computed as one minus the expansion probability in each case. The model requires one DNA data set and a minimum of two replicated RNA-seq data sets from the same sample. For the observed read data that originated from the same STR motif and length (e.g., (A)
<sub>7</sub>
), our method calculates the likelihood of the data being generated from all possible combinations of RNA forms and all possible combinations of cDNA forms given the set of four parameters (RT error rate, RT expansion probability, RDD rate, and RDD expansion probability). By identifying the parameter set that results in the highest likelihood value, our model makes use of the replicated cDNA library structure (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
) to enhance our ability to distinguish between RT errors and RDDs.</p>
</sec>
<sec>
<title>Performance of MLE</title>
<p>To evaluate the ability of the MLE to infer the four parameters of interest, we conducted simulations using several sets of model parameters, numbers of loci, and bin sizes. The bin size is the number of sampled molecules at the RNA or cDNA stages, which determines the set of possible distinct STR length distributions for RNA and cDNA. This bin size affects the sampling process of each cDNA library from an RNA sample and each RNA sample from the DNA sample. Small bin sizes will yield a high sampling error, leading to distortions in the distribution of RNA or cDNA STR forms relative to the distribution expected under the stepwise mutation model. The results of the simulations indicate that the MLE can estimate all four parameters with a high level of precision and accuracy (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3, S4, and S7–S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online), provided certain conditions are met. First, the chosen bin size
<italic>M</italic>
must be close to the number of reads per locus of the RNA-seq data, proxying gene expression level (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3, S4, and S7–S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Although the optimal combination between bin size and the number of RNA-seq reads per locus varies among parameter sets (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3, S4, and S7–S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online), the MLE performs reasonably well when the number of RNA-seq reads per locus is between
<italic>M</italic>
and 2
<italic>M.</italic>
For example, the estimated RT error and RDD rates for a bin size of 2 are the most accurate when the simulated data were generated using three molecules of RNA and three molecules of cDNA (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3 and S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Second, the number of loci must be at least the inverse of the error rates. The higher the number of loci, the more accurate the estimates. When both conditions are met, the true parameters are bound by 95% of the estimated parameters, and the median estimates deviate from the true parameters by less than 10% (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3, S4, and S7–S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online).</p>
</sec>
<sec>
<title>Lumping MLE</title>
<p>Because the optimal bin size for MLE increases with the number of RNA-seq reads per locus (expression level), it is computationally challenging to estimate RT error and RDD rates from loci expressed at high levels. Therefore, we developed an approximation to the MLE, which we call the
<italic>lumping MLE</italic>
, that substantially reduces the number of calculations in the likelihood (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary text S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) as compared with that in our original, or “
<italic>full, MLE</italic>
” (see “MLE Formulation” in Materials and Methods). We validated this method using loci expressed at low levels and compared its results with those obtained using the full MLE. The parameter estimates and their corresponding 95% confidence intervals of the same data sets are strikingly similar between the full and lumping MLE methods (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S3</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). For example, at a bin size of 5, both the full MLE and the lumping MLE can estimate RDD rates for the data with ten RNA-seq reads per locus with less than 5% error. Because of a similar performance but applicability to a larger range of expression levels than for the full MLE, we will use the lumping MLE to estimate RT error and RDD rates for STR loci with six or more RNA-seq reads per locus.</p>
</sec>
<sec>
<title>Estimation of the STR RT Error and RDD Rates Using MLE</title>
<p>Using the full MLE and the bin size of 2, we first estimated the RT error and RDD rates, as well as RT error and RDD expansion probabilities, at (A)
<sub>
<italic>n</italic>
</sub>
-containing loci expressed at low levels (i.e., with three to five RNA-seq reads per locus). The exceptionally low RT error rate for repeat (A)
<sub>5</sub>
was most likely due to our detection threshold for mononucleotide STRs—we only collected such STRs starting from five repeats and thus could not observe RT errors (and RDDs) that changed (A)
<sub>5</sub>
to (A)
<sub>4</sub>
. As the repeat number increased from 6 to 9 bp, the RT error rates increased exponentially from 2.1 × 10
<sup></sup>
<sup>4</sup>
to 7.7 × 10
<sup></sup>
<sup>2</sup>
, (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
A;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Our estimates of RT error rates had narrow 95% confidence intervals and were highly similar between the two sequencing batches (blue and red lines in
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
A). The RT errors from (A)
<sub>6</sub>
to (A)
<sub>9</sub>
exhibited an expansion bias (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
D). The expansion bias decreased as the repeat number increased (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
D and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online); the 95% confidence intervals also widened because the number of loci evaluated decreased (second column in
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Similar to the pattern observed for RT errors, RDD rates increased with STR length (
<xref ref-type="table" rid="msw139-T1">table 1</xref>
). However, the RDD rates were substantially lower than the RT error rates (
<xref ref-type="table" rid="msw139-T1">table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online,
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
A). For example, the average RT error rate between the two batches at (A)
<sub>8</sub>
was 3.7 × 10
<sup></sup>
<sup>2</sup>
, whereas the average RDD rate at the same repeat number was 4.2 × 10
<sup></sup>
<sup>3</sup>
. Because RDD rates were rather low, we could not estimate them for several repeat numbers (as we lacked sufficient data to detect such low rates), and the 95% confidence intervals for those that we could estimate were wide (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). The same is true for our estimates of RDD expansion probability (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online).
<fig id="msw139-F2" orientation="portrait" position="float">
<label>Fig. 2</label>
<caption>
<p>A comparison of RT error rates and RT expansion probabilities as a function of repeat number for motif (A)
<sub>
<italic>n</italic>
</sub>
between sequencing batches A (blue) and B (red). (
<italic>A</italic>
) RT error rates for the bin size of 2; (
<italic>B</italic>
) RT error rates for the bin size of 5; (
<italic>C</italic>
) RT error rates for the bin size of 40; (
<italic>D</italic>
) RT expansion probabilities for the bin sizes of 2; (
<italic>E</italic>
) RT expansion probabilities for the bin size of 5; (
<italic>F</italic>
) RT expansion probabilities for the bin size of 40. Repeat numbers between 5 and 10 were chosen due to their high abundance. Median values across 100 empirical bootstrap replicates (bootstrapped across loci) are plotted with open circles, whereas point estimates are plotted with stars. Solid lines connect the median bootstrap estimates. The 95% confidence intervals were calculated from the 100 bootstraps replicates. Each estimate was based on five sets of random initial parameters to minimize the possibility of reaching local maxima, and the set of parameters that had the maximal likelihood was taken as the estimate for a given bootstrap replicate. The estimations for the bin size of 2 were performed using full MLE, whereas the estimations for the bin size of 5 and 40 were performed using lumping MLE. The number of loci analyzed for each bin size is listed in
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary tables S4 and S5</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online.</p>
</caption>
<graphic xlink:href="msw139f2p"></graphic>
</fig>
<table-wrap id="msw139-T1" orientation="portrait" position="float">
<label>Table 1</label>
<caption>
<p>RDD Rates for the (A)
<sub>
<italic>n</italic>
</sub>
Motif.</p>
</caption>
<table frame="hsides" rules="groups">
<thead align="left">
<tr>
<th rowspan="2" colspan="1"></th>
<th align="center" colspan="2" rowspan="1">
<bold>Bin = 2; 3–5 RNA-seq Reads</bold>
</th>
<th align="center" colspan="2" rowspan="1">
<bold>Bin = 5; 6–16 RNA-seq Reads</bold>
</th>
<th align="center" colspan="2" rowspan="1">
<bold>Bin = 40; 49–102 RNA-seq Reads</bold>
</th>
</tr>
<tr>
<th rowspan="1" colspan="1">Batch A</th>
<th rowspan="1" colspan="1">Batch B</th>
<th rowspan="1" colspan="1">Batch A</th>
<th rowspan="1" colspan="1">Batch B</th>
<th rowspan="1" colspan="1">Batch A</th>
<th rowspan="1" colspan="1">Batch B</th>
</tr>
</thead>
<tbody align="left">
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>5</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, <1.0e-9]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 2.76e-4]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 3.29e-9]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9,6.86e-5]</td>
<td align="char" char="." rowspan="1" colspan="1">1.87e-4 [<1.0e-9, 5.70e-4]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 4.29e-9]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>6</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">1.87e-3 [2.45e-4, 3.38e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">5.13e-4 [<1.0e-9, 2.01e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">6.45e-4 [<1.0e-9, <2.26e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">6.80e-4 [<1.0e-9, 2.17e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">1.89e-3 [<1.0e-9, 3.42e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">1.81e-3 [<1.0e-9, 4.97e-3]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>7</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 2.28e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">2.59e-3 [<1.0e-9, 7.59e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">3.26e-3 [<1.0e-9, 2.34e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">3.9e-3 [<1.0e-9, 2.83e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">7.36e-4 [<1.0e-9, 5.36e-3]</td>
<td align="char" char="." rowspan="1" colspan="1">5.90e-3 [<1.0e-9, 1.33e-2]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>8</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">3.57e-3 [<1.0e-9, 1.48e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">2.68e-3 [<1.0e-9, 1.72e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">7.53e-3 [3.80e-3, <1.0e-9]</td>
<td align="char" char="." rowspan="1" colspan="1">7.46e-3 [<1.0e-9, 1.14e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">3.76e-3 [<1.0e-9, 1.36e-2]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 8.12e-3]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>9</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1">8.94e-3 [<1.0e-9, 2.28e-2]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, <1.0e-9]</td>
<td align="char" char="." rowspan="1" colspan="1">2.40e-2 [<1.0e-9, 1.90e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">1.57e-2 [<1.0e-9, 1.67e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">2.33e-2 [<1.0e-9, 7.34e-2]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 1.49e-2]</td>
</tr>
<tr>
<td rowspan="1" colspan="1">
<bold>(A)
<sub>10</sub>
</bold>
</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 6.55e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">1.15e-2 [<1.0e-9, 8.22e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">7.52e-2 [<1.0e-9, 6.85e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">4.95e-3 [<1.0e-9, 3.54e-2]</td>
<td align="char" char="." rowspan="1" colspan="1">7.54e-3 [<1.0e-9, 5.93e-2]</td>
<td align="char" char="." rowspan="1" colspan="1"><1.0e-9 [<1.0e-9, 1.18e-2]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="msw139-TF1">
<p>N
<sc>ote</sc>
.—In each cell, the number outside the brackets is the point estimation, whereas the numbers inside the brackets are the 95% confidence intervals.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</p>
<p>To confirm that our estimates were not affected by data selection or sequencing artifacts at loci expressed at low levels, we repeated the analysis with different bin sizes and ranges of expression level. We applied the lumping MLE to the data with the numbers of RNA-seq reads ranging from 6 to 16 (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S5</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) using the bin size of 5 (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
B and E), and from 49 to 102 using the bin size of 40 (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
C and F). This range of RNA-seq read numbers does not overlap with the one used for the bin size of 2 (see “Estimation of RDD and RT Errors Using the MLE” in Materials and Methods) and thus provides an opportunity to estimate the parameters independently, but for the same sequencing batches. The resulting estimates of RT error rates and of RT error expansion probability were strikingly similar to those calculated based on the smaller number of RNA-seq reads and the bin size of 2 (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
A–F). The RDD rates estimated using three different bin sizes all increase with repeat number; however, their more detailed comparison is challenging because of wide confidence intervals (
<xref ref-type="table" rid="msw139-T1">table 1</xref>
).</p>
<p>Our MLE can be applied to more than two replicated RNA sequencing data sets (see
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary text S3</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online, for equation). For example, we simultaneously analyzed both cDNA libraries 1 and 2 for both batches A and B (four sequencing data sets) with lumping MLE with bin size of 5. The estimated RT error and RDD rates (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S11</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) are similar to those obtained after analyzing batches A and B separately (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
).</p>
<p>We also attempted to estimate RT error and RDD rates for other STR motifs. However, the numbers of loci were insufficient to estimate these rates accurately (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). For example, the next most abundant group of STRs in our data after the (A)
<sub>
<italic>n</italic>
</sub>
-containing STRs were (AC)
<sub>
<italic>n</italic>
</sub>
- and (AG)
<sub>
<italic>n</italic>
</sub>
-containing STRs (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Among them, we identified only 5,142 loci with three to five RNA-seq reads per locus in batch A (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S6</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). For such loci (combined for these two motifs), we only detected one deviant STR form at the consensus repeat number of 4 (one locus contained two reads of (AG)
<sub>3</sub>
), and inferred RDD rate of 0 and RT error rate of 5.84 × 10
<sup></sup>
<sup>4</sup>
(95% confidence interval from < 1.0 × 10
<sup></sup>
<sup>9</sup>
to 1.76 × 10
<sup></sup>
<sup>3</sup>
) (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S7</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). The RT error expansion probability was inferred to be 0, indicating a contraction bias; however, the 95% confidence interval was wide (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S7</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). We conclude that we presently lack a sufficient amount of data to accurately evaluate the RT error and RDD rates at STRs others than (A)
<sub>
<italic>n</italic>
</sub>
.</p>
</sec>
<sec>
<title>RDD and RT Error Rates Estimation Using Barcoded RNA Sequencing</title>
<p>To validate the MLE, we analyzed publicly available
<italic>C. elegans</italic>
barcoded RNA data (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) and evaluated RT error and RDD rates with an independent method, that is, barcoded RNA sequencing. According to this method, RNA molecules are tagged, allowing a direct inference of RDD rates by tracing cDNA molecules and sequencing reads that originated from the same RNA molecule (i.e., from the same “family”;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S5</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). In the barcoded RNA data, using the modified STR-FM (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
), we detected a total of 9,074,690 STR-containing cDNA reads (5,574,030 mono-, 2,455,300 di-, 21,018,578 tri-, and 26,782 tetranucleotide containing cDNA reads), based on which we inferred a total of 949,826 STR-containing cDNA molecules (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S8</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Because most of the cDNA families were present in just one of the three RNA-seq libraries (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
), we could infer STR lengths in only 7,922 STR-containing RNA molecules (with 4,596 mono-, 1,376 di-, 1,948 tri-, and two tetranucleotides;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S9</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). No errors were detected among the reads for each RNA molecule allowing us to estimate only maximal RDD rate as one divided by the number of loci for a specific motif (e.g., 1/3,549 for (A)
<sub>5</sub>
;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S9</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online).</p>
<p>For this data set, the barcoded RNA was also reverse transcribed (and sequenced) independently three times, allowing one to infer RT error rates. We found that all RT errors occurred at the (A)
<sub>
<italic>n</italic>
</sub>
-containing motif and that the RT error rates increased with increasing repeat number (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). The 12 erroneous reads stemmed from RNA families with only two reads, and so we could not immediately determine whether these were expansion or contraction errors. However, based on the consensus repeat numbers of all reads mapped to these loci, we concluded that the RT errors had a preference toward expansions (eight expansions vs. four contractions;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S10</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online).</p>
<p>Notably, the point estimates of RT error rates were remarkably concordant between the
<italic>C. elegans</italic>
data set (where they were inferred with the RNA barcoded approach) and orangutan data set (where they were inferred with the MLE approach;
<xref ref-type="fig" rid="msw139-F3">fig. 3</xref>
), even though the
<italic>C. elegans</italic>
data were more limited in scale and thus the estimates from it had wide confidence intervals. This concordance is particularly exceptional given that the rates were inferred by two different methods and from two independent data sets generated in two different laboratories (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
). Indeed, the RT error rates estimated from the orangutan data using MLE and from the
<italic>C. elegans</italic>
data using barcoded RNA increased with increasing repeat numbers and their confidence intervals overlapped (
<xref ref-type="fig" rid="msw139-F3">fig. 3</xref>
). The maximal RDD rate for
<italic>C. elegans</italic>
(
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S9</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) appears to be higher than but overall is comparable to the RDD estimates for orangutan (
<xref ref-type="table" rid="msw139-T1">table 1</xref>
and
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Because we can only estimate the maximal RDD rate from the
<italic>C. elegans</italic>
data (as one over the number of studied loci), we are not in a position to compare it with the RDD rate obtained from the orangutan data rigorously.
<fig id="msw139-F3" orientation="portrait" position="float">
<label>Fig. 3</label>
<caption>
<p>A comparison of RT error rates estimated using the full MLE (orangutan data) versus barcoded RNA sequencing (
<italic>Caenorhabditis elegans</italic>
data). The 95% confidence intervals for the rates estimated with the full MLE were generated from 100 empirical bootstrap replicates (bootstrapped across loci), whereas the 95% confidence intervals for the barcoded RNA sequencing were generated from 1,000 bootstrap replicates of inferred cDNA molecules with at least two cDNA molecules in that family. The lower bounds of the RT error rate confidence intervals for the barcoded RNA sequencing are zero and thus are outside the plotting area.</p>
</caption>
<graphic xlink:href="msw139f3p"></graphic>
</fig>
</p>
<p>To additionally test the performance of our MLE, we applied the lumping MLE to the
<italic>C. elegans</italic>
data (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) after removing the barcodes, and estimated the RT error rates for this data set (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S12</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). The resulting RT error rate estimates for the
<italic>C. elegans</italic>
data set were strikingly similar between the lumping MLE and barcoded RNA methods. This validates the use of the MLE method for reliable RT error rate estimation.</p>
</sec>
</sec>
<sec sec-type="discussion">
<title>Discussion</title>
<sec>
<title>Application of Estimated RT Error Rates</title>
<p>Knowing RT error rates can be instrumental in functional genomics analysis where RNA-sequencing data are one of the most important sources of information. RNA-sequencing data have been used to study differences in gene expression among tissues (
<xref rid="msw139-B44" ref-type="bibr">GTEx Consortium et al. 2015</xref>
), between samples of healthy and diseased individuals, and among organisms inhabiting various environments (
<xref rid="msw139-B98" ref-type="bibr">Wang et al. 2009</xref>
;
<xref rid="msw139-B100" ref-type="bibr">Wilhelm and Landry 2009</xref>
;
<xref rid="msw139-B71" ref-type="bibr">Oshlack et al. 2010</xref>
;
<xref rid="msw139-B105" ref-type="bibr">Garber et al. 2011</xref>
;
<xref rid="msw139-B73" ref-type="bibr">Ozsolak and Milos 2011</xref>
;
<xref rid="msw139-B67" ref-type="bibr">McCarthy et al. 2012</xref>
). Such data can also be utilized to study biological pathways (
<xref rid="msw139-B33" ref-type="bibr">Feng et al. 2012</xref>
;
<xref rid="msw139-B54" ref-type="bibr">Khatri et al. 2012</xref>
;
<xref rid="msw139-B94" ref-type="bibr">Trapnell et al. 2013</xref>
), metabolic flux (
<xref rid="msw139-B42" ref-type="bibr">Gowen and Fong 2010</xref>
;
<xref rid="msw139-B59" ref-type="bibr">Lee et al. 2012</xref>
), and individual health (
<xref rid="msw139-B19" ref-type="bibr">Chen et al. 2012</xref>
). Without estimating RT error rates, it is challenging to quantify expression level for such genes accurately. For example, if the RT error rate is high, then a large number of STRs in cDNA will vary in length, thereby reducing their mappability to the reference genome, which can lead to an underestimation of expression levels of genes containing STRs. Removing STR-containing regions can alleviate this problem, but will lead to an underestimation of true variation at the level of RNA.</p>
<p>Applications of RT error rates are not limited to functional genomics. STRs have been widely used as markers in population genomics due to their high polymorphism level (
<xref rid="msw139-B300" ref-type="bibr">Wright and Bentzen 1994</xref>
;
<xref rid="msw139-B47" ref-type="bibr">Gupta and Varshney 2000</xref>
;
<xref rid="msw139-B109" ref-type="bibr">Sunnucks 2000</xref>
;
<xref rid="msw139-B68" ref-type="bibr">Miah et al. 2013</xref>
;
<xref rid="msw139-B1" ref-type="bibr">Abdul-Muneer 2014</xref>
). According to a recent study (
<xref rid="msw139-B37" ref-type="bibr">Gayral et al. 2013</xref>
), RNA-sequencing can be applied to study population genomics of nonmodel organisms without a reference genome (
<xref rid="msw139-B24" ref-type="bibr">De Wit et al. 2012</xref>
;
<xref rid="msw139-B37" ref-type="bibr">Gayral et al. 2013</xref>
). (For model organisms with reference genomes, exome sequencing data are usually used as an alternative [
<xref rid="msw139-B23" ref-type="bibr">DaRe et al. 2013</xref>
;
<xref rid="msw139-B46" ref-type="bibr">Guo et al. 2013</xref>
;
<xref rid="msw139-B88" ref-type="bibr">Samuels et al. 2013</xref>
;
<xref rid="msw139-B43" ref-type="bibr">Griffin et al. 2014</xref>
]). It is crucial, however, to take into account such errors in order to distinguish genetic variation from technical errors.</p>
</sec>
<sec>
<title>Relative Rates and Patterns of RDD, RT Errors, and Mutations at STRs</title>
<p>The RDD rates obtained here provide the first opportunity to understand the propensity of STRs to increase in repeat number not only at the level of DNA but also at the level of RNA. For (A)
<sub>
<italic>n</italic>
</sub>
-containing STRs, we found that the RDD and RT error rates increase exponentially with repeat number—a pattern similar to that previously identified for germ-line mutations (
<xref rid="msw139-B93" ref-type="bibr">Sun et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
) and sequencing errors (
<xref rid="msw139-B86" ref-type="bibr">Ross et al. 2013</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). This similarity can be explained by the increased propensity of polymerase slippage with an increase in the STR repeat number. Moreover, we inferred the RT error rates to be higher than the RDD rates, which were higher than the sequencing error rates (with the minimal Phred sequencing quality of 20), which, in turn, were higher than the germ-line mutation rates for STRs (
<xref ref-type="fig" rid="msw139-F4">fig. 4</xref>
) (
<xref rid="msw139-B58" ref-type="bibr">Kong et al. 2012</xref>
;
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). For mononucleotides with repeat numbers of 6 and 7, these differences were approximately 1 order of magnitude in size. For SNPs, the RT error rate (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) is also higher than the RDD rate (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
; Traverse and Ochman 2016). Critically, the level of technical errors is higher than the level of biological errors. Therefore, accurate inferences of germ-line mutations must consider sequencing errors, and accurate estimations of RDD rates must consider RT errors.
<fig id="msw139-F4" orientation="portrait" position="float">
<label>Fig. 4</label>
<caption>
<p>A comparison among STR RT error rates (this study), STR RDD rates (this study), STR germ-line mutation rates (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
), STR sequencing error rates (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
), base-substitution germ-line mutation rates (
<xref rid="msw139-B58" ref-type="bibr">Kong et al. 2012</xref>
), and base-substitution RDD rates (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
[lower line], Traverse & Ochman 2016 [upper line]).</p>
</caption>
<graphic xlink:href="msw139f4p"></graphic>
</fig>
</p>
<p>Regarding technical errors, the most commonly used reverse transcriptase in molecular biology applications is the Moloney murine leukemia virus RT (MMLV-RT). The MMLV-RT enzyme has an in vitro error rate of 1/29,000 nucleotides synthesized using an RNA template, and 1/37,000 nucleotides using a DNA template, as determined using a genetic reporter assay (
<xref rid="msw139-B51" ref-type="bibr">Ji and Loeb 1992</xref>
). Although the majority of MMLV-RT errors are base substitutions, a mutational hotspot of one-base indels within an (A)
<sub>4</sub>
sequence has been reported (
<xref rid="msw139-B8" ref-type="bibr">Barrioluengo et al. 2011</xref>
). Many protocols for generating cDNA, including the Illumina TruSeq RNA library preparation, use a modified version of MMLV-RT known as Superscript II reverse transcriptase. The Superscript II RT has improved thermostability but reduced fidelity, with an error rate of 1/15,000 nucleotides synthesized using a DNA template (
<xref rid="msw139-B4" ref-type="bibr">Azeri and Hogrefe 2007</xref>
). These reported MMLV-RT error rates are of similar magnitude to error rates measured for the Taq polymerase (∼1/10,000–1/50,000) for proofreading-proficient thermostable polymerases, measured using the same in vitro assay (
<xref rid="msw139-B28" ref-type="bibr">Eckert and Kunkel 1990</xref>
,
<xref rid="msw139-B102" ref-type="bibr">1991</xref>
). Therefore, cDNA synthesis and sequencing error rates are not expected to vary substantially, unless a thermostable DNA polymerase with a highly efficient proofreading activity is used. Importantly, the extent to which the accuracy of MMLV-RT, Taq or proofreading-proficient thermostable polymerases will vary when copying longer STRs remains to be determined.</p>
</sec>
<sec>
<title>The Reliability of the Estimates</title>
<p>The RT error rates we obtained were congruent between two independent methods—MLE and barcoded RNA sequencing—and between two independent data sets—the one obtained from an orangutan sample and the one obtained from the
<italic>C. elegans</italic>
sample—produced in two separate laboratories. This concordance suggests that our estimates are reliable.</p>
<p>We followed several procedures to control for technical errors that could distort RT error and RDD rates. We used the same tissue sample for both DNA and RNA sequencing to prevent somatic variation among different tissues. We genotyped the sample with two separate library preparation techniques, and tested for genotype concordance to ensure the correct genotype. We utilized a flank-based mapping approach in the read-mapping process to avoid bias in the STR-length profiling (
<xref rid="msw139-B48" ref-type="bibr">Gymrek et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). This flank-based mapping approach also removed STRs adjacent to the read termini, which were shown to exhibit high sequencing error rates (
<xref rid="msw139-B104" ref-type="bibr">Kleinman and Majewski 2012</xref>
;
<xref rid="msw139-B78" ref-type="bibr">Pickrell et al. 2012</xref>
). Finally, we included scaffolds not mapped to particular chromosomes, and removed potentially duplicated regions that were missing from the reference genome. These procedures have been demonstrated to reduce false genetic variation observed in RNA sequencing data (
<xref rid="msw139-B76" ref-type="bibr">Peng et al. 2012</xref>
), as paralogous variants could be mistaken for STR variation when an incomplete reference genome is used (
<xref rid="msw139-B49" ref-type="bibr">Ho et al. 2011</xref>
;
<xref rid="msw139-B10" ref-type="bibr">Bass et al. 2012</xref>
;
<xref rid="msw139-B76" ref-type="bibr">Peng et al. 2012</xref>
).</p>
<p>Several factors may still affect our RT error and RDD rate estimates. First, the RNA expression levels at the same locus vary even among cells from the same tissue. Such variation is stochastic (
<xref rid="msw139-B31" ref-type="bibr">Elowitz et al. 2002</xref>
;
<xref rid="msw139-B72" ref-type="bibr">Ozbudak et al. 2002</xref>
;
<xref rid="msw139-B83" ref-type="bibr">Raser and O’Shea 2005</xref>
;
<xref rid="msw139-B52" ref-type="bibr">Kaufmann and van Oudenaarden 2007</xref>
), and our measurement represents the mean expression level among cells in a tissue. The variation in expression level among cells in tissues could lead to improper matching between the bin size of the MLE and the expression level, which might bias our error rate estimates. One solution to alleviate this limitation in the future would be to implement single-cell RNA sequencing or the G&T-seq (simultaneous DNA- and RNA-sequencing at a single-cell level) (
<xref rid="msw139-B113" ref-type="bibr">Saliba et al. 2014</xref>
;
<xref rid="msw139-B64" ref-type="bibr">Macaulay et al. 2015</xref>
), which would enable us to estimate the number of RNA molecules expressed from a specific locus. Note that in order to use single-cell RNA sequencing data, it is necessary for the sampling from RNA to cDNA and from cDNA to sequencing reads to be modeled as multivariate hypergeometric sampling. This is necessary because there would be only a small number of actual RNA and cDNA molecules, such that the sampling process cannot be a proxy for sampling with replacement as in multinomial sampling. Despite the caveat of uncertainty in expression level, our estimations of RT error levels agree well with those estimated using barcoded RNA sequencing, which does not consider expression level information. This comparison provides an independent validation of our approach as well as points to the credibility of the estimates obtained.</p>
<p>Second, the MLE possibly reported a suboptimal solution if local maxima exist. However, we do not believe our estimates were distorted by this potential limitation because 1) we considered five sets of initial parameters for each bootstrap procedure of the loci and the initial parameters for the RDD and RT error rates were randomized on a log scale to accommodate error rates that have a large search space; 2) in most cases, all five of the initial parameters converged to the same solution, which suggests that the landscape of our maximum-likelihood surface may not contain many local maxima; and 3) despite the independent analysis of error rates by STR length, the estimated error rates increased exponentially with the STR length, as expected based on the known STR sequencing error and mutation patterns (
<xref rid="msw139-B93" ref-type="bibr">Sun et al. 2012</xref>
;
<xref rid="msw139-B86" ref-type="bibr">Ross et al. 2013</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). Also, the estimation from loci with different numbers of RNA-seq reads and sequencing batches yielded similar results (
<xref ref-type="fig" rid="msw139-F2">fig. 2</xref>
A–F).</p>
</sec>
<sec sec-type="conclusions">
<title>Conclusions and Future Directions</title>
<p>In this study, we provide the first model-based method to estimate the rates of RT errors and RDDs at STRs from RNA sequencing of replicated cDNA libraries. This method can be applied to existing RNA data sets with replicated cDNA libraries and a known genotype (or existing DNA-sequencing data). The merit of our approach is in that it does not require significant changes to be made to the established, general RNA-sequencing procedures. Also, our approach allows one to utilize a large number of STR loci throughout the genome, thus reducing the contextual bias due to sequencing composition around STRs. Therefore, the MLE provides a suitable alternative for estimating batched RT errors to the barcoded RNA sequencing approach (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
). The currently available barcoded RNA-sequencing data are insufficient in scale to detect RDD events given the requirement that the entire STR and sufficient flanking regions need to be embedded in the reads.</p>
<p>Future studies should evaluate RDD error rates at STRs with more precision. Unlike RT error rates, which depend on an enzyme used for RT during library preparation, RDD rates might differ among species as they depend on species-specific biology. Moreover, both RT error and RDD rates should be evaluated for STRs others than (A)
<sub>
<italic>n</italic>
</sub>
from a larger data set. The MLE method we developed can be used for this purpose.</p>
<p>A minority of repeats in our data set are heterozygous (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online), therefore including them would not have substantially changed our estimates, while analyzing them is computationally challenging for our model. The increased genetic polymorphism of STRs at heterozygous loci has been controversial both in terms of observations and in terms of mechanistic explanations (
<xref rid="msw139-B2" ref-type="bibr">Amos 2016</xref>
). Nevertheless, it will be interesting to analyze RDDs at heterozygous STR loci in future studies.</p>
<p>Another important area for future studies is the impact of RDDs on disease-causing STRs. As RDDs can modify transcripts and protein products, they could alter the phenotype and disease manifestation. Interestingly, the classification of repeat numbers for disease-causing STRs into normal, pre-mutation, and disease-causing relies on the correlation between genotype and phenotype. It is possible that, although a genotype has a non-disease repeat number, an RDD can create a disease-causing repeat in the RNA originating from the same locus. Future analyses of RDD error rates at disease-causing STRs are needed to establish the validity of such a mechanism.</p>
</sec>
</sec>
<sec sec-type="materials|methods">
<title>Materials and Methods</title>
<sec>
<title>Samples, DNA Sequencing, and Genotyping</title>
<p>Using the DNeasy Blood and Tissue Kit (Qiagen), we extracted genomic DNA from testis of a Bornean orangutan (
<italic>Pongo pygmaeus pygmaeus</italic>
; ID 1991-0051, Smithsonian Institute). Polymerase chain reaction (PCR)-containing and PCR-free libraries with insert size of 250–280 bp were constructed with the TruSeq DNA LT Sample Preparation Kit (Illumina) and the TruSeq DNA PCR-Free LT Sample Preparation Kit (Illumina), respectively, following the manufacturer’s protocol. The libraries were sequenced with the 150 bp × 150 bp paired-end reads on HiSeq2500 (Illumina).</p>
<p>The STR length arrays were profiled with STR-FM (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). Briefly, STRs with at least five mono-, three di-, three tri-, and three tetranucleotide repeats were detected in sequencing reads. We retained the reads possessing flanking regions of at least 20 bp on each side of an STR and having Phred quality score of at least 20 in the STR and their flanking regions. Flanking regions of STRs were mapped to the Sumatran orangutan (ponAbe2) reference genome with Burrows-Wheeler Aligner (BWA) (
<xref rid="msw139-B107" ref-type="bibr">Li and Durbin 2009</xref>
). We retained STRs for which both 20-bp flanking sequences mapped uniquely to the reference genome sequence. Random genomic scaffolds, that is, the ones that have not been assigned to specific chromosomes, were also used in mapping to avoid false unique mapping of some reads. STRs located closer than 10 bp to other STRs of the same class (e.g., mononucleotides) were discarded to minimize the effect of nearby STR loci on error estimation. Sequencing reads from PCR-containing and PCR-free libraries were processed separately.</p>
<p>For each library, the identified STR loci were genotyped using STR-FM and utilizing previously estimated sequencing error rates (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). In this step, we retained loci with a minimum of one order of magnitude difference in the probability of being the most likely homozygote versus heterozygote because such loci have a high likelihood to be genotyped correctly (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). The STR genotypes from both PCR-containing and PCR-free libraries were then compared. Discordant genotypes were removed, and the remaining genotyped loci from libraries were then combined to represent the STR length of DNA at each locus (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). We limited the subsequent analysis to homozygous loci because 1) they represent the majority of our data (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary table S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) and 2) heterozygous loci can display biased expression between the two alleles (
<xref rid="msw139-B15" ref-type="bibr">Borel et al. 2015</xref>
;
<xref rid="msw139-B60" ref-type="bibr">Leung et al. 2015</xref>
;
<xref rid="msw139-B77" ref-type="bibr">Perez et al. 2015</xref>
). Additionally, the use of one allele per locus simplified the model used in our MLE by reducing the number of expected STR RNA forms and their derived error forms (see “MLE Formulation” in Materials and Methods). Finally, the homologous regions between human (assembly version hg19) and orangutan (assembly version ponAbe2) genomes that had a high score in human self-alignment assembly version GRCh38 (
<ext-link ext-link-type="uri" xlink:href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsSelf/hg38.hg38.net.gz">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsSelf/hg38.hg38.net.gz</ext-link>
, last accessed July 16, 2016) were removed. Conversion between hg19 and GRCh38 was performed with the lift-over tool in Galaxy (
<xref rid="msw139-B38" ref-type="bibr">Giardine et al. 2005</xref>
;
<xref rid="msw139-B13" ref-type="bibr">Blankenberg et al. 2010</xref>
,
<xref rid="msw139-B12" ref-type="bibr">2014</xref>
;
<xref rid="msw139-B39" ref-type="bibr">Goecks et al. 2010</xref>
). This removal was performed to exclude the regions in the orangutan genome that might be paralogous and might have been collapsed in the reference assembly.</p>
</sec>
<sec>
<title>Replicated cDNA Construction, Sequencing, and Profiling</title>
<p>Total RNA was extracted from the same Bornean orangutan testis sample that was used for genomic DNA sequencing, with the RNeasy Mini kit protocol (Qiagen). The extracted RNA was divided into two aliquots that were utilized to generate two separate sequencing libraries (libraries 1 and 2) using the TruSeq RNA sample preparation kit (Illumina) with the stranded protocol (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
). Each of the two resulting libraries was sequenced twice in two separate batches (A and B) with 150 bp × 150 bp paired-end reads on the HiSeq 2500 (Illumina).</p>
<p>To profile STRs in RNA, we followed the same procedure as that employed for DNA, except that we did not run the genotyping model. BWA (Li and Durbin 2009) was also employed for mapping the RNA-seq reads in our analysis to 1) minimize differences between the current procedure and the procedure used to estimate RNA sequencing errors in previous studies that also used BWA (e.g.,
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
), 2) guard against biases that may result from applying a different algorithm for mapping RNA than for DNA, and 3) be conservative, as our preliminary results demonstrated that most STR loci that can be uniquely mapped with BWA can also be mapped with Tophat (
<xref rid="msw139-B95" ref-type="bibr">Trapnell et al. 2009</xref>
;
<xref rid="msw139-B55" ref-type="bibr">Kim et al. 2013</xref>
) and STAR (
<xref rid="msw139-B26" ref-type="bibr">Dobin et al. 2013</xref>
), whereas the opposite was not true (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). Each RNA-seq library and each sequencing batch was analyzed separately (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
). As a result, we conducted identical analyses on four different library–batch combinations.</p>
</sec>
<sec>
<title>MLE Formulation</title>
<p>We formulated the MLE to infer four parameters—RDD rate, RDD expansion probability, RT error rate, and RT expansion probability—that maximize the probability of observed data (maximum-likelihood estimation of the parameters) in STRs obtained from RNA-seq. The model includes an expansion probability for RDD,
<italic>p</italic>
<sub>RDD</sub>
, and the contraction probability can be computed as 1 –
<italic>p</italic>
<sub>RDD</sub>
. The same is true for an expansion probability for RT errors,
<italic>p</italic>
<sub>RT</sub>
. The model used in the MLE is based on the following key assumptions:
<list list-type="order">
<list-item>
<p>All loci are independent.</p>
</list-item>
<list-item>
<p>The error rates and the expansion probabilities for both RT errors and RDDs are identical for all DNA loci with the same STR motif and repeat number.</p>
</list-item>
<list-item>
<p>Both RT errors and RDDs follow the stepwise mutation model (
<xref rid="msw139-B56" ref-type="bibr">Kimura and Ohta 1978</xref>
;
<xref rid="msw139-B112" ref-type="bibr">Valdes et al. 1993</xref>
;
<xref rid="msw139-B85" ref-type="bibr">Di Rienzo et al. 1998</xref>
;
<xref rid="msw139-B87" ref-type="bibr">Sainudiin et al. 2004</xref>
) that only allows expansion or contraction by one repeat unit after a single round of each process (i.e., transcription or RT). Thus, starting from DNA (e.g., (AG)
<sub>6</sub>
), there are three possible STR forms for RNA (e.g., (AG)
<sub>5</sub>
, (AG)
<sub>6</sub>
, and (AG)
<sub>7</sub>
), and five possible STR forms for cDNA (e.g., (AG)
<sub>4</sub>
, (AG)
<sub>5</sub>
, (AG)
<sub>6</sub>
, (AG)
<sub>7</sub>
, and (AG)
<sub>8</sub>
). We denote the number of possible STR forms at a given stage (RNA or cDNA) as
<italic>K</italic>
.</p>
</list-item>
<list-item>
<p>The model uses a fixed bin size (denoted by
<italic>M</italic>
), which represents the number of sampled RNA or cDNA molecules after transcription or RT, respectively. This finite bin size
<italic>M</italic>
permits alterations in the expected distribution of STR forms in a given stage (RNA or cDNA) by conditioning on the number of STRs of a given form passed on from the previous stage. For example, suppose that at the RNA stage the relative proportions of STR forms for four, five, and six repeats are 0.1, 0.5, and 0.4, respectively. Based on the previous point, five possible cDNA forms are expected—those with three, four, five, six, and seven repeats. If we sample only a small number
<italic>M</italic>
of STRs to be passed from the RNA to the cDNA stage, then it is likely that the STR form with three repeats will not be represented in the cDNA stage. However, if
<italic>M</italic>
is sufficiently large, then the probability of observing all possible forms at the cDNA stage is high. This sampling permits different cDNA libraries for the same RNA sample to be correlated, as the STR forms observed in these libraries are conditional on the RNA STR forms that they share, and allows us to take advantage of the structure of our experimental design (
<xref ref-type="fig" rid="msw139-F1">fig. 1</xref>
). We use this bin size to generate all possible compositions of STR length for RNA and cDNA, depending on expression level proxied by the number of RNA-seq reads (see below). For example, for DNA with an STR of (AG)
<sub>6</sub>
and the bin size of 2, there are six possible compositions of RNA forms (i.e., (AG)
<sub>5</sub>
(AG)
<sub>5</sub>
, (AG)
<sub>5</sub>
(AG)
<sub>6</sub>
, (AG)
<sub>5</sub>
(AG)
<sub>7</sub>
, (AG)
<sub>6</sub>
(AG)
<sub>6</sub>
, (AG)
<sub>6</sub>
(AG)
<sub>7</sub>
, and (AG)
<sub>7</sub>
(AG)
<sub>7</sub>
) and 15 possible compositions of cDNA forms (i.e., (AG)
<sub>4</sub>
(AG)
<sub>4</sub>
, (AG)
<sub>4</sub>
(AG)
<sub>5</sub>
, … , and (AG)
<sub>8</sub>
(AG)
<sub>8</sub>
). Considering all possible compositions allows us to calculate the probability of changes from DNA to RNA, and from RNA to cDNA, which permits the derivation of the distribution of STR forms at the RNA and the cDNA stages.With this formulation, the likelihood function at locus
<italic>j</italic>
can be represented as
<disp-formula id="E1">
<mml:math id="EQ1">
<mml:mrow>
<mml:mi mathvariant="script">L</mml:mi>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mtext>θ; data </mml:mtext>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
<mml:mtext></mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext></mml:mtext>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mtext>data </mml:mtext>
<mml:mi>j</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext></mml:mtext>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
<mml:mtext></mml:mtext>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mi>r</mml:mi>
</mml:munder>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mtext></mml:mtext>
<mml:mrow>
<mml:mo stretchy="true">(</mml:mo>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mtext></mml:mtext>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn>
<mml:mo>|</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>θ</mml:mi>
</mml:mrow>
<mml:mo stretchy="true">)</mml:mo>
</mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>r</mml:mi>
<mml:mo>|</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
</list-item>
</list>
where
<inline-formula id="IE1">
<mml:math id="IEQ1">
<mml:mi>θ</mml:mi>
<mml:mi></mml:mi>
</mml:math>
</inline-formula>
is a vector of model parameters;
<italic>c</italic>
1 and
<italic>c</italic>
2 are vectors of the numbers of STRs at each STR form in cDNA libraries 1 and 2, respectively; and
<italic>r</italic>
is a vector of the numbers of STRs at the RNA stage. The log likelihood of the data at all
<italic>L</italic>
loci is then
<disp-formula id="E2">
<mml:math id="EQ2">
<mml:mrow>
<mml:mi></mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>θ; data 1, 2, </mml:mtext>
<mml:mo></mml:mo>
<mml:mtext>, </mml:mtext>
<mml:mi>L</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mtext></mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext></mml:mtext>
<mml:mstyle displaystyle="true">
<mml:munderover>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mtext></mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext></mml:mtext>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mi>L</mml:mi>
</mml:munderover>
<mml:mrow>
<mml:mi>log</mml:mi>
<mml:mo></mml:mo>
<mml:mo stretchy="false">[</mml:mo>
<mml:mi mathvariant="script">L</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mtext>θ; data </mml:mtext>
<mml:mi>j</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mstyle>
</mml:mrow>
</mml:math>
</disp-formula>
Note that for a bin size of
<italic>M</italic>
at a stage with
<italic>K</italic>
possible STR forms, the number of compositions is
<italic>M </italic>
+
<italic> K</italic>
− 1 choose
<italic>K</italic>
− 1, which grows quickly as the bin size
<italic>M</italic>
increases. See
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary text S1</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online, for the derivation of our MLE. Note that the bin size incorporates expression level into the model. In practical terms, and assuming RNA sequencing was performed at high depth to capture the vast majority of unique transcripts, expression level is proxied by the number of RNA-seq reads for each locus.</p>
<p>We calculated the probability that the observed data are generated from all possible distributions of RNA and cDNA STR lengths for a given number of sampled molecules
<italic>M</italic>
under the stepwise mutation model (assumptions 3 and 4), to ensure that the estimation is not distorted by an incorrect inference of RNA and cDNA STR profiles. In the transition from cDNA to sequencing reads, we incorporated the sequencing error rates estimated by
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. (2015)</xref>
.</p>
<p>The MLE was implemented in R (
<xref rid="msw139-B110" ref-type="bibr">R Development Core Team</xref>
). We chose the L-BFGS-B (Limited-memory Broyden–Fletcher–Goldfarb–Shanno with box constraints) method (
<xref rid="msw139-B17" ref-type="bibr">Byrd et al. 1995</xref>
;
<xref rid="msw139-B66" ref-type="bibr">Malouf 2002</xref>
) from the “optim” function for parameter searching. The box constraints (parameter limits) were set from 10
<sup></sup>
<sup>9</sup>
to 0.5 for the RT error and RDD rates, and from 0 to 1 for the expansion probabilities. We used the lower bound of 10
<sup></sup>
<sup>9</sup>
as it is several orders of magnitude lower than known STR germ-line mutation rates (10
<sup></sup>
<sup>5</sup>
–10
<sup></sup>
<sup>2</sup>
;
<xref rid="msw139-B93" ref-type="bibr">Sun et al. 2012</xref>
;
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). The upper bound of 0.5 assumes that half of the reads are erroneous. For the expansion probability, a value of 0 indicates all contractions, 0.5 indicates an equal ratio between expansions and contractions, and 1 indicates all expansions.</p>
</sec>
<sec>
<title>Lumping MLE</title>
<p>Due to the computationally intensive nature of the algorithm when bin size
<italic>M</italic>
is large, we also considered a reduced form of the model that lumps the two cDNA forms with smallest repeat number into one class and the two forms with the largest repeat number into another class. That is, in our original model there are five cDNA forms with repeat numbers
<italic>D</italic>
<italic>2, D</italic>
<italic>1, D, D + 1</italic>
, and
<italic>D + 2</italic>
, where
<italic>D</italic>
is the DNA STR length. In this modified approach, we lump forms with length
<italic>D</italic>
<italic>2</italic>
and
<italic>D</italic>
<italic>1</italic>
into a single form
<italic>(D</italic>
<italic>1)lump</italic>
and lump forms
<italic>D + 1</italic>
and
<italic>D + 2</italic>
into a single form
<italic>(D + 1)lump</italic>
. This formulation reduces the complexity of the calculation as we now have
<italic>K = 3</italic>
forms instead of
<italic>K = 5</italic>
forms at the cDNA stage, and this substantially reduces the number of compositions needed to be evaluated from
<italic>M + 5</italic>
<italic>1</italic>
choose 5 − 1 to
<italic>M + 3</italic>
<italic>1</italic>
choose 3 − 1, thereby permitting consideration of larger bin sizes for a fixed amount of computing time. The probability of state
<italic>(D</italic>
<italic>1)lump</italic>
is the sum of the probabilities of states
<italic>D</italic>
<italic>2</italic>
and
<italic>D</italic>
<italic>1</italic>
, and the probability of state
<italic>(D + 1)lump</italic>
is the sum of the probabilities of states
<italic>D + 1</italic>
and
<italic>D + 2</italic>
. As an example, if the genotype is (A)
<sub>8</sub>
, then based on the stepwise mutation model there are three possible forms of RNA ((A)
<sub>7</sub>
, (A)
<sub>8</sub>
, (A)
<sub>9</sub>
) and five possible forms of cDNA ((A)
<sub>6</sub>
, (A)
<sub>7</sub>
, (A)
<sub>8</sub>
, (A)
<sub>9</sub>
, (A)
<sub>10</sub>
). We lump the probabilities of (A)
<sub>6</sub>
and (A)
<sub>7</sub>
and those of (A)
<sub>9</sub>
with (A)
<sub>10</sub>
to reduce the complexity of the calculation. We will refer to this algorithm as “lumping MLE.” Its full description can be found in
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary text S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online. The full and lumping MLE were implemented in R and the resulting software, STR-RNA-MLE, can be downloaded from
<ext-link ext-link-type="uri" xlink:href="https://github.com/Arkarachai/str-rna-mle">https://github.com/Arkarachai/str-rna-mle</ext-link>
.</p>
</sec>
<sec sec-type="methods">
<title>MLE Method Evaluation</title>
<p>To test the ability of our method to estimate its four parameters, we performed simulations to generate random STR length profiles based on fixed RT error and RDD rates (0.01 or 0.05) and expansion probabilities (0.3, 0.7, or 0.8), the number of studied loci (10, 100, 1,000, or 10,000), two replicated cDNA libraries, and the number of RNA and cDNA molecules (ranging from 2 to 17 molecules). We then employed our MLE to infer the parameters for each simulation set using bin sizes of 2, 3, and 5. We generated 100 replicate data sets for a given parameter set, and estimated the parameters for each replicate. The 95% confidence interval for each estimated parameter was calculated from the average of the second and third lowest inferred values to obtain the lower bound, and the average of the second and third highest inferred values to obtain the upper bound, from a set of 100 replicates. We also tested the lumping MLE by comparing the estimated parameters from the data with the number of RNA and cDNA molecules set at 6 and 10, RT error and RDD rates set at 0.1, expansion probabilities of RT errors and RDDs of 0.8, 1,000 loci, and two cDNA libraries using both our standard MLE model and lumping MLE model at a bin size equal to 5.</p>
</sec>
<sec>
<title>Estimation of RDD and RT Errors Using the MLE from the Orangutan Data</title>
<p>For each batch of RNA sequencing, the RNA profiling data of replicated libraries were paired with the DNA genotypes at the same loci. Each batch of replicated sequencing data was analyzed separately. We selected subsets of data with an appropriate number of RNA-seq reads to analyze specific bin sizes. Initially, we chose a bin size of 2 and analyzed a subset of data that had a mean number of RNA-seq reads of three to five reads per locus, requiring a minimum of two reads. The loci within this range of RNA-seq reads were chosen because 1) 65% of the STR-containing loci in our data set have low expression level (with less than five reads per locus;
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S2</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online) and 2) their estimated rates of RT errors and RDDs are less than 2-fold different from the expected values based on our simulations (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary figs. S3 and S4</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). To ensure that our estimated rates are valid for the loci with higher expression levels, we used the lumping MLE model to analyze 1) a subset of data with 6–16 RNA-seq reads per locus, using a bin size of 5, and 2) a subset of data with 49–102 RNA-seq reads per locus, using a bin size of 40. For each sequencing batch, each STR length, and each binning of the data, we generated 100 bootstrap replicates in which the loci had the same DNA length and STR length profiles as in our RNA sequencing data from the two replicated cDNA libraries. We analyzed each bootstrap replicate with the MLE starting with five random initial sets of the four parameters of interest, and chose the parameter estimates with the highest likelihood. We started with five initial parameter sets to avoid hitting local maxima. We then calculated the 95% confidence interval for each of the four model parameters by taking the average of the second and third lowest inferred values as the lowered bound, and the average of the second and third highest values as the upper bound, with parameters estimated at each of 100 bootstrap replicates.</p>
</sec>
<sec>
<title>Analysis of the Barcoded mRNA Sequencing Data</title>
<p>To verify the RT error and RDD rates estimated with the MLE, we evaluated them using the publicly available barcoded RNA sequencing data (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) to which we applied the same stringent filtering parameters (see “Replicated cDNA Construction, Sequencing, and Profiling” in Materials and Methods). The published RNA sequencing data are derived from three different strains of
<italic>Caenorhabditis elegans</italic>
(
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
)
<italic>.</italic>
In this data set, the RNA extracted from each strain was barcoded and then reverse transcribed sequentially three times, and each product of RT was sequenced separately. Based on the barcodes, we traced all sequencing reads that belonged to the same original RNA molecule (referred to as “family”) based on the shared barcode and shared starting genomic mapping coordinate (
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">supplementary fig. S5</ext-link>
,
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary Material</ext-link>
online). We mapped flanking regions of the STR-containing sequencing reads to the
<italic>C. elegans</italic>
genome assembly version Ce10 using BWA (Li and Durbin 2009), and applied a modified STR-FM pipeline (
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. 2015</xref>
). To reduce sequencing errors, at least two STR-containing reads from the same family were used to infer one STR-containing molecule at the cDNA step for each library (note that if two reads mapping to the same locus had STR lengths of 10 and 11, we did not infer cDNA state for this locus in this library). Ideally, these two STR-containing reads should come from overlapping paired-end reads (
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
). However, due to our requirement that STR and 20 bp of their flanking regions upstream and downstream must be located within the reads, we found only six pairs of reads that came from overlapping paired-end reads. Therefore, to infer STR length at a cDNA molecule, instead of using overlapping paired-end reads, we considered all reads from the same family in a library, even though some of them might have constituted PCR duplicates. Next, to infer STR lengths at RNA molecules and RT errors, we utilized cDNA STRs from the same family present in at least two cDNA libraries. Finally, to infer RDDs, we collected all inferred RNA molecules that mapped to the same STR locus.</p>
<p>The rates of RT errors of RDDs were calculated from the proportion of reads with incongruent STR length per locus. For example, if in a cDNA library STR reads with lengths of 10, 10, and 11 bp belonged to the same family, then we inferred the consensus RNA to have an STR length of 10 bp, two cDNA reads with no RT errors, and one cDNA read with a 1-bp expansion RT error. If we observed only two cDNA reads that differed from each other, then we used the consensus length (the unambiguous majority length of STR reads mapping to that locus regardless of the family or library), to polarize the direction of an error. For example, if two cDNA reads in a certain family had an STR with lengths of 10 and 11 bp at the same locus, and the most common cDNA STR length for all the families at this STR locus was 10 bp, then we inferred that the STR of 11 bp was erroneous. Our error estimation did not include any cDNA families or RNA molecules for which we could infer only one cDNA molecule, or one RNA molecule, at a certain locus, as errors could not be inferred in such cases.</p>
<p>As an alternative method of comparison, the RNA sequencing STR length profile of
<italic>C. elegans</italic>
(
<xref rid="msw139-B41" ref-type="bibr">Gout et al. 2013</xref>
) was also used to estimate the RDD and RT error rates inferred using MLE without utilizing the barcode information. For each of the three
<italic>C. elegans</italic>
strains, the two (out of three) replicated cDNA libraries with the highest sequencing depths were chosen, and the data were processed exactly as for the orangutan data above (with
<italic>M </italic>
= 2). To infer cDNA molecules, we employed the sequencing error rates from
<xref rid="msw139-B34" ref-type="bibr">Fungtammasan et al. (2015)</xref>
instead of using the information from barcoded RNA sequencing reads.</p>
</sec>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>
<ext-link ext-link-type="uri" xlink:href="http://mbe.oxfordjournals.org/lookup/suppl/doi:10.1093/molbev/msw139/-/DC1">Supplementary texts S1–S3, figures S1–S12, and tables S1–S10</ext-link>
are available at
<italic>Molecular Biology and Evolution</italic>
online (
<ext-link ext-link-type="uri" xlink:href="http://www.mbe.oxfordjournals.org/">http://www.mbe.oxfordjournals.org/</ext-link>
).</p>
<supplementary-material id="PMC_1" content-type="local-data">
<caption>
<title>Supplementary Data</title>
</caption>
<media mimetype="text" mime-subtype="html" xlink:href="supp_33_10_2744__index.html"></media>
<media xlink:role="associated-file" mimetype="application" mime-subtype="x-zip-compressed" xlink:href="supp_msw139_suppl_data.zip"></media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The authors thank Francesca Chiaromonte, Paul Medvedev, Guruprasad Ananda, Rahulsimham Vegesna, Wilfried Guiblet, Suzanne Hile, Akshay Kakumanu, Pimpajee Navakulsirinat, Samarth Rangavittal, Monika Cechova, and Boris Rebolledo-Jaramillo for their suggestions on data analysis and methods; Jean-François Gout for his assistance in utilizing the barcoded mRNA data; Dan Mishmar for his encouragement to conduct this study; the Genomics Core Facility at the Huck Institutes of the Life Sciences for their help with sequencing; and the Smithsonian Institute for providing tissues. This work was supported in part by the NIH grant
<award-id>R01-GM087472</award-id>
to K.A.E. and K.D.M., the NSF grant
<award-id>DBI-0965596</award-id>
to K.D.M., the Penn State Clinical and Translational Sciences Institute, the NSF instrumentation grant
<award-id>OCI–0821527,</award-id>
the USDA-AFRI graduate fellowship to A.F., the Pennsylvania Department of Health using Tobacco CURE Funds Penn State Clinical and Translational Sciences Institute, and startup funds from the
<funding-source>Pennsylvania State University Eberly College of Science</funding-source>
to M.D. The Department of Health specifically disclaims responsibility for any analyses, interpretations, or conclusions. Portions of this research were conducted with Advanced CyberInfrastructure computational resources provided by the Institute for CyberScience at The Pennsylvania State University.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="msw139-B1">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Abdul-Muneer</surname>
<given-names>PM.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Application of microsatellite markers in conservation genetics and fisheries management: recent advances in population structure analysis and conservation strategies</article-title>
.
<source>Genet Res Int</source>
. 2014:
<fpage>e691759</fpage>
.</mixed-citation>
</ref>
<ref id="msw139-B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amos</surname>
<given-names>W.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Heterozygosity increases microsatellite mutation rate</article-title>
.
<source>Biol Lett</source>
.
<volume>1291</volume>
:
<fpage>20150929.</fpage>
<pub-id pub-id-type="pmid">26740567</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B101">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ananda</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chiaromonte</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Makova</surname>
<given-names>KD.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>A genome-wide view of mutation rate co-variation using multivariate analyses</article-title>
.
<source>Genome Biol.</source>
<volume>12</volume>
:
<fpage>R27</fpage>
.
<pub-id pub-id-type="pmid">21426544</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ananda</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Walsh</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Jacob</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Krasilnikova</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Chiaromonte</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Makova</surname>
<given-names>KD.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Distinct mutational behaviors differentiate short tandem repeats from microsatellites in the human genome</article-title>
.
<source>Genome Biol Evol</source>
.
<volume>5</volume>
:
<fpage>606</fpage>
<lpage>620</lpage>
.
<pub-id pub-id-type="pmid">23241442</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Arezi</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hogrefe</surname>
<given-names>HH.</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Escherichia coli DNA polymerase III epsilon subunit increases Moloney murine leukemia virus reverse transcriptase fidelity and accuracy of RT-PCR procedures</article-title>
.
<source>Anal Biochem</source>
.
<volume>360</volume>
:
<fpage>84</fpage>
<lpage>91</lpage>
.
<pub-id pub-id-type="pmid">17107651</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bahn</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>J-H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Greer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>X.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Accurate identification of A-to-I RNA editing in human by transcriptome sequencing</article-title>
.
<source>Genome Res</source>
.
<volume>22</volume>
:
<fpage>142</fpage>
<lpage>150</lpage>
.
<pub-id pub-id-type="pmid">21960545</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baptiste</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Ananda</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Strubczewski</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lutzkanin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Khoo</surname>
<given-names>SJ</given-names>
</name>
<name>
<surname>Srikanth</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Makova</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Krasilnikova</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>KA.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Mature microsatellites: mechanisms underlying dinucleotide microsatellite mutational biases in human cells</article-title>
.
<source>G3 (Bethesda)</source>
.
<volume>3</volume>
:
<fpage>451</fpage>
<lpage>463</lpage>
.
<pub-id pub-id-type="pmid">23450065</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B7">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Baptiste</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Jacob</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>KA.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Genetic evidence that both dNTP-stabilized and strand slippage mechanisms may dictate DNA polymerase errors within mononucleotide microsatellites</article-title>
.
<source>DNA Repair</source>
<volume>29</volume>
:
<fpage>91</fpage>
<lpage>100</lpage>
.
<pub-id pub-id-type="pmid">25758780</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barrioluengo</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Alvarez</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barbieri</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Menéndez-Arias</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Thermostable HIV-1 group O reverse transcriptase variants with the same fidelity as murine leukaemia virus reverse transcriptase</article-title>
.
<source>Biochem J</source>
.
<volume>436</volume>
:
<fpage>599</fpage>
<lpage>607</lpage>
.
<pub-id pub-id-type="pmid">21446917</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bass</surname>
<given-names>BL.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>RNA editing by adenosine deaminases that act on RNA</article-title>
.
<source>Annu Rev Biochem</source>
.
<volume>71</volume>
:
<fpage>817</fpage>
<lpage>846</lpage>
.
<pub-id pub-id-type="pmid">12045112</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B10">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bass</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hundley</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Peng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Pickrell</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>XG</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>The difficult calls in RNA editing</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>30</volume>
:
<fpage>1207</fpage>
<lpage>1209</lpage>
.
<pub-id pub-id-type="pmid">23222792</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B11">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blank</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gallant</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Burgess</surname>
<given-names>RR</given-names>
</name>
<name>
<surname>Loeb</surname>
<given-names>LA.</given-names>
</name>
</person-group>
<year>1986</year>
<article-title>An RNA polymerase mutant with reduced accuracy of chain elongation</article-title>
.
<source>Biochemistry</source>
<volume>25</volume>
:
<fpage>5920</fpage>
<lpage>5928</lpage>
.
<pub-id pub-id-type="pmid">3098280</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blankenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kuster</surname>
<given-names>GV</given-names>
</name>
<name>
<surname>Bouvier</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Baker</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Afgan</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stoler</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nekrutenko</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Dissemination of scientific software with Galaxy ToolShed</article-title>
.
<source>Genome Biol</source>
.
<volume>15</volume>
:
<fpage>403.</fpage>
<pub-id pub-id-type="pmid">25001293</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Blankenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Von Kuster</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Coraor</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Ananda</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lazarus</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mangan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nekrutenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Galaxy: a web-based genome analysis tool for experimentalists</article-title>
.
<source>Curr Protoc Mol Biol Ed Frederick M Ausubel Al</source>
Chapter 19:Unit 19.10.1–21.</mixed-citation>
</ref>
<ref id="msw139-B14">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boby</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Patch</surname>
<given-names>A-M</given-names>
</name>
<name>
<surname>Aves</surname>
<given-names>SJ.</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>TRbase: a database relating tandem repeats to disease genes for the human genome</article-title>
.
<source>Bioinformatics</source>
<volume>21</volume>
:
<fpage>811</fpage>
<lpage>816</lpage>
.
<pub-id pub-id-type="pmid">15479712</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B15">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Borel</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ferreira</surname>
<given-names>PG</given-names>
</name>
<name>
<surname>Santoni</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Delaneau</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Fort</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Popadin</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Garieri</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Falconnet</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Ribaux</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Guipponi</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
<article-title>Biased allelic expression in human primary fibroblast single cells</article-title>
.
<source>Am J Hum Genet</source>
.
<volume>96</volume>
:
<fpage>70</fpage>
<lpage>80</lpage>
.
<pub-id pub-id-type="pmid">25557783</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brais</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Bouchard</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>YG</given-names>
</name>
<name>
<surname>Rochefort</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Chrétien</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Tomé</surname>
<given-names>FM</given-names>
</name>
<name>
<surname>Lafrenière</surname>
<given-names>RG</given-names>
</name>
<name>
<surname>Rommens</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Uyama</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Nohira</surname>
<given-names>O</given-names>
</name>
</person-group>
,
<etal></etal>
<year>1998</year>
<article-title>Short GCG expansions in the PABP2 gene cause oculopharyngeal muscular dystrophy</article-title>
.
<source>Nat Genet</source>
.
<volume>18</volume>
:
<fpage>164</fpage>
<lpage>167</lpage>
.
<pub-id pub-id-type="pmid">9462747</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B17">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Byrd</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Nocedal</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>1995</year>
<article-title>A limited memory algorithm for bound constrained optimization</article-title>
.
<source>SIAM J Sci Comput</source>
.
<volume>16</volume>
:
<fpage>1190</fpage>
<lpage>1208</lpage>
.</mixed-citation>
</ref>
<ref id="msw139-B18">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castel</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Cleary</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Pearson</surname>
<given-names>CE.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Repeat instability as the basis for human diseases and as a potential target for therapy</article-title>
.
<source>Nat Rev Mol Cell Biol</source>
.
<volume>11</volume>
:
<fpage>165</fpage>
<lpage>170</lpage>
.
<pub-id pub-id-type="pmid">20177394</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B19">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Mias</surname>
<given-names>GI</given-names>
</name>
<name>
<surname>Li-Pook-Than</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Lam</surname>
<given-names>HYK</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Miriami</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Karczewski</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Hariharan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Dewey</surname>
<given-names>FE</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2012</year>
<article-title>Personal omics profiling reveals dynamic molecular and medical phenotypes</article-title>
.
<source>Cell</source>
<volume>148</volume>
:
<fpage>1293</fpage>
<lpage>1307</lpage>
.
<pub-id pub-id-type="pmid">22424236</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B20">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conesa</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Madrigal</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tarazona</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gomez-Cabrero</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cervera</surname>
<given-names>A</given-names>
</name>
<name>
<surname>McPherson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Szcześniak</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Gaffney</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Elo</surname>
<given-names>LL</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2016</year>
<article-title>A survey of best practices for RNA-seq data analysis</article-title>
.
<source>Genome Biol</source>
.
<volume>17</volume>
:
<fpage>13.</fpage>
<pub-id pub-id-type="pmid">26813401</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corneille</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lutz</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Maliga</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Conservation of RNA editing between rice and maize plastids: are most editing events dispensable?</article-title>
<source>Mol Gen Genet</source>
.
<volume>264</volume>
:
<fpage>419</fpage>
<lpage>424</lpage>
.
<pub-id pub-id-type="pmid">11129045</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Danecek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Nellåker</surname>
<given-names>C</given-names>
</name>
<name>
<surname>McIntyre</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Buendia-Buendia</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Bumpstead</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ponting</surname>
<given-names>CP</given-names>
</name>
<name>
<surname>Flint</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Keane</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>DJ.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>High levels of RNA-editing site conservation amongst 15 laboratory mouse strains</article-title>
.
<source>Genome Biol</source>
.
<volume>13</volume>
:
<fpage>26.</fpage>
<pub-id pub-id-type="pmid">22524474</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B23">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>DaRe</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Vasta</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Penn</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tran</surname>
<given-names>N-TB</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>SH.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Targeted exome sequencing for mitochondrial disorders reveals high genetic heterogeneity</article-title>
.
<source>BMC Med Genet</source>
.
<volume>14</volume>
:
<fpage>118.</fpage>
<pub-id pub-id-type="pmid">24215330</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>De Wit</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Pespeni</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Ladner</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Barshis</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Seneca</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Jaris</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Therkildsen</surname>
<given-names>NO</given-names>
</name>
<name>
<surname>Morikawa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Palumbi</surname>
<given-names>SR.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>The simple fool’s guide to population genomics via RNA-Seq: an introduction to high-throughput sequencing data analysis</article-title>
.
<source>Mol Ecol Resour</source>
.
<volume>12</volume>
:
<fpage>1058</fpage>
<lpage>1067</lpage>
.
<pub-id pub-id-type="pmid">22931062</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Denver</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Kewalramani</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>KE</given-names>
</name>
<name>
<surname>Chow</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Estes</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lynch</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>WK.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Abundance, distribution, and mutation rates of homopolymeric nucleotide runs in the genome of Caenorhabditis elegans</article-title>
.
<source>J Mol Evol</source>
.
<volume>58</volume>
:
<fpage>584</fpage>
<lpage>595</lpage>
.
<pub-id pub-id-type="pmid">15170261</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B26">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dobin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Schlesinger</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Drenkow</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zaleski</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jha</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Batut</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Chaisson</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gingeras</surname>
<given-names>TR.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>STAR: ultrafast universal RNA-seq aligner</article-title>
.
<source>Bioinformatics</source>
<volume>29</volume>
:
<fpage>15</fpage>
<lpage>21</lpage>
.
<pub-id pub-id-type="pmid">23104886</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B27">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Drake</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Charlesworth</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Charlesworth</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Crow</surname>
<given-names>JF.</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Rates of spontaneous mutation</article-title>
.
<source>Genetics</source>
<volume>148</volume>
:
<fpage>1667</fpage>
<lpage>1686</lpage>
.
<pub-id pub-id-type="pmid">9560386</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eckert</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Kunkel</surname>
<given-names>TA.</given-names>
</name>
</person-group>
<year>1990</year>
<article-title>High fidelity DNA synthesis by the Thermus aquaticus DNA polymerase</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>18</volume>
:
<fpage>3739</fpage>
<lpage>3744</lpage>
.
<pub-id pub-id-type="pmid">2374708</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B102">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eckert</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Kunkel</surname>
<given-names>TA.</given-names>
</name>
</person-group>
<year>1991</year>
<article-title>DNA polymerase fidelity and the polymerase chain reaction</article-title>
.
<source>PCR Methods Appl.</source>
<volume>1</volume>
:
<fpage>17</fpage>
<lpage>24</lpage>
.
<pub-id pub-id-type="pmid">1842916</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ellegren</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Microsatellite mutations in the germline: implications for evolutionary inference</article-title>
.
<source>Trends Genet</source>
.
<volume>16</volume>
:
<fpage>551</fpage>
<lpage>558</lpage>
.
<pub-id pub-id-type="pmid">11102705</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B30">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ellegren</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Microsatellites: simple sequences with complex evolution</article-title>
.
<source>Nat Rev Genet</source>
.
<volume>5</volume>
:
<fpage>435</fpage>
<lpage>445</lpage>
.
<pub-id pub-id-type="pmid">15153996</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B31">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elowitz</surname>
<given-names>MB</given-names>
</name>
<name>
<surname>Levine</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Siggia</surname>
<given-names>ED</given-names>
</name>
<name>
<surname>Swain</surname>
<given-names>PS.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Stochastic gene expression in a single cell</article-title>
.
<source>Science</source>
<volume>297</volume>
:
<fpage>1183</fpage>
<lpage>1186</lpage>
.
<pub-id pub-id-type="pmid">12183631</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B103">
<mixed-citation publication-type="other">
<collab>Encyclopedia of DNA Elements</collab>
. Experiment Guidelines – ENCODE. [cited 2016 July 16] Available from:
<ext-link ext-link-type="uri" xlink:href="https://www.encodeproject.org/about/experiment-guidelines/">https://www.encodeproject.org/about/experiment-guidelines/</ext-link>
.</mixed-citation>
</ref>
<ref id="msw139-B33">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Feng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bai</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Allan</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Ferguson</surname>
<given-names>IB</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>K.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Transcriptomic analysis of Chinese bayberry (Myrica rubra) fruit development and ripening using RNA-Seq</article-title>
.
<source>BMC Genomics</source>
<volume>13</volume>
:
<fpage>19.</fpage>
<pub-id pub-id-type="pmid">22244270</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B34">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fungtammasan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Ananda</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Hile</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Su</surname>
<given-names>MS-W</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Medvedev</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Makova</surname>
<given-names>KD.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Accurate typing of short tandem repeats from genome-wide sequencing data and its applications</article-title>
.
<source>Genome Res</source>
.
<volume>25</volume>
:
<fpage>736</fpage>
<lpage>749</lpage>
.
<pub-id pub-id-type="pmid">25823460</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B105">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garber</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grabherr</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Guttman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Computational methods for transcriptome annotation and quantification using RNA-seq</article-title>
.
<source>Nat. Methods</source>
<volume>8</volume>
:
<fpage>469</fpage>
<lpage>477</lpage>
.
<pub-id pub-id-type="pmid">21623353</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B35">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garrett</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rosenthal</surname>
<given-names>JJC.</given-names>
</name>
</person-group>
<year>2012a</year>
<article-title>A role for A-to-I RNA editing in temperature adaptation</article-title>
.
<source>Physiol</source>
<volume>27</volume>
:
<fpage>362</fpage>
<lpage>369</lpage>
.</mixed-citation>
</ref>
<ref id="msw139-B36">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Garrett</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rosenthal</surname>
<given-names>JJC.</given-names>
</name>
</person-group>
<year>2012b</year>
<article-title>RNA editing underlies temperature adaptation in K+ channels from polar octopuses</article-title>
.
<source>Science</source>
<volume>335</volume>
:
<fpage>848</fpage>
<lpage>851</lpage>
.
<pub-id pub-id-type="pmid">22223739</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B37">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gayral</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Melo-Ferreira</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Glémin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bierne</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Carneiro</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Nabholz</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Lourenco</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Alves</surname>
<given-names>PC</given-names>
</name>
<name>
<surname>Ballenghien</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Faivre</surname>
<given-names>N</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2013</year>
<article-title>Reference-free population genomics from next-generation transcriptome data and the vertebrate–invertebrate gap</article-title>
.
<source>PLoS Genet</source>
.
<volume>9</volume>
:
<fpage>e1003457.</fpage>
<pub-id pub-id-type="pmid">23593039</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B38">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Giardine</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Riemer</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hardison</surname>
<given-names>RC</given-names>
</name>
<name>
<surname>Burhans</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Elnitski</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Shah</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Blankenberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Albert</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2005</year>
<article-title>Galaxy: a platform for interactive large-scale genome analysis</article-title>
.
<source>Genome Res</source>
.
<volume>15</volume>
:
<fpage>1451</fpage>
<lpage>1455</lpage>
.
<pub-id pub-id-type="pmid">16169926</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B39">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Goecks</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Nekrutenko</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences</article-title>
.
<source>Genome Biol</source>
.
<volume>11</volume>
:
<fpage>R86.</fpage>
<pub-id pub-id-type="pmid">20738864</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B40">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gommans</surname>
<given-names>WM</given-names>
</name>
<name>
<surname>Mullen</surname>
<given-names>SP</given-names>
</name>
<name>
<surname>Maas</surname>
<given-names>S.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>RNA editing: a driving force for adaptive evolution?</article-title>
<source>BioEssays</source>
<volume>31</volume>
:
<fpage>1137</fpage>
<lpage>1145</lpage>
.
<pub-id pub-id-type="pmid">19708020</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B41">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gout</surname>
<given-names>J-F</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>WK</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Okamoto</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lynch</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Large-scale detection of in vivo transcription errors</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<volume>110</volume>
:
<fpage>18584</fpage>
<lpage>18589</lpage>
.
<pub-id pub-id-type="pmid">24167253</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B42">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gowen</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Fong</surname>
<given-names>SS.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>Genome-scale metabolic model integrated with RNAseq data to identify metabolic states of Clostridium thermocellum</article-title>
.
<source>Biotechnol J</source>
.
<volume>5</volume>
:
<fpage>759</fpage>
<lpage>767</lpage>
.
<pub-id pub-id-type="pmid">20665646</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B43">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffin</surname>
<given-names>HR</given-names>
</name>
<name>
<surname>Pyle</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Blakely</surname>
<given-names>EL</given-names>
</name>
<name>
<surname>Alston</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Duff</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hudson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Horvath</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>IJ</given-names>
</name>
<name>
<surname>Santibanez-Koref</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Taylor</surname>
<given-names>RW</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2014</year>
<article-title>Accurate mitochondrial DNA sequencing using off-target reads provides a single test to identify pathogenic point mutations</article-title>
.
<source>Genet Med</source>
.
<volume>16</volume>
:
<fpage>962</fpage>
<lpage>971</lpage>
.
<pub-id pub-id-type="pmid">24901348</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B44">
<mixed-citation publication-type="journal">
<collab>GTEx Consortium</collab>
,
<person-group person-group-type="author">
<name>
<surname>Ardlie</surname>
<given-names>KG</given-names>
</name>
<name>
<surname>Deluca</surname>
<given-names>DS</given-names>
</name>
<name>
<surname>Segrè</surname>
<given-names>AV</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>TJ</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Gelfand</surname>
<given-names>ET</given-names>
</name>
<name>
<surname>Trowbridge</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Maller</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Tukiainen</surname>
<given-names>T</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
<article-title>The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans</article-title>
.
<source>Science</source>
<volume>348</volume>
:
<fpage>648</fpage>
<lpage>660</lpage>
.
<pub-id pub-id-type="pmid">25954001</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B45">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Gatti</surname>
<given-names>DM</given-names>
</name>
<name>
<surname>Srivastava</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Raghupathy</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Simecek</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Svenson</surname>
<given-names>KL</given-names>
</name>
<name>
<surname>Dotu</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Chuang</surname>
<given-names>JH</given-names>
</name>
<name>
<surname>Keller</surname>
<given-names>MP</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2016</year>
<article-title>Genetic architectures of quantitative variation in RNA editing pathways</article-title>
.
<source>Genetics</source>
<volume>202</volume>
:
<fpage>787</fpage>
<lpage>798</lpage>
.
<pub-id pub-id-type="pmid">26614740</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B46">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guo</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>C-I</given-names>
</name>
<name>
<surname>Shyr</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Samuels</surname>
<given-names>DC.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis</article-title>
.
<source>Bioinformatics</source>
<volume>29</volume>
:
<fpage>1210</fpage>
<lpage>1211</lpage>
.
<pub-id pub-id-type="pmid">23471301</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B47">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gupta</surname>
<given-names>PK</given-names>
</name>
<name>
<surname>Varshney</surname>
<given-names>RK.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat</article-title>
.
<source>Euphytica</source>
<volume>113</volume>
:
<fpage>163</fpage>
<lpage>185</lpage>
.</mixed-citation>
</ref>
<ref id="msw139-B48">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gymrek</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Golan</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Rosset</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Erlich</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>lobSTR: a short tandem repeat profiler for personal genomes</article-title>
.
<source>Genome Res</source>
.
<volume>22</volume>
:
<fpage>1154</fpage>
<lpage>1162</lpage>
.
<pub-id pub-id-type="pmid">22522390</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B49">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ho</surname>
<given-names>M-R</given-names>
</name>
<name>
<surname>Tsai</surname>
<given-names>K-W</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>W.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>dbDNV: a resource of duplicated gene nucleotide variants in human genome</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>39</volume>
:
<fpage>D920</fpage>
<lpage>D925</lpage>
.
<pub-id pub-id-type="pmid">21097891</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B50">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ibrahim</surname>
<given-names>ME</given-names>
</name>
<name>
<surname>Mahdi</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bereir</surname>
<given-names>RE</given-names>
</name>
<name>
<surname>Giha</surname>
<given-names>RS</given-names>
</name>
<name>
<surname>Wasunna</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Evolutionary conservation of RNA editing in the genus Leishmania</article-title>
.
<source>Infect Genet Evol</source>
.
<volume>8</volume>
:
<fpage>378</fpage>
<lpage>380</lpage>
.
<pub-id pub-id-type="pmid">18378193</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B51">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ji</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Loeb</surname>
<given-names>LA.</given-names>
</name>
</person-group>
<year>1992</year>
<article-title>Fidelity of HIV-1 reverse transcriptase copying RNA in vitro</article-title>
.
<source>Biochemistry</source>
<volume>31</volume>
:
<fpage>954</fpage>
<lpage>958</lpage>
.
<pub-id pub-id-type="pmid">1370910</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B52">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kaufmann</surname>
<given-names>BB</given-names>
</name>
<name>
<surname>van Oudenaarden</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Stochastic gene expression: from single molecules to the proteome</article-title>
.
<source>Curr Opin Genet Dev</source>
.
<volume>17</volume>
:
<fpage>107</fpage>
<lpage>112</lpage>
.
<pub-id pub-id-type="pmid">17317149</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B53">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kelkar</surname>
<given-names>YD</given-names>
</name>
<name>
<surname>Strubczewski</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Hile</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Chiaromonte</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Eckert</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Makova</surname>
<given-names>KD.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats</article-title>
.
<source>Genome Biol Evol</source>
.
<volume>2</volume>
:
<fpage>620</fpage>
<lpage>635</lpage>
.
<pub-id pub-id-type="pmid">20668018</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B54">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khatri</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sirota</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Butte</surname>
<given-names>AJ.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Ten years of pathway analysis: current approaches and outstanding challenges</article-title>
.
<source>PLoS Comput Biol</source>
.
<volume>8</volume>
:
<fpage>e1002375.</fpage>
<pub-id pub-id-type="pmid">22383865</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B55">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Pertea</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pimentel</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Kelley</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions</article-title>
.
<source>Genome Biol</source>
.
<volume>14</volume>
:
<fpage>R36.</fpage>
<pub-id pub-id-type="pmid">23618408</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B56">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kimura</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ohta</surname>
<given-names>T.</given-names>
</name>
</person-group>
<year>1978</year>
<article-title>Stepwise mutation model and distribution of allelic frequencies in a finite population</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<volume>75</volume>
:
<fpage>2868</fpage>
<lpage>2872</lpage>
.
<pub-id pub-id-type="pmid">275857</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B104">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kleinman</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Majewski</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome.”</article-title>
<source>Science</source>
<volume>335</volume>
:
<fpage>1302c</fpage>
.
<pub-id pub-id-type="pmid">22422962</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B57">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Knippa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Peterson</surname>
<given-names>DO.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Fidelity of RNA Polymerase II transcription: role of Rbp9 in error detection and proofreading</article-title>
.
<source>Biochemistry</source>
<volume>52</volume>
:
<fpage>7807</fpage>
<lpage>7817</lpage>
.
<pub-id pub-id-type="pmid">24099331</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B58">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kong</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Frigge</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Masson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Besenbacher</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Sulem</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Magnusson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Gudjonsson</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Sigurdsson</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Jonasdottir</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Jonasdottir</surname>
<given-names>A</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2012</year>
<article-title>Rate of de novo mutations and the importance of father’s age to disease risk</article-title>
.
<source>Nature</source>
<volume>488</volume>
:
<fpage>471</fpage>
<lpage>475</lpage>
.
<pub-id pub-id-type="pmid">22914163</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B59">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Smallbone</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dunn</surname>
<given-names>WB</given-names>
</name>
<name>
<surname>Murabito</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Winder</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Kell</surname>
<given-names>DB</given-names>
</name>
<name>
<surname>Mendes</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Swainston</surname>
<given-names>N.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Improving metabolic flux predictions using absolute gene expression data</article-title>
.
<source>BMC Syst Biol</source>
.
<volume>6</volume>
:
<fpage>73.</fpage>
<pub-id pub-id-type="pmid">22713172</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B106">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Legendre</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Pochet</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pak</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Verstrepen</surname>
<given-names>KJ.</given-names>
</name>
</person-group>
<year>2007</year>
<article-title>Sequence-based estimation of minisatellite and microsatellite repeat variability</article-title>
.
<source>Genome Res.</source>
<volume>17</volume>
:
<fpage>1787</fpage>
<lpage>1796</lpage>
.
<pub-id pub-id-type="pmid">17978285</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B60">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Leung</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Jung</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Rajagopal</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Schmitt</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Selvaraj</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>AY</given-names>
</name>
<name>
<surname>Yen</surname>
<given-names>C-A</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Qiu</surname>
<given-names>Y</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
<article-title>Integrative analysis of haplotype-resolved epigenomes across human tissues</article-title>
.
<source>Nature</source>
<volume>518</volume>
:
<fpage>350</fpage>
<lpage>354</lpage>
.
<pub-id pub-id-type="pmid">25693566</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B61">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Levanon</surname>
<given-names>EY</given-names>
</name>
<name>
<surname>Yoon</surname>
<given-names>J-K</given-names>
</name>
<name>
<surname>Aach</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>B</given-names>
</name>
<name>
<surname>LeProust</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Gao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Church</surname>
<given-names>GM.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing</article-title>
.
<source>Science</source>
<volume>324</volume>
:
<fpage>1210</fpage>
<lpage>1213</lpage>
.
<pub-id pub-id-type="pmid">19478186</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B107">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>
.
<source>Bioinforma. Oxf. Engl.</source>
<volume>25</volume>
:
<fpage>1754</fpage>
<lpage>1760</lpage>
.</mixed-citation>
</ref>
<ref id="msw139-B62">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>IX</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Bruzel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Toung</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>VG.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Widespread RNA and DNA sequence differences in the human transcriptome</article-title>
.
<source>Science</source>
<volume>333</volume>
:
<fpage>53</fpage>
<lpage>58</lpage>
.
<pub-id pub-id-type="pmid">21596952</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B63">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Piskol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>JB.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Comment on "Widespread RNA and DNA sequence differences in the human transcriptome."</article-title>
<source>Science</source>
<volume>335</volume>
:
<fpage>1302e</fpage>
.
<pub-id pub-id-type="pmid">22422964</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B64">
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Macaulay</surname>
<given-names>IC</given-names>
</name>
<name>
<surname>Haerty</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>YI</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>TX</given-names>
</name>
<name>
<surname>Teng</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Goolam</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Saurat</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Coupland</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Shirley</surname>
<given-names>LM</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods. 12:519–522.</mixed-citation>
</ref>
<ref id="msw139-B65">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madsen</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Villesen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Wiuf</surname>
<given-names>C.</given-names>
</name>
</person-group>
<year>2008</year>
<article-title>Short tandem repeats in human exons: a target for disease mutations</article-title>
.
<source>BMC Genomics</source>
<volume>9</volume>
:
<fpage>410.</fpage>
<pub-id pub-id-type="pmid">18789129</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B66">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Malouf</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2002</year>
<chapter-title>A comparison of algorithms for maximum entropy parameter estimation.</chapter-title>
In:
<source>Proceedings of the 6th Conference on Natural Language Learning—Volume 20. COLING-02.</source>
<publisher-loc>Stroudsburg, PA</publisher-loc>
:
<publisher-name>Association for Computational Linguistics</publisher-name>
p.
<fpage>1</fpage>
<lpage>7</lpage>
. [cited 2016 July 16]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3115/1118853.1118871">http://dx.doi.org/10.3115/1118853.1118871</ext-link>
</mixed-citation>
</ref>
<ref id="msw139-B67">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>McCarthy</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Smyth</surname>
<given-names>GK.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation</article-title>
.
<source>Nucleic Acids Res</source>
.
<volume>40</volume>
:
<fpage>4288</fpage>
<lpage>4297</lpage>
.
<pub-id pub-id-type="pmid">22287627</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B68">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Miah</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Rafii</surname>
<given-names>MY</given-names>
</name>
<name>
<surname>Ismail</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Puteh</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Rahim</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Islam</surname>
<given-names>KN</given-names>
</name>
<name>
<surname>Latif</surname>
<given-names>MA.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>A review of microsatellite markers and their applications in rice breeding programs to improve blast disease resistance</article-title>
.
<source>Int J Mol Sci</source>
.
<volume>14</volume>
:
<fpage>22499</fpage>
<lpage>22528</lpage>
.
<pub-id pub-id-type="pmid">24240810</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B69">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ninio</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>1991</year>
<article-title>Connections between translation, transcription and replication error-rates</article-title>
.
<source>Biochimie</source>
<volume>73</volume>
:
<fpage>1517</fpage>
<lpage>1523</lpage>
.
<pub-id pub-id-type="pmid">1805967</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B70">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>O’Huallachain</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Karczewski</surname>
<given-names>KJ</given-names>
</name>
<name>
<surname>Weissman</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Urban</surname>
<given-names>AE</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>MP.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Extensive genetic variation in somatic human tissues</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<volume>109</volume>
:
<fpage>18018</fpage>
<lpage>18023</lpage>
.
<pub-id pub-id-type="pmid">23043118</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B71">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Oshlack</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Robinson</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Young</surname>
<given-names>MD.</given-names>
</name>
</person-group>
<year>2010</year>
<article-title>From RNA-seq reads to differential expression results</article-title>
.
<source>Genome Biol</source>
.
<volume>11</volume>
:
<fpage>220.</fpage>
<pub-id pub-id-type="pmid">21176179</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B72">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ozbudak</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Thattai</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Kurtser</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Grossman</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>van Oudenaarden</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Regulation of noise in the expression of a single gene</article-title>
.
<source>Nat Genet</source>
.
<volume>31</volume>
:
<fpage>69</fpage>
<lpage>73</lpage>
.
<pub-id pub-id-type="pmid">11967532</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B73">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ozsolak</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Milos</surname>
<given-names>PM.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>RNA sequencing: advances, challenges and opportunities</article-title>
.
<source>Nat Rev Genet</source>
.
<volume>12</volume>
:
<fpage>87</fpage>
<lpage>98</lpage>
.
<pub-id pub-id-type="pmid">21191423</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B74">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pachter</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>A closer look at RNA editing</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>30</volume>
:
<fpage>246</fpage>
<lpage>247</lpage>
.
<pub-id pub-id-type="pmid">22398619</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B108">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Mortazavi</surname>
<given-names>A.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>RNA editing in the human ENCODE RNA-seq data</article-title>
.
<source>Genome Res.</source>
<volume>22</volume>
:
<fpage>1626</fpage>
<lpage>1633</lpage>
.
<pub-id pub-id-type="pmid">22955975</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B75">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pearson</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Edamura</surname>
<given-names>KN</given-names>
</name>
<name>
<surname>Cleary</surname>
<given-names>JD.</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Repeat instability: mechanisms of dynamic mutations</article-title>
.
<source>Nat Rev Genet</source>
.
<volume>6</volume>
:
<fpage>729</fpage>
<lpage>742</lpage>
.
<pub-id pub-id-type="pmid">16205713</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B76">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>BC-M</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Tian</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>X</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2012</year>
<article-title>Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>30</volume>
:
<fpage>253</fpage>
<lpage>260</lpage>
.
<pub-id pub-id-type="pmid">22327324</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B77">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Perez</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>Rubinstein</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Fernandez</surname>
<given-names>DE</given-names>
</name>
<name>
<surname>Santoro</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Needleman</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Ho-Shing</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Choi</surname>
<given-names>JJ</given-names>
</name>
<name>
<surname>Zirlinger</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S-K</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>JS</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2015</year>
<article-title>Quantitative and functional interrogation of parent-of-origin allelic expression biases in the brain</article-title>
.
<source>Elife</source>
<volume>4</volume>
:
<fpage>e07860.</fpage>
<pub-id pub-id-type="pmid">26140685</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B78">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pickrell</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Gilad</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Pritchard</surname>
<given-names>JK.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Comment on "Widespread RNA and DNA sequence differences in the human transcriptome."</article-title>
<source>Science</source>
<volume>335</volume>
:
<fpage>1302d</fpage>
.
<pub-id pub-id-type="pmid">22422963</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B79">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Quail</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Coupland</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Otto</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Harris</surname>
<given-names>SR</given-names>
</name>
<name>
<surname>Connor</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Bertoni</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Swerdlow</surname>
<given-names>HP</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers</article-title>
.
<source>BMC Genomics</source>
<volume>13</volume>
:
<fpage>341.</fpage>
<pub-id pub-id-type="pmid">22827831</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B110">
<mixed-citation publication-type="book">
<collab>R Development CoreTeam</collab>
.
<source>R: A Language and Environment for Statistical Computing</source>
.
<publisher-name>R Foundation for Statistical Computing</publisher-name>
,
<publisher-loc>Vienna, Austria</publisher-loc>
Available from:
<ext-link ext-link-type="uri" xlink:href="https://www.R-project.org">https://www.R-project.org</ext-link>
</mixed-citation>
</ref>
<ref id="msw139-B81">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramaswami</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Piskol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>JB.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Accurate identification of human Alu and non-Alu RNA editing sites</article-title>
.
<source>Nat Methods</source>
.
<volume>9</volume>
:
<fpage>579</fpage>
<lpage>581</lpage>
.
<pub-id pub-id-type="pmid">22484847</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B82">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ramaswami</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Piskol</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Keegan</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>P</given-names>
</name>
<name>
<surname>O’Connell</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>JB.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Identifying RNA editing sites using RNA sequencing data alone</article-title>
.
<source>Nat Methods</source>
.
<volume>10</volume>
:
<fpage>128</fpage>
<lpage>132</lpage>
.
<pub-id pub-id-type="pmid">23291724</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B83">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Raser</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>O’Shea</surname>
<given-names>EK.</given-names>
</name>
</person-group>
<year>2005</year>
<article-title>Noise in gene expression: origins, consequences, and control</article-title>
.
<source>Science</source>
<volume>309</volume>
:
<fpage>2010</fpage>
<lpage>2013</lpage>
.
<pub-id pub-id-type="pmid">16179466</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B84">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rieder</surname>
<given-names>LE</given-names>
</name>
<name>
<surname>Savva</surname>
<given-names>YA</given-names>
</name>
<name>
<surname>Reyna</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Chang</surname>
<given-names>Y-J</given-names>
</name>
<name>
<surname>Dorsky</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Rezaei</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reenan</surname>
<given-names>RA.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>Dynamic response of RNA editing to temperature in Drosophila</article-title>
.
<source>BMC Biol</source>
.
<volume>13</volume>
:
<fpage>1.</fpage>
<pub-id pub-id-type="pmid">25555396</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B85">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rienzo</surname>
<given-names>AD</given-names>
</name>
<name>
<surname>Donnelly</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Toomajian</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Sisk</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Hill</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Petzl-Erler</surname>
<given-names>ML</given-names>
</name>
<name>
<surname>Haines</surname>
<given-names>GK</given-names>
</name>
<name>
<surname>Barch</surname>
<given-names>DH.</given-names>
</name>
</person-group>
<year>1998</year>
<article-title>Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories</article-title>
.
<source>Genetics</source>
<volume>148</volume>
:
<fpage>1269</fpage>
<lpage>1284</lpage>
.
<pub-id pub-id-type="pmid">9539441</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B86">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ross</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Russ</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Costello</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hollinger</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lennon</surname>
<given-names>NJ</given-names>
</name>
<name>
<surname>Hegarty</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Jaffe</surname>
<given-names>DB.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Characterizing and measuring bias in sequence data</article-title>
.
<source>Genome Biol</source>
.
<volume>14</volume>
:
<fpage>R51.</fpage>
<pub-id pub-id-type="pmid">23718773</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B87">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sainudiin</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Durrett</surname>
<given-names>RT</given-names>
</name>
<name>
<surname>Aquadro</surname>
<given-names>CF</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>R.</given-names>
</name>
</person-group>
<year>2004</year>
<article-title>Microsatellite mutation models</article-title>
.
<source>Genetics</source>
<volume>168</volume>
:
<fpage>383</fpage>
<lpage>395</lpage>
.
<pub-id pub-id-type="pmid">15454551</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B113">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saliba</surname>
<given-names>A-E</given-names>
</name>
<name>
<surname>Westermann</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>Gorski</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Vogel</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Single-cell RNA-seq: advances and future challenges</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>42</volume>
:
<fpage>8845</fpage>
<lpage>8860</lpage>
.
<pub-id pub-id-type="pmid">25053837</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B88">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Samuels</surname>
<given-names>DC</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Quanghu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>TA</given-names>
</name>
<name>
<surname>Shyr</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Finding the lost treasures in exome sequencing data</article-title>
.
<source>Trends Genet</source>
.
<volume>29</volume>
:
<fpage>593</fpage>
<lpage>599</lpage>
.
<pub-id pub-id-type="pmid">23972387</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B89">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schaub</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Keller</surname>
<given-names>W.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>RNA editing by adenosine deaminases generates RNA and protein diversity</article-title>
.
<source>Biochimie</source>
<volume>84</volume>
:
<fpage>791</fpage>
<lpage>803</lpage>
.
<pub-id pub-id-type="pmid">12457566</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B114">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schrider</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Gout</surname>
<given-names>J-F</given-names>
</name>
<name>
<surname>Hahn</surname>
<given-names>MW.</given-names>
</name>
</person-group>
<year>2011</year>
<article-title>Very Few RNA and DNA Sequence Differences in the Human Transcriptome</article-title>
.
<source>PLoS ONE</source>
<volume>6</volume>
:
<fpage>e25842</fpage>
.
<pub-id pub-id-type="pmid">22022455</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B90">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Strathern</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Malagon</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Irvin</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Gotte</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Shafer</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Kireeva</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lubkowska</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Kashlev</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>The fidelity of transcription RPB1 (RPO21) mutations that increase transcriptional slippage in S. cerevisiae</article-title>
.
<source>J Biol Chem</source>
.
<volume>288</volume>
:
<fpage>2689</fpage>
<lpage>2699</lpage>
.
<pub-id pub-id-type="pmid">23223234</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B91">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Strathern</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Court</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Kashlev</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2012</year>
<article-title>Isolation and characterization of transcription fidelity mutants</article-title>
.
<source>Biochim Biophys Acta</source>
.
<volume>1819</volume>
:
<fpage>694</fpage>
<lpage>699</lpage>
.
<pub-id pub-id-type="pmid">22366339</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B92">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Subramanian</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Mishra</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions</article-title>
.
<source>Genome Biol</source>
.
<volume>4</volume>
:
<fpage>R13.</fpage>
<pub-id pub-id-type="pmid">12620123</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B93">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname>
<given-names>JX</given-names>
</name>
<name>
<surname>Helgason</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Masson</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Ebenesersdóttir</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Mallick</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Gnerre</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Patterson</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Kong</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reich</surname>
<given-names>D</given-names>
</name>
</person-group>
,
<etal></etal>
<year>2012</year>
<article-title>A direct characterization of human mutation based on microsatellites</article-title>
.
<source>Nat Genet</source>
.
<volume>44</volume>
:
<fpage>1161</fpage>
<lpage>1165</lpage>
.
<pub-id pub-id-type="pmid">22922873</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B109">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sunnucks</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>2000</year>
<article-title>Efficient genetic markers for population biology</article-title>
.
<source>Trends Ecol. Evol.</source>
<volume>15</volume>
:
<fpage>199</fpage>
<lpage>203</lpage>
.
<pub-id pub-id-type="pmid">10782134</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B94">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Hendrickson</surname>
<given-names>DG</given-names>
</name>
<name>
<surname>Sauvageau</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Goff</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Rinn</surname>
<given-names>JL</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Differential analysis of gene regulation at transcript resolution with RNA-seq</article-title>
.
<source>Nat Biotechnol</source>
.
<volume>31</volume>
:
<fpage>46</fpage>
<lpage>53</lpage>
.
<pub-id pub-id-type="pmid">23222703</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B95">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>SL.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>TopHat: discovering splice junctions with RNA-Seq</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
:
<fpage>1105</fpage>
<lpage>1111</lpage>
.
<pub-id pub-id-type="pmid">19289445</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B111">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Traverse</surname>
<given-names>CC</given-names>
</name>
<name>
<surname>Ochman</surname>
<given-names>H.</given-names>
</name>
</person-group>
<year>2016</year>
<article-title>Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles</article-title>
.
<source>Proc. Natl. Acad. Sci</source>
.:201525329.</mixed-citation>
</ref>
<ref id="msw139-B96">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Vigouroux</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Jaqueth</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Matsuoka</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>OS</given-names>
</name>
<name>
<surname>Beavis</surname>
<given-names>WD</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>JSC</given-names>
</name>
<name>
<surname>Doebley</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2002</year>
<article-title>Rate and pattern of mutation at microsatellite loci in maize</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>19</volume>
:
<fpage>1251</fpage>
<lpage>1260</lpage>
.
<pub-id pub-id-type="pmid">12140237</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B112">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Valdes</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Slatkin</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Freimer</surname>
<given-names>NB.</given-names>
</name>
</person-group>
<year>1993</year>
<article-title>Allele Frequencies at Microsatellite Loci: The Stepwise Mutation Model Revisited</article-title>
.
<source>Genetics</source>
<volume>133</volume>
:
<fpage>737</fpage>
<lpage>749</lpage>
.
<pub-id pub-id-type="pmid">8454213</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B98">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>RNA-Seq: a revolutionary tool for transcriptomics</article-title>
.
<source>Nat Rev Genet</source>
.
<volume>10</volume>
:
<fpage>57</fpage>
<lpage>63</lpage>
.
<pub-id pub-id-type="pmid">19015660</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B99">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wedekind</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Dance</surname>
<given-names>GSC</given-names>
</name>
<name>
<surname>Sowden</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>HC.</given-names>
</name>
</person-group>
<year>2003</year>
<article-title>Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business</article-title>
.
<source>Trends Genet</source>
.
<volume>19</volume>
:
<fpage>207</fpage>
<lpage>216</lpage>
.
<pub-id pub-id-type="pmid">12683974</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B100">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wilhelm</surname>
<given-names>BT</given-names>
</name>
<name>
<surname>Landry</surname>
<given-names>J-R.</given-names>
</name>
</person-group>
<year>2009</year>
<article-title>RNA-Seq—quantitative measurement of expression through massively parallel RNA-sequencing</article-title>
.
<source>Methods</source>
<volume>48</volume>
:
<fpage>249</fpage>
<lpage>257</lpage>
.
<pub-id pub-id-type="pmid">19336255</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B300">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wright</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Bentzen</surname>
<given-names>P.</given-names>
</name>
</person-group>
<year>1994</year>
<article-title>Microsatellites: genetic markers for the future</article-title>
.
<source>Rev Fish Biol Fish</source>
.
<volume>4</volume>
:
<fpage>384</fpage>
<lpage>388</lpage>
.</mixed-citation>
</ref>
<ref id="msw139-B301">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2014</year>
<article-title>Human coding RNA editing is generally nonadaptive</article-title>
.
<source>Proc Natl Acad Sci U S A</source>
.
<volume>111</volume>
:
<fpage>3769</fpage>
<lpage>3774</lpage>
.
<pub-id pub-id-type="pmid">24567376</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B302">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
</person-group>
<year>2015</year>
<article-title>In search of beneficial coding RNA editing</article-title>
.
<source>Mol Biol Evol</source>
.
<volume>32</volume>
:
<fpage>536</fpage>
<lpage>541</lpage>
.
<pub-id pub-id-type="pmid">25392343</pub-id>
</mixed-citation>
</ref>
<ref id="msw139-B303">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>YN</given-names>
</name>
<name>
<surname>Lubkowska</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Hui</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Court</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Court</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Strathern</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Kashlev</surname>
<given-names>M.</given-names>
</name>
</person-group>
<year>2013</year>
<article-title>Isolation and characterization of RNA Polymerase rpoB mutations that alter transcription slippage during elongation in escherichia coli</article-title>
.
<source>J Biol Chem</source>
.
<volume>288</volume>
:
<fpage>2700</fpage>
<lpage>2710</lpage>
.
<pub-id pub-id-type="pmid">23223236</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000567 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000567 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:5026258
   |texte=   Reverse Transcription Errors and RNA–DNA Differences at Short Tandem Repeats
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:27413049" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024