Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000C71 ( Pmc/Corpus ); précédent : 000C709; suivant : 000C720 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Efficient Mining of Variants From Trios for Ventricular Septal Defect Association Study</title>
<author>
<name sortKey="Jiang, Peng" sort="Jiang, Peng" uniqKey="Jiang P" first="Peng" last="Jiang">Peng Jiang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hu, Yaofei" sort="Hu, Yaofei" uniqKey="Hu Y" first="Yaofei" last="Hu">Yaofei Hu</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Yiqi" sort="Wang, Yiqi" uniqKey="Wang Y" first="Yiqi" last="Wang">Yiqi Wang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Jin" sort="Zhang, Jin" uniqKey="Zhang J" first="Jin" last="Zhang">Jin Zhang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Qinghong" sort="Zhu, Qinghong" uniqKey="Zhu Q" first="Qinghong" last="Zhu">Qinghong Zhu</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bai, Lin" sort="Bai, Lin" uniqKey="Bai L" first="Lin" last="Bai">Lin Bai</name>
<affiliation>
<nlm:aff id="aff2">
<institution>School of Computing and Electronic Information, Guangxi University</institution>
,
<addr-line>Nanning</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tong, Qiang" sort="Tong, Qiang" uniqKey="Tong Q" first="Qiang" last="Tong">Qiang Tong</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Tao" sort="Li, Tao" uniqKey="Li T" first="Tao" last="Li">Tao Li</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Liang" sort="Zhao, Liang" uniqKey="Zhao L" first="Liang" last="Zhao">Liang Zhao</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<institution>School of Computing and Electronic Information, Guangxi University</institution>
,
<addr-line>Nanning</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">31440271</idno>
<idno type="pmc">6694746</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6694746</idno>
<idno type="RBID">PMC:6694746</idno>
<idno type="doi">10.3389/fgene.2019.00670</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000C71</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000C71</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Efficient Mining of Variants From Trios for Ventricular Septal Defect Association Study</title>
<author>
<name sortKey="Jiang, Peng" sort="Jiang, Peng" uniqKey="Jiang P" first="Peng" last="Jiang">Peng Jiang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hu, Yaofei" sort="Hu, Yaofei" uniqKey="Hu Y" first="Yaofei" last="Hu">Yaofei Hu</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wang, Yiqi" sort="Wang, Yiqi" uniqKey="Wang Y" first="Yiqi" last="Wang">Yiqi Wang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Jin" sort="Zhang, Jin" uniqKey="Zhang J" first="Jin" last="Zhang">Jin Zhang</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Qinghong" sort="Zhu, Qinghong" uniqKey="Zhu Q" first="Qinghong" last="Zhu">Qinghong Zhu</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bai, Lin" sort="Bai, Lin" uniqKey="Bai L" first="Lin" last="Bai">Lin Bai</name>
<affiliation>
<nlm:aff id="aff2">
<institution>School of Computing and Electronic Information, Guangxi University</institution>
,
<addr-line>Nanning</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tong, Qiang" sort="Tong, Qiang" uniqKey="Tong Q" first="Qiang" last="Tong">Qiang Tong</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Tao" sort="Li, Tao" uniqKey="Li T" first="Tao" last="Li">Tao Li</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhao, Liang" sort="Zhao, Liang" uniqKey="Zhao L" first="Liang" last="Zhao">Liang Zhao</name>
<affiliation>
<nlm:aff id="aff1">
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2">
<institution>School of Computing and Electronic Information, Guangxi University</institution>
,
<addr-line>Nanning</addr-line>
,
<country>China</country>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Frontiers in Genetics</title>
<idno type="eISSN">1664-8021</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Ventricular septal defect (VSD) is a fatal congenital heart disease showing severe consequence in affected infants. Early diagnosis plays an important role, particularly through genetic variants. Existing panel-based approaches of variants mining suffer from shortage of large panels, costly sequencing, and missing rare variants. Although a trio-based method alleviates these limitations to some extent, it is agnostic to novel mutations and computational intensive. Considering these limitations, we are studying a novel variants mining algorithm from trio-based sequencing data and apply it on a VSD trio to identify associated mutations. Our approach starts with irrelevant
<italic>k</italic>
-mer filtering from sequences of a trio
<italic>via</italic>
a newly conceived coupled Bloom Filter, then corrects sequencing errors by using a statistical approach and extends kept
<italic>k</italic>
-mers into long sequences. These extended sequences are used as input for variants needed. Later, the obtained variants are comprehensively analyzed against existing databases to mine VSD-related mutations. Experiments show that our trio-based algorithm narrows down candidate coding genes and lncRNAs by about 10- and 5-folds comparing with single sequence-based approaches, respectively. Meanwhile, our algorithm is 10 times faster and 2 magnitudes memory-frugal compared with existing state-of-the-art approach. By applying our approach to a VSD trio, we fish out an unreported gene—CD80, a combination of two genes—MYBPC3 and TRDN and a lncRNA—NONHSAT096266.2, which are highly likely to be VSD-related.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Adzhubei, I A" uniqKey="Adzhubei I">I. A. Adzhubei</name>
</author>
<author>
<name sortKey="Schmidt, S" uniqKey="Schmidt S">S. Schmidt</name>
</author>
<author>
<name sortKey="Peshkin, L" uniqKey="Peshkin L">L. Peshkin</name>
</author>
<author>
<name sortKey="Ramensky, V E" uniqKey="Ramensky V">V. E. Ramensky</name>
</author>
<author>
<name sortKey="Gerasimova, A" uniqKey="Gerasimova A">A. Gerasimova</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P. Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Andrey, G" uniqKey="Andrey G">G. Andrey</name>
</author>
<author>
<name sortKey="Mundlos, S" uniqKey="Mundlos S">S. Mundlos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Becker, K G" uniqKey="Becker K">K. G. Becker</name>
</author>
<author>
<name sortKey="Barnes, K C" uniqKey="Barnes K">K. C. Barnes</name>
</author>
<author>
<name sortKey="Bright, T J" uniqKey="Bright T">T. J. Bright</name>
</author>
<author>
<name sortKey="Wang, S A" uniqKey="Wang S">S. A. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, B H" uniqKey="Bloom B">B. H. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, Z" uniqKey="Chang Z">Z. Chang</name>
</author>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q. Zhang</name>
</author>
<author>
<name sortKey="Feng, Q" uniqKey="Feng Q">Q. Feng</name>
</author>
<author>
<name sortKey="Xu, J" uniqKey="Xu J">J. Xu</name>
</author>
<author>
<name sortKey="Teng, T" uniqKey="Teng T">T. Teng</name>
</author>
<author>
<name sortKey="Luan, Q" uniqKey="Luan Q">Q. Luan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Depristo, M" uniqKey="Depristo M">M. DePristo</name>
</author>
<author>
<name sortKey="Banks, E" uniqKey="Banks E">E. Banks</name>
</author>
<author>
<name sortKey="Poplin, R" uniqKey="Poplin R">R. Poplin</name>
</author>
<author>
<name sortKey="Garimella, K" uniqKey="Garimella K">K. Garimella</name>
</author>
<author>
<name sortKey="Maguire, J" uniqKey="Maguire J">J. Maguire</name>
</author>
<author>
<name sortKey="Hartl, C" uniqKey="Hartl C">C. Hartl</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hamosh, A" uniqKey="Hamosh A">A. Hamosh</name>
</author>
<author>
<name sortKey="Scott, A F" uniqKey="Scott A">A. F. Scott</name>
</author>
<author>
<name sortKey="Amberger, J S" uniqKey="Amberger J">J. S. Amberger</name>
</author>
<author>
<name sortKey="Bocchini, C A" uniqKey="Bocchini C">C. A. Bocchini</name>
</author>
<author>
<name sortKey="Mckusick, V A" uniqKey="Mckusick V">V. A. McKusick</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, D W" uniqKey="Huang D">D. W. Huang</name>
</author>
<author>
<name sortKey="Sherman, B T" uniqKey="Sherman B">B. T. Sherman</name>
</author>
<author>
<name sortKey="Lempicki, R A" uniqKey="Lempicki R">R. A. Lempicki</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiang, P" uniqKey="Jiang P">P. Jiang</name>
</author>
<author>
<name sortKey="Luo, J" uniqKey="Luo J">J. Luo</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y. Wang</name>
</author>
<author>
<name sortKey="Deng, P" uniqKey="Deng P">P. Deng</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B. Schmidt</name>
</author>
<author>
<name sortKey="Tang, X" uniqKey="Tang X">X. Tang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jin, S C" uniqKey="Jin S">S. C. Jin</name>
</author>
<author>
<name sortKey="Homsy, J" uniqKey="Homsy J">J. Homsy</name>
</author>
<author>
<name sortKey="Zaidi, S" uniqKey="Zaidi S">S. Zaidi</name>
</author>
<author>
<name sortKey="Lu, Q" uniqKey="Lu Q">Q. Lu</name>
</author>
<author>
<name sortKey="Morton, S" uniqKey="Morton S">S. Morton</name>
</author>
<author>
<name sortKey="Depalma, S R" uniqKey="Depalma S">S. R. DePalma</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jin, Z B" uniqKey="Jin Z">Z.-B. Jin</name>
</author>
<author>
<name sortKey="Wu, J" uniqKey="Wu J">J. Wu</name>
</author>
<author>
<name sortKey="Huang, X F" uniqKey="Huang X">X.-F. Huang</name>
</author>
<author>
<name sortKey="Feng, C Y" uniqKey="Feng C">C.-Y. Feng</name>
</author>
<author>
<name sortKey="Cai, X B" uniqKey="Cai X">X.-B. Cai</name>
</author>
<author>
<name sortKey="Mao, J Y" uniqKey="Mao J">J.-Y. Mao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kallikourdis, M" uniqKey="Kallikourdis M">M. Kallikourdis</name>
</author>
<author>
<name sortKey="Martini, E" uniqKey="Martini E">E. Martini</name>
</author>
<author>
<name sortKey="Carullo, P" uniqKey="Carullo P">P. Carullo</name>
</author>
<author>
<name sortKey="Sardi, C" uniqKey="Sardi C">C. Sardi</name>
</author>
<author>
<name sortKey="Roselli, G" uniqKey="Roselli G">G. Roselli</name>
</author>
<author>
<name sortKey="Greco, C M" uniqKey="Greco C">C. M. Greco</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M. Kanehisa</name>
</author>
<author>
<name sortKey="Sato, Y" uniqKey="Sato Y">Y. Sato</name>
</author>
<author>
<name sortKey="Furumichi, M" uniqKey="Furumichi M">M. Furumichi</name>
</author>
<author>
<name sortKey="Morishima, K" uniqKey="Morishima K">K. Morishima</name>
</author>
<author>
<name sortKey="Tanabe, M" uniqKey="Tanabe M">M. Tanabe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lek, M" uniqKey="Lek M">M. Lek</name>
</author>
<author>
<name sortKey="Karczewski, K J" uniqKey="Karczewski K">K. J. Karczewski</name>
</author>
<author>
<name sortKey="Minikel, E V" uniqKey="Minikel E">E. V. Minikel</name>
</author>
<author>
<name sortKey="Samocha, K E" uniqKey="Samocha K">K, E. Samocha</name>
</author>
<author>
<name sortKey="Banks, E" uniqKey="Banks E">E. Banks</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T. Fennell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B. Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A. Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T. Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J. Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N. Homer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y. Liu</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J. Wang</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J. Li</name>
</author>
<author>
<name sortKey="Wang, R" uniqKey="Wang R">R. Wang</name>
</author>
<author>
<name sortKey="Tharakan, B" uniqKey="Tharakan B">B. Tharakan</name>
</author>
<author>
<name sortKey="Zhang, S L" uniqKey="Zhang S">S. L. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macarthur, J" uniqKey="Macarthur J">J. MacArthur</name>
</author>
<author>
<name sortKey="Bowler, E" uniqKey="Bowler E">E. Bowler</name>
</author>
<author>
<name sortKey="Cerezo, M" uniqKey="Cerezo M">M. Cerezo</name>
</author>
<author>
<name sortKey="Gil, L" uniqKey="Gil L">L. Gil</name>
</author>
<author>
<name sortKey="Hall, P" uniqKey="Hall P">P. Hall</name>
</author>
<author>
<name sortKey="Hastings, E" uniqKey="Hastings E">E. Hastings</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marchese, F P" uniqKey="Marchese F">F. P. Marchese</name>
</author>
<author>
<name sortKey="Raimondi, I" uniqKey="Raimondi I">I. Raimondi</name>
</author>
<author>
<name sortKey="Huarte, M" uniqKey="Huarte M">M. Huarte</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marouli, E" uniqKey="Marouli E">E. Marouli</name>
</author>
<author>
<name sortKey="Graff, M" uniqKey="Graff M">M. Graff</name>
</author>
<author>
<name sortKey="Medina Gomez, C" uniqKey="Medina Gomez C">C. Medina-Gomez</name>
</author>
<author>
<name sortKey="Lo, K S" uniqKey="Lo K">K. S. Lo</name>
</author>
<author>
<name sortKey="Wood, A R" uniqKey="Wood A">A. R. Wood</name>
</author>
<author>
<name sortKey="Kjaer, T R" uniqKey="Kjaer T">T. R. Kjaer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A. Mortazavi</name>
</author>
<author>
<name sortKey="Williams, B A" uniqKey="Williams B">B. A. Williams</name>
</author>
<author>
<name sortKey="Mccue, K" uniqKey="Mccue K">K. McCue</name>
</author>
<author>
<name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L. Schaeffer</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B. Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peach, R J" uniqKey="Peach R">R. J. Peach</name>
</author>
<author>
<name sortKey="Bajorath, J" uniqKey="Bajorath J">J. Bajorath</name>
</author>
<author>
<name sortKey="Naemura, J" uniqKey="Naemura J">J. Naemura</name>
</author>
<author>
<name sortKey="Leytze, G" uniqKey="Leytze G">G. Leytze</name>
</author>
<author>
<name sortKey="Greene, J" uniqKey="Greene J">J. Greene</name>
</author>
<author>
<name sortKey="Aruffo, A" uniqKey="Aruffo A">A. Aruffo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Razin, S V" uniqKey="Razin S">S. V. Razin</name>
</author>
<author>
<name sortKey="Gavrilov, A A" uniqKey="Gavrilov A">A. A. Gavrilov</name>
</author>
<author>
<name sortKey="Loudinkova, E S" uniqKey="Loudinkova E">E. S. Loudinkova</name>
</author>
<author>
<name sortKey="Larovaia, O V" uniqKey="Larovaia O">O. V. Larovaia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Serpytis, P" uniqKey="Serpytis P">P. Serpytis</name>
</author>
<author>
<name sortKey="Karvelyte, N" uniqKey="Karvelyte N">N. Karvelyte</name>
</author>
<author>
<name sortKey="Serpytis, R" uniqKey="Serpytis R">R. Serpytis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sim, N L" uniqKey="Sim N">N.-L. Sim</name>
</author>
<author>
<name sortKey="Kumar, P" uniqKey="Kumar P">P. Kumar</name>
</author>
<author>
<name sortKey="Hu, J" uniqKey="Hu J">J. Hu</name>
</author>
<author>
<name sortKey="Henikoff, S" uniqKey="Henikoff S">S. Henikoff</name>
</author>
<author>
<name sortKey="Schneider, G" uniqKey="Schneider G">G. Schneider</name>
</author>
<author>
<name sortKey="Ng, P C" uniqKey="Ng P">P. C. Ng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spicer, D E" uniqKey="Spicer D">D. E. Spicer</name>
</author>
<author>
<name sortKey="Hsu, H H" uniqKey="Hsu H">H. H. Hsu</name>
</author>
<author>
<name sortKey="Co Vu, J" uniqKey="Co Vu J">J. Co-Vu</name>
</author>
<author>
<name sortKey="Anderson, R H" uniqKey="Anderson R">R. H. Anderson</name>
</author>
<author>
<name sortKey="Fricker, F J" uniqKey="Fricker F">F. J. Fricker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stenson, P D" uniqKey="Stenson P">P. D. Stenson</name>
</author>
<author>
<name sortKey="Mort, M" uniqKey="Mort M">M. Mort</name>
</author>
<author>
<name sortKey="Ball, E V" uniqKey="Ball E">E. V. Ball</name>
</author>
<author>
<name sortKey="Evans, K" uniqKey="Evans K">K. Evans</name>
</author>
<author>
<name sortKey="Hayden, M" uniqKey="Hayden M">M. Hayden</name>
</author>
<author>
<name sortKey="Heywood, S" uniqKey="Heywood S">S. Heywood</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Szklarczyk, D" uniqKey="Szklarczyk D">D. Szklarczyk</name>
</author>
<author>
<name sortKey="Morris, J H" uniqKey="Morris J">J. H. Morris</name>
</author>
<author>
<name sortKey="Cook, H" uniqKey="Cook H">H. Cook</name>
</author>
<author>
<name sortKey="Kuhn, M" uniqKey="Kuhn M">M. Kuhn</name>
</author>
<author>
<name sortKey="Wyder, S" uniqKey="Wyder S">S. Wyder</name>
</author>
<author>
<name sortKey="Simonovic, M" uniqKey="Simonovic M">M. Simonovic</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Auton, A" uniqKey="Auton A">A. Auton</name>
</author>
<author>
<name sortKey="Abecasis, G R" uniqKey="Abecasis G">G. R. Abecasis</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Volders, P J" uniqKey="Volders P">P.-J. Volders</name>
</author>
<author>
<name sortKey="Anckaert, J" uniqKey="Anckaert J">J. Anckaert</name>
</author>
<author>
<name sortKey="Verheggen, K" uniqKey="Verheggen K">K. Verheggen</name>
</author>
<author>
<name sortKey="Nuytens, J" uniqKey="Nuytens J">J. Nuytens</name>
</author>
<author>
<name sortKey="Martens, L" uniqKey="Martens L">L. Martens</name>
</author>
<author>
<name sortKey="Mestdagh, P" uniqKey="Mestdagh P">P. Mestdagh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K. Wang</name>
</author>
<author>
<name sortKey="Li, M" uniqKey="Li M">M. Li</name>
</author>
<author>
<name sortKey="Hakonarson, H" uniqKey="Hakonarson H">H. Hakonarson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wei, Q" uniqKey="Wei Q">Q. Wei</name>
</author>
<author>
<name sortKey="Zhan, X" uniqKey="Zhan X">X. Zhan</name>
</author>
<author>
<name sortKey="Zhong, X" uniqKey="Zhong X">X. Zhong</name>
</author>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y. Liu</name>
</author>
<author>
<name sortKey="Han, Y" uniqKey="Han Y">Y. Han</name>
</author>
<author>
<name sortKey="Chen, W" uniqKey="Chen W">W. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wood, A R" uniqKey="Wood A">A. R. Wood</name>
</author>
<author>
<name sortKey="Esko, T" uniqKey="Esko T">T. Esko</name>
</author>
<author>
<name sortKey="Yang, J" uniqKey="Yang J">J. Yang</name>
</author>
<author>
<name sortKey="Vedantam, S" uniqKey="Vedantam S">S. Vedantam</name>
</author>
<author>
<name sortKey="Pers, T H" uniqKey="Pers T">T, H. Pers</name>
</author>
<author>
<name sortKey="Gustafsson, S" uniqKey="Gustafsson S">S. Gustafsson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, L" uniqKey="Zhao L">L. Zhao</name>
</author>
<author>
<name sortKey="Chen, Q" uniqKey="Chen Q">Q. Chen</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W. Li</name>
</author>
<author>
<name sortKey="Jiang, P" uniqKey="Jiang P">P. Jiang</name>
</author>
<author>
<name sortKey="Wong, L" uniqKey="Wong L">L. Wong</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, L" uniqKey="Zhao L">L. Zhao</name>
</author>
<author>
<name sortKey="Xie, J" uniqKey="Xie J">J. Xie</name>
</author>
<author>
<name sortKey="Bai, L" uniqKey="Bai L">L. Bai</name>
</author>
<author>
<name sortKey="Chen, W" uniqKey="Chen W">W. Chen</name>
</author>
<author>
<name sortKey="Wang, M" uniqKey="Wang M">M. Wang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z. Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y. Zhao</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
<author>
<name sortKey="Fang, S" uniqKey="Fang S">S. Fang</name>
</author>
<author>
<name sortKey="Kang, Y" uniqKey="Kang Y">Y. Kang</name>
</author>
<author>
<name sortKey="Wu, W" uniqKey="Wu W">W. Wu</name>
</author>
<author>
<name sortKey="Hao, Y" uniqKey="Hao Y">Y. Hao</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Front Genet</journal-id>
<journal-id journal-id-type="iso-abbrev">Front Genet</journal-id>
<journal-id journal-id-type="publisher-id">Front. Genet.</journal-id>
<journal-title-group>
<journal-title>Frontiers in Genetics</journal-title>
</journal-title-group>
<issn pub-type="epub">1664-8021</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">31440271</article-id>
<article-id pub-id-type="pmc">6694746</article-id>
<article-id pub-id-type="doi">10.3389/fgene.2019.00670</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Genetics</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Efficient Mining of Variants From Trios for Ventricular Septal Defect Association Study</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Jiang</surname>
<given-names>Peng</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn003">
<sup></sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/676692"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hu</surname>
<given-names>Yaofei</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn003">
<sup></sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775866/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wang</surname>
<given-names>Yiqi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775822/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Jin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhu</surname>
<given-names>Qinghong</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775833/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bai</surname>
<given-names>Lin</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775820/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tong</surname>
<given-names>Qiang</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775831/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Tao</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/775827/overview"></uri>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zhao</surname>
<given-names>Liang</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:type="simple" xlink:href="https://loop.frontiersin.org/people/661735"></uri>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine</institution>
,
<addr-line>Shiyan</addr-line>
,
<country>China</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>School of Computing and Electronic Information, Guangxi University</institution>
,
<addr-line>Nanning</addr-line>
,
<country>China</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Tao Zeng, Shanghai Institutes for Biological Sciences (CAS), China</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Wenting Liu, Genome Institute of Singapore, Singapore; Zhenhua Li, National University of Singapore, Singapore</p>
</fn>
<corresp id="fn001">*Correspondence: Qiang Tong,
<email xlink:href="mailto:tttrrrxxx@163.com" xlink:type="simple">tttrrrxxx@163.com</email>
; Tao Li,
<email xlink:href="mailto:317371983@qq.com" xlink:type="simple">317371983@qq.com</email>
; Liang Zhao,
<email xlink:href="mailto:s080011@e.ntu.edu.sg" xlink:type="simple">s080011@e.ntu.edu.sg</email>
</corresp>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics</p>
</fn>
<fn fn-type="other" id="fn003">
<p>†These authors share first authorship</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>8</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>670</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>12</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>27</day>
<month>6</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2019 Jiang, Hu, Wang, Zhang, Zhu, Bai, Tong, Li and Zhao</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Jiang, Hu, Wang, Zhang, Zhu, Bai, Tong, Li and Zhao</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</license-p>
</license>
</permissions>
<abstract>
<p>Ventricular septal defect (VSD) is a fatal congenital heart disease showing severe consequence in affected infants. Early diagnosis plays an important role, particularly through genetic variants. Existing panel-based approaches of variants mining suffer from shortage of large panels, costly sequencing, and missing rare variants. Although a trio-based method alleviates these limitations to some extent, it is agnostic to novel mutations and computational intensive. Considering these limitations, we are studying a novel variants mining algorithm from trio-based sequencing data and apply it on a VSD trio to identify associated mutations. Our approach starts with irrelevant
<italic>k</italic>
-mer filtering from sequences of a trio
<italic>via</italic>
a newly conceived coupled Bloom Filter, then corrects sequencing errors by using a statistical approach and extends kept
<italic>k</italic>
-mers into long sequences. These extended sequences are used as input for variants needed. Later, the obtained variants are comprehensively analyzed against existing databases to mine VSD-related mutations. Experiments show that our trio-based algorithm narrows down candidate coding genes and lncRNAs by about 10- and 5-folds comparing with single sequence-based approaches, respectively. Meanwhile, our algorithm is 10 times faster and 2 magnitudes memory-frugal compared with existing state-of-the-art approach. By applying our approach to a VSD trio, we fish out an unreported gene—CD80, a combination of two genes—MYBPC3 and TRDN and a lncRNA—NONHSAT096266.2, which are highly likely to be VSD-related.</p>
</abstract>
<kwd-group>
<kwd>trio-sequencing</kwd>
<kwd>k-mer filtering</kwd>
<kwd>variant calling</kwd>
<kwd>ventricular septal defect</kwd>
<kwd>association study</kwd>
<kwd>long non-coding RNA</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source id="cn001">National Natural Science Foundation of China
<named-content content-type="fundref-id">10.13039/501100001809</named-content>
</funding-source>
</award-group>
</funding-group>
<counts>
<fig-count count="3"></fig-count>
<table-count count="6"></table-count>
<equation-count count="1"></equation-count>
<ref-count count="39"></ref-count>
<page-count count="11"></page-count>
<word-count count="5575"></word-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>Introduction</title>
<p>Ventricular septal defect (VSD) is a major kind of congenital heart disease (CHD), constituting about 20% of all CHD cases (
<xref rid="B28" ref-type="bibr">Spicer et al., 2014</xref>
). By taking conservative treatment, mortality is around 90% to 95%, whereas
<italic>via</italic>
surgery, this rate reduces to 19% to 60% (
<xref rid="B26" ref-type="bibr">Serpytis et al., 2015</xref>
). Very often, diagnosis of a VSD patient is at its late stage due to the obvious communication obstacles in infants; this poses a need for early diagnosis, particularly through genetic variants.</p>
<p>Mining genetic variants and associating them with diseases is a hot topic, by which thousands of disease-associated variants have been identified (
<xref rid="B10" ref-type="bibr">International HapMap Consortium et al., 2010</xref>
;
<xref rid="B31" ref-type="bibr">The 1000 Genomes Project Consortium et al., 2015</xref>
). Obtaining these findings usually starts with a panel containing hundreds to thousands of patients diagnosed as having the same specific disease; later their genetic materials are extracted and sequenced. This is followed by disease-associated variants mining through a series of analytic procedures. Using this protocol, 89,251 single-nucleotide polymorphism (SNP) trait associations have been successfully pinpointed according to genome-wide association study (GWAS) catalog (
<xref rid="B20" ref-type="bibr">MacArthur et al., 2017</xref>
), including more than 400 CHD-related genes (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
). Although an association study is fruitful and promising, many issues weaken its applicability. First, panel-based association studies only identify common variants, and rare variants are overlooked due to low statistical significance. Thus, it requires large number of samples to be collected, i.e., hundreds, even thousands of cases. Second, almost all existing studies mine a one-to-one correspondence between genes and diseases rather than a many-to-many scheme, which is pretty challenging. Unfortunately, majority of diseases are caused by many mutations of genes. For instance, more than 400 genes have been discovered to be associated with CHD (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
), and more than 700 genes are involved in adult height (
<xref rid="B36" ref-type="bibr">Wood et al., 2014</xref>
), and even much more (
<xref rid="B22" ref-type="bibr">Marouli et al., 2017</xref>
). Third, it is costly to obtain the whole DNA (deoxyribonucleic acid) sequence of a sample. Although the ever increasing throughput and decreasing cost have made whole-genome sequencing possible for general research, it still costs a few hundreds to a thousand dollars for a single genome. To partially overcome the aforementioned limitations from single sequencing (SS) data, trio-based sequencing emerges.</p>
<p>Typically, a trio usually contains two parents and one child. This trio-based approach is effective for identifying disease-associated genes according to the basic rule of inheritance. It is also powerful to pinpoint
<italic>de novo</italic>
mutations without a large panel. Various studies have been conducted to identify disease-associated genes by using trio-sequencing (TS). For instance, a trio-based exome sequencing is used to identify
<italic>de novo</italic>
mutations in early-onset high myopia (
<xref rid="B13" ref-type="bibr">Jin et al., 2017b</xref>
), and ∼440 CHD-related genes have been discovered based on 2,645 trios (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
). The typical procedure of using trios to identify variants is mapping-calling-filtering, i.e., mapping all sequences of each individual from a trio to a reference genome, calling variants based on mapped sequences, and filtering out variants shared by members of the trio. Intuitively, this protocol is inefficient to identify
<italic>de novo</italic>
mutations from child sequences. Obviously, a large portion of sequences have no contribution to variant calling, which have been considered during the whole processes for all samples within the trio. To solve this problem, we propose a novel idea of calling
<italic>de novo</italic>
variants from a trio and have applied it to identified VSD-related genetic variants, including coding genes and long non-coding RNAs (lncRNAs).</p>
<p>Our approach starts from a trio with a child diagnosed as having VSD but with healthy parents. Later, unique
<italic>k</italic>
-mers (
<italic>k</italic>
-length consecutive bases from a genomic sequence) belonging to the child only are fished out through a newly proposed counted
<italic>k</italic>
-mer-encoding algorithm. This is followed by sequence error correction and
<italic>k</italic>
-mer extension before mapping to a reference genome. Finally, variants are fished out and analyzed against existing databases to mine VSD-related coding genes and lncRNAs.</p>
</sec>
<sec id="s2">
<title>Methods</title>
<p>Our approach is composed of two major parts: TS-based variant mining and VSD-related variant filtering.</p>
<sec id="s2_1">
<title>Variant Mining</title>
<p>Unlike conventional mapping-calling-filtering approach of variant identification, e.g., SAMtools (
<xref rid="B18" ref-type="bibr">Li et al., 2009</xref>
) and GATK (
<xref rid="B6" ref-type="bibr">DePristo et al., 2011</xref>
), we conceive a novel idea of
<italic>de novo</italic>
variants identification algorithm from a trio achieving good computation efficiency. Our approach contains four steps:
<italic>k</italic>
-mer filtering,
<italic>k</italic>
-mer extension, and variant identification. Details are shown below.</p>
<sec id="s2_1_1">
<title>
<italic>k</italic>
-mer Filtering</title>
<p>Let a trio be
<italic>R
<sub>f</sub>
</italic>
,
<italic>R
<sub>m</sub>
</italic>
,
<italic>R
<sub>c</sub>
</italic>
, representing the reads of the father (
<italic>R
<sub>f</sub>
</italic>
), the reads of the mother (
<italic>R
<sub>m</sub>
</italic>
), and the reads of the child (
<italic>R
<sub>c</sub>
</italic>
, suppose only one child is available); the set of
<italic>k</italic>
-mers contained in a sample is
<italic>K
<sub>f</sub>
</italic>
,
<italic>K
<sub>m</sub>
</italic>
, and
<italic>K
<sub>c</sub>
</italic>
for father, mother, and child, respectively. Herein, we mean each
<italic>k</italic>
-mer having its count (the times it appears within the sequenced data) available, i.e., a
<italic>k</italic>
-mer, say κ, is a touple (
<italic>s</italic>
<sub>κ</sub>
,
<italic>f</italic>
<sub>κ</sub>
), where
<italic>s</italic>
<sub>κ</sub>
is the
<italic>k</italic>
-length string of κ, and
<italic>f</italic>
<sub>κ</sub>
is its count. To fish out
<italic>de novo</italic>
mutations from
<italic>K
<sub>c</sub>
</italic>
, we go through all the
<italic>k</italic>
-mers of
<italic>K
<sub>c</sub>
</italic>
and check them with
<italic>K
<sub>f</sub>
</italic>
and
<italic>K
<sub>m</sub>
</italic>
. In case the count ratio of a
<italic>k</italic>
-mer between both parents and the child is less than a threshold (τ
<sub>0</sub>
), the
<italic>k</italic>
-mer is kept as a variant-containing candidate.</p>
<p>It seems trivial to filter out large amount of
<italic>k</italic>
-mers shared between
<italic>K
<sub>f</sub>
</italic>
/
<italic>K
<sub>m</sub>
</italic>
and
<italic>K
<sub>c</sub>
</italic>
. However, the number of
<italic>k</italic>
-mers obtained from a whole human genome sequencing reads is usually too large to fit into a main memory, not to mention putting them together. For instance, the 31-mers having a count larger than one of the HapMap sample NA12878 ((
<uri xlink:type="simple" xlink:href="https://www.ncbi.nlm.nih.gov/sra/ERR091571/">https://www.ncbi.nlm.nih.gov/sra/ERR091571/</uri>
)) take 90-Gb space on disk. To solve this problem, we have designed a novel coupled Bloom Filter-based algorithm achieving high memory saving ratio and good retrieval efficiency (
<xref rid="B11" ref-type="bibr">Jiang et al., 2019</xref>
). Let
<italic>f</italic>
<sub>max</sub>
be the maximum frequency in
<italic>K</italic>
, which can be represented by at most
<italic>h</italic>
bits (in binary). We take the following steps to represent
<italic>K</italic>
:</p>
<list list-type="order">
<list-item>
<p>Create a hash function vector
<italic>H</italic>
containing
<italic>h</italic>
hash functions, say 〈H
<sub>0</sub>
(·),
<italic>H</italic>
<sub>1</sub>
(·), ⋯,
<italic>H
<sub>h</sub>
</italic>
<sub>−1</sub>
(·)〉.</p>
</list-item>
<list-item>
<p>Allocate a coupled Bloom Filter
<italic>B</italic>
= (
<italic>B</italic>
<sup>+</sup>
,
<italic>B</italic>
<sup></sup>
) having
<italic>m</italic>
bits.
<italic>m</italic>
is computed as
<inline-formula>
<mml:math id="M1">
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>ln</mml:mi>
<mml:mo></mml:mo>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>ln</mml:mi>
<mml:mo></mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
</mml:math>
</inline-formula>
with the target false-positive rate
<italic>p</italic>
and number of
<italic>k</italic>
-mers
<italic>n</italic>
= |
<italic>K</italic>
|;
<italic>cf.</italic>
<xref rid="B4" ref-type="bibr">Bloom (1970)</xref>
.</p>
</list-item>
<list-item>
<p>For a
<italic>k</italic>
-mer κ in
<italic>K</italic>
, set the corresponding bits of
<italic>B</italic>
indexed by
<italic>H</italic>
as
<disp-formula>
<mml:math id="M2">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msup>
<mml:mi>B</mml:mi>
<mml:mo>+</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msup>
<mml:mi>B</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>H</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mtext>Binary</mml:mtext>
<mml:mtext> </mml:mtext>
<mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>f</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>h</mml:mi>
</mml:msup>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>]</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>where binary (
<italic>f</italic>
<sub>κ</sub>
)
<italic>
<sup>h</sup>
</italic>
is the binary representation of
<italic>f</italic>
<sub>κ</sub>
<italic>via h</italic>
bits, and binary (
<italic>f</italic>
<sub>κ</sub>
)
<italic>
<sup>h</sup>
</italic>
[
<italic>i</italic>
] returns the value of the
<italic>i</italic>
th bit.</p>
</list-item>
<list-item>
<p>Repeat Step 3 above until all
<italic>k</italic>
-mers are inserted.</p>
</list-item>
</list>
<p>Based on the above steps,
<italic>K
<sub>f</sub>
</italic>
,
<italic>K</italic>
<sub>m</sub>
, and
<italic>K
<sub>c</sub>
</italic>
can be saved into
<italic>B</italic>
<sub>f</sub>
,
<italic>B</italic>
<sub>m</sub>
, and
<italic>B</italic>
<sub>c</sub>
economically; more details are shown in
<xref rid="B11" ref-type="bibr">Jiang et al. (2019)</xref>
.</p>
<p>Based on above algorithm, we are able to store
<italic>K
<sub>f</sub>
</italic>
,
<italic>K</italic>
<sub>m</sub>
, and
<italic>K</italic>
<sub>c</sub>
within a memory simultaneously and compute the ratio of a
<italic>k</italic>
-mer between a parent and the child efficiently. Note that, the time efficiency of
<italic>k</italic>
-mer retrieval from a coupled Bloom Filter mainly comes from the hash operation, which is in O(1) time complexity.</p>
<p>Due to sequencing bias, the
<italic>k</italic>
-mers are error-prone. To mitigate the impact of errors on variants identification, we perform error correction before further analysis (
<xref rid="B38" ref-type="bibr">Zhao et al., 2018</xref>
). For a
<italic>k</italic>
-mer κ in
<italic>K
<sub>x</sub>
</italic>
, we search its neighbors
<italic>N</italic>
<sub>κ</sub>
from
<italic>B
<sub>x</sub>
</italic>
, where
<italic>x</italic>
∈{
<italic>f</italic>
,
<italic>m</italic>
,
<italic>c</italic>
}. A neighbor of κ is defined as the one having edit distance of 1 from κ. Later, a
<italic>z</italic>
-score
<italic>z</italic>
<sub>κ</sub>
is calculated from
<inline-formula>
<mml:math id="M3">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>κ</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>κ</mml:mi>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
, where
<italic>z</italic>
<sub>κ</sub>
= (
<italic>f</italic>
<sub>κ</sub>
– μ)/σ, μ is mean frequency of
<italic>k</italic>
-mers in
<inline-formula>
<mml:math id="M4">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>N</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>κ</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, and σ is their standard deviation. We consider κ is error-free when
<italic>z</italic>
<sub>κ</sub>
>
<italic>z</italic>
<sub>0</sub>
and
<italic>f</italic>
<sub>κ</sub>
>
<italic>f</italic>
<sub>0</sub>
. In this study,
<italic>z</italic>
<sub>0</sub>
= 0.8 and
<italic>f</italic>
<sub>0</sub>
= 4. More details are presented in
<xref rid="B38" ref-type="bibr">Zhao et al. (2018)</xref>
.</p>
<table-wrap id="A1" position="float">
<label>Algorithm 1</label>
<caption>
<p>
<italic>k</italic>
-mer filtering.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1">
<bold>Data</bold>
: (
<italic>K
<sub>f</sub>
</italic>
,
<italic>K
<sub>m</sub>
</italic>
,
<italic>K
<sub>c</sub>
</italic>
),
<italic>k</italic>
-mers of a trio;
<italic>H</italic>
, a hash function vector
<break></break>
<bold>Result</bold>
:
<inline-formula>
<mml:math id="M5">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, mutation-contained
<italic>k</italic>
-mers of a child
<break></break>
<bold>begin</bold>
<break></break>
<bold>for</bold>
<italic>x</italic>
in {
<italic>f</italic>
,
<italic>m</italic>
,
<italic>c</italic>
}
<bold>do</bold>
<break></break>
  
<italic>B
<sub>x</sub>
</italic>
= Encoding(
<italic>K
<sub>x</sub>
</italic>
,
<italic>H</italic>
)
<break></break>
<bold>for</bold>
κ in
<italic>K
<sub>c</sub>
</italic>
<bold>do</bold>
<break></break>
  
<italic>ν
<sub>f</sub>
</italic>
← Decoding(
<italic>B
<sub>f</sub>
</italic>
,
<italic>H</italic>
,
<italic>s</italic>
<sub>κ</sub>
)
<break></break>
  
<italic>ν
<sub>m</sub>
</italic>
← Decoding(
<italic>B
<sub>m</sub>
</italic>
,
<italic>H</italic>
,
<italic>s</italic>
<sub>κ</sub>
)
<break></break>
  
<italic>ν
<sub>c</sub>
</italic>
← Decoding(
<italic>B
<sub>c</sub>
</italic>
,
<italic>H</italic>
,
<italic>s</italic>
<sub>κ</sub>
)
<break></break>
  
<bold>if</bold>
<italic>u
<sub>f</sub>
</italic>
/
<italic>ν
<sub>c</sub>
</italic>
< τ
<sub>0</sub>
<bold>and</bold>
<italic>u
<sub>m</sub>
</italic>
/
<italic>ν
<sub>c</sub>
</italic>
< τ
<sub>0</sub>
<bold>then</bold>
<break></break>
<inline-formula>
<mml:math id="M6">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
← Correction(
<italic>B
<sub>c</sub>
</italic>
,
<italic>H</italic>
,
<inline-formula>
<mml:math id="M7">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
)
<break></break>
<bold>return</bold>
<inline-formula>
<mml:math id="M8">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<break></break>
//Details of
<italic>Encoding</italic>
,
<italic>Decoding</italic>
and
<italic>Correction</italic>
are shown in Appendix A.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_1_2">
<title>
<italic>k</italic>
-mer Extension</title>
<p>A
<italic>k</italic>
-mer is usually not long enough to uniquely map to a specific location of a reference genome. Hence, extending a
<italic>k</italic>
-mer into a long sequence is necessary before mapping. To this end, we take a candidate variation-containing
<italic>k</italic>
-mer as seed, and elongate the
<italic>k</italic>
-mer to both side. Taking right-hand extension, each time one base is attached to the right of the current string
<italic>s</italic>
, i.e.,
<italic>s</italic>
ʹ =
<italic>s </italic>
<italic>x</italic>
,
<italic>x</italic>
∈{
<italic>A</italic>
,
<italic>C</italic>
,
<italic>G</italic>
,
<italic>T}</italic>
, and the
<italic>k</italic>
length suffix of
<italic>s</italic>
ʹ, i.e.,
<inline-formula>
<mml:math id="M9">
<mml:mrow>
<mml:mtext>suffix</mml:mtext>
<mml:msub>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mi>k</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mrow>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>:</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, is checked against
<italic>B</italic>
<sub>c</sub>
. In case the suffix is absent, the extension will be altered by another base, or terminated if all alternatives have failed. The left-hand side extension is similar to the right-hand extension but with opposite direction. An extension will be terminated in case the length limitation is reached or multiple extensions are available. We set the length limitation to 1,000 in this study. Extension details are shown in Algorithm 2.</p>
<table-wrap id="A2" position="float">
<label>Algorithm 2</label>
<caption>
<p>
<italic>k</italic>
-mer extension.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1">
<bold>Data</bold>
:
<italic>B
<sub>c</sub>
</italic>
, child
<italic>k</italic>
-mers;
<italic>H</italic>
, a hash function vector;
<inline-formula>
<mml:math id="M10">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
, kept
<italic>k</italic>
-mers; maxLen: maximum length
<break></break>
<bold>Result</bold>
:
<italic>S</italic>
, set of variant-containing sequences
<break></break>
<bold>begin</bold>
<break></break>
<bold>for</bold>
κ in
<inline-formula>
<mml:math id="M11">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
<bold>do</bold>
<break></break>
  hasBranch ← 0
<break></break>
  
<italic>s</italic>
′ ←
<italic>s</italic>
<sub>κ</sub>
<break></break>
  
<bold>repeat</bold>
<break></break>
   
<italic>c</italic>
← 0,
<italic>e</italic>
← ‘‘
<break></break>
   
<bold>for</bold>
<italic>x in</italic>
{
<italic>A, C, G, T</italic>
}
<bold>do</bold>
<break></break>
    
<italic>s</italic>
″ ← suffix(
<italic>s</italic>
′,
<italic>k</italic>
– 1) ·
<italic>x</italic>
<break></break>
    val ← Decoding(
<italic>B
<sub>c</sub>
</italic>
,
<italic>H</italic>
,
<italic>s</italic>
″)
<break></break>
    
<bold>if</bold>
val >
<italic>0</italic>
<bold>then</bold>
<break></break>
     
<italic>c</italic>
<italic>c</italic>
+ 1,
<italic>e</italic>
<italic>x</italic>
<break></break>
   
<bold>if</bold>
<italic>c</italic>
> 1
<bold>then</bold>
<break></break>
    hasBranch ← 1
<break></break>
   
<bold>else</bold>
<break></break>
    
<italic>s</italic>
′ ←
<italic>s</italic>
′ ·
<italic>e</italic>
<break></break>
  
<bold>until</bold>
<italic>hasBranch</italic>
<bold>or</bold>
|
<italic>s</italic>
′| >
<italic>maxLen</italic>
<break></break>
  hasBranch ← 0
<break></break>
  
<bold>repeat</bold>
<break></break>
   
<italic>c</italic>
← 0,
<italic>e</italic>
← ‘‘
<break></break>
   
<bold>for</bold>
<italic>x in</italic>
{
<italic>A, C, G, T</italic>
}
<bold>do</bold>
<break></break>
    
<italic>s</italic>
″ ←
<italic>x</italic>
· prefix(
<italic>s</italic>
′,
<italic>k</italic>
– 1)
<break></break>
    val ← Decoding(
<italic>B
<sub>c</sub>
</italic>
,
<italic>H</italic>
,
<italic>s</italic>
″)
<break></break>
    
<bold>if</bold>
val >
<italic>0</italic>
<bold>then</bold>
<break></break>
     
<italic>c</italic>
<italic>c</italic>
+ 1,
<italic>e</italic>
<italic>x</italic>
<break></break>
   
<bold>if</bold>
<italic>c</italic>
> 1
<bold>then</bold>
<break></break>
    hasBranch ← 1
<break></break>
   
<bold>else</bold>
<break></break>
    
<italic>s</italic>
′ ←
<italic>e</italic>
·
<italic>s</italic>
<break></break>
  
<bold>until</bold>
<italic>hasBranch</italic>
<bold>or</bold>
|
<italic>s</italic>
′| > maxLen
<break></break>
  
<inline-formula>
<mml:math id="M12">
<mml:mrow>
<mml:mtext>S</mml:mtext>
<mml:mo></mml:mo>
<mml:mtext>S</mml:mtext>
<mml:mo></mml:mo>
<mml:mo>{</mml:mo>
<mml:msup>
<mml:mi>s</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mo>}</mml:mo>
</mml:mrow>
</mml:math>
</inline-formula>
<break></break>
<bold>return</bold>
<italic>S</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_1_3">
<title>Variant Identification</title>
<p>All extended
<italic>k</italic>
-mers are mapped to GRCh38/hg38 by BWA (
<xref rid="B17" ref-type="bibr">Li and Durbin, 2009</xref>
), and variants as well as their position are pinpointed by using SAMtools (
<xref rid="B18" ref-type="bibr">Li et al., 2009</xref>
) and the best practice of GATK (
<xref rid="B6" ref-type="bibr">DePristo et al., 2011</xref>
). Unlike most existing approach that uses read coverage to filter out low confidence variants, we use previously identified variant-containing
<italic>k</italic>
-mers (from the first step and with count included) to refine the obtained variants. More precisely, a variant is kept if it satisfies the following criteria: 1) a
<italic>k</italic>
-mer (formed by the reference genome and the variant jointly) containing the variant can be found from the set of kept
<italic>k</italic>
-mers obtained from the first step; 2) the sum of the count of all
<italic>k</italic>
-mer covering the variant is not less than 3. Note that the first criterion is necessary because extended
<italic>k</italic>
-mers introduce additional variants that are not unique to the child.</p>
</sec>
</sec>
<sec id="s2_2">
<title>Variant Filtering</title>
<p>We focus on VSD-related variants; thus, those obtained from the previous step undergo filtering to fish out VSD-related variants. Two types of variants are considered, viz., contained in coding and non-coding regions. For affected non-coding genes, we pay special attention on long non-coding RNAs.</p>
<sec id="s2_2_1">
<title>Identifying VSD-Related Variants</title>
<p>Only a tiny portion of variants obtained from GATK could be VSD-related. To obtain these variants, we first filter out irrelevant ones by GATK built-in modules with various parameters, including “QD < 2.0,” “QUAL < 30.0,” “SOR > 3.0,” “FS > 60.0,” “MQ < 40.0,” “MQRankSum<-12.5,” and “ReadPosRankSum<-8.0” for SNPs and “QD < 2.0,” “QUAL < 30.0,” “FS > 200.0,” and “ReadPosRankSum<-20.0” for indels (insertions and deletions). This step is followed by using ANNOVAR (
<xref rid="B34" ref-type="bibr">Wang et al., 2010</xref>
) to filter out variants presented in known individuals with minor allele frequency (MAF) of 0.01. Reference databases used in this stage are the phase 3 of 1000 Genomes Project (
<xref rid="B31" ref-type="bibr">The 1000 Genomes Project Consortium et al., 2015</xref>
), ExAC (
<xref rid="B16" ref-type="bibr">Lek et al., 2016</xref>
), ESP (
<xref rid="B7" ref-type="bibr">Exome Variant Server, 2019</xref>
), and gnomAD (
<xref rid="B16" ref-type="bibr">Lek et al., 2016</xref>
). That is, a variant that appears in these databases having MAF no less than 0.01 is excluded.</p>
<p>After filtering, we use DAVID (
<xref rid="B9" ref-type="bibr">Huang et al., 2009</xref>
) to analyze functions of remaining variants. These variants are also validated by using Gene Ontology (GO) (
<xref rid="B32" ref-type="bibr">The Gene Ontology Consortium, 2017</xref>
), Kyoto Encyclopedia of Genes and Genomes (KEGG) (
<xref rid="B15" ref-type="bibr">Kanehisa et al., 2018</xref>
), the Online Mendelian Inheritance in Man (OMIM) (
<xref rid="B8" ref-type="bibr">Hamosh et al., 2005</xref>
), and the Human Gene Mutation Database (HGMD) (
<xref rid="B29" ref-type="bibr">Stenson et al., 2017</xref>
). Functions and pathways of coding genes can be easily obtained by using DAVID, whereas they are unable to be obtained directly for non-coding transcripts. Hence, we handle them separately; see below.</p>
</sec>
<sec id="s2_2_2">
<title>Fishing Out Coding Genes</title>
<p>Taking the results generated by ANNOVAR, we select the variants having consequence of “Nonsense_Mutation,” “Frame_Shift_Ins,” “Frame_Shift_Del,” “Translation_Start_Site,” “Splice_Site,” “In_Frame_Ins,” “In_Frame_Del,” and “Missense_Mutation.” In addition, variants having SIFT score (
<xref rid="B27" ref-type="bibr">Sim et al., 2012</xref>
) larger than 0.05 and PolyPhen-2 index (
<xref rid="B1" ref-type="bibr">Adzhubei et al., 2010</xref>
) smaller than 0.446 are further filtered out. The remaining genes are input into DAVID to analyze gene-disease association, gene-annotation enrichment analysis, pathway mapping, and so on. Those genes related to cardiovascular diseases are fished out. In addition, the neurodegenerative diseases-related genes are also obtained because many studies have shown that these two diseases are closely related (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
).</p>
<p>The pinpointed genes are also checked with GO, OMIM, HGMD, and KEGG to verify their functions if available.</p>
</sec>
<sec id="s2_2_3">
<title>Pinpointing Out lncRNAs</title>
<p>A lncRNA does not translate proteins; however, it possesses many roles in gene transcription regulation, post-transcriptional regulation, epigenetic regulation, aging, and so on (
<xref rid="B21" ref-type="bibr">Marchese et al., 2017</xref>
). Hence, mutations occurred in lncRNAs may affect the downstream products. To identify the VSD-related variant-containing lncRNAs within the child, we extend the VSD-related genes (listed in
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
) to upstream and downstream by 100, 200, 500, and 1,000 bp. A variant is considered as a candidate if it is within the extension region and overlaps with lncRNAs shown in LNCipedia (
<xref rid="B33" ref-type="bibr">Volders et al., 2018</xref>
) or NONCODE (
<xref rid="B39" ref-type="bibr">Zhao et al., 2016</xref>
). Note that this protocol approaches the VSD-related lncRNAs approximately but not directly. The rationale is that regulatory elements within proximity usually play a role together (
<xref rid="B25" ref-type="bibr">Razin et al., 2013</xref>
;
<xref rid="B2" ref-type="bibr">Andrey and Mundlos, 2017</xref>
).</p>
<p>Functions of identified lncRNAs are fully explored by using LNCipedia and NONCODE.</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="s3">
<title>Results</title>
<sec id="s3_1">
<title>Data Preparation</title>
<p>A trio containing a 3-year-old boy diagnosed as having typical VSD and a couple of healthy parents is collected. The DNA of each individual is extracted from 5-ml venous blood and is sequenced by an Illumina HiSeq X Ten platform having coverage of 30× and read length of 151 bp. All DNA sequences of the three samples are obtained from one batch. As a result, 356,781,358 paired-end reads are obtained from the child, and 368,280,232 and 330,790,178 paired-end reads are obtained from his father and mother, respectively.</p>
<p>Before the sample collection, a written informed consent is obtained from the parents of the child.</p>
</sec>
<sec id="s3_2">
<title>TS-Based Variants</title>
<p>Based on the TS data, we obtained 2,585,348 variants by using GATK (
<xref rid="B6" ref-type="bibr">DePristo et al., 2011</xref>
) with default settings. These variants are further divided into two types, i.e., protein coding and non-coding. For the non-coding variants, we focus on the regions transcribed into lncRNAs. Variants associated with both cardiovascular and neurodegenerative diseases are explored because they usually occur together (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
). Details are shown below.</p>
<sec id="s3_2_1">
<title>Coding Genes Related to VSD</title>
<p>From the 2,585,348 variants, 193 within exonic regions and 6 from splicing regions pass various filtering criteria that are obtained by using ANNOVAR (
<xref rid="B34" ref-type="bibr">Wang et al., 2010</xref>
). The 193 variants are associated with 61 unique genes, whereas the 6 are involved in 8 genes; see more details in the
<xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary File</bold>
</xref>
.</p>
<p>Taking the 61 genes as input, we identify 14 genes related to cardiovascular diseases, including RASA1, CNOT2, MICALCL, MDFIC, PRDM7, ATXN1, CSGALNACT1, DYSF, GJB2, KRT35, MUC16, P2RX6, ZNF618, and CD80, and 5 genes related to neurodegenerative diseases, which are ATXN1, EPB41L1, PNPLA6, SYN2, and ERO1B (see
<xref ref-type="fig" rid="f1">
<bold>Figure 1</bold>
</xref>
and
<xref rid="T1" ref-type="table">
<bold>Table 1</bold>
</xref>
). Among the 18 genes (ATXN1 appears in both categories), 5 of them have been confirmed by OMIM and GAD (Genetic Association Database) (
<xref rid="B3" ref-type="bibr">Becker et al., 2004</xref>
), of which 2 (RASA1 and ATXN1) are cardiovascular disease-related and 4 are neurodegenerative disease-related (ATXN1, EPB41L1, PNPLA6, SYN2), whereas the rest only appear in one database. Compared with all the 457 VSD-related genes shown in
<xref rid="B12" ref-type="bibr">Jin et al. (2017a)</xref>
, we found that MUC16 is common for both data.</p>
<fig id="f1" position="float">
<label>Figure 1</label>
<caption>
<p>Variant-containing coding genes obtained from the trio that are associated with cardiovascular and neurodegenerative diseases. Panel
<bold>(A)</bold>
shows the 18 genes attached to the two categories [generated by using the STRING database (
<xref rid="B30" ref-type="bibr">Szklarczyk et al., 2017</xref>
)], panel
<bold>(B)</bold>
presents the connections between CD80- and CHD-related genes, and panel
<bold>(C)</bold>
illustrates the 3D structure of the mutated CD80 (PDB ID: 1I8L) discovered in this study. The 18 genes are fished out by using DAVID from OMIM and GAD databases, genes identified by OMIM are shaded by a polygon. Note that, all the genes identified by OMIM have also been confirmed by GAD.</p>
</caption>
<graphic xlink:href="fgene-10-00670-g001"></graphic>
</fig>
<table-wrap id="T1" position="float">
<label>Table 1</label>
<caption>
<p>Details of the 16 variant-containing coding genes identified from the trio.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" rowspan="1" colspan="1">Disease</th>
<th valign="top" rowspan="1" colspan="1">Gene</th>
<th valign="top" rowspan="1" colspan="1">Chr.</th>
<th valign="top" rowspan="1" colspan="1">Pos.</th>
<th valign="top" rowspan="1" colspan="1">Reference</th>
<th valign="top" rowspan="1" colspan="1">Alt.</th>
<th valign="top" rowspan="1" colspan="1">Variant
<sup>a</sup>
</th>
<th valign="top" rowspan="1" colspan="1">|Transcript|
<sup>b</sup>
</th>
<th valign="top" rowspan="1" colspan="1">Coverage
<sup>c</sup>
</th>
<th valign="top" rowspan="1" colspan="1">MAF
<italic>
<sup>d</sup>
</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">DYSF</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">71665193</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">14</td>
<td valign="top" rowspan="1" colspan="1">36</td>
<td valign="top" rowspan="1" colspan="1">0.52</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">MUC16</td>
<td valign="top" rowspan="1" colspan="1">19</td>
<td valign="top" rowspan="1" colspan="1">8888863</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">4</td>
<td valign="top" rowspan="1" colspan="1">55</td>
<td valign="top" rowspan="1" colspan="1">0.52</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">P2RX6</td>
<td valign="top" rowspan="1" colspan="1">22</td>
<td valign="top" rowspan="1" colspan="1">21023596</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">NM</td>
<td valign="top" rowspan="1" colspan="1">4</td>
<td valign="top" rowspan="1" colspan="1">33</td>
<td valign="top" rowspan="1" colspan="1">0.54</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">ZNF618</td>
<td valign="top" rowspan="1" colspan="1">9</td>
<td valign="top" rowspan="1" colspan="1">114050108</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">4</td>
<td valign="top" rowspan="1" colspan="1">35</td>
<td valign="top" rowspan="1" colspan="1">0.57</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">CD80</td>
<td valign="top" rowspan="1" colspan="1">3</td>
<td valign="top" rowspan="1" colspan="1">119537362</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">FSD</td>
<td valign="top" rowspan="1" colspan="1">3</td>
<td valign="top" rowspan="1" colspan="1">39</td>
<td valign="top" rowspan="1" colspan="1">0.56</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">CNOT2</td>
<td valign="top" rowspan="1" colspan="1">12</td>
<td valign="top" rowspan="1" colspan="1">70353914</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">IFD</td>
<td valign="top" rowspan="1" colspan="1">3</td>
<td valign="top" rowspan="1" colspan="1">15</td>
<td valign="top" rowspan="1" colspan="1">0.40</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">Cardiovascular</td>
<td valign="top" rowspan="1" colspan="1">ATXN1</td>
<td valign="top" rowspan="1" colspan="1">6</td>
<td valign="top" rowspan="1" colspan="1">16327634</td>
<td valign="top" rowspan="1" colspan="1">TGCTGC</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">IFD</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">16</td>
<td valign="top" rowspan="1" colspan="1">0.31</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">RASA1</td>
<td valign="top" rowspan="1" colspan="1">5</td>
<td valign="top" rowspan="1" colspan="1">87383769</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">21</td>
<td valign="top" rowspan="1" colspan="1">0.57</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">MDFIC</td>
<td valign="top" rowspan="1" colspan="1">7</td>
<td valign="top" rowspan="1" colspan="1">114922989</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">18</td>
<td valign="top" rowspan="1" colspan="1">0.66</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">CSGALNACT1</td>
<td valign="top" rowspan="1" colspan="1">8</td>
<td valign="top" rowspan="1" colspan="1">19458429</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">27</td>
<td valign="top" rowspan="1" colspan="1">0.62</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">PRDM7</td>
<td valign="top" rowspan="1" colspan="1">16</td>
<td valign="top" rowspan="1" colspan="1">90058384</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">FSD</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">37</td>
<td valign="top" rowspan="1" colspan="1">0.64</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">MICALCL</td>
<td valign="top" rowspan="1" colspan="1">11</td>
<td valign="top" rowspan="1" colspan="1">12294797</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">CTCCTC</td>
<td valign="top" rowspan="1" colspan="1">IFI</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">11</td>
<td valign="top" rowspan="1" colspan="1">0.36</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">GJB2</td>
<td valign="top" rowspan="1" colspan="1">13</td>
<td valign="top" rowspan="1" colspan="1">20189347</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">FSD</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">33</td>
<td valign="top" rowspan="1" colspan="1">0.39</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">KRT35</td>
<td valign="top" rowspan="1" colspan="1">17</td>
<td valign="top" rowspan="1" colspan="1">41477614</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">31</td>
<td valign="top" rowspan="1" colspan="1">0.41</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">Neurodegenerative</td>
<td valign="top" rowspan="1" colspan="1">PNPLA6</td>
<td valign="top" rowspan="1" colspan="1">19</td>
<td valign="top" rowspan="1" colspan="1">7556658</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">5</td>
<td valign="top" rowspan="1" colspan="1">26</td>
<td valign="top" rowspan="1" colspan="1">0.46</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">EPB41L1</td>
<td valign="top" rowspan="1" colspan="1">20</td>
<td valign="top" rowspan="1" colspan="1">36209768</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">4</td>
<td valign="top" rowspan="1" colspan="1">35</td>
<td valign="top" rowspan="1" colspan="1">0.62</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">SYN2</td>
<td valign="top" rowspan="1" colspan="1">3</td>
<td valign="top" rowspan="1" colspan="1">12004751</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">GCCCGCGCCGCA</td>
<td valign="top" rowspan="1" colspan="1">IFI</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">6</td>
<td valign="top" rowspan="1" colspan="1">0.33</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">ATXN1</td>
<td valign="top" rowspan="1" colspan="1">6</td>
<td valign="top" rowspan="1" colspan="1">16327634</td>
<td valign="top" rowspan="1" colspan="1">TGCTGC</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">IFD</td>
<td valign="top" rowspan="1" colspan="1">2</td>
<td valign="top" rowspan="1" colspan="1">20</td>
<td valign="top" rowspan="1" colspan="1">0.31</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1">ERO1B</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">236235819</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">MM</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">21</td>
<td valign="top" rowspan="1" colspan="1">0.52</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a</sup>
Variant classes include MM (missense mutation), NM (nonsense mutation), FSD (frame shift deletion), IFD (in frame deletion), IFI (in frame insertion), IFD (in frame deletion);
<sup>b</sup>
affected number of transcripts,
<sup>c</sup>
reads coverage, and
<sup>d</sup>
mutant allele frequency.</p>
</table-wrap-foot>
</table-wrap>
<p>We also performed pathway enrichment test for the identified genes; however, no significant cardiovascular-related pathway can be identified. We found that only five genes overlap with the genes saved in KEGG.</p>
<p>After careful investigation of the 14 cardiovascular disease-related genes
<italic>via</italic>
literature review, we found 13 of them have literature support that are associated with CHD (VSD is the most common type of CHD), while CD80 has no explicit support. Hence, we take further effort to explore the possible roles of CD80 in VSD development.</p>
<p>CD80 is well known in providing co-stimulatory signal necessary for T-cell activation and survival, which has been found on dendritic cells, activated B cells, and monocytes (
<xref rid="B24" ref-type="bibr">Peach et al., 1995</xref>
). To our knowledge, no study shows direct relation between CD80 and CHD. However,
<xref rid="B14" ref-type="bibr">Kallikourdis et al. (2017)</xref>
have reported that CD80 involved in T-cell costimulation complex contributes to a heart failure, suggesting that mutated CD80 has impact on heart defect. Hence, we carefully explored the role of CD80 with VSD computationally.</p>
<p>To examine whether the mutated CD80 has a connection with VSD as shown in this study, we retrieved all cardiovascular-related genes from GO, OMIM, and HGMD, and built connections between these genes and CD80 by using the STRING database (
<xref rid="B30" ref-type="bibr">Szklarczyk et al., 2017</xref>
). Results show that 31 genes in GO and 18 genes in OMIM have connections with CD80 (in protein association, including known interactions, predicted interactions, co-expression, etc.). In total, 41 genes have connection with CD80. The details are shown in
<xref rid="T2" ref-type="table">
<bold>Table 2</bold>
</xref>
. Among these 41 genes, 7 of them are known interactions (experimentally determined or curated from databases, shown in italic in
<xref rid="T2" ref-type="table">
<bold>Table 2</bold>
</xref>
; see
<xref ref-type="fig" rid="f1">
<bold>Figure 1B</bold>
</xref>
). Among these genes, AKT1, PDPK1, CDC42, AKT3, and PIK3CA have concrete evidences shown in relation with congenital heart disease; even two of them (AKT1, CDC42) have explicit association with VSD (
<xref rid="B5" ref-type="bibr">Chang et al., 2010</xref>
;
<xref rid="B19" ref-type="bibr">Liu et al., 2017</xref>
).</p>
<table-wrap id="T2" position="float">
<label>Table 2</label>
<caption>
<p>CD80 interacting genes in GO and OMIM that are associated with cardiovascular diseases.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" rowspan="1" colspan="1"></th>
<th valign="top" rowspan="1" colspan="1">
<italic>AKT1</italic>
</th>
<th valign="top" rowspan="1" colspan="1">
<italic>PDPK1</italic>
</th>
<th valign="top" rowspan="1" colspan="1">
<italic>CDC42</italic>
</th>
<th valign="top" rowspan="1" colspan="1">
<italic>PIK3R3</italic>
</th>
<th valign="top" rowspan="1" colspan="1">
<italic>AKT3</italic>
</th>
<th valign="top" rowspan="1" colspan="1">
<italic>PIK3CA</italic>
</th>
<th valign="top" rowspan="1" colspan="1">PIK3CB</th>
<th valign="top" rowspan="1" colspan="1">TGFB1</th>
</tr>
<tr>
<th valign="top" rowspan="3" colspan="1">CD80-GO</th>
<th valign="top" rowspan="1" colspan="1">PTEN</th>
<th valign="top" rowspan="1" colspan="1">IL8</th>
<th valign="top" rowspan="1" colspan="1">CXCL10</th>
<th valign="top" rowspan="1" colspan="1">IL10</th>
<th valign="top" rowspan="1" colspan="1">CCL2</th>
<th valign="top" rowspan="1" colspan="1">CCR2</th>
<th valign="top" rowspan="1" colspan="1">THY1</th>
<th valign="top" rowspan="1" colspan="1">ERBB2</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1">PIK3CG</td>
<td valign="top" rowspan="1" colspan="1">TLR3</td>
<td valign="top" rowspan="1" colspan="1">IL1B</td>
<td valign="top" rowspan="1" colspan="1">CD34</td>
<td valign="top" rowspan="1" colspan="1">ANPEP</td>
<td valign="top" rowspan="1" colspan="1">STAT3</td>
<td valign="top" rowspan="1" colspan="1">FASLG</td>
<td valign="top" rowspan="1" colspan="1">VEGFA</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">JUN</td>
<td valign="top" rowspan="1" colspan="1">IL18</td>
<td valign="top" rowspan="1" colspan="1">NRP1</td>
<td valign="top" rowspan="1" colspan="1">IL6</td>
<td valign="top" rowspan="1" colspan="1">STAT1</td>
<td valign="top" rowspan="1" colspan="1">CD40</td>
<td valign="top" rowspan="1" colspan="1">CXCR3</td>
<td valign="top" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td valign="top" rowspan="3" colspan="1">CD80-OMIM</td>
<td valign="top" rowspan="1" colspan="1">AKT2</td>
<td valign="top" rowspan="1" colspan="1">ICAM1</td>
<td valign="top" rowspan="1" colspan="1">ITIH4</td>
<td valign="top" rowspan="1" colspan="1">PIK3CG</td>
<td valign="top" rowspan="1" colspan="1">CD36</td>
<td valign="top" rowspan="1" colspan="1">CD40LG</td>
<td valign="top" rowspan="1" colspan="1">NRP1</td>
<td valign="top" rowspan="1" colspan="1">IL10</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">IL6</td>
<td valign="top" rowspan="1" colspan="1">CD40</td>
<td valign="top" rowspan="1" colspan="1">SCARB1</td>
<td valign="top" rowspan="1" colspan="1">INS</td>
<td valign="top" rowspan="1" colspan="1">IFNA1</td>
<td valign="top" rowspan="1" colspan="1">LMNA</td>
<td valign="top" rowspan="1" colspan="1">IL4</td>
<td valign="top" rowspan="1" colspan="1">VEGFA</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">IL18</td>
<td valign="top" rowspan="1" colspan="1">PTEN</td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1"></td>
<td valign="top" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Genes in italic are experimentally determined that have interactions with CD80, whereas the rest are computationally predicted.</p>
</table-wrap-foot>
</table-wrap>
<p>Regarding the mutated CD80 in this study, it has a “T” deletion on the reverse strand at chr3:119537362, leading to a frame shift at position 159 of the translated protein (cf.
<xref ref-type="fig" rid="f1">
<bold>Figure 1C</bold>
</xref>
). As a result, the protein is no longer able to insert into the membrane of a cardiac myocyte; see
<xref ref-type="fig" rid="f1">
<bold>Figure 1C</bold>
</xref>
. Therefore, the downstream pathway will be affected.</p>
<p>Regarding the eight variants that reside in the splicing region, we found only one (MROH5) that is related to cardiovascular disease.</p>
</sec>
<sec id="s3_2_2">
<title>LncRNAs Related to VSD</title>
<p>Other than fishing out candidate VSD-related genes from variants directly, we use known VSD-associated genes as a seed, and then pinpoint lncRNAs having variants near these seeds. A set of 457 known VSD-related genes are obtained from
<xref rid="B12" ref-type="bibr">Jin et al. (2017a)</xref>
, whereas the whole set of lncRNAs are retrieved from LNCipedia and NONCODE. A variant-containing lncRNA is considered to be VSD-related if it is within a certain distance of a known VSD-related gene. Other than using a single distance, we use various distances, which are 100, 200, 500, and 1,000 bp.</p>
<p>We identified 6, 7, 27, and 49 lncRNAs from LNCipedia having distance of 100, 200, 500, and 1,000 bp, respectively, whereas these numbers are 6, 8, 32, and 57 when checked against NONCODE. Details are shown in the
<xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary File</bold>
</xref>
. To examine whether these lncRNAs have a potential effect to VSD, we carefully studied their expression in different tissues, particularly in the heart. We found that among all the lncRNAs (both from LNCipedia and NONCODE), 29 of them present in the heart, especially NONHSAT096266.2 (NONCODE ID), which is highly and uniquely expressed in the heart, having a FPKM score of 13.97. More interestingly, this lncRNA is very close to NFXL1, which has been identified as a VSD-associated gene (
<xref rid="B12" ref-type="bibr">Jin et al., 2017a</xref>
). See
<xref rid="T3" ref-type="table">
<bold>Table 3</bold>
</xref>
.</p>
<table-wrap id="T3" position="float">
<label>Table 3</label>
<caption>
<p>The top 10 lncRNAs obtained from the trio that have potential relation with VSD.</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th valign="top" rowspan="1" colspan="1">NONCODE ID</th>
<th valign="top" rowspan="1" colspan="1">Chr.</th>
<th valign="top" rowspan="1" colspan="1">Pos.</th>
<th valign="top" rowspan="1" colspan="1">Ref.</th>
<th valign="top" rowspan="1" colspan="1">Alt.</th>
<th valign="top" rowspan="1" colspan="1">FPKM</th>
<th valign="top" rowspan="1" colspan="1">Dis.(bp)</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1"> NONHSAT096266.2</td>
<td valign="top" rowspan="1" colspan="1">4</td>
<td valign="top" rowspan="1" colspan="1">47846448</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">13.97</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT232531.1</td>
<td valign="top" rowspan="1" colspan="1">12</td>
<td valign="top" rowspan="1" colspan="1">131923913</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">2.08</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT001273.2</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">19074132</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">1.99</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT180457.1</td>
<td valign="top" rowspan="1" colspan="1">19</td>
<td valign="top" rowspan="1" colspan="1">44259481</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">1.54</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT235401.1</td>
<td valign="top" rowspan="1" colspan="1">15</td>
<td valign="top" rowspan="1" colspan="1">66701900</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">1.49</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT244710.1</td>
<td valign="top" rowspan="1" colspan="1">22</td>
<td valign="top" rowspan="1" colspan="1">19179167</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">1.15</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT010771.2</td>
<td valign="top" rowspan="1" colspan="1">1</td>
<td valign="top" rowspan="1" colspan="1">247332298</td>
<td valign="top" rowspan="1" colspan="1">AC</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">0.89</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT022678.2</td>
<td valign="top" rowspan="1" colspan="1">11</td>
<td valign="top" rowspan="1" colspan="1">71452973</td>
<td valign="top" rowspan="1" colspan="1">T</td>
<td valign="top" rowspan="1" colspan="1">C</td>
<td valign="top" rowspan="1" colspan="1">0.37</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT229850.1</td>
<td valign="top" rowspan="1" colspan="1">11</td>
<td valign="top" rowspan="1" colspan="1">64779243</td>
<td valign="top" rowspan="1" colspan="1">GAAAAAA</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">0.32</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
<tr>
<td valign="top" rowspan="1" colspan="1">NONHSAT246250.1</td>
<td valign="top" rowspan="1" colspan="1">3</td>
<td valign="top" rowspan="1" colspan="1">9396926</td>
<td valign="top" rowspan="1" colspan="1">G</td>
<td valign="top" rowspan="1" colspan="1">A</td>
<td valign="top" rowspan="1" colspan="1">0.22</td>
<td valign="top" rowspan="1" colspan="1">1000</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>FPKM: fragments per kilobase of exon per million reads mapped (
<xref rid="B23" ref-type="bibr">Mortazavi et al., 2008</xref>
).</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_2_3">
<title>Results Comparison</title>
<p>We use TrioDeNovo (
<xref rid="B35" ref-type="bibr">Wei et al., 2015</xref>
) with default settings (depth of coverage equals 5) to call
<italic>de novo</italic>
variants from the trio and compare results with that of our approach. As a result, TrioDeNovo identifies 79,082 variants contained in 357 genes. After filtering out common variants with MAF of 0.01, 51 variants located in 25 genes are obtained. Among these genes, 21 overlap with our findings. Regarding lncRNAs, no variant can be found within 1,000 bp of known VSD-related genes.</p>
</sec>
</sec>
<sec id="s3_3">
<title>SS-Based Variants</title>
<p>Intuitively a TS-based approach is able to significantly narrow down candidate genes; however, it is hard to speculate to what extent the improvement is. Hence, we have conducted experiments on the sequences of the VSD sample (the child) only with the same protocols as the TS-based experiments.</p>
<p>Based on the single-sequencing (SS) data, we have obtained 4,826,899 variants by using GATK (
<xref rid="B6" ref-type="bibr">DePristo et al., 2011</xref>
). Similar as trio-sequencing (TS) data analysis, we divide them into protein coding and non-coding variants.</p>
<sec id="s3_3_1">
<title>Coding Genes Related to VSD</title>
<p>After annotation and filtering by using ANNOVAR, we obtained 1,552 variants contained in 436 genes. Among these genes, 424 have exonic variations and the other 12 have splicing variations. For the 424 genes, we identified MYBPC3 (Chr11:47342683, C/T, missense mutation, p.G507R) and TRDN (Chr6:123571021, C/A, missense mutation, p.S45I), which are highly related to ventricles, by using DAVID based on OMIM. More details are shown in the
<xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary File</bold>
</xref>
. Unfortunately, these two genes cannot be identified based on the trio. Further investigation has shown that the variation in MYBPC3 is inherited from the father, whereas the variation in TRDN is inherited from the mother. Considering the truth that both the father and mother are healthy, but the child has VSD, we speculate that the combination of mutated MYBPC3 and TRDN may have noteworthy contribution to VSD. Regarding the 12 genes having variants in splicing regions, we have not found cardiovascular- or neurodegenerative-related genes.</p>
<p>The genes identified by GAD are excluded for the child sample analysis. This is because more than 60 genes can be found, and the significance of relations between these variants and VSD can be hardly determined.</p>
</sec>
<sec id="s3_3_2">
<title>LncRNAs Related to VSD</title>
<p>Similar to TS-based lncRNA identification, we carried out the same experiments on the VSD patient only. Unlike the results obtained from coding genes that are about 10 times larger, candidates are selected from the SS-based data than the TS-based data; we get five times larger number of lncRNA variants between the SS-based data and the TS-based data.</p>
<p>The numbers of lncRNAs having variants close to VSD-related genes are 37, 60, 129, and 197 for distances of 100, 200, 500, and 1,000 bp, respectively. Among all these lncRNAs, only 97 are present in heart cells, of which 23 are highly expressed having FPKM score larger than 1. Details are shown in the
<xref ref-type="supplementary-material" rid="SM1">
<bold>Supplementary File</bold>
</xref>
. For instance, the lncRNA NONHSAT181468.1 (Chr2:27217145, CT/C), which has the highest FPKM score of the identified lncRNAs, is highly expressed in the heart having FPKM of 31.7. This lncRNA is within the first intron of SLC5A6, which has been confirmed as a VSD-associated gene.</p>
</sec>
</sec>
<sec id="s3_4">
<title>Variants Profile of the Trio</title>
<p>We use the ratio of
<italic>k</italic>
-mers between the parents and child to reflect their genetic variations.
<xref ref-type="fig" rid="f2">
<bold>Figure 2</bold>
</xref>
shows the detailed ratio distribution. It is clear that only a small portion of
<italic>k</italic>
-mers have ratio of 0 (see
<xref ref-type="fig" rid="f2">
<bold>Figure 2A, C</bold>
</xref>
). That having been said, among the
<italic>k</italic>
-mers of the child, only 0.442% contains
<italic>de novo</italic>
mutations compared with his father, and this value is 0.438% compared with his mother. After combining them together, 0.43% are unique
<italic>k</italic>
-mers.</p>
<fig id="f2" position="float">
<label>Figure 2</label>
<caption>
<p>Variants profile of the trio. Panels
<bold>(A</bold>
and
<bold>C)</bold>
are the distribution of
<italic>k</italic>
-mer count ratio between the parents and the child, panel
<bold>(B)</bold>
is the ratio distribution of the
<italic>k</italic>
-mers having a count less than 20, and panel
<bold>(D)</bold>
is the overall distribution of ratio against count. Note that ratio distribution in panel
<bold>(B)</bold>
is clustered into bins,
<italic>viz</italic>
.,
<italic>r</italic>
= 0 (denoted as “0”), 0 < r ≤ 0.1 (denoted as “0.1”), 0.1 <
<italic>r</italic>
≤ 0.2 (denoted as “0.2”), and 0.2 <
<italic>r</italic>
≤ 0.3 (denoted as “0.3”), where
<italic>r</italic>
is a ratio and the count is in log scale.</p>
</caption>
<graphic xlink:href="fgene-10-00670-g002"></graphic>
</fig>
<p>The ratio of
<italic>k</italic>
-mers may be affected by sequencing errors. To alleviate this impact, we include
<italic>k</italic>
-mers having small ratio (less than 0.3) except the ones having a ratio of 0. Generally, the number of
<italic>k</italic>
-mers having a ratio of 0 is four to five magnitudes larger than those non-zero ones (see
<xref ref-type="fig" rid="f2">
<bold>Figure 2C</bold>
</xref>
). In case these
<italic>k</italic>
-mers contain mutations, they will be fished out during the downstream variant calling.</p>
<p>Unlike the distribution of
<italic>k</italic>
-mer count for all k-mers (approximate normal), the
<italic>k</italic>
-mers having mutations follow a Poisson distribution (see
<xref ref-type="fig" rid="f2">
<bold>Figure 2D</bold>
</xref>
). The
<italic>k</italic>
-mers having counted smaller than 20 forms 97.97% of all
<italic>k</italic>
-mers having ratio less than 0.3. The distribution breakdowns of these
<italic>k</italic>
-mers are shown in
<xref ref-type="fig" rid="f2">
<bold>Figure 2B</bold>
</xref>
.</p>
</sec>
<sec id="s3_5">
<title>Run-Time Analysis</title>
<p>Our experiments are conducted on a computer having 128G RAM and two E5-2683V4 CPUs (32 cores in total), installed with CentOS 7.0. Throughout the entire experiments, we use 24 threads as default if applicable.</p>
<p>Other than existing approaches that filter out irrelevant variants from trios after mapping, e.g., TrioDeNovo (
<xref rid="B35" ref-type="bibr">Wei et al., 2015</xref>
), we conduct filtering before mapping. This small change is not trivial since the input data are usually very large. For instance, the input size of the VSD sample used in this study is 242 Gb in fastq format, and the total size is over 700 Gb for the trio. To solve this problem, we have conceived a novel coupled Bloom Filter-based
<italic>k</italic>
-mer encoding algorithm. This algorithm achieves a compression ratio of 12 under default settings. That having been said, a typical set of
<italic>k</italic>
-mers obtained from a human genome (usually around 120Gb) can be compressed into 10 Gb. Using this approach, we are able to handle a trio within a main memory.</p>
<p>Experiments show that the total memory used to encode counted
<italic>k</italic>
-mers obtained from the trio is 31.7 Gb. Based on the encoded
<italic>k</italic>
-mers having count available, we calculate count ratio of all
<italic>k</italic>
-mers between the parents and the child. Mathematically, suppose the count of a
<italic>k</italic>
-mer κ from the child is
<inline-formula>
<mml:math id="M13">
<mml:mrow>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
, and the count is
<inline-formula>
<mml:math id="M14">
<mml:mrow>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="M15">
<mml:mrow>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
for his father and mother, respectively; then, the count ratio is
<inline-formula>
<mml:math id="M16">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>κ</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>f</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mrow>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:math>
</inline-formula>
between his father and himself. Analogously, the count ratio between his mother and himself is
<inline-formula>
<mml:math id="M17">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>κ</mml:mi>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
<mml:msubsup>
<mml:mi>f</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
. If both
<inline-formula>
<mml:math id="M18">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
and
<inline-formula>
<mml:math id="M19">
<mml:mrow>
<mml:msubsup>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo>/</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mi>κ</mml:mi>
</mml:msubsup>
</mml:mrow>
</mml:math>
</inline-formula>
are smaller than the threshold
<italic>r</italic>
<sub>0</sub>
, then, κ is kept, where
<italic>r</italic>
<sub>0</sub>
is set as 0.3 in this study. Results show that
<italic>k</italic>
-mer counting takes 129 min,
<italic>k</italic>
-mer encoding takes 175 min, and
<italic>k</italic>
-mer filtering takes 20.3 s. As a result, 3.9%
<italic>k</italic>
-mers are left for further analysis.</p>
<p>Because there exist sequencing errors, we perform error correction on the remained
<italic>k</italic>
-mers (
<xref rid="B37" ref-type="bibr">Zhao et al., 2017</xref>
). It takes 1.7 s and 0.12-Gb RAM to correct 93.7% errors of the kept
<italic>k</italic>
-mers. As a result, 293.2M
<italic>k</italic>
-mers are left for variants identification.</p>
<p>Before mapping variant-containing
<italic>k</italic>
-mers to a reference genome, we have also conducted
<italic>k</italic>
-mer extension to avoid multi-mapping problem caused by short input sequence, e.g.,
<italic>k</italic>
-mer. An extension takes a
<italic>k</italic>
-mer as seed, and extends the
<italic>k</italic>
-mer to both sides based on the reads in which the
<italic>k</italic>
-mer is contained. Finally, we mapped extended sequences to the reference genome GRCh38/h38
<italic>via</italic>
BWA (
<xref rid="B17" ref-type="bibr">Li and Durbin, 2009</xref>
), which takes 52 min to finish. This is followed by variants calling through SAMtools
<xref rid="B18" ref-type="bibr">Li et al. (2009)</xref>
and GATK (
<xref rid="B6" ref-type="bibr">DePristo et al., 2011</xref>
) jointly. It takes 50 min to finish the above mentioned steps.</p>
<p>Regarding TrioDeNovo, it takes 572 min to get the sorted sam file from a raw fastq file and uses 8,179 min to merge and generate the final vcf file by using GATK and TrioDeNovo. Compared with our approach, TrioDenovo is 10 times (= (8179 + 572*3)/((129 + 175)*3+102)) slower than ours. Besides, the maximum RAM required by our approach is two magnitudes smaller.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s4">
<title>Conclusions</title>
<p>As the most common CHD, VSD affects a noteworthy portion of newborns, leading to a high mortality. Unveiling the biological mechanism, particularly the underpinning genetic variants, is essential for both early diagnosis and clinical treatment. Existing approaches of mining genetic variants rely on large panels, which is challenging in cost and sample collection. It is also prone to overlooking rare variants and hard to handle multiple variants. We designed a novel algorithm for identifying variants from a trio and associate them with VSD. Experiments show that trio-sequencing-based approach is able to narrow down VSD-related candidates by about 10 times in coding genes and 5 times in lncRNAs; meanwhile our approach is 10 times faster than existing state-of-the-art approach. Applying our method to a VSD trio, we fish out 14 coding genes closely correlated to cardiovascular diseases and 5 coding genes associated with neurodegenerative diseases. Among them, CD80 has not been reported yet. More promisingly, results show that the combination of MYBPC3 and TRDN has high possibility to be VSD-related. Analysis on lncRNA shows that six are highly expressed in heart that are within 1,000 bp to VSD-related genes, particularly NONHSAT096266.2, which has a FPKM socre of 13.97 and is uniquely expressed in heart.</p>
</sec>
<sec id="s5">
<title>Author Contributions</title>
<p>LZ conceived the algorithm, designed the experiments, and wrote the manuscript. PJ participated in program coding. PJ, YH, YW, JZ, LB, QT, and TL participated in data analysis. All authors read and approved the final manuscript.</p>
</sec>
<sec sec-type="funding-information" id="s6">
<title>Funding</title>
<p>This study is collectively supported by the Free Exploration Fund of Hubei University of Medicine (FDFR201805), the National Natural Science Foundation of China (31501070), the Natural Science Foundation of Hubei (2017CFB137) and Guangxi (2016GXNSFCA380006, 2018GXNSFAA281275 and 2018GXNSFAA138085), the Scientific Research Fund of GuangXi University (XGZ150316), and Taihe Hospital (2016JZ11).</p>
</sec>
<sec id="s7">
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<sec sec-type="supplementary-material" id="s8">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at:
<ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fgene.2019.00670/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fgene.2019.00670/full#supplementary-material</ext-link>
</p>
<supplementary-material content-type="local-data" id="SM1">
<media xlink:href="Table_1.xlsx">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Adzhubei</surname>
<given-names>I. A.</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Peshkin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ramensky</surname>
<given-names>V. E.</given-names>
</name>
<name>
<surname>Gerasimova</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2010</year>
).
<article-title>A method and server for predicting damaging missense mutations</article-title>
.
<source>Nat. Methods</source>
<volume>7</volume>
,
<fpage>248</fpage>
<lpage>249</lpage>
.
<pub-id pub-id-type="doi">10.1038/nmeth0410-248</pub-id>
<pub-id pub-id-type="pmid">20354512</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Andrey</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Mundlos</surname>
<given-names>S.</given-names>
</name>
</person-group>
(
<year>2017</year>
).
<article-title>The three-dimensional genome: regulating gene expression during pluripotency and development</article-title>
.
<source>Development</source>
<volume>144</volume>
,
<fpage>3646</fpage>
<lpage>3658</lpage>
.
<pub-id pub-id-type="doi">10.1242/dev.148304</pub-id>
<pub-id pub-id-type="pmid">29042476</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Becker</surname>
<given-names>K. G.</given-names>
</name>
<name>
<surname>Barnes</surname>
<given-names>K. C.</given-names>
</name>
<name>
<surname>Bright</surname>
<given-names>T. J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S. A.</given-names>
</name>
</person-group>
(
<year>2004</year>
).
<article-title>The genetic association database</article-title>
.
<source>Nat. Genet.</source>
<volume>36</volume>
,
<fpage>431</fpage>
<lpage>432</lpage>
.
<pub-id pub-id-type="doi">10.1038/ng0504-431</pub-id>
<pub-id pub-id-type="pmid">15118671</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bloom</surname>
<given-names>B. H.</given-names>
</name>
</person-group>
(
<year>1970</year>
).
<article-title>Space/time trade-offs in hash coding with allowable errors</article-title>
.
<source>Commun. ACM</source>
<volume>13</volume>
,
<fpage>422</fpage>
<lpage>426</lpage>
.
<pub-id pub-id-type="doi">10.1145/362686.362692</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Teng</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Luan</surname>
<given-names>Q.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2010</year>
).
<article-title>Deletion of Akt1 causes heart defects and abnormal cardiomyocyte proliferation</article-title>
.
<source>Dev. Biol.</source>
<volume>347</volume>
,
<fpage>384</fpage>
<lpage>391</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.ydbio.2010.08.033</pub-id>
<pub-id pub-id-type="pmid">20816796</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>DePristo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Banks</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Poplin</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Garimella</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Maguire</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hartl</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2011</year>
).
<article-title>A framework for variation discovery and genotyping using next-generation DNA sequencing data</article-title>
.
<source>Nat. Genet.</source>
<volume>43</volume>
,
<fpage>491</fpage>
<lpage>498</lpage>
.
<pub-id pub-id-type="doi">10.1038/ng.806</pub-id>
<pub-id pub-id-type="pmid">21478889</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<collab>Exome Variant Server</collab>
</person-group>
(
<year>2019</year>
).
<article-title>NHLBI GO Exome Sequencing Project (ESP)</article-title>
.
<uri xlink:type="simple" xlink:href="http://evs.gs.washington.edu/EVS/">http://evs.gs.washington.edu/EVS/</uri>
,
<publisher-loc>Seattle, WA</publisher-loc>
.</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hamosh</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Scott</surname>
<given-names>A. F.</given-names>
</name>
<name>
<surname>Amberger</surname>
<given-names>J. S.</given-names>
</name>
<name>
<surname>Bocchini</surname>
<given-names>C. A.</given-names>
</name>
<name>
<surname>McKusick</surname>
<given-names>V. A.</given-names>
</name>
</person-group>
(
<year>2005</year>
).
<article-title>Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>33</volume>
,
<fpage>D514</fpage>
<lpage>D517</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gki033</pub-id>
<pub-id pub-id-type="pmid">15608251</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>D. W.</given-names>
</name>
<name>
<surname>Sherman</surname>
<given-names>B. T.</given-names>
</name>
<name>
<surname>Lempicki</surname>
<given-names>R. A.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<article-title>Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources</article-title>
.
<source>Nat. Protoc.</source>
<volume>4</volume>
,
<fpage>44</fpage>
<lpage>57</lpage>
.
<pub-id pub-id-type="doi">10.1038/nprot.2008.211</pub-id>
<pub-id pub-id-type="pmid">19131956</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<collab>The International HapMap 3 Consortium</collab>
</person-group>
(
<year>2010</year>
).
<article-title>Integrating common and rare genetic variation in diverse human populations</article-title>
.
<source>Nature</source>
<volume>467</volume>
,
<fpage>52</fpage>
<lpage>58</lpage>
.
<pub-id pub-id-type="doi">10.1038/nature09298</pub-id>
<pub-id pub-id-type="pmid">20811451</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Deng</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Schmidt</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Tang</surname>
<given-names>X.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2019</year>
).
<article-title>kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers</article-title>
.
<source>Bioinformatics</source>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btz299</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>S. C.</given-names>
</name>
<name>
<surname>Homsy</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Zaidi</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Morton</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>DePalma</surname>
<given-names>S. R.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
a).
<article-title>Contribution of rare inherited and
<italic>de novo</italic>
variants in 2,871 congenital heart disease probands</article-title>
.
<source>Nat. Genet.</source>
<volume>49</volume>
,
<fpage>1593</fpage>
.
<pub-id pub-id-type="doi">10.1038/ng.3970</pub-id>
<pub-id pub-id-type="pmid">28991257</pub-id>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jin</surname>
<given-names>Z.-B.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X.-F.</given-names>
</name>
<name>
<surname>Feng</surname>
<given-names>C.-Y.</given-names>
</name>
<name>
<surname>Cai</surname>
<given-names>X.-B.</given-names>
</name>
<name>
<surname>Mao</surname>
<given-names>J.-Y.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
b).
<article-title>Trio-based exome sequencing arrests
<italic>de novo</italic>
mutations in early-onset high myopia</article-title>
.
<source>Proc. Natl. Acad. Sci. U.S.A.</source>
<volume>114</volume>
,
<fpage>4219</fpage>
<lpage>4224</lpage>
.
<pub-id pub-id-type="doi">10.1073/pnas.1615970114</pub-id>
<pub-id pub-id-type="pmid">28373534</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kallikourdis</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Martini</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Carullo</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sardi</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Roselli</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Greco</surname>
<given-names>C. M.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>T cell costimulation blockade blunts pressure overload-induced heart failure</article-title>
.
<source>Nat. Commun.</source>
<volume>8</volume>
,
<elocation-id>14680</elocation-id>
.
<pub-id pub-id-type="doi">10.1038/ncomms14680</pub-id>
<pub-id pub-id-type="pmid">28262700</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Furumichi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Morishima</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Tanabe</surname>
<given-names>M.</given-names>
</name>
</person-group>
(
<year>2018</year>
).
<article-title>New approach for understanding genome variations in KEGG</article-title>
.
<source>Nucleic Acids Res</source>
<volume>47</volume>
(
<issue>D1</issue>
),
<fpage>D590</fpage>
<lpage>D595</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gky962</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lek</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Karczewski</surname>
<given-names>K. J.</given-names>
</name>
<name>
<surname>Minikel</surname>
<given-names>E. V.</given-names>
</name>
<name>
<surname>Samocha</surname>
<given-names>K, E.</given-names>
</name>
<name>
<surname>Banks</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2016</year>
).
<article-title>Analysis of protein-coding genetic variation in 60,706 humans</article-title>
.
<source>Nature</source>
<volume>536</volume>
,
<fpage>285</fpage>
<lpage>291</lpage>
.
<pub-id pub-id-type="doi">10.1038/nature19057</pub-id>
<pub-id pub-id-type="pmid">27535533</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Durbin</surname>
<given-names>R.</given-names>
</name>
</person-group>
(
<year>2009</year>
).
<article-title>Fast and accurate short read alignment with Burrows-Wheeler transform</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
,
<fpage>1754</fpage>
<lpage>1760</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp324</pub-id>
<pub-id pub-id-type="pmid">19451168</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Handsaker</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Wysoker</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fennell</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ruan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Homer</surname>
<given-names>N.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2009</year>
).
<article-title>The sequence alignment/map format and SAMtools</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
,
<fpage>2078</fpage>
<lpage>2079</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
<pub-id pub-id-type="pmid">19505943</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Tharakan</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>S. L.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>Deletion of Cdc42 in embryonic cardiomyocytes results in right ventricle hypoplasia</article-title>
.
<source>Clin. Transl. Med.</source>
<volume>6</volume>
,
<fpage>40</fpage>
.
<pub-id pub-id-type="doi">10.1186/s40169-017-0171-4</pub-id>
<pub-id pub-id-type="pmid">29101495</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>MacArthur</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bowler</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Cerezo</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Gil</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Hall</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hastings</surname>
<given-names>E.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>45</volume>
,
<fpage>D896</fpage>
<lpage>D901</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkw1133</pub-id>
<pub-id pub-id-type="pmid">27899670</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marchese</surname>
<given-names>F. P.</given-names>
</name>
<name>
<surname>Raimondi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Huarte</surname>
<given-names>M.</given-names>
</name>
</person-group>
(
<year>2017</year>
).
<article-title>The multidimensional mechanisms of long noncoding RNA function</article-title>
.
<source>Genome Biol.</source>
<volume>18</volume>
,
<fpage>206</fpage>
.
<pub-id pub-id-type="doi">10.1186/s13059-017-1348-2</pub-id>
<pub-id pub-id-type="pmid">29084573</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marouli</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Graff</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Medina-Gomez</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Lo</surname>
<given-names>K. S.</given-names>
</name>
<name>
<surname>Wood</surname>
<given-names>A. R.</given-names>
</name>
<name>
<surname>Kjaer</surname>
<given-names>T. R.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>Rare and low-frequency coding variants alter human adult height</article-title>
.
<source>Nature</source>
<volume>542</volume>
,
<fpage>186</fpage>
<lpage>190</lpage>
.
<pub-id pub-id-type="doi">10.1038/nature21039</pub-id>
<pub-id pub-id-type="pmid">28146470</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mortazavi</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>B. A.</given-names>
</name>
<name>
<surname>McCue</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Schaeffer</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wold</surname>
<given-names>B.</given-names>
</name>
</person-group>
(
<year>2008</year>
).
<article-title>Mapping and quantifying mammalian transcriptomes by RNA-Seq</article-title>
.
<source>Nat. Methods</source>
<volume>5</volume>
,
<fpage>621</fpage>
<lpage>628</lpage>
.
<pub-id pub-id-type="doi">10.1038/nmeth.1226</pub-id>
<pub-id pub-id-type="pmid">18516045</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Peach</surname>
<given-names>R. J.</given-names>
</name>
<name>
<surname>Bajorath</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Naemura</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Leytze</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Greene</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Aruffo</surname>
<given-names>A.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>1995</year>
).
<article-title>Both extracellular immunoglobin-like domains of CD80 contain residues critical for binding T cell surface receptors CTLA-4 and CD28</article-title>
.
<source>J. Biol. Chem.</source>
<volume>270</volume>
,
<fpage>21181</fpage>
<lpage>21187</lpage>
.
<pub-id pub-id-type="doi">10.1074/jbc.270.36.21181</pub-id>
<pub-id pub-id-type="pmid">7545666</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Razin</surname>
<given-names>S. V.</given-names>
</name>
<name>
<surname>Gavrilov</surname>
<given-names>A. A.</given-names>
</name>
<name>
<surname>Loudinkova</surname>
<given-names>E. S.</given-names>
</name>
<name>
<surname>Larovaia</surname>
<given-names>O. V.</given-names>
</name>
</person-group>
(
<year>2013</year>
).
<article-title>Communication of genome regulatory elements in a folded chromosome</article-title>
.
<source>FEBS Lett.</source>
<volume>587</volume>
,
<fpage>1840</fpage>
<lpage>1847</lpage>
.
<pub-id pub-id-type="doi">10.1016/j.febslet.2013.04.027</pub-id>
<pub-id pub-id-type="pmid">23651551</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Serpytis</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Karvelyte</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Serpytis</surname>
<given-names>R.</given-names>
</name>
</person-group>
(
<year>2015</year>
).
<article-title>Post-infarction ventricular septal defect: risk factors and early outcomes</article-title>
.
<source>Hellenic J. Cardiol.</source>
<volume>56</volume>
,
<fpage>66</fpage>
<lpage>71</lpage>
.
<pub-id pub-id-type="pmid">25701974</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sim</surname>
<given-names>N.-L.</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Hu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Henikoff</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Schneider</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Ng</surname>
<given-names>P. C.</given-names>
</name>
</person-group>
(
<year>2012</year>
).
<article-title>SIFT web server: predicting effects of amino acid substitutions on proteins</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>40</volume>
,
<fpage>W452</fpage>
<lpage>W457</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gks539</pub-id>
<pub-id pub-id-type="pmid">22689647</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Spicer</surname>
<given-names>D. E.</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>H. H.</given-names>
</name>
<name>
<surname>Co-Vu</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Anderson</surname>
<given-names>R. H.</given-names>
</name>
<name>
<surname>Fricker</surname>
<given-names>F. J.</given-names>
</name>
</person-group>
(
<year>2014</year>
).
<article-title>Ventricular septal defect</article-title>
.
<source>Orphanet J. Rare Dis.</source>
<volume>9</volume>
,
<elocation-id>144</elocation-id>
.
<pub-id pub-id-type="doi">10.1186/s13023-014-0144-2</pub-id>
<pub-id pub-id-type="pmid">25523232</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stenson</surname>
<given-names>P. D.</given-names>
</name>
<name>
<surname>Mort</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>E. V.</given-names>
</name>
<name>
<surname>Evans</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Hayden</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Heywood</surname>
<given-names>S.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies</article-title>
.
<source>Hum. Genet.</source>
<volume>136</volume>
,
<fpage>665</fpage>
<lpage>677</lpage>
.
<pub-id pub-id-type="doi">10.1007/s00439-017-1779-6</pub-id>
<pub-id pub-id-type="pmid">28349240</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Szklarczyk</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Morris</surname>
<given-names>J. H.</given-names>
</name>
<name>
<surname>Cook</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kuhn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Wyder</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Simonovic</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2017</year>
).
<article-title>The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>45</volume>
,
<fpage>D362</fpage>
<lpage>D368</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkw937</pub-id>
<pub-id pub-id-type="pmid">27924014</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<collab>The 1000 Genomes Project Consortium</collab>
<name>
<surname>Auton</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Abecasis</surname>
<given-names>G. R.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2015</year>
).
<article-title>A global reference for human genetic variation</article-title>
.
<source>Nature</source>
<volume>526</volume>
,
<fpage>68</fpage>
<lpage>74</lpage>
.
<pub-id pub-id-type="doi">10.1038/nature15393</pub-id>
<pub-id pub-id-type="pmid">26432245</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<collab>The Gene Ontology Consortium</collab>
</person-group>
(
<year>2017</year>
).
<article-title>Expansion of the gene ontology knowledgebase and resources</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>45</volume>
,
<fpage>D331</fpage>
<lpage>D338</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkw1108</pub-id>
<pub-id pub-id-type="pmid">27899567</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Volders</surname>
<given-names>P.-J.</given-names>
</name>
<name>
<surname>Anckaert</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Verheggen</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Nuytens</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Martens</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Mestdagh</surname>
<given-names>P.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2018</year>
).
<article-title>LNCipedia 5: towards a reference set of human long non-coding RNAs</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>47</volume>
(
<issue>D1</issue>
),
<fpage>D135</fpage>
<lpage>D139</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gky1031</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hakonarson</surname>
<given-names>H.</given-names>
</name>
</person-group>
(
<year>2010</year>
).
<article-title>ANNOVAR: functional annotation of genetic variants from next-generation sequencing data</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>38</volume>
,
<fpage>e164</fpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkq603</pub-id>
<pub-id pub-id-type="pmid">20601685</pub-id>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wei</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Zhan</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Zhong</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2015</year>
).
<article-title>A bayesian framework for
<italic>de novo</italic>
mutation calling in parents-offspring trios</article-title>
.
<source>Bioinformatics</source>
<volume>31</volume>
,
<fpage>1375</fpage>
<lpage>1381</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btu839</pub-id>
<pub-id pub-id-type="pmid">25535243</pub-id>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wood</surname>
<given-names>A. R.</given-names>
</name>
<name>
<surname>Esko</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Vedantam</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pers</surname>
<given-names>T, H.</given-names>
</name>
<name>
<surname>Gustafsson</surname>
<given-names>S.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2014</year>
).
<article-title>Defining the role of common variation in the genomic and biological architecture of adult human height</article-title>
.
<source>Nat. Genet.</source>
<volume>46</volume>
,
<fpage>1173</fpage>
<lpage>1186</lpage>
.
<pub-id pub-id-type="doi">10.1038/ng.3097</pub-id>
<pub-id pub-id-type="pmid">25282103</pub-id>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>J.</given-names>
</name>
</person-group>
(
<year>2017</year>
).
<article-title>Mapreduce for accurate error correction of next-generation sequencing data</article-title>
.
<source>Bioinformatics</source>
<volume>33</volume>
,
<fpage>3844</fpage>
<lpage>3851</lpage>
.
<pub-id pub-id-type="doi">10.1093/bioinformatics/btx089</pub-id>
<pub-id pub-id-type="pmid">28205674</pub-id>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Xie</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bai</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2018</year>
).
<article-title>Mining statistically-solid k-mers for accurate NGS error correction</article-title>
.
<source>BMC Genomics</source>
<volume>19</volume>
,
<fpage>912</fpage>
.
<pub-id pub-id-type="doi">10.1186/s12864-018-5272-y</pub-id>
<pub-id pub-id-type="pmid">30598110</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhao</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kang</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Hao</surname>
<given-names>Y.</given-names>
</name>
<etal></etal>
</person-group>
(
<year>2016</year>
).
<article-title>NONCODE 2016: an informative and valuable data source of long non-coding RNAs</article-title>
.
<source>Nucleic Acids Res.</source>
<volume>44</volume>
,
<fpage>D203</fpage>
<lpage>D208</lpage>
.
<pub-id pub-id-type="doi">10.1093/nar/gkv1252</pub-id>
<pub-id pub-id-type="pmid">26586799</pub-id>
</mixed-citation>
</ref>
</ref-list>
<app-group>
<app id="app1">
<title>Appendix A</title>
<p>The procedure of
<italic>Encoding</italic>
,
<italic>Decodin</italic>
<italic>g</italic>
and error
<italic>Correction</italic>
are shown below.</p>
<table-wrap id="d35e5361" position="anchor">
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td valign="top" rowspan="1" colspan="1">
<bold>Function</bold>
<italic>Encoding</italic>
(
<italic>K</italic>
,
<italic>H</italic>
):
<break></break>
 B = Ø,
<italic>f
<sub>max</sub>
</italic>
← max (
<italic>F</italic>
(
<italic>K</italic>
)),
<italic>h</italic>
← |Binary (
<italic>f
<sub>max</sub>
</italic>
)|
<break></break>
<bold>while</bold>
<italic>K ≠ </italic>
Ø
<bold>do</bold>
<break></break>
  initialize new
<italic>B
<sup>+</sup>
</italic>
,
<italic>B
<sup></sup>
</italic>
and
<italic>K</italic>
ʹ
<break></break>
  
<bold>for</bold>
κ in
<italic>K</italic>
<bold>do</bold>
<break></break>
   flag ← False, freq ← Binary(
<italic>f</italic>
<sub>κ</sub>
)
<break></break>
<bold>*</bold>
  Roll-back point
<break></break>
   
<bold>for</bold>
<italic>i</italic>
← 0 to
<italic>h</italic>
– 1
<bold>do</bold>
<break></break>
    j ← H
<sub>i</sub>
(
<italic>s</italic>
<sub>κ</sub>
)
<break></break>
    
<bold>if</bold>
<italic>B
<sup>+</sup>
</italic>
[
<italic>j</italic>
] = = 1
<bold>and</bold>
<italic>B</italic>
<sup></sup>
[
<italic>j</italic>
] ≠ freq[i]
<bold>then</bold>
<break></break>
     flag ← True
<break></break>
     
<italic>K</italic>
ʹ ←
<italic>K</italic>
ʹ ∪{κ}
<break></break>
    
<bold>else</bold>
<break></break>
     
<italic>B</italic>
<sup>+</sup>
[
<italic>j</italic>
] ← 1,
<italic>B</italic>
<sup></sup>
[
<italic>j</italic>
] ← freq[i]
<break></break>
    
<bold>If</bold>
<italic>flag</italic>
= =
<italic>True</italic>
<bold>then</bold>
<break></break>
     roll back
<italic>B</italic>
to the point *
<break></break>
     break;
<break></break>
  
<italic>B</italic>
<italic>B</italic>
∪ {(
<italic>B</italic>
<sup>+</sup>
,
<italic>B</italic>
<sup></sup>
)},
<italic>K</italic>
<italic>K</italic>
ʹ
<break></break>
<bold>return</bold>
<italic>B</italic>
<break></break>
<bold>Function</bold>
<italic>Decoding</italic>
<italic>(B, H, s
<sub>κ</sub>
</italic>
<italic>)</italic>
<bold>:</bold>
<break></break>
<italic>h</italic>
← |
<italic>H</italic>
|
<break></break>
<bold>for</bold>
<italic>i</italic>
← 0 to
<italic>h</italic>
– 1
<bold>do</bold>
<break></break>
  
<bold>if</bold>
<italic>B</italic>
<sup>+</sup>
[
<italic>H
<sub>i</sub>
</italic>
(
<italic>s
<sub>κ</sub>
</italic>
)] = =
<italic>0</italic>
<bold>then</bold>
<break></break>
   
<bold>return</bold>
False
<break></break>
<bold>for</bold>
<italic>i</italic>
← 0 to
<italic>h</italic>
– 1
<bold>do</bold>
<break></break>
  
<italic>b
<sub>i</sub>
</italic>
<italic>B</italic>
<sup></sup>
[
<italic>H
<sub>i</sub>
</italic>
(
<italic>s
<sub>κ</sub>
</italic>
)]
<break></break>
  val ← Denary (
<italic>b</italic>
<sub>0</sub>
<italic>b</italic>
<sub>1</sub>
<italic>b</italic>
<sub>(</sub>
<italic>
<sub>h</sub>
</italic>
<sub>–1)</sub>
)
<break></break>
<bold>return</bold>
val
<break></break>
<bold>Function</bold>
Correction (B, H, Kʹ)
<bold>
<italic>:</italic>
</bold>
<break></break>
<bold>For</bold>
κ in
<italic>K</italic>
ʹ
<bold>do</bold>
<break></break>
  
<italic>u</italic>
<sub>κ</sub>
← Decoding (
<italic>B</italic>
,
<italic>H,</italic>
κ)
<break></break>
  
<italic>N</italic>
κ ← {
<italic>ν</italic>
<sub>κ</sub>
}
<break></break>
  
<bold>for</bold>
<italic>i</italic>
← 1 to
<italic>k</italic>
<bold>do</bold>
<break></break>
   
<bold>for</bold>
<italic>x</italic>
in {
<italic>A, C, G, T</italic>
}
<bold>do</bold>
<break></break>
    
<bold>if</bold>
<italic>s</italic>
<sub>κ</sub>
[
<italic>i</italic>
] ≠
<italic>x</italic>
then
<break></break>
     
<italic>s</italic>
<sub>κʹ</sub>
<italic>s</italic>
<sub>κ [1:(</sub>
<italic>
<sub>i</sub>
</italic>
<sub> – 1)]</sub>
·
<italic>x</italic>
·
<italic>s</italic>
<sub>κ[(</sub>
<italic>
<sub>i</sub>
</italic>
<sub> + 1): </sub>
<italic>
<sub>k</sub>
</italic>
<sub>]</sub>
<break></break>
     
<italic>ν</italic>
<sub>κʹ</sub>
← Decoding (
<italic>B</italic>
,
<italic>H</italic>
,
<italic>s</italic>
<sub>κʹ</sub>
)
<break></break>
  
<italic>z</italic>
<sub>κ</sub>
← (
<italic>ν</italic>
<sub>κ</sub>
– mean (
<italic>N</italic>
<sub>κ</sub>
))/std(
<italic>N</italic>
<sub>κ</sub>
)
<break></break>
  
<bold>if</bold>
not
<italic>z</italic>
<sub>κ</sub>
>
<italic>z</italic>
<sub>0</sub>
and ν
<sub>κ</sub>
<
<italic>f</italic>
<sub>0</sub>
<bold>then</bold>
<break></break>
  
<inline-formula>
<mml:math id="M20">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>κ</mml:mi>
</mml:mrow>
</mml:math>
</inline-formula>
<break></break>
<bold>return</bold>
<inline-formula>
<mml:math id="M21">
<mml:mrow>
<mml:msub>
<mml:msup>
<mml:mi>K</mml:mi>
<mml:mo></mml:mo>
</mml:msup>
<mml:mi>c</mml:mi>
</mml:msub>
</mml:mrow>
</mml:math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</app>
</app-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000C71  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000C71  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021