Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding

Identifieur interne : 000910 ( Pmc/Checkpoint ); précédent : 000909; suivant : 000911

Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding

Auteurs : Xu Min [République populaire de Chine] ; Wanwen Zeng [République populaire de Chine] ; Ning Chen [République populaire de Chine] ; Ting Chen [République populaire de Chine, États-Unis] ; Rui Jiang [République populaire de Chine]

Source :

RBID : PMC:5870572

Abstract

AbstractMotivation

Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning.

Results

We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility.

Availability and implementation

The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm.

Supplementary information

Supplementary materials are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/btx234
PubMed: 28881969
PubMed Central: 5870572


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:5870572

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Chromatin accessibility prediction via convolutional long short-term memory networks with
<italic>k</italic>
-mer embedding</title>
<author>
<name sortKey="Min, Xu" sort="Min, Xu" uniqKey="Min X" first="Xu" last="Min">Xu Min</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zeng, Wanwen" sort="Zeng, Wanwen" uniqKey="Zeng W" first="Wanwen" last="Zeng">Wanwen Zeng</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff3">Department of Automation, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Automation, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chen, Ning" sort="Chen, Ning" uniqKey="Chen N" first="Ning" last="Chen">Ning Chen</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chen, Ting" sort="Chen, Ting" uniqKey="Chen T" first="Ting" last="Chen">Ting Chen</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<nlm:aff id="btx234-aff4">Program in Computational Biology and Bioinformatics, University of Southern California, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Program in Computational Biology and Bioinformatics, University of Southern California, CA</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
<settlement type="city">Los Angeles</settlement>
</placeName>
<orgName type="university">Université de Californie du Sud</orgName>
</affiliation>
</author>
<author>
<name sortKey="Jiang, Rui" sort="Jiang, Rui" uniqKey="Jiang R" first="Rui" last="Jiang">Rui Jiang</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff3">Department of Automation, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Automation, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28881969</idno>
<idno type="pmc">5870572</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870572</idno>
<idno type="RBID">PMC:5870572</idno>
<idno type="doi">10.1093/bioinformatics/btx234</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000B18</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B18</idno>
<idno type="wicri:Area/Pmc/Curation">000B18</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B18</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000910</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000910</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Chromatin accessibility prediction via convolutional long short-term memory networks with
<italic>k</italic>
-mer embedding</title>
<author>
<name sortKey="Min, Xu" sort="Min, Xu" uniqKey="Min X" first="Xu" last="Min">Xu Min</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zeng, Wanwen" sort="Zeng, Wanwen" uniqKey="Zeng W" first="Wanwen" last="Zeng">Wanwen Zeng</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff3">Department of Automation, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Automation, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chen, Ning" sort="Chen, Ning" uniqKey="Chen N" first="Ning" last="Chen">Ning Chen</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chen, Ting" sort="Chen, Ting" uniqKey="Chen T" first="Ting" last="Chen">Ting Chen</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff2">Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="4">
<nlm:aff id="btx234-aff4">Program in Computational Biology and Bioinformatics, University of Southern California, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Program in Computational Biology and Bioinformatics, University of Southern California, CA</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
<settlement type="city">Los Angeles</settlement>
</placeName>
<orgName type="university">Université de Californie du Sud</orgName>
</affiliation>
</author>
<author>
<name sortKey="Jiang, Rui" sort="Jiang, Rui" uniqKey="Jiang R" first="Rui" last="Jiang">Rui Jiang</name>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff1">MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<nlm:aff id="btx234-aff3">Department of Automation, Tsinghua University, Beijing, China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Automation, Tsinghua University, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<sec id="SA1">
<title>Motivation</title>
<p>Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted
<italic>k</italic>
-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful
<italic>k</italic>
-mer co-occurrence information with recent advances in deep learning.</p>
</sec>
<sec id="SA2">
<title>Results</title>
<p>We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with
<italic>k</italic>
-mer embedding. We first split DNA sequences into
<italic>k</italic>
-mers and pre-train
<italic>k</italic>
-mer embedding vectors based on the co-occurrence matrix of
<italic>k</italic>
-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that
<italic>k</italic>
-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility.</p>
</sec>
<sec id="SA3">
<title>Availability and implementation</title>
<p>The source code can be downloaded from
<ext-link ext-link-type="uri" xlink:href="https://github.com/minxueric/ismb2017_lstm">https://github.com/minxueric/ismb2017_lstm</ext-link>
.</p>
</sec>
<sec id="SA4">
<title>Supplementary information</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary materials</xref>
are available at
<italic>Bioinformatics</italic>
online.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Alipanahi, B" uniqKey="Alipanahi B">B. Alipanahi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bengio, Y" uniqKey="Bengio Y">Y. Bengio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bengio, Y" uniqKey="Bengio Y">Y. Bengio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chollet, F" uniqKey="Chollet F">F. Chollet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Consortium, E P" uniqKey="Consortium E">E.P. Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crawford, G E" uniqKey="Crawford G">G.E. Crawford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duchi, J" uniqKey="Duchi J">J. Duchi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghandi, M" uniqKey="Ghandi M">M. Ghandi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harris, Z S" uniqKey="Harris Z">Z.S. Harris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="He, K" uniqKey="He K">K. He</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hinton, G E" uniqKey="Hinton G">G.E. Hinton</name>
</author>
<author>
<name sortKey="Salakhutdinov, R R" uniqKey="Salakhutdinov R">R.R. Salakhutdinov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hochreiter, S" uniqKey="Hochreiter S">S. Hochreiter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hochreiter, S" uniqKey="Hochreiter S">S. Hochreiter</name>
</author>
<author>
<name sortKey="Schmidhuber, J" uniqKey="Schmidhuber J">J. Schmidhuber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="John, S" uniqKey="John S">S. John</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kelley, D R" uniqKey="Kelley D">D.R. Kelley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Y" uniqKey="Kim Y">Y. Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krizhevsky, A" uniqKey="Krizhevsky A">A. Krizhevsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le, Q V" uniqKey="Le Q">Q.V. Le</name>
</author>
<author>
<name sortKey="Mikolov, T" uniqKey="Mikolov T">T. Mikolov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, D" uniqKey="Lee D">D. Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luong, M T" uniqKey="Luong M">M.-T. Luong</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maaten, L V D" uniqKey="Maaten L">L. v d. Maaten</name>
</author>
<author>
<name sortKey="Hinton, G" uniqKey="Hinton G">G. Hinton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mikolov, T" uniqKey="Mikolov T">T. Mikolov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Min, X" uniqKey="Min X">X. Min</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niwa, H" uniqKey="Niwa H">H. Niwa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pennington, J" uniqKey="Pennington J">J. Pennington</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="S Nderby, S K" uniqKey="S Nderby S">S.K. Sønderby</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tai, K S" uniqKey="Tai K">K.S. Tai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tieleman, T" uniqKey="Tieleman T">T. Tieleman</name>
</author>
<author>
<name sortKey="Hinton, G" uniqKey="Hinton G">G. Hinton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vierstra, J" uniqKey="Vierstra J">J. Vierstra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, D R" uniqKey="Wilson D">D.R. Wilson</name>
</author>
<author>
<name sortKey="Martinez, T R" uniqKey="Martinez T">T.R. Martinez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zeng, H" uniqKey="Zeng H">H. Zeng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, J" uniqKey="Zhou J">J. Zhou</name>
</author>
<author>
<name sortKey="Troyanskaya, O G" uniqKey="Troyanskaya O">O.G. Troyanskaya</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">Bioinformatics</journal-id>
<journal-id journal-id-type="publisher-id">bioinformatics</journal-id>
<journal-title-group>
<journal-title>Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="ppub">1367-4803</issn>
<issn pub-type="epub">1367-4811</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28881969</article-id>
<article-id pub-id-type="pmc">5870572</article-id>
<article-id pub-id-type="doi">10.1093/bioinformatics/btx234</article-id>
<article-id pub-id-type="publisher-id">btx234</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017</subject>
<subj-group subj-group-type="category-toc-heading">
<subject>Hitseq</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Chromatin accessibility prediction via convolutional long short-term memory networks with
<italic>k</italic>
-mer embedding</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Min</surname>
<given-names>Xu</given-names>
</name>
<xref ref-type="aff" rid="btx234-aff1">1</xref>
<xref ref-type="aff" rid="btx234-aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Zeng</surname>
<given-names>Wanwen</given-names>
</name>
<xref ref-type="aff" rid="btx234-aff1">1</xref>
<xref ref-type="aff" rid="btx234-aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Ning</given-names>
</name>
<xref ref-type="aff" rid="btx234-aff1">1</xref>
<xref ref-type="aff" rid="btx234-aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chen</surname>
<given-names>Ting</given-names>
</name>
<xref ref-type="aff" rid="btx234-aff1">1</xref>
<xref ref-type="aff" rid="btx234-aff2">2</xref>
<xref ref-type="aff" rid="btx234-aff4">4</xref>
<xref ref-type="corresp" rid="btx234-cor1"></xref>
<pmc-comment>tingchen@tsinghua.edu.cn</pmc-comment>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Jiang</surname>
<given-names>Rui</given-names>
</name>
<xref ref-type="aff" rid="btx234-aff1">1</xref>
<xref ref-type="aff" rid="btx234-aff3">3</xref>
<xref ref-type="corresp" rid="btx234-cor1"></xref>
<pmc-comment>ruijiang@tsinghua.edu.cn</pmc-comment>
</contrib>
</contrib-group>
<aff id="btx234-aff1">
<label>1</label>
MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing, China</aff>
<aff id="btx234-aff2">
<label>2</label>
Department of Computer Science and Technology, State Key Lab of Intelligent Technology and Systems, Tsinghua University, Beijing, China</aff>
<aff id="btx234-aff3">
<label>3</label>
Department of Automation, Tsinghua University, Beijing, China</aff>
<aff id="btx234-aff4">
<label>4</label>
Program in Computational Biology and Bioinformatics, University of Southern California, CA, USA</aff>
<author-notes>
<corresp id="btx234-cor1">To whom correspondence should be addressed. Email:
<email>tingchen@tsinghua.edu.cn</email>
or
<email>ruijiang@tsinghua.edu.cn</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>15</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2017-07-12">
<day>12</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>12</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>33</volume>
<issue>14</issue>
<fpage>i92</fpage>
<lpage>i101</lpage>
<permissions>
<copyright-statement>© The Author 2017. Published by Oxford University Press.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="cc-by-nc" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>
), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com</license-p>
</license>
</permissions>
<self-uri xlink:href="btx234.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<sec id="SA1">
<title>Motivation</title>
<p>Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted
<italic>k</italic>
-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful
<italic>k</italic>
-mer co-occurrence information with recent advances in deep learning.</p>
</sec>
<sec id="SA2">
<title>Results</title>
<p>We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with
<italic>k</italic>
-mer embedding. We first split DNA sequences into
<italic>k</italic>
-mers and pre-train
<italic>k</italic>
-mer embedding vectors based on the co-occurrence matrix of
<italic>k</italic>
-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that
<italic>k</italic>
-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility.</p>
</sec>
<sec id="SA3">
<title>Availability and implementation</title>
<p>The source code can be downloaded from
<ext-link ext-link-type="uri" xlink:href="https://github.com/minxueric/ismb2017_lstm">https://github.com/minxueric/ismb2017_lstm</ext-link>
.</p>
</sec>
<sec id="SA4">
<title>Supplementary information</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary materials</xref>
are available at
<italic>Bioinformatics</italic>
online.</p>
</sec>
</abstract>
<funding-group>
<award-group award-type="grant">
<funding-source>
<named-content content-type="funder-name">National Natural Science Foundation of China</named-content>
<named-content content-type="funder-identifier">10.13039/501100001809</named-content>
</funding-source>
<award-id>61573207</award-id>
<award-id>61175002</award-id>
<award-id>71101010</award-id>
<award-id>61673241</award-id>
<award-id>61561146396</award-id>
</award-group>
</funding-group>
<counts>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
</pmc>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
</region>
<settlement>
<li>Los Angeles</li>
<li>Pékin</li>
</settlement>
<orgName>
<li>Université de Californie du Sud</li>
</orgName>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Min, Xu" sort="Min, Xu" uniqKey="Min X" first="Xu" last="Min">Xu Min</name>
</noRegion>
<name sortKey="Chen, Ning" sort="Chen, Ning" uniqKey="Chen N" first="Ning" last="Chen">Ning Chen</name>
<name sortKey="Chen, Ning" sort="Chen, Ning" uniqKey="Chen N" first="Ning" last="Chen">Ning Chen</name>
<name sortKey="Chen, Ting" sort="Chen, Ting" uniqKey="Chen T" first="Ting" last="Chen">Ting Chen</name>
<name sortKey="Chen, Ting" sort="Chen, Ting" uniqKey="Chen T" first="Ting" last="Chen">Ting Chen</name>
<name sortKey="Jiang, Rui" sort="Jiang, Rui" uniqKey="Jiang R" first="Rui" last="Jiang">Rui Jiang</name>
<name sortKey="Jiang, Rui" sort="Jiang, Rui" uniqKey="Jiang R" first="Rui" last="Jiang">Rui Jiang</name>
<name sortKey="Min, Xu" sort="Min, Xu" uniqKey="Min X" first="Xu" last="Min">Xu Min</name>
<name sortKey="Zeng, Wanwen" sort="Zeng, Wanwen" uniqKey="Zeng W" first="Wanwen" last="Zeng">Wanwen Zeng</name>
<name sortKey="Zeng, Wanwen" sort="Zeng, Wanwen" uniqKey="Zeng W" first="Wanwen" last="Zeng">Wanwen Zeng</name>
</country>
<country name="États-Unis">
<region name="Californie">
<name sortKey="Chen, Ting" sort="Chen, Ting" uniqKey="Chen T" first="Ting" last="Chen">Ting Chen</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000910 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd -nk 000910 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Checkpoint
   |type=    RBID
   |clé=     PMC:5870572
   |texte=   Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Checkpoint/RBID.i   -Sk "pubmed:28881969" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021