Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

Identifieur interne : 000073 ( Pmc/Corpus ); précédent : 000072; suivant : 000074

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

Auteurs : Jilong Li ; Jie Hou ; Lin Sun ; Jordan Maximillian Wilkins ; Yuan Lu ; Chad E. Niederhuth ; Benjamin Ryan Merideth ; Thomas P. Mawhinney ; Valeri V. Mossine ; C. Michael Greenlief ; John C. Walker ; William R. Folk ; Mark Hannink ; Dennis B. Lubahn ; James A. Birchler ; Jianlin Cheng

Source :

RBID : PMC:4406561

Abstract

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html.


Url:
DOI: 10.1371/journal.pone.0125000
PubMed: 25902288
PubMed Central: 4406561

Links to Exploration step

PMC:4406561

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data</title>
<author>
<name sortKey="Li, Jilong" sort="Li, Jilong" uniqKey="Li J" first="Jilong" last="Li">Jilong Li</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hou, Jie" sort="Hou, Jie" uniqKey="Hou J" first="Jie" last="Hou">Jie Hou</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sun, Lin" sort="Sun, Lin" uniqKey="Sun L" first="Lin" last="Sun">Lin Sun</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wilkins, Jordan Maximillian" sort="Wilkins, Jordan Maximillian" uniqKey="Wilkins J" first="Jordan Maximillian" last="Wilkins">Jordan Maximillian Wilkins</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lu, Yuan" sort="Lu, Yuan" uniqKey="Lu Y" first="Yuan" last="Lu">Yuan Lu</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Niederhuth, Chad E" sort="Niederhuth, Chad E" uniqKey="Niederhuth C" first="Chad E." last="Niederhuth">Chad E. Niederhuth</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Merideth, Benjamin Ryan" sort="Merideth, Benjamin Ryan" uniqKey="Merideth B" first="Benjamin Ryan" last="Merideth">Benjamin Ryan Merideth</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mawhinney, Thomas P" sort="Mawhinney, Thomas P" uniqKey="Mawhinney T" first="Thomas P." last="Mawhinney">Thomas P. Mawhinney</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mossine, Valeri V" sort="Mossine, Valeri V" uniqKey="Mossine V" first="Valeri V." last="Mossine">Valeri V. Mossine</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Greenlief, C Michael" sort="Greenlief, C Michael" uniqKey="Greenlief C" first="C. Michael" last="Greenlief">C. Michael Greenlief</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Walker, John C" sort="Walker, John C" uniqKey="Walker J" first="John C." last="Walker">John C. Walker</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Folk, William R" sort="Folk, William R" uniqKey="Folk W" first="William R." last="Folk">William R. Folk</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hannink, Mark" sort="Hannink, Mark" uniqKey="Hannink M" first="Mark" last="Hannink">Mark Hannink</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lubahn, Dennis B" sort="Lubahn, Dennis B" uniqKey="Lubahn D" first="Dennis B." last="Lubahn">Dennis B. Lubahn</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Birchler, James A" sort="Birchler, James A" uniqKey="Birchler J" first="James A." last="Birchler">James A. Birchler</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheng, Jianlin" sort="Cheng, Jianlin" uniqKey="Cheng J" first="Jianlin" last="Cheng">Jianlin Cheng</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff006">
<addr-line>Informatics Institute, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff007">
<addr-line>C. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">25902288</idno>
<idno type="pmc">4406561</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406561</idno>
<idno type="RBID">PMC:4406561</idno>
<idno type="doi">10.1371/journal.pone.0125000</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Pmc/Corpus">000073</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data</title>
<author>
<name sortKey="Li, Jilong" sort="Li, Jilong" uniqKey="Li J" first="Jilong" last="Li">Jilong Li</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hou, Jie" sort="Hou, Jie" uniqKey="Hou J" first="Jie" last="Hou">Jie Hou</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Sun, Lin" sort="Sun, Lin" uniqKey="Sun L" first="Lin" last="Sun">Lin Sun</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Wilkins, Jordan Maximillian" sort="Wilkins, Jordan Maximillian" uniqKey="Wilkins J" first="Jordan Maximillian" last="Wilkins">Jordan Maximillian Wilkins</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lu, Yuan" sort="Lu, Yuan" uniqKey="Lu Y" first="Yuan" last="Lu">Yuan Lu</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Niederhuth, Chad E" sort="Niederhuth, Chad E" uniqKey="Niederhuth C" first="Chad E." last="Niederhuth">Chad E. Niederhuth</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Merideth, Benjamin Ryan" sort="Merideth, Benjamin Ryan" uniqKey="Merideth B" first="Benjamin Ryan" last="Merideth">Benjamin Ryan Merideth</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mawhinney, Thomas P" sort="Mawhinney, Thomas P" uniqKey="Mawhinney T" first="Thomas P." last="Mawhinney">Thomas P. Mawhinney</name>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mossine, Valeri V" sort="Mossine, Valeri V" uniqKey="Mossine V" first="Valeri V." last="Mossine">Valeri V. Mossine</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Greenlief, C Michael" sort="Greenlief, C Michael" uniqKey="Greenlief C" first="C. Michael" last="Greenlief">C. Michael Greenlief</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff005">
<addr-line>Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Walker, John C" sort="Walker, John C" uniqKey="Walker J" first="John C." last="Walker">John C. Walker</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Folk, William R" sort="Folk, William R" uniqKey="Folk W" first="William R." last="Folk">William R. Folk</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Hannink, Mark" sort="Hannink, Mark" uniqKey="Hannink M" first="Mark" last="Hannink">Mark Hannink</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lubahn, Dennis B" sort="Lubahn, Dennis B" uniqKey="Lubahn D" first="Dennis B." last="Lubahn">Dennis B. Lubahn</name>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff004">
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Birchler, James A" sort="Birchler, James A" uniqKey="Birchler J" first="James A." last="Birchler">James A. Birchler</name>
<affiliation>
<nlm:aff id="aff003">
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Cheng, Jianlin" sort="Cheng, Jianlin" uniqKey="Cheng J" first="Jianlin" last="Cheng">Jianlin Cheng</name>
<affiliation>
<nlm:aff id="aff001">
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff002">
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff006">
<addr-line>Informatics Institute, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff007">
<addr-line>C. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint>
<date when="2015">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from
<italic>Human</italic>
,
<italic>Mouse</italic>
,
<italic>Arabidopsis thaliana</italic>
, and
<italic>Drosophila melanogaster</italic>
cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at
<ext-link ext-link-type="uri" xlink:href="http://calla.rnet.missouri.edu/rnaminer/index.html">http://calla.rnet.missouri.edu/rnaminer/index.html</ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Fang, Z" uniqKey="Fang Z">Z Fang</name>
</author>
<author>
<name sortKey="Martin, J" uniqKey="Martin J">J Martin</name>
</author>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soneson, C" uniqKey="Soneson C">C Soneson</name>
</author>
<author>
<name sortKey="Delorenzi, M" uniqKey="Delorenzi M">M Delorenzi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Williams, Ba" uniqKey="Williams B">BA Williams</name>
</author>
<author>
<name sortKey="Mccue, K" uniqKey="Mccue K">K McCue</name>
</author>
<author>
<name sortKey="Schaeffer, L" uniqKey="Schaeffer L">L Schaeffer</name>
</author>
<author>
<name sortKey="Wold, B" uniqKey="Wold B">B Wold</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, G" uniqKey="Chen G">G Chen</name>
</author>
<author>
<name sortKey="Wang, C" uniqKey="Wang C">C Wang</name>
</author>
<author>
<name sortKey="Shi, T" uniqKey="Shi T">T Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oshlack, A" uniqKey="Oshlack A">A Oshlack</name>
</author>
<author>
<name sortKey="Robinson, Md" uniqKey="Robinson M">MD Robinson</name>
</author>
<author>
<name sortKey="Young, Md" uniqKey="Young M">MD Young</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L Wang</name>
</author>
<author>
<name sortKey="Li, P" uniqKey="Li P">P Li</name>
</author>
<author>
<name sortKey="Brutnell, Tp" uniqKey="Brutnell T">TP Brutnell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kvam, Vm" uniqKey="Kvam V">VM Kvam</name>
</author>
<author>
<name sortKey="Liu, P" uniqKey="Liu P">P Liu</name>
</author>
<author>
<name sortKey="Si, Y" uniqKey="Si Y">Y Si</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, D" uniqKey="Kim D">D Kim</name>
</author>
<author>
<name sortKey="Pertea, G" uniqKey="Pertea G">G Pertea</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pimentel, H" uniqKey="Pimentel H">H Pimentel</name>
</author>
<author>
<name sortKey="Kelley, R" uniqKey="Kelley R">R Kelley</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Williams, Ba" uniqKey="Williams B">BA Williams</name>
</author>
<author>
<name sortKey="Pertea, G" uniqKey="Pertea G">G Pertea</name>
</author>
<author>
<name sortKey="Mortazavi, A" uniqKey="Mortazavi A">A Mortazavi</name>
</author>
<author>
<name sortKey="Kwan, G" uniqKey="Kwan G">G Kwan</name>
</author>
<author>
<name sortKey="Van Baren, Mj" uniqKey="Van Baren M">MJ van Baren</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Robertson, G" uniqKey="Robertson G">G Robertson</name>
</author>
<author>
<name sortKey="Schein, J" uniqKey="Schein J">J Schein</name>
</author>
<author>
<name sortKey="Chiu, R" uniqKey="Chiu R">R Chiu</name>
</author>
<author>
<name sortKey="Corbett, R" uniqKey="Corbett R">R Corbett</name>
</author>
<author>
<name sortKey="Field, M" uniqKey="Field M">M Field</name>
</author>
<author>
<name sortKey="Jackman, Sd" uniqKey="Jackman S">SD Jackman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Anders, S" uniqKey="Anders S">S Anders</name>
</author>
<author>
<name sortKey="Huber, W" uniqKey="Huber W">W Huber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, L" uniqKey="Sun L">L Sun</name>
</author>
<author>
<name sortKey="Fernandez, Hr" uniqKey="Fernandez H">HR Fernandez</name>
</author>
<author>
<name sortKey="Donohue, Rc" uniqKey="Donohue R">RC Donohue</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
<author>
<name sortKey="Birchler, Ja" uniqKey="Birchler J">JA Birchler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, L" uniqKey="Sun L">L Sun</name>
</author>
<author>
<name sortKey="Johnson, Af" uniqKey="Johnson A">AF Johnson</name>
</author>
<author>
<name sortKey="Donohue, Rc" uniqKey="Donohue R">RC Donohue</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
<author>
<name sortKey="Birchler, Ja" uniqKey="Birchler J">JA Birchler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sun, L" uniqKey="Sun L">L Sun</name>
</author>
<author>
<name sortKey="Johnson, Af" uniqKey="Johnson A">AF Johnson</name>
</author>
<author>
<name sortKey="Li, J" uniqKey="Li J">J Li</name>
</author>
<author>
<name sortKey="Lambdin, As" uniqKey="Lambdin A">AS Lambdin</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
<author>
<name sortKey="Birchler, Ja" uniqKey="Birchler J">JA Birchler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Cao, R" uniqKey="Cao R">R Cao</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Zhang, Xc" uniqKey="Zhang X">XC Zhang</name>
</author>
<author>
<name sortKey="Le, Mh" uniqKey="Le M">MH Le</name>
</author>
<author>
<name sortKey="Xu, D" uniqKey="Xu D">D Xu</name>
</author>
<author>
<name sortKey="Stacey, G" uniqKey="Stacey G">G Stacey</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, M" uniqKey="Zhu M">M Zhu</name>
</author>
<author>
<name sortKey="Deng, X" uniqKey="Deng X">X Deng</name>
</author>
<author>
<name sortKey="Joshi, T" uniqKey="Joshi T">T Joshi</name>
</author>
<author>
<name sortKey="Xu, D" uniqKey="Xu D">D Xu</name>
</author>
<author>
<name sortKey="Stacey, G" uniqKey="Stacey G">G Stacey</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhu, M" uniqKey="Zhu M">M Zhu</name>
</author>
<author>
<name sortKey="Dahmen, Jl" uniqKey="Dahmen J">JL Dahmen</name>
</author>
<author>
<name sortKey="Stacey, G" uniqKey="Stacey G">G Stacey</name>
</author>
<author>
<name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Giardine, B" uniqKey="Giardine B">B Giardine</name>
</author>
<author>
<name sortKey="Riemer, C" uniqKey="Riemer C">C Riemer</name>
</author>
<author>
<name sortKey="Hardison, Rc" uniqKey="Hardison R">RC Hardison</name>
</author>
<author>
<name sortKey="Burhans, R" uniqKey="Burhans R">R Burhans</name>
</author>
<author>
<name sortKey="Elnitski, L" uniqKey="Elnitski L">L Elnitski</name>
</author>
<author>
<name sortKey="Shah, P" uniqKey="Shah P">P Shah</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goff, Sa" uniqKey="Goff S">SA Goff</name>
</author>
<author>
<name sortKey="Vaughn, M" uniqKey="Vaughn M">M Vaughn</name>
</author>
<author>
<name sortKey="Mckay, S" uniqKey="Mckay S">S McKay</name>
</author>
<author>
<name sortKey="Lyons, E" uniqKey="Lyons E">E Lyons</name>
</author>
<author>
<name sortKey="Stapleton, Ae" uniqKey="Stapleton A">AE Stapleton</name>
</author>
<author>
<name sortKey="Gessler, D" uniqKey="Gessler D">D Gessler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L Pachter</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Td" uniqKey="Wu T">TD Wu</name>
</author>
<author>
<name sortKey="Nacu, S" uniqKey="Nacu S">S Nacu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karolchik, D" uniqKey="Karolchik D">D Karolchik</name>
</author>
<author>
<name sortKey="Baertsch, R" uniqKey="Baertsch R">R Baertsch</name>
</author>
<author>
<name sortKey="Diekhans, M" uniqKey="Diekhans M">M Diekhans</name>
</author>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
<author>
<name sortKey="Hinrichs, A" uniqKey="Hinrichs A">A Hinrichs</name>
</author>
<author>
<name sortKey="Lu, Yt" uniqKey="Lu Y">YT Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pruitt, Kd" uniqKey="Pruitt K">KD Pruitt</name>
</author>
<author>
<name sortKey="Tatusova, T" uniqKey="Tatusova T">T Tatusova</name>
</author>
<author>
<name sortKey="Maglott, Dr" uniqKey="Maglott D">DR Maglott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Radivojac, P" uniqKey="Radivojac P">P Radivojac</name>
</author>
<author>
<name sortKey="Clark, Wt" uniqKey="Clark W">WT Clark</name>
</author>
<author>
<name sortKey="Oron, Tr" uniqKey="Oron T">TR Oron</name>
</author>
<author>
<name sortKey="Schnoes, Am" uniqKey="Schnoes A">AM Schnoes</name>
</author>
<author>
<name sortKey="Wittkop, T" uniqKey="Wittkop T">T Wittkop</name>
</author>
<author>
<name sortKey="Sokolov, A" uniqKey="Sokolov A">A Sokolov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author>
<name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author>
<name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author>
<name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soding, J" uniqKey="Soding J">J Soding</name>
</author>
<author>
<name sortKey="Biegert, A" uniqKey="Biegert A">A Biegert</name>
</author>
<author>
<name sortKey="Lupas, A" uniqKey="Lupas A">A Lupas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
<author>
<name sortKey="Ball, Ca" uniqKey="Ball C">CA Ball</name>
</author>
<author>
<name sortKey="Blake, Ja" uniqKey="Blake J">JA Blake</name>
</author>
<author>
<name sortKey="Botstein, D" uniqKey="Botstein D">D Botstein</name>
</author>
<author>
<name sortKey="Butler, H" uniqKey="Butler H">H Butler</name>
</author>
<author>
<name sortKey="Cherry, Jm" uniqKey="Cherry J">JM Cherry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boeckmann, B" uniqKey="Boeckmann B">B Boeckmann</name>
</author>
<author>
<name sortKey="Bairoch, A" uniqKey="Bairoch A">A Bairoch</name>
</author>
<author>
<name sortKey="Apweiler, R" uniqKey="Apweiler R">R Apweiler</name>
</author>
<author>
<name sortKey="Blatter, Mc" uniqKey="Blatter M">MC Blatter</name>
</author>
<author>
<name sortKey="Estreicher, A" uniqKey="Estreicher A">A Estreicher</name>
</author>
<author>
<name sortKey="Gasteiger, E" uniqKey="Gasteiger E">E Gasteiger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bateman, A" uniqKey="Bateman A">A Bateman</name>
</author>
<author>
<name sortKey="Coin, L" uniqKey="Coin L">L Coin</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
<author>
<name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author>
<name sortKey="Hollich, V" uniqKey="Hollich V">V Hollich</name>
</author>
<author>
<name sortKey="Griffiths Jones, S" uniqKey="Griffiths Jones S">S Griffiths-Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agresti, A" uniqKey="Agresti A">A Agresti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Agresti, A" uniqKey="Agresti A">A Agresti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fisher, Ra" uniqKey="Fisher R">RA Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fisher, Ra" uniqKey="Fisher R">RA Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fisher, Ra" uniqKey="Fisher R">RA Fisher</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mehta, Cr" uniqKey="Mehta C">CR Mehta</name>
</author>
<author>
<name sortKey="Patel, Nr" uniqKey="Patel N">NR Patel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Clarkson, Db" uniqKey="Clarkson D">DB Clarkson</name>
</author>
<author>
<name sortKey="Fan, Y" uniqKey="Fan Y">Y Fan</name>
</author>
<author>
<name sortKey="Joe, H" uniqKey="Joe H">H Joe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patefield, Wm" uniqKey="Patefield W">WM Patefield</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Langmead, B" uniqKey="Langmead B">B Langmead</name>
</author>
<author>
<name sortKey="Trapnell, C" uniqKey="Trapnell C">C Trapnell</name>
</author>
<author>
<name sortKey="Pop, M" uniqKey="Pop M">M Pop</name>
</author>
<author>
<name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lunter, G" uniqKey="Lunter G">G Lunter</name>
</author>
<author>
<name sortKey="Goodson, M" uniqKey="Goodson M">M Goodson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hardcastle, Tj" uniqKey="Hardcastle T">TJ Hardcastle</name>
</author>
<author>
<name sortKey="Kelly, Ka" uniqKey="Kelly K">KA Kelly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van De Wiel, Ma" uniqKey="Van De Wiel M">MA Van De Wiel</name>
</author>
<author>
<name sortKey="Leday, Gg" uniqKey="Leday G">GG Leday</name>
</author>
<author>
<name sortKey="Pardo, L" uniqKey="Pardo L">L Pardo</name>
</author>
<author>
<name sortKey="Rue, H" uniqKey="Rue H">H Rue</name>
</author>
<author>
<name sortKey="Van Der Vaart, Aw" uniqKey="Van Der Vaart A">AW Van Der Vaart</name>
</author>
<author>
<name sortKey="Van Wieringen, Wn" uniqKey="Van Wieringen W">WN Van Wieringen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tarazona, S" uniqKey="Tarazona S">S Tarazona</name>
</author>
<author>
<name sortKey="Garcia Alcalde, F" uniqKey="Garcia Alcalde F">F García-Alcalde</name>
</author>
<author>
<name sortKey="Dopazo, J" uniqKey="Dopazo J">J Dopazo</name>
</author>
<author>
<name sortKey="Ferrer, A" uniqKey="Ferrer A">A Ferrer</name>
</author>
<author>
<name sortKey="Conesa, A" uniqKey="Conesa A">A Conesa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Foster, Ba" uniqKey="Foster B">BA Foster</name>
</author>
<author>
<name sortKey="Gingrich, Jr" uniqKey="Gingrich J">JR Gingrich</name>
</author>
<author>
<name sortKey="Kwon, Ed" uniqKey="Kwon E">ED Kwon</name>
</author>
<author>
<name sortKey="Madias, C" uniqKey="Madias C">C Madias</name>
</author>
<author>
<name sortKey="Greenberg, Nm" uniqKey="Greenberg N">NM Greenberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mossine, Vv" uniqKey="Mossine V">VV Mossine</name>
</author>
<author>
<name sortKey="Mawhinney, Tp" uniqKey="Mawhinney T">TP Mawhinney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Niederhuth, Ce" uniqKey="Niederhuth C">CE Niederhuth</name>
</author>
<author>
<name sortKey="Patharkar, Or" uniqKey="Patharkar O">OR Patharkar</name>
</author>
<author>
<name sortKey="Walker, Jc" uniqKey="Walker J">JC Walker</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">PLoS One</journal-id>
<journal-id journal-id-type="iso-abbrev">PLoS ONE</journal-id>
<journal-id journal-id-type="publisher-id">plos</journal-id>
<journal-id journal-id-type="pmc">plosone</journal-id>
<journal-title-group>
<journal-title>PLoS ONE</journal-title>
</journal-title-group>
<issn pub-type="epub">1932-6203</issn>
<publisher>
<publisher-name>Public Library of Science</publisher-name>
<publisher-loc>San Francisco, CA USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">25902288</article-id>
<article-id pub-id-type="pmc">4406561</article-id>
<article-id pub-id-type="doi">10.1371/journal.pone.0125000</article-id>
<article-id pub-id-type="publisher-id">PONE-D-14-38827</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data</article-title>
<alt-title alt-title-type="running-head">Mining Large RNA-Seq Data</alt-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Jilong</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hou</surname>
<given-names>Jie</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Sun</surname>
<given-names>Lin</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wilkins</surname>
<given-names>Jordan Maximillian</given-names>
</name>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lu</surname>
<given-names>Yuan</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Niederhuth</surname>
<given-names>Chad E.</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Merideth</surname>
<given-names>Benjamin Ryan</given-names>
</name>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mawhinney</surname>
<given-names>Thomas P.</given-names>
</name>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Mossine</surname>
<given-names>Valeri V.</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Greenlief</surname>
<given-names>C. Michael</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff005">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Walker</surname>
<given-names>John C.</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Folk</surname>
<given-names>William R.</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Hannink</surname>
<given-names>Mark</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Lubahn</surname>
<given-names>Dennis B.</given-names>
</name>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff004">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Birchler</surname>
<given-names>James A.</given-names>
</name>
<xref ref-type="aff" rid="aff003">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Cheng</surname>
<given-names>Jianlin</given-names>
</name>
<xref ref-type="aff" rid="aff001">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff002">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff006">
<sup>6</sup>
</xref>
<xref ref-type="aff" rid="aff007">
<sup>7</sup>
</xref>
<xref rid="cor001" ref-type="corresp">*</xref>
</contrib>
</contrib-group>
<aff id="aff001">
<label>1</label>
<addr-line>Computer Science Department, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff002">
<label>2</label>
<addr-line>MU Botanical Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff003">
<label>3</label>
<addr-line>Division of Biological Sciences, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff004">
<label>4</label>
<addr-line>Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff005">
<label>5</label>
<addr-line>Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff006">
<label>6</label>
<addr-line>Informatics Institute, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<aff id="aff007">
<label>7</label>
<addr-line>C. Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America</addr-line>
</aff>
<contrib-group>
<contrib contrib-type="editor">
<name>
<surname>Provart</surname>
<given-names>Nicholas James</given-names>
</name>
<role>Academic Editor</role>
<xref ref-type="aff" rid="edit1"></xref>
</contrib>
</contrib-group>
<aff id="edit1">
<addr-line>University of Toronto, CANADA</addr-line>
</aff>
<author-notes>
<fn fn-type="conflict" id="coi001">
<p>
<bold>Competing Interests: </bold>
The authors have declared that no competing interests exist.</p>
</fn>
<fn fn-type="con" id="contrib001">
<p>Conceived and designed the experiments: JC JL. Performed the experiments: JC JL. Analyzed the data: JC JL. Contributed reagents/materials/analysis tools: JL LS JMW YL CEN BRM TPM VVM CMG JCW WRF MH DBL JAB JC. Wrote the paper: JL JC. Web service construction: JL JH.</p>
</fn>
<corresp id="cor001">* E-mail:
<email>chengji@missouri.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>22</day>
<month>4</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="collection">
<year>2015</year>
</pub-date>
<volume>10</volume>
<issue>4</issue>
<elocation-id>e0125000</elocation-id>
<history>
<date date-type="received">
<day>28</day>
<month>8</month>
<year>2014</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>3</month>
<year>2015</year>
</date>
</history>
<permissions>
<copyright-year>2015</copyright-year>
<copyright-holder>Li et al</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="pone.0125000.pdf"></self-uri>
<abstract>
<p>RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from
<italic>Human</italic>
,
<italic>Mouse</italic>
,
<italic>Arabidopsis thaliana</italic>
, and
<italic>Drosophila melanogaster</italic>
cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at
<ext-link ext-link-type="uri" xlink:href="http://calla.rnet.missouri.edu/rnaminer/index.html">http://calla.rnet.missouri.edu/rnaminer/index.html</ext-link>
.</p>
</abstract>
<funding-group>
<funding-statement>The work was made possible by an NIH Botanical Center grant (P50AT006273) from the National Center for Complementary and Alternative Medicines (NCCAM), the Office of Dietary Supplements (ODS), and the National Cancer Institute (NCI), a NIH R01 grant (R01GM093123), and a NSF CAREER grant (DBI1149224). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NCCAM, ODS, NCI, the National Institutes of Health, or the National Science Foundation.</funding-statement>
</funding-group>
<counts>
<fig-count count="23"></fig-count>
<table-count count="9"></table-count>
<page-count count="30"></page-count>
</counts>
<custom-meta-group>
<custom-meta id="data-availability">
<meta-name>Data Availability</meta-name>
<meta-value>All of the six data sets used in this work are publicly available. The third, fourth, and fifth data sets have been deposited into the Gene Expression Omnibus (GEO) database (accession numbers: GSE41570, GSE41679, and GSE35288, respectively). The first, second, and sixth data sets are publicly available at
<ext-link ext-link-type="uri" xlink:href="http://calla.rnet.missouri.edu/botanical/RNAMiner.html">http://calla.rnet.missouri.edu/botanical/RNAMiner.html</ext-link>
.</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
<notes>
<title>Data Availability</title>
<p>All of the six data sets used in this work are publicly available. The third, fourth, and fifth data sets have been deposited into the Gene Expression Omnibus (GEO) database (accession numbers: GSE41570, GSE41679, and GSE35288, respectively). The first, second, and sixth data sets are publicly available at
<ext-link ext-link-type="uri" xlink:href="http://calla.rnet.missouri.edu/botanical/RNAMiner.html">http://calla.rnet.missouri.edu/botanical/RNAMiner.html</ext-link>
.</p>
</notes>
</front>
<body>
<sec sec-type="intro" id="sec001">
<title>Introduction</title>
<p>Transcriptome analysis is essential for determining the relationship between the information encoded in a genome, its expression, and phenotypic variation [
<xref rid="pone.0125000.ref001" ref-type="bibr">1</xref>
,
<xref rid="pone.0125000.ref002" ref-type="bibr">2</xref>
]. Next-generation sequencing (NGS) of RNAs (RNA-Seq) has emerged as a powerful approach for transcriptome analysis [
<xref rid="pone.0125000.ref003" ref-type="bibr">3</xref>
,
<xref rid="pone.0125000.ref004" ref-type="bibr">4</xref>
] that has many advantages over microarray technologies [
<xref rid="pone.0125000.ref005" ref-type="bibr">5</xref>
,
<xref rid="pone.0125000.ref006" ref-type="bibr">6</xref>
,
<xref rid="pone.0125000.ref007" ref-type="bibr">7</xref>
].</p>
<p>A RNA-Seq experiment typically generates hundreds of millions of short reads that are mapped to reference genomes and counted as a measure of expression [
<xref rid="pone.0125000.ref005" ref-type="bibr">5</xref>
]. Mining the gigabytes or even terabytes of RNA-Seq raw data is an essential, but challenging step in the analysis.</p>
<p>In order to address these challenges, RNAMiner has been developed to convert gigabytes of raw RNA-Seq data into kilobytes of valuable biological knowledge through a five-step data mining and knowledge discovery process. RNAMiner integrates both public tools (e.g., TopHat2 [
<xref rid="pone.0125000.ref008" ref-type="bibr">8</xref>
], Bowtie2 [
<xref rid="pone.0125000.ref009" ref-type="bibr">9</xref>
], Cufflinks [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], HTSeq [
<xref rid="pone.0125000.ref011" ref-type="bibr">11</xref>
], edgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
], and DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
]) with our in-house tools (MULTICOM-MAP [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
,
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
,
<xref rid="pone.0125000.ref016" ref-type="bibr">16</xref>
]) to preprocess data and identify differentially expressed genes in the first three steps. In the last two steps, RNAMiner uses our in-house tools MULTICOM-PDCN [
<xref rid="pone.0125000.ref017" ref-type="bibr">17</xref>
,
<xref rid="pone.0125000.ref018" ref-type="bibr">18</xref>
] and MULTICOM-GNET [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
] to predict both functions and gene regulatory networks of differentially expressed genes, respectively.</p>
<p>As proof of principle, we have applied the RNAMiner protocol to RNA-Seq data generated from
<italic>Human</italic>
,
<italic>Mouse</italic>
,
<italic>Arabidopsis thaliana</italic>
, and
<italic>Drosophila melanogaster</italic>
cells. The data mining process successfully produced valuable biological knowledge such as differentially expressed genes, cohesive functional gene groups, and novel hypothetical gene regulatory networks by reducing the size of the initial data set over a thousand-fold.</p>
</sec>
<sec sec-type="materials|methods" id="sec002">
<title>Methods</title>
<p>Some RNA-Seq data analysis pipelines (e.g. Galaxy [
<xref rid="pone.0125000.ref021" ref-type="bibr">21</xref>
], KBase [
<xref rid="pone.0125000.ref022" ref-type="bibr">22</xref>
], iPlant [
<xref rid="pone.0125000.ref023" ref-type="bibr">23</xref>
]) provide users with a convenient and free platform for RNA-Seq data analysis by combing public tools, such as TopHat [
<xref rid="pone.0125000.ref024" ref-type="bibr">24</xref>
], Bowtie [
<xref rid="pone.0125000.ref025" ref-type="bibr">25</xref>
], Cufflinks [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], Cuffmerge [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], and Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
]. As with these pipelines, RNAMiner combines these public tools such as TopHat2 [
<xref rid="pone.0125000.ref008" ref-type="bibr">8</xref>
], Bowtie2 [
<xref rid="pone.0125000.ref009" ref-type="bibr">9</xref>
], Cufflinks [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], and it is free. However, there are several differences between RNAMiner and other pipelines. First, RNAMiner integrates more tools, such as HTSeq [
<xref rid="pone.0125000.ref011" ref-type="bibr">11</xref>
], edgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
], DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
], and our in-house MULTICOM-MAP [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
,
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
,
<xref rid="pone.0125000.ref016" ref-type="bibr">16</xref>
], to calculate gene expression values and identify differentially expressed genes. These tools can generate more accurate consensus results. For example, RNAMiner uses Cuffdiff, edgeR, and DESeq2 to identify differentially expressed genes based on TopHat mapping results and gene expression values calculated by HTSeq and MULTICOM-MAP. RNAMiner generates up to five distinct lists and one consensus list of differentially expressed genes, which usually produces more accurate results. Second, RNAMiner predicts functions of differentially expressed genes and builds gene regulatory networks by integrating our in-house tools MULTICOM-PDCN [
<xref rid="pone.0125000.ref017" ref-type="bibr">17</xref>
,
<xref rid="pone.0125000.ref018" ref-type="bibr">18</xref>
] and MULTICOM-GNET [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
]. These analyses provide more biological information. Other pipelines (e.g. Galaxy and iPlant) do not provide these analyses. Another software package—KBase—contains a service to predict gene functions, but the service only provides GO annotation for plant genomes. Third, without requirements for user registration and selection of many parameters, RNAMiner is easier to use than other pipelines. Compared to running each tool separately, users can easily run all these tools integrated in RNAMiner at one time and download results generated by all the tools at the RNAMiner web site.</p>
<p>The five data analysis steps of the RNAMiner protocol (
<xref rid="pone.0125000.g001" ref-type="fig">Fig 1</xref>
) are described individually in sub-sections below. Tables
<xref rid="pone.0125000.t001" ref-type="table">1</xref>
and
<xref rid="pone.0125000.t002" ref-type="table">2</xref>
list the versions and the parameters of all the public tools used in RNAMiner and describe the meanings of the parameters.</p>
<fig id="pone.0125000.g001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g001</object-id>
<label>Fig 1</label>
<caption>
<title>The RNAMiner protocol for big transcriptomics data analysis.</title>
<p>Five blue boxes denote five data analysis steps, i.e. mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. The tools used in each step are listed inside each box. The external input information is represented by brown boxes and the final output information is represented by green boxes. The information flow between these components is denoted by arrows.</p>
</caption>
<graphic xlink:href="pone.0125000.g001"></graphic>
</fig>
<table-wrap id="pone.0125000.t001" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t001</object-id>
<label>Table 1</label>
<caption>
<title>The versions of the public tools used in RNAMiner.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t001g" xlink:href="pone.0125000.t001"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Tool</th>
<th align="left" rowspan="1" colspan="1">Version</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">TopHat2</td>
<td align="left" rowspan="1" colspan="1">2.0.6</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Bowtie2</td>
<td align="left" rowspan="1" colspan="1">2.1.0</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Cufflinks</td>
<td align="left" rowspan="1" colspan="1">2.2.1</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">HTSeq</td>
<td align="left" rowspan="1" colspan="1">0.5.3p7</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">edgeR</td>
<td align="left" rowspan="1" colspan="1">3.4.2</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">DESeq2</td>
<td align="left" rowspan="1" colspan="1">1.2.10</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t002</object-id>
<label>Table 2</label>
<caption>
<title>The parameters of the public tools used in RNAMiner, the parameter values, and the descriptions.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t002g" xlink:href="pone.0125000.t002"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Tool</th>
<th align="left" rowspan="1" colspan="1">Parameter</th>
<th align="left" rowspan="1" colspan="1">Value</th>
<th align="left" rowspan="1" colspan="1">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>TopHat2</bold>
</td>
<td align="left" rowspan="1" colspan="1">—read-mismatches</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">The maximum number of mismatched nucleotides between a read and a reference allowed for a valid mapping.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—read-gap-length</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">The maximum number of gaps in the alignment between a read and a reference genome allowed for a valid mapping.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—splice-mismatches</td>
<td align="left" rowspan="1" colspan="1">0</td>
<td align="left" rowspan="1" colspan="1">The maximum number of mismatches allowed in the "anchor" region of a spliced alignment.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—segment-mismatches</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">Read segments are mapped independently, allowing up to this number of mismatches in each segment alignment.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—segment-length</td>
<td align="left" rowspan="1" colspan="1">25</td>
<td align="left" rowspan="1" colspan="1">A read is cut into segments each having at least this length. These segments are mapped independently.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Bowtie2</bold>
</td>
<td align="left" rowspan="1" colspan="1">—end-to-end</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">In this mode, Bowtie2 requires that the entire read to be aligned from one end to the other, without any trimming (or "soft clipping") of characters from either end. Local alignment is not used in Bowtie2.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—sensitive</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">This option generally balances speed, sensitivity and accuracy.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Cufflinks</bold>
</td>
<td align="left" rowspan="1" colspan="1">—frag-len-mean</td>
<td align="left" rowspan="1" colspan="1">200</td>
<td align="left" rowspan="1" colspan="1">The expected (mean) fragment length.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—frag-len-std-dev</td>
<td align="left" rowspan="1" colspan="1">80</td>
<td align="left" rowspan="1" colspan="1">The standard deviation of fragment lengths.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—min-isoform-fraction</td>
<td align="left" rowspan="1" colspan="1">0.10</td>
<td align="left" rowspan="1" colspan="1">Suppress isoform transcripts below this abundance level.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—pre-mrna-fraction</td>
<td align="left" rowspan="1" colspan="1">0.15</td>
<td align="left" rowspan="1" colspan="1">Suppress intra-intronic transcripts below this level.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—max-intron-length</td>
<td align="left" rowspan="1" colspan="1">300000</td>
<td align="left" rowspan="1" colspan="1">Ignore alignments with gaps longer than this.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Cuffdiff</bold>
</td>
<td align="left" rowspan="1" colspan="1">—min-alignment-count</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">Minimum number of alignments in a locus for testing.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—FDR</td>
<td align="left" rowspan="1" colspan="1">0.05</td>
<td align="left" rowspan="1" colspan="1">The maximum false discovery rate allowed after statistical correction.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—frag-len-mean</td>
<td align="left" rowspan="1" colspan="1">200</td>
<td align="left" rowspan="1" colspan="1">The expected (mean) fragment length.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">—frag-len-std-dev</td>
<td align="left" rowspan="1" colspan="1">80</td>
<td align="left" rowspan="1" colspan="1">The standard deviation of fragment lengths.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HTSeq</bold>
</td>
<td align="left" rowspan="1" colspan="1">-a</td>
<td align="left" rowspan="1" colspan="1">10</td>
<td align="left" rowspan="1" colspan="1">Skip all reads with alignment quality lower than this minimum value.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">-t</td>
<td align="left" rowspan="1" colspan="1">Exon</td>
<td align="left" rowspan="1" colspan="1">Feature type (3rd column in GFF file) to be used. All the features of other types are ignored.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">-i</td>
<td align="left" rowspan="1" colspan="1">gene_id</td>
<td align="left" rowspan="1" colspan="1">GFF attribute to be used as feature ID.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">-m</td>
<td align="left" rowspan="1" colspan="1">Union</td>
<td align="left" rowspan="1" colspan="1">Mode to handle reads overlapping more than one feature.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>DESeq2</bold>
</td>
<td align="left" rowspan="1" colspan="1">Test</td>
<td align="left" rowspan="1" colspan="1">LRT</td>
<td align="left" rowspan="1" colspan="1">Use the likelihood ratio test on the difference in deviation between a full and reduced model formula (nbinomLRT).</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">fitType</td>
<td align="left" rowspan="1" colspan="1">parametric</td>
<td align="left" rowspan="1" colspan="1">The type of fitting of dispersions to the mean intensity. Parametric: fit a dispersion-mean relation via a robust gamma-family GLM.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>edgeR</bold>
</td>
<td align="left" rowspan="1" colspan="1">Pair</td>
<td align="left" rowspan="1" colspan="1">NULL</td>
<td align="left" rowspan="1" colspan="1">First two levels of object$samples$group (a factor) are used in comparison.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Dispersion</td>
<td align="left" rowspan="1" colspan="1">NULL</td>
<td align="left" rowspan="1" colspan="1">Either the common or tagwise dispersion estimates from the DGEList object will be used, according to the value of common.disp.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">common.disp</td>
<td align="left" rowspan="1" colspan="1">TRUE</td>
<td align="left" rowspan="1" colspan="1">Testing carried out by common dispersion for each tag/gene.</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<sec id="sec003">
<title>1. Mapping RNA-Seq reads to a reference genome</title>
<p>We use two public tools, TopHat2 [
<xref rid="pone.0125000.ref008" ref-type="bibr">8</xref>
] and Bowtie2 [
<xref rid="pone.0125000.ref009" ref-type="bibr">9</xref>
], to map RNA-Seq reads to reference genomes in the UCSC genome browser [
<xref rid="pone.0125000.ref026" ref-type="bibr">26</xref>
] in conjunction with the RefSeq genome reference annotations [
<xref rid="pone.0125000.ref027" ref-type="bibr">27</xref>
]. The workflow of mapping RNA-Seq reads to a reference genome and calculating gene expression values is illustrated in
<xref rid="pone.0125000.g002" ref-type="fig">Fig 2</xref>
. It is worth noting that, since the RefSeq genome reference annotations contain information about some non-coding small RNAs, the reads of the non-coding RNAs are mapped and counted in addition to regular protein coding mRNAs. MULTICOM-MAP [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
,
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
,
<xref rid="pone.0125000.ref016" ref-type="bibr">16</xref>
] is used to remove reads mapped to multiple locations in a reference genome from the mapping data in BAM/SAM format [
<xref rid="pone.0125000.ref028" ref-type="bibr">28</xref>
] generated by TopHat2 and Bowtie2. Only reads mapped to a unique location on the genome are retained to calculate the read counts of the genes. We use MULTICOM-MAP to analyze the mapping results to obtain baseline information, such as the total number of reads, the number of reads mapped to a unique location, and the number of reads mapped to multiple locations. This mapping process can generally reduce the size of datasets by
<italic>several</italic>
orders of magnitude.</p>
<fig id="pone.0125000.g002" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g002</object-id>
<label>Fig 2</label>
<caption>
<title>The workflow of mapping RNA-Seq reads to a reference genome and calculating gene expression values.</title>
<p>The blue boxes denote the tools (TopHat2, Bowtie2, MULTICOM-MAP, HTSeq, Cufflinks) used in the steps of mapping RNA-Seq reads to a reference genome and calculating gene expression values. The external input information is represented by brown boxes and the output information is represented by green boxes. The information flow between these components is denoted by arrows.</p>
</caption>
<graphic xlink:href="pone.0125000.g002"></graphic>
</fig>
</sec>
<sec id="sec004">
<title>2. Calculating gene expression values</title>
<p>For RNAMiner, MULTICOM-MAP [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
,
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
,
<xref rid="pone.0125000.ref016" ref-type="bibr">16</xref>
] and two public tools: HTSeq [
<xref rid="pone.0125000.ref011" ref-type="bibr">11</xref>
] and Cufflinks [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
] are used to calculate gene expression values according to the genome mapping output and the RefSeq genome reference annotation [
<xref rid="pone.0125000.ref027" ref-type="bibr">27</xref>
]. MULTICOM-MAP and HTSeq produce raw read counts, while Cufflinks generates normalized values in terms of FPKM, i.e., fragments per kilobase of exon model per million mapped fragments. The normalized gene expression values generated by Cufflinks are used to identify differentially expressed genes in the next step. The read counts generated by MULTICOM-MAP and HTSeq are fed separately into two R Bioconductor packages, edgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
] and DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
], to identify differentially expressed genes. The normalized gene expression values (RPKM, reads per kilobase of exon model per million mapped reads) of MULTICOM-MAP are used to construct gene regulatory networks in the last step. Cufflinks, MULTICOM-MAP, and HTSeq are largely complementary and mostly differ in how they handle the reads mapped to common exons of multiple isoforms of a gene. Cufflinks distributes the count of such reads to each isoform proportionally according the estimated probability that the reads were derived from the isoform. In contrast, MULTICOM-MAP distributes the total count of such reads to each isoform, while HTSeq discards the reads without counting them for any isoform. This analysis step generates the overall expression profile of most genes in a transcriptome and can reduce the size of data from Step 1 by ~one thousand-fold, from gigabytes to several megabytes.</p>
</sec>
<sec id="sec005">
<title>3. Identifying differentially expressed genes</title>
<p>We use Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
] and two R Bioconductor packages, edgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
] and DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
] to identify differentially expressed genes separately (see
<xref rid="pone.0125000.g003" ref-type="fig">Fig 3</xref>
for the workflow). EdgeR and DESeq2 identify differentially expressed genes based on the raw read counts calculated by MULTICOM-MAP and HTSeq, resulting in four lists of differentially expressed genes (i.e., edgeR+MULTICOM-MAP, edgeR+HTSeq, DESeq2+MULTICOM-MAP, and DESeq2+HTSeq). In contrast Cuffdiff identifies differentially expressed genes directly from the genome mapping outputs containing only reads mapped to a unique location on the genome, resulting in one list of differentially expressed genes. Cuffdiff, edgeR and DESeq2 further adjust p-values by multiple testing using Benjamini and Hochberg's approach, which controls the false discovery rate (FDR) [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
,
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
,
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
]. Usually, the cut-off of p-value (or q-value) is set to 0.05. Based on the five lists of differentially expressed genes generated by Cuffdiff, edgeR+MULTICOM-MAP, DESeq2+MULTICOM-MAP, edgeR+HTSeq, and DESeq2+HTSeq, a consensus list of differentially expressed genes is generated as the final output which usually comes from the overlap of at least three lists of differentially expressed genes. This step generates valuable information that may play an important role in the biological experiment. For example, the significantly differentially expressed genes identified by RNAMiner could be the targets for new biological experiments. This analysis step can generally reduce the size of data of the previous step by a couple orders of magnitude, condensing the data set size to several hundred kilobytes.</p>
<fig id="pone.0125000.g003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g003</object-id>
<label>Fig 3</label>
<caption>
<title>The workflow of identifying differentially expressed genes.</title>
<p>The blue boxes denote the tools (edgeR, DESeq2, Cuffdiff) used in the step of identifying differentially expressed genes. The external input information is represented by brown boxes and the output information is represented by green boxes. The information flow between these components is denoted by arrows.</p>
</caption>
<graphic xlink:href="pone.0125000.g003"></graphic>
</fig>
</sec>
<sec id="sec006">
<title>4. Predicting gene functions</title>
<p>We use MULTICOM-PDCN [
<xref rid="pone.0125000.ref017" ref-type="bibr">17</xref>
,
<xref rid="pone.0125000.ref018" ref-type="bibr">18</xref>
], a protein function prediction method ranked among the top methods in the 2011–2012 Critical Assessment of Function Annotation (CAFA) [
<xref rid="pone.0125000.ref029" ref-type="bibr">29</xref>
], to predict functions of differentially expressed genes (see
<xref rid="pone.0125000.g004" ref-type="fig">Fig 4</xref>
for the workflow). MULTICOM-PDCN integrates sequence-profile and profile-profile alignment methods (PSI-BLAST [
<xref rid="pone.0125000.ref030" ref-type="bibr">30</xref>
] and HHSearch [
<xref rid="pone.0125000.ref031" ref-type="bibr">31</xref>
]) with protein function databases such as the Gene Ontology database [
<xref rid="pone.0125000.ref032" ref-type="bibr">32</xref>
], the Swiss-Prot database [
<xref rid="pone.0125000.ref033" ref-type="bibr">33</xref>
], and the Pfam database [
<xref rid="pone.0125000.ref034" ref-type="bibr">34</xref>
], to predict functions of proteins in Gene Ontology [
<xref rid="pone.0125000.ref032" ref-type="bibr">32</xref>
] terms in three categories: biological process, molecular function, and cellular component. MULTICOM-PDCN also provides some statistical information about the predicted functions, such as the number of differentially expressed genes predicted in each function. We then use the Cochran-Mantel-Haenszel test implemented by R program mantelhaen.test [
<xref rid="pone.0125000.ref035" ref-type="bibr">35</xref>
,
<xref rid="pone.0125000.ref036" ref-type="bibr">36</xref>
] to check if predicted function terms are good for Fisher’s exact test to identify the significantly enriched GO function terms. A p-value from the MH test lower than 0.05 suggests the two nominal variables (e.g., two function terms) are conditionally independent in each stratum [
<xref rid="pone.0125000.ref035" ref-type="bibr">35</xref>
,
<xref rid="pone.0125000.ref036" ref-type="bibr">36</xref>
]. We then calculate a p-value of enrichment for each predicted function using R function fisher.test [
<xref rid="pone.0125000.ref035" ref-type="bibr">35</xref>
,
<xref rid="pone.0125000.ref036" ref-type="bibr">36</xref>
,
<xref rid="pone.0125000.ref037" ref-type="bibr">37</xref>
,
<xref rid="pone.0125000.ref038" ref-type="bibr">38</xref>
,
<xref rid="pone.0125000.ref039" ref-type="bibr">39</xref>
,
<xref rid="pone.0125000.ref040" ref-type="bibr">40</xref>
,
<xref rid="pone.0125000.ref041" ref-type="bibr">41</xref>
,
<xref rid="pone.0125000.ref042" ref-type="bibr">42</xref>
] and sort the predicted functions by their p-value in ascending order, from the most significant ones to the least significant ones. The list of the most significantly enriched functions can provide an overview of the biological processes differentially perturbed in two biological conditions. Although the physical size of the data and knowledge generated in this step is comparable to the size of the data in the previous step, the differentially expressed genes can be organized in three functional perspectives: biological process, molecular function, and cellular component.</p>
<fig id="pone.0125000.g004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g004</object-id>
<label>Fig 4</label>
<caption>
<title>The workflow of predicting gene functions.</title>
<p>The blue box denotes the tool used in the step of predicting gene functions. The tools of PSI-BLAST and HHSearch used in MULTICOM-PDCN are listed inside the blue box. The external input information is represented by brown boxes and the output information is represented by green boxes. The information flow between these components is denoted by arrows.</p>
</caption>
<graphic xlink:href="pone.0125000.g004"></graphic>
</fig>
</sec>
<sec id="sec007">
<title>5. Constructing gene regulatory networks</title>
<p>We use MULTICOM-GNET [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
] to construct gene regulatory networks based on differentially expressed genes and transcription factors in a genome (see
<xref rid="pone.0125000.g005" ref-type="fig">Fig 5</xref>
for the workflow). MULTICOM-GNET firstly clusters differentially expressed genes with similar expression patterns into functional clusters using the K-means clustering algorithm. Secondly, it builds a binary decision tree to represent potential regulatory relationships between several selected transcription factors (TFs) and the genes in each cluster. Thirdly, it re-assigns differentially expressed genes into clusters whose gene regulatory tree best explained the expression patterns of the genes. The last two steps are repeated until the maximized likelihood of the gene expression data is reached [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
]. We also use a R network analysis and visualization package “igraph” [
<xref rid="pone.0125000.ref043" ref-type="bibr">43</xref>
] to visualize gene regulatory networks by linking the regulatory relationships between and within all the gene regulatory modules predicted by MULTICOM-GNET together. The regulatory network construction step generates a comprehensive understanding of underlying mechanisms controlling the expression of a transcriptome and can significantly reduce the size of data. The hundreds of kilobytes of the biological network data provide a system view of the cellular systems, which can be more readily utilized to generate valuable hypotheses for biological experiments.</p>
<fig id="pone.0125000.g005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g005</object-id>
<label>Fig 5</label>
<caption>
<title>The workflow of constructing gene regulatory networks.</title>
<p>The blue boxes denote the methods used by MULTICOM-GNET in constructing gene regulatory networks. The external input information is represented by brown boxes and the output information is represented by green boxes. The information flow between these components is denoted by arrows.</p>
</caption>
<graphic xlink:href="pone.0125000.g005"></graphic>
</fig>
<p>For replicates from RNA-Seq experiments, RNAMiner maps reads of the replicates to reference genomes and calculates gene expression values separately. The gene expression values of the replicates of two samples are combined into a profile (i.e. a vector of the expression values of a gene in each replicate of each condition), which is input into edgeR and DESeq2 to identify differentially expressed genes. Additionally, the TopHat mapping results of the replicates of two samples are input into Cuffdiff to identify differentially expressed genes. EdgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
], DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
], and Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
] handle the replicates by modeling the variance (dispersion) in counts across the replicates as a function of the mean count of the replicates. EdgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
] estimates the variance by conditional maximum likelihood conditioned on the total count for the gene. DESeq2 [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
] uses a flexible and mean-dependent local regression to estimate the variance between the replicates by pooling genes with similar expression levels to enhance the variance estimation. Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
] estimates the variance based on a negative binomial model and uses t-test to calculate the test statistics. Cuffdiff can make a model on each condition with replicates, or use a global model for all conditions together.</p>
<p>Before calling a tool to do data analysis, RNAMiner checks whether the data is appropriate to the tool. For example, MULTICOM-GNET [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
] is not applied if no transcription factors exist in differentially expressed genes because MULTICOM-GNET needs at least one transcription factor to build gene regulatory networks. Another example is, for some special datasets, overexpression of some treatments in some regions of the genome in one condition leads to very large read counts of some genes in this condition, and dramatic differences of gene expressions between two conditions. This violates the assumption of edgeR’s normalization method [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
] that the majority of the genes should have similar expression levels. Therefore, calculating a normalization factor across all loci is difficult. RNAMiner will check this assumption and will not call edgeR if it is violated.</p>
</sec>
</sec>
<sec id="sec008">
<title>Evaluation and Discussion</title>
<p>We tested the RNAMiner protocol on six sets of RNA-Seq data generated from
<italic>Human</italic>
,
<italic>Mouse</italic>
,
<italic>Arabidopsis thaliana</italic>
and
<italic>Drosophila melanogaster</italic>
cells in order to evaluate its effectiveness. The details such as organisms, biological conditions, and experimental settings about the six sets of RNA-Seq data were reported in
<xref rid="pone.0125000.t003" ref-type="table">Table 3</xref>
. The results of each of the five analysis steps are described and discussed as follows.</p>
<table-wrap id="pone.0125000.t003" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t003</object-id>
<label>Table 3</label>
<caption>
<title>The organisms, conditions, replicate numbers, and descriptions of the six sets of RNA-Seq data.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t003g" xlink:href="pone.0125000.t003"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Data set</th>
<th align="left" rowspan="1" colspan="1">Organism</th>
<th align="left" rowspan="1" colspan="1">Conditions</th>
<th align="left" rowspan="1" colspan="1">Replicates</th>
<th align="left" rowspan="1" colspan="1">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>First</bold>
</td>
<td align="left" rowspan="1" colspan="1">Mouse</td>
<td align="left" rowspan="1" colspan="1">Control, two botanicals (Lessertia frutescens and Sambucus nigra), and Nrf2 activator CDDO (2-cyano-3, 12-dioxooleana-1, 9-dien-28-oic acid)</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">Two mouse fibroblast cell lines: mutant (PGAM5 knock-out) mouse and wild-type mouse.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Second</bold>
</td>
<td align="left" rowspan="1" colspan="1">Mouse</td>
<td align="left" rowspan="1" colspan="1">FruHis (0, 1, 2, 4, 8, 16 mM) in the absence (samples 2A, 2B, 2C, 2D, 2E, 2F) or presence (samples 2G, 2H, 3A, 3B, 3C, 3D) of 4 μM lycopene</td>
<td align="left" rowspan="1" colspan="1">No</td>
<td align="left" rowspan="1" colspan="1">Murine prostate adenocarcinoma cells TRAMP-C2 [
<xref rid="pone.0125000.ref049" ref-type="bibr">49</xref>
] treated in vitro with a novel antioxidant FruHis from tomato [
<xref rid="pone.0125000.ref050" ref-type="bibr">50</xref>
].</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Third</bold>
</td>
<td align="left" rowspan="1" colspan="1">Drosophila melanogaster</td>
<td align="left" rowspan="1" colspan="1">CF (Control Female), CM (Control Male), HMF (H83M2 Female), and HMM (H83M2 Male)</td>
<td align="left" rowspan="1" colspan="1">Three biological replicates</td>
<td align="left" rowspan="1" colspan="1">H83M2 is a transgenic stock number, which ectopically expresses MSL2 protein in females [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
].</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Fourth</bold>
</td>
<td align="left" rowspan="1" colspan="1">Drosophila melanogaster</td>
<td align="left" rowspan="1" colspan="1">CF (Control Female), CM (Control Male), and mF (Meta Female)</td>
<td align="left" rowspan="1" colspan="1">Three biological replicates</td>
<td align="left" rowspan="1" colspan="1">Meta female is the triple X female [
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
].</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Fifth</bold>
</td>
<td align="left" rowspan="1" colspan="1">Arabidopsis thaliana</td>
<td align="left" rowspan="1" colspan="1">Columbia wild-type and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 mutants</td>
<td align="left" rowspan="1" colspan="1">Three biological replicates</td>
<td align="left" rowspan="1" colspan="1">There were two fastq files for each sample. One was solexa 1.3 quals output, and the other was Casava 1.8 output. The result files Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 were generated from solexa 1.3 quals outputs and the result files Col_qtrim and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim were generated from Casava 1.8 outputs [
<xref rid="pone.0125000.ref051" ref-type="bibr">51</xref>
].</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Sixth</bold>
</td>
<td align="left" rowspan="1" colspan="1">Human</td>
<td align="left" rowspan="1" colspan="1">1Sfesrrb, 2pc3, 3DY131, and 4ctrl</td>
<td align="left" rowspan="1" colspan="1">Two biological replicates</td>
<td align="left" rowspan="1" colspan="1">Samples 1Sfesrrb and 2pc3 were from Experiment 1, and samples 3DY131 and 4ctrl were from Experiment 2. 2pc3 was control for Experiment 1, and 1Sfesrrb was forced to express human Estrogen Related Receptor beta (esrrb), confirmed by real-time PCR. 4ctrl was control for experiment 2. In 3DY131, Esrrb agonist DY131 was added to the culture of DU145 cells at concentration of 3uM for 12 hours.</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<sec id="sec009">
<title>1. Results of mapping RNA-Seq reads to a reference genome</title>
<p>RNAMiner used TopHat2 [
<xref rid="pone.0125000.ref008" ref-type="bibr">8</xref>
] and Bowtie [
<xref rid="pone.0125000.ref044" ref-type="bibr">44</xref>
] to map RNA-Seq reads in the first and second data sets to the
<italic>Mouse</italic>
reference genome (mm9) in the UCSC genome browser [
<xref rid="pone.0125000.ref026" ref-type="bibr">26</xref>
] in conjunction with the RefSeq genome reference annotation (mm9) [
<xref rid="pone.0125000.ref027" ref-type="bibr">27</xref>
], map RNA-Seq reads in the third and fourth data sets to the
<italic>Drosophila melanogaster</italic>
reference genome (dm3) in the UCSC genome browser [
<xref rid="pone.0125000.ref026" ref-type="bibr">26</xref>
] in conjunction with the RefSeq genome reference annotation (dm3) [
<xref rid="pone.0125000.ref027" ref-type="bibr">27</xref>
], map RNA-Seq reads in the fifth data set to the
<italic>Arabidopsis thaliana</italic>
reference genome (
<ext-link ext-link-type="uri" xlink:href="http://ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_">ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_</ext-link>
chromosomes/) in conjunction with the
<italic>Arabidopsis thaliana</italic>
genome reference annotation (
<ext-link ext-link-type="uri" xlink:href="http://ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/">ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_gff3/</ext-link>
), and used TopHat2 [
<xref rid="pone.0125000.ref008" ref-type="bibr">8</xref>
] and Bowtie2 [
<xref rid="pone.0125000.ref009" ref-type="bibr">9</xref>
] to map RNA-Seq reads in the sixth data set to the
<italic>Homo sapiens</italic>
reference genomes (hg19) in the UCSC genome browser [
<xref rid="pone.0125000.ref026" ref-type="bibr">26</xref>
] in conjunction with the RefSeq genome reference annotation (hg19) [
<xref rid="pone.0125000.ref027" ref-type="bibr">27</xref>
]. Tables
<xref rid="pone.0125000.t004" ref-type="table">4</xref>
<xref rid="pone.0125000.t009" ref-type="table">9</xref>
show the mapping statistics of six sets of RNA-Seq data. Overall, more than 70% of reads were mapped to the genome successfully. Particularly, a very high mapping rate (~97%) was reached on the sixth data set. These mapping success rates were within the reasonable range, suggesting the good quality of the data and the correctness of the mapping process. This reads mapping process reduced the size of data by several orders of magnitude.</p>
<table-wrap id="pone.0125000.t004" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t004</object-id>
<label>Table 4</label>
<caption>
<title>Mapping statistics of the first set of RNA-Seq data of mouse.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t004g" xlink:href="pone.0125000.t004"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique sites</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Mutant, Control</bold>
</td>
<td align="left" rowspan="1" colspan="1">22,053,527</td>
<td align="left" rowspan="1" colspan="1">14,120,075</td>
<td align="left" rowspan="1" colspan="1">2,577,992</td>
<td align="left" rowspan="1" colspan="1">5,330,066</td>
<td align="left" rowspan="1" colspan="1">25,394</td>
<td align="char" char="." rowspan="1" colspan="1">75.72%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Wild-type, Control</bold>
</td>
<td align="left" rowspan="1" colspan="1">29,483,443</td>
<td align="left" rowspan="1" colspan="1">19,560,525</td>
<td align="left" rowspan="1" colspan="1">3,077,978</td>
<td align="left" rowspan="1" colspan="1">6,809,626</td>
<td align="left" rowspan="1" colspan="1">35,314</td>
<td align="char" char="." rowspan="1" colspan="1">76.78%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Mutant, CDDO</bold>
</td>
<td align="left" rowspan="1" colspan="1">16,050,830</td>
<td align="left" rowspan="1" colspan="1">10,500,068</td>
<td align="left" rowspan="1" colspan="1">1,832,101</td>
<td align="left" rowspan="1" colspan="1">3,699,982</td>
<td align="left" rowspan="1" colspan="1">18,679</td>
<td align="char" char="." rowspan="1" colspan="1">76.83%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Wild-type, CDDO</bold>
</td>
<td align="left" rowspan="1" colspan="1">26,643,277</td>
<td align="left" rowspan="1" colspan="1">17,185,336</td>
<td align="left" rowspan="1" colspan="1">2,840,999</td>
<td align="left" rowspan="1" colspan="1">6,585,400</td>
<td align="left" rowspan="1" colspan="1">31,542</td>
<td align="char" char="." rowspan="1" colspan="1">75.16%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Mutant, Sutherlandia</bold>
</td>
<td align="left" rowspan="1" colspan="1">37,321,607</td>
<td align="left" rowspan="1" colspan="1">23,776,732</td>
<td align="left" rowspan="1" colspan="1">3,690,121</td>
<td align="left" rowspan="1" colspan="1">9,813,200</td>
<td align="left" rowspan="1" colspan="1">41,554</td>
<td align="char" char="." rowspan="1" colspan="1">73.60%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Wild-type, Sutherlandia</bold>
</td>
<td align="left" rowspan="1" colspan="1">27,678,509</td>
<td align="left" rowspan="1" colspan="1">18,150,349</td>
<td align="left" rowspan="1" colspan="1">2,717,683</td>
<td align="left" rowspan="1" colspan="1">6,777,541</td>
<td align="left" rowspan="1" colspan="1">32,936</td>
<td align="char" char="." rowspan="1" colspan="1">75.39%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Mutant, Elderberry</bold>
</td>
<td align="left" rowspan="1" colspan="1">25,750,508</td>
<td align="left" rowspan="1" colspan="1">17,155,488</td>
<td align="left" rowspan="1" colspan="1">2,631,750</td>
<td align="left" rowspan="1" colspan="1">5,932,836</td>
<td align="left" rowspan="1" colspan="1">30,434</td>
<td align="char" char="." rowspan="1" colspan="1">76.84%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Wild-type, Elderberry</bold>
</td>
<td align="left" rowspan="1" colspan="1">24,036,293</td>
<td align="left" rowspan="1" colspan="1">15,882,208</td>
<td align="left" rowspan="1" colspan="1">2,386,226</td>
<td align="left" rowspan="1" colspan="1">5,738,437</td>
<td align="left" rowspan="1" colspan="1">29,422</td>
<td align="char" char="." rowspan="1" colspan="1">76.00%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t005" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t005</object-id>
<label>Table 5</label>
<caption>
<title>Mapping statistics of the second set of RNA-Seq data of mouse.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t005g" xlink:href="pone.0125000.t005"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique sites</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2A</bold>
</td>
<td align="left" rowspan="1" colspan="1">12,390,167</td>
<td align="left" rowspan="1" colspan="1">9,016,108</td>
<td align="left" rowspan="1" colspan="1">1,513,122</td>
<td align="left" rowspan="1" colspan="1">1,849,425</td>
<td align="left" rowspan="1" colspan="1">11,512</td>
<td align="char" char="." rowspan="1" colspan="1">84.98%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2B</bold>
</td>
<td align="left" rowspan="1" colspan="1">11,760,788</td>
<td align="left" rowspan="1" colspan="1">8,220,731</td>
<td align="left" rowspan="1" colspan="1">1,445,292</td>
<td align="left" rowspan="1" colspan="1">2,083,834</td>
<td align="left" rowspan="1" colspan="1">10,931</td>
<td align="char" char="." rowspan="1" colspan="1">82.19%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2C</bold>
</td>
<td align="left" rowspan="1" colspan="1">9,481,395</td>
<td align="left" rowspan="1" colspan="1">6,892,027</td>
<td align="left" rowspan="1" colspan="1">1,178,753</td>
<td align="left" rowspan="1" colspan="1">1,402,253</td>
<td align="left" rowspan="1" colspan="1">8,362</td>
<td align="char" char="." rowspan="1" colspan="1">85.12%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2D</bold>
</td>
<td align="left" rowspan="1" colspan="1">19,450,682</td>
<td align="left" rowspan="1" colspan="1">13,849,985</td>
<td align="left" rowspan="1" colspan="1">2,406,684</td>
<td align="left" rowspan="1" colspan="1">3,176,235</td>
<td align="left" rowspan="1" colspan="1">17,778</td>
<td align="char" char="." rowspan="1" colspan="1">83.58%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2E</bold>
</td>
<td align="left" rowspan="1" colspan="1">11,743,452</td>
<td align="left" rowspan="1" colspan="1">8,381,763</td>
<td align="left" rowspan="1" colspan="1">1,418,645</td>
<td align="left" rowspan="1" colspan="1">1,932,480</td>
<td align="left" rowspan="1" colspan="1">10,564</td>
<td align="char" char="." rowspan="1" colspan="1">83.45%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2F</bold>
</td>
<td align="left" rowspan="1" colspan="1">12,104,053</td>
<td align="left" rowspan="1" colspan="1">8,692,100</td>
<td align="left" rowspan="1" colspan="1">1,510,391</td>
<td align="left" rowspan="1" colspan="1">1,890,846</td>
<td align="left" rowspan="1" colspan="1">10,716</td>
<td align="char" char="." rowspan="1" colspan="1">84.29%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2G</bold>
</td>
<td align="left" rowspan="1" colspan="1">13,301,646</td>
<td align="left" rowspan="1" colspan="1">9,427,606</td>
<td align="left" rowspan="1" colspan="1">1,642,257</td>
<td align="left" rowspan="1" colspan="1">2,219,819</td>
<td align="left" rowspan="1" colspan="1">11,964</td>
<td align="char" char="." rowspan="1" colspan="1">83.22%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2H</bold>
</td>
<td align="left" rowspan="1" colspan="1">15,766,959</td>
<td align="left" rowspan="1" colspan="1">11,158,652</td>
<td align="left" rowspan="1" colspan="1">1,950,974</td>
<td align="left" rowspan="1" colspan="1">2,643,055</td>
<td align="left" rowspan="1" colspan="1">14,278</td>
<td align="char" char="." rowspan="1" colspan="1">83.15%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3A</bold>
</td>
<td align="left" rowspan="1" colspan="1">22,688,673</td>
<td align="left" rowspan="1" colspan="1">16,126,579</td>
<td align="left" rowspan="1" colspan="1">2,773,025</td>
<td align="left" rowspan="1" colspan="1">3,768,846</td>
<td align="left" rowspan="1" colspan="1">20,223</td>
<td align="char" char="." rowspan="1" colspan="1">83.30%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3B</bold>
</td>
<td align="left" rowspan="1" colspan="1">20,352,253</td>
<td align="left" rowspan="1" colspan="1">14,506,676</td>
<td align="left" rowspan="1" colspan="1">2,503,008</td>
<td align="left" rowspan="1" colspan="1">3,324,616</td>
<td align="left" rowspan="1" colspan="1">17,953</td>
<td align="char" char="." rowspan="1" colspan="1">83.58%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3C</bold>
</td>
<td align="left" rowspan="1" colspan="1">20,301,445</td>
<td align="left" rowspan="1" colspan="1">14,486,410</td>
<td align="left" rowspan="1" colspan="1">2,401,849</td>
<td align="left" rowspan="1" colspan="1">3,394,815</td>
<td align="left" rowspan="1" colspan="1">18,371</td>
<td align="char" char="." rowspan="1" colspan="1">83.19%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3D</bold>
</td>
<td align="left" rowspan="1" colspan="1">14,985,494</td>
<td align="left" rowspan="1" colspan="1">10,729,926</td>
<td align="left" rowspan="1" colspan="1">1,876,610</td>
<td align="left" rowspan="1" colspan="1">2,365,106</td>
<td align="left" rowspan="1" colspan="1">13,852</td>
<td align="char" char="." rowspan="1" colspan="1">84.12%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t006</object-id>
<label>Table 6</label>
<caption>
<title>Mapping statistics of the third set of RNA-Seq data of Drosophila melanogaster.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t006g" xlink:href="pone.0125000.t006"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique sites</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF1</bold>
</td>
<td align="left" rowspan="1" colspan="1">51,144,998</td>
<td align="left" rowspan="1" colspan="1">45,977,130</td>
<td align="left" rowspan="1" colspan="1">900,427</td>
<td align="left" rowspan="1" colspan="1">4,241,077</td>
<td align="left" rowspan="1" colspan="1">26,364</td>
<td align="char" char="." rowspan="1" colspan="1">91.66%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF2</bold>
</td>
<td align="left" rowspan="1" colspan="1">81,302,211</td>
<td align="left" rowspan="1" colspan="1">74,246,307</td>
<td align="left" rowspan="1" colspan="1">1,415,608</td>
<td align="left" rowspan="1" colspan="1">5,590,921</td>
<td align="left" rowspan="1" colspan="1">49,375</td>
<td align="char" char="." rowspan="1" colspan="1">93.06%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF3</bold>
</td>
<td align="left" rowspan="1" colspan="1">123,512,038</td>
<td align="left" rowspan="1" colspan="1">108,573,161</td>
<td align="left" rowspan="1" colspan="1">1,797,884</td>
<td align="left" rowspan="1" colspan="1">13,075,375</td>
<td align="left" rowspan="1" colspan="1">65,618</td>
<td align="char" char="." rowspan="1" colspan="1">89.36%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM1</bold>
</td>
<td align="left" rowspan="1" colspan="1">77,424,855</td>
<td align="left" rowspan="1" colspan="1">70,221,478</td>
<td align="left" rowspan="1" colspan="1">1,520,820</td>
<td align="left" rowspan="1" colspan="1">5,642,533</td>
<td align="left" rowspan="1" colspan="1">40,024</td>
<td align="char" char="." rowspan="1" colspan="1">92.66%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM2</bold>
</td>
<td align="left" rowspan="1" colspan="1">61,946,818</td>
<td align="left" rowspan="1" colspan="1">55,327,486</td>
<td align="left" rowspan="1" colspan="1">1,103,757</td>
<td align="left" rowspan="1" colspan="1">5,477,117</td>
<td align="left" rowspan="1" colspan="1">38,458</td>
<td align="char" char="." rowspan="1" colspan="1">91.10%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM3</bold>
</td>
<td align="left" rowspan="1" colspan="1">69,294,584</td>
<td align="left" rowspan="1" colspan="1">61,985,415</td>
<td align="left" rowspan="1" colspan="1">1,518,519</td>
<td align="left" rowspan="1" colspan="1">5,750,241</td>
<td align="left" rowspan="1" colspan="1">40,409</td>
<td align="char" char="." rowspan="1" colspan="1">91.64%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMF1</bold>
</td>
<td align="left" rowspan="1" colspan="1">85,587,833</td>
<td align="left" rowspan="1" colspan="1">78,370,743</td>
<td align="left" rowspan="1" colspan="1">1,323,717</td>
<td align="left" rowspan="1" colspan="1">5,849,024</td>
<td align="left" rowspan="1" colspan="1">44,349</td>
<td align="char" char="." rowspan="1" colspan="1">93.11%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMF2</bold>
</td>
<td align="left" rowspan="1" colspan="1">44,339,865</td>
<td align="left" rowspan="1" colspan="1">39,020,383</td>
<td align="left" rowspan="1" colspan="1">701,810</td>
<td align="left" rowspan="1" colspan="1">4,593,648</td>
<td align="left" rowspan="1" colspan="1">24,024</td>
<td align="char" char="." rowspan="1" colspan="1">89.59%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMF3</bold>
</td>
<td align="left" rowspan="1" colspan="1">75,974,183</td>
<td align="left" rowspan="1" colspan="1">68,654,562</td>
<td align="left" rowspan="1" colspan="1">1,302,500</td>
<td align="left" rowspan="1" colspan="1">5,973,758</td>
<td align="left" rowspan="1" colspan="1">43,363</td>
<td align="char" char="." rowspan="1" colspan="1">92.08%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMM1</bold>
</td>
<td align="left" rowspan="1" colspan="1">74,429,022</td>
<td align="left" rowspan="1" colspan="1">67,318,010</td>
<td align="left" rowspan="1" colspan="1">1,682,459</td>
<td align="left" rowspan="1" colspan="1">5,382,557</td>
<td align="left" rowspan="1" colspan="1">45,996</td>
<td align="char" char="." rowspan="1" colspan="1">92.71%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMM2</bold>
</td>
<td align="left" rowspan="1" colspan="1">68,985,281</td>
<td align="left" rowspan="1" colspan="1">61,450,714</td>
<td align="left" rowspan="1" colspan="1">1,302,852</td>
<td align="left" rowspan="1" colspan="1">6,195,229</td>
<td align="left" rowspan="1" colspan="1">36,486</td>
<td align="char" char="." rowspan="1" colspan="1">90.97%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>HMM3</bold>
</td>
<td align="left" rowspan="1" colspan="1">76,796,015</td>
<td align="left" rowspan="1" colspan="1">69,056,721</td>
<td align="left" rowspan="1" colspan="1">1,578,337</td>
<td align="left" rowspan="1" colspan="1">6,116,669</td>
<td align="left" rowspan="1" colspan="1">44,288</td>
<td align="char" char="." rowspan="1" colspan="1">91.98%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t007</object-id>
<label>Table 7</label>
<caption>
<title>Mapping statistics of the fourth set of RNA-Seq data of Drosophila melanogaster.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t007g" xlink:href="pone.0125000.t007"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique sites</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF1</bold>
</td>
<td align="left" rowspan="1" colspan="1">83,480,611</td>
<td align="left" rowspan="1" colspan="1">56,708,379</td>
<td align="left" rowspan="1" colspan="1">2,163,167</td>
<td align="left" rowspan="1" colspan="1">24,557,803</td>
<td align="left" rowspan="1" colspan="1">51,262</td>
<td align="char" char="." rowspan="1" colspan="1">70.52%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF2</bold>
</td>
<td align="left" rowspan="1" colspan="1">56,660,705</td>
<td align="left" rowspan="1" colspan="1">42,714,627</td>
<td align="left" rowspan="1" colspan="1">1,398,576</td>
<td align="left" rowspan="1" colspan="1">12,513,930</td>
<td align="left" rowspan="1" colspan="1">33,572</td>
<td align="char" char="." rowspan="1" colspan="1">77.86%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CF3</bold>
</td>
<td align="left" rowspan="1" colspan="1">67,314,472</td>
<td align="left" rowspan="1" colspan="1">50,492,765</td>
<td align="left" rowspan="1" colspan="1">1,681,644</td>
<td align="left" rowspan="1" colspan="1">15,100,773</td>
<td align="left" rowspan="1" colspan="1">39,290</td>
<td align="char" char="." rowspan="1" colspan="1">77.51%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM1</bold>
</td>
<td align="left" rowspan="1" colspan="1">50,000,247</td>
<td align="left" rowspan="1" colspan="1">38,470,206</td>
<td align="left" rowspan="1" colspan="1">1,206,401</td>
<td align="left" rowspan="1" colspan="1">10,291,223</td>
<td align="left" rowspan="1" colspan="1">32,417</td>
<td align="char" char="." rowspan="1" colspan="1">79.35%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM2</bold>
</td>
<td align="left" rowspan="1" colspan="1">70,869,571</td>
<td align="left" rowspan="1" colspan="1">53,559,942</td>
<td align="left" rowspan="1" colspan="1">1,657,406</td>
<td align="left" rowspan="1" colspan="1">15,608,111</td>
<td align="left" rowspan="1" colspan="1">44,112</td>
<td align="char" char="." rowspan="1" colspan="1">77.91%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>CM3</bold>
</td>
<td align="left" rowspan="1" colspan="1">68,530,284</td>
<td align="left" rowspan="1" colspan="1">51,799,947</td>
<td align="left" rowspan="1" colspan="1">1,627,155</td>
<td align="left" rowspan="1" colspan="1">15,061,138</td>
<td align="left" rowspan="1" colspan="1">42,044</td>
<td align="char" char="." rowspan="1" colspan="1">77.96%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>mF1</bold>
</td>
<td align="left" rowspan="1" colspan="1">78,004,841</td>
<td align="left" rowspan="1" colspan="1">61,015,420</td>
<td align="left" rowspan="1" colspan="1">2,721,937</td>
<td align="left" rowspan="1" colspan="1">14,235,541</td>
<td align="left" rowspan="1" colspan="1">31,943</td>
<td align="char" char="." rowspan="1" colspan="1">81.71%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>mF2</bold>
</td>
<td align="left" rowspan="1" colspan="1">51,629,214</td>
<td align="left" rowspan="1" colspan="1">40,273,082</td>
<td align="left" rowspan="1" colspan="1">1,573,248</td>
<td align="left" rowspan="1" colspan="1">9,745,829</td>
<td align="left" rowspan="1" colspan="1">37,055</td>
<td align="char" char="." rowspan="1" colspan="1">81.05%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>mF3</bold>
</td>
<td align="left" rowspan="1" colspan="1">75,882,842</td>
<td align="left" rowspan="1" colspan="1">59,657,154</td>
<td align="left" rowspan="1" colspan="1">2,740,131</td>
<td align="left" rowspan="1" colspan="1">13,454,327</td>
<td align="left" rowspan="1" colspan="1">31,230</td>
<td align="char" char="." rowspan="1" colspan="1">82.23%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t008</object-id>
<label>Table 8</label>
<caption>
<title>Mapping statistics of the fifth set of RNA-Seq data of Arabidopsis thaliana.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t008g" xlink:href="pone.0125000.t008"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique sites</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_1</bold>
</td>
<td align="left" rowspan="1" colspan="1">27,725,818</td>
<td align="left" rowspan="1" colspan="1">24,853,210</td>
<td align="left" rowspan="1" colspan="1">973,400</td>
<td align="left" rowspan="1" colspan="1">1,897,394</td>
<td align="left" rowspan="1" colspan="1">1,814</td>
<td align="char" char="." rowspan="1" colspan="1">93.15%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_2</bold>
</td>
<td align="left" rowspan="1" colspan="1">34,323,205</td>
<td align="left" rowspan="1" colspan="1">30,712,426</td>
<td align="left" rowspan="1" colspan="1">1,319,275</td>
<td align="left" rowspan="1" colspan="1">2,289,425</td>
<td align="left" rowspan="1" colspan="1">2,079</td>
<td align="char" char="." rowspan="1" colspan="1">93.32%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_3</bold>
</td>
<td align="left" rowspan="1" colspan="1">27,486,337</td>
<td align="left" rowspan="1" colspan="1">24,759,189</td>
<td align="left" rowspan="1" colspan="1">836,816</td>
<td align="left" rowspan="1" colspan="1">1,888,591</td>
<td align="left" rowspan="1" colspan="1">1,741</td>
<td align="char" char="." rowspan="1" colspan="1">93.12%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_1_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,555,221</td>
<td align="left" rowspan="1" colspan="1">15,965,329</td>
<td align="left" rowspan="1" colspan="1">636,099</td>
<td align="left" rowspan="1" colspan="1">953,687</td>
<td align="left" rowspan="1" colspan="1">106</td>
<td align="char" char="." rowspan="1" colspan="1">94.57%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_2_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">22,064,711</td>
<td align="left" rowspan="1" colspan="1">20,041,732</td>
<td align="left" rowspan="1" colspan="1">876,453</td>
<td align="left" rowspan="1" colspan="1">1,146,446</td>
<td align="left" rowspan="1" colspan="1">80</td>
<td align="char" char="." rowspan="1" colspan="1">94.80%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>Col_3_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,459,673</td>
<td align="left" rowspan="1" colspan="1">15,956,994</td>
<td align="left" rowspan="1" colspan="1">548,720</td>
<td align="left" rowspan="1" colspan="1">953,897</td>
<td align="left" rowspan="1" colspan="1">62</td>
<td align="char" char="." rowspan="1" colspan="1">94.54%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_1</bold>
</td>
<td align="left" rowspan="1" colspan="1">26,356,053</td>
<td align="left" rowspan="1" colspan="1">23,676,670</td>
<td align="left" rowspan="1" colspan="1">886,816</td>
<td align="left" rowspan="1" colspan="1">1,790,930</td>
<td align="left" rowspan="1" colspan="1">1,637</td>
<td align="char" char="." rowspan="1" colspan="1">93.20%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_2</bold>
</td>
<td align="left" rowspan="1" colspan="1">20,998,406</td>
<td align="left" rowspan="1" colspan="1">18,793,308</td>
<td align="left" rowspan="1" colspan="1">727,901</td>
<td align="left" rowspan="1" colspan="1">1,475,840</td>
<td align="left" rowspan="1" colspan="1">1,357</td>
<td align="char" char="." rowspan="1" colspan="1">92.97%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_3</bold>
</td>
<td align="left" rowspan="1" colspan="1">28,372,647</td>
<td align="left" rowspan="1" colspan="1">25,669,013</td>
<td align="left" rowspan="1" colspan="1">914,982</td>
<td align="left" rowspan="1" colspan="1">1,786,946</td>
<td align="left" rowspan="1" colspan="1">1,706</td>
<td align="char" char="." rowspan="1" colspan="1">93.70%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_1_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">16,641,162</td>
<td align="left" rowspan="1" colspan="1">15,168,566</td>
<td align="left" rowspan="1" colspan="1">578,226</td>
<td align="left" rowspan="1" colspan="1">894,325</td>
<td align="left" rowspan="1" colspan="1">45</td>
<td align="char" char="." rowspan="1" colspan="1">94.63%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_2_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">13,167,066</td>
<td align="left" rowspan="1" colspan="1">11,963,242</td>
<td align="left" rowspan="1" colspan="1">473,323</td>
<td align="left" rowspan="1" colspan="1">730,435</td>
<td align="left" rowspan="1" colspan="1">66</td>
<td align="char" char="." rowspan="1" colspan="1">94.45%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_3_qtrim</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,927,078</td>
<td align="left" rowspan="1" colspan="1">16,467,776</td>
<td align="left" rowspan="1" colspan="1">597,035</td>
<td align="left" rowspan="1" colspan="1">862,219</td>
<td align="left" rowspan="1" colspan="1">48</td>
<td align="char" char="." rowspan="1" colspan="1">95.19%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
<table-wrap id="pone.0125000.t009" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.t009</object-id>
<label>Table 9</label>
<caption>
<title>Mapping statistics of the sixth set of RNA-Seq data of human.</title>
</caption>
<alternatives>
<graphic id="pone.0125000.t009g" xlink:href="pone.0125000.t009"></graphic>
<table frame="hsides" rules="groups">
<colgroup span="1">
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
<col align="left" valign="middle" span="1"></col>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Samples</th>
<th align="left" rowspan="1" colspan="1"># reads</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to unique site</th>
<th align="left" rowspan="1" colspan="1"># reads mapped to multiple sites</th>
<th align="left" rowspan="1" colspan="1"># reads failed to map</th>
<th align="left" rowspan="1" colspan="1"># filtered reads</th>
<th align="left" rowspan="1" colspan="1">Percentage of reads mapped to genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>1Sfesrrb-2</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,448,758</td>
<td align="left" rowspan="1" colspan="1">16,392,693</td>
<td align="left" rowspan="1" colspan="1">692,905</td>
<td align="left" rowspan="1" colspan="1">363,022</td>
<td align="left" rowspan="1" colspan="1">138</td>
<td align="char" char="." rowspan="1" colspan="1">97.92%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>1Sfesrrb-3</bold>
</td>
<td align="left" rowspan="1" colspan="1">16,228,533</td>
<td align="left" rowspan="1" colspan="1">15,239,649</td>
<td align="left" rowspan="1" colspan="1">662,064</td>
<td align="left" rowspan="1" colspan="1">326,704</td>
<td align="left" rowspan="1" colspan="1">116</td>
<td align="char" char="." rowspan="1" colspan="1">97.99%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2pc3-1</bold>
</td>
<td align="left" rowspan="1" colspan="1">15,582,276</td>
<td align="left" rowspan="1" colspan="1">14,641,199</td>
<td align="left" rowspan="1" colspan="1">626,397</td>
<td align="left" rowspan="1" colspan="1">314,544</td>
<td align="left" rowspan="1" colspan="1">136</td>
<td align="char" char="." rowspan="1" colspan="1">97.98%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>2pc3-3</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,066,953</td>
<td align="left" rowspan="1" colspan="1">16,009,327</td>
<td align="left" rowspan="1" colspan="1">707,445</td>
<td align="left" rowspan="1" colspan="1">350,042</td>
<td align="left" rowspan="1" colspan="1">139</td>
<td align="char" char="." rowspan="1" colspan="1">97.95%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3DY131-1</bold>
</td>
<td align="left" rowspan="1" colspan="1">17,130,579</td>
<td align="left" rowspan="1" colspan="1">15,966,495</td>
<td align="left" rowspan="1" colspan="1">806,969</td>
<td align="left" rowspan="1" colspan="1">357,004</td>
<td align="left" rowspan="1" colspan="1">111</td>
<td align="char" char="." rowspan="1" colspan="1">97.92%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>3DY131-2</bold>
</td>
<td align="left" rowspan="1" colspan="1">15,500,204</td>
<td align="left" rowspan="1" colspan="1">14,623,868</td>
<td align="left" rowspan="1" colspan="1">576,060</td>
<td align="left" rowspan="1" colspan="1">300,168</td>
<td align="left" rowspan="1" colspan="1">108</td>
<td align="char" char="." rowspan="1" colspan="1">98.06%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>4ctrl-1</bold>
</td>
<td align="left" rowspan="1" colspan="1">19,117,412</td>
<td align="left" rowspan="1" colspan="1">17,885,236</td>
<td align="left" rowspan="1" colspan="1">858,147</td>
<td align="left" rowspan="1" colspan="1">373,922</td>
<td align="left" rowspan="1" colspan="1">107</td>
<td align="char" char="." rowspan="1" colspan="1">98.04%</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>4ctrl-2</bold>
</td>
<td align="left" rowspan="1" colspan="1">16,269,465</td>
<td align="left" rowspan="1" colspan="1">15,280,726</td>
<td align="left" rowspan="1" colspan="1">663,813</td>
<td align="left" rowspan="1" colspan="1">324,810</td>
<td align="left" rowspan="1" colspan="1">116</td>
<td align="char" char="." rowspan="1" colspan="1">98.00%</td>
</tr>
</tbody>
</table>
</alternatives>
</table-wrap>
</sec>
<sec id="sec010">
<title>2. Gene expression values calculated from the reads mapping data</title>
<p>RNAMiner removed reads that mapped to multiple locations on a reference genome from the mapping data. The gene expression values were calculated by Cufflinks [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], MULTICOM-MAP [
<xref rid="pone.0125000.ref014" ref-type="bibr">14</xref>
,
<xref rid="pone.0125000.ref015" ref-type="bibr">15</xref>
,
<xref rid="pone.0125000.ref016" ref-type="bibr">16</xref>
], and HTSeq [
<xref rid="pone.0125000.ref011" ref-type="bibr">11</xref>
] on the remaining RNA-Seq reads mapped to unique locations on the genome. Compiling reads mappings into gene expression values generates an overall profile of the expression levels of most genes in a transcriptome, which can reduce the size of dataset by about one thousand-fold (i.e., from gigabytes to megabytes) in our experiments. The compilation process transforms the raw data into meaningful expression profiles of genes. For example, three gene expression plots for comparisons between Control and each treatment in mutant mouse in the first data set are shown in
<xref rid="pone.0125000.g006" ref-type="fig">Fig 6</xref>
, and two gene expression plots for comparisons between 2A and 2B, between 2A and 3D in the second dataset are shown in
<xref rid="pone.0125000.g007" ref-type="fig">Fig 7</xref>
. In these figures, gene expression values calculated by MULTICOM-MAP were used, and the range of these values was constrained to [0, 100] while keeping the original ratios in order to make these figures readable. Usually, the points beyond the diagonal are candidates of differentially expressed genes.</p>
<fig id="pone.0125000.g006" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g006</object-id>
<label>Fig 6</label>
<caption>
<title>Three gene expression plots in the first data set.</title>
<p>These plots are for comparisons between Control and each treatment in mutant mouse. The x-axis represents Control and the y-axis represents CDDO treatment in A, Sutherlandia treatment in B, and Elderberry treatment in C. We used gene expression values calculated by MULTICOM-MAP to make the plots. The range of these values was constrained to [0, 100] while keeping the original ratios in order to make these figures readable.</p>
</caption>
<graphic xlink:href="pone.0125000.g006"></graphic>
</fig>
<fig id="pone.0125000.g007" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g007</object-id>
<label>Fig 7</label>
<caption>
<title>Two gene expression plots in the second data set.</title>
<p>These plots are for comparisons between 2A and 2B, between 2A and 3D. The x-axis represents 2A (Control, no FruHis, no Lycopene) and the y-axis represents 2B (FruHis = 1, no Lycopene) in A and 3D (FruHis = 16, with Lycopene) in B. We used gene expression values calculated by MULTICOM-MAP to make the plots. The range of these values was constrained to [0, 100] while keeping the original ratios in order to make these figures readable.</p>
</caption>
<graphic xlink:href="pone.0125000.g007"></graphic>
</fig>
<p>MULTICOM-MAP and HTSeq were used to calculate the raw read counts in the third and fourth sets of data. The counts were normalized by dividing them by the total number of uniquely mapped reads in the replicate. The normalized count of a gene was an indicator of the relative expression level of the gene in the replicate. The normalized counts of a gene in multiple replicates of a sample were further averaged and used as the measure of the relative expression level of the gene in the sample.
<xref rid="pone.0125000.g008" ref-type="fig">Fig 8</xref>
shows one gene expression plot for the comparison between CF (Control Female) and CM (Control Male) in the third data set. In this figure, gene expression values were calculated by MULTICOM-MAP, and the values were transformed by log
<sub>2</sub>
in order to make the figure readable. Two gene expression plots for the comparison between Col (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant) in the fifth data set are illustrated in
<xref rid="pone.0125000.g009" ref-type="fig">Fig 9</xref>
, and two gene expression plots for the comparison between 2pc3 and 1Sfesrrb in the sixth data set are illustrated in
<xref rid="pone.0125000.g010" ref-type="fig">Fig 10</xref>
. The left plot in each figure was generated from all the genes, and the right one was generated from differentially expressed genes. The gene expression values were calculated by MULTICOM-MAP and normalized by log
<sub>2</sub>
. According to the two plots in Figs
<xref rid="pone.0125000.g009" ref-type="fig">9</xref>
and
<xref rid="pone.0125000.g010" ref-type="fig">10</xref>
, the distribution of expression values of differentially expressed genes is quite different than that of the rest of the genes.</p>
<fig id="pone.0125000.g008" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g008</object-id>
<label>Fig 8</label>
<caption>
<title>One gene expression plot in the third data set.</title>
<p>The plot is for the comparison between CF and CM. The x-axis represents CF and the y-axis represents CM. We used gene expression values calculated by MULTICOM-MAP to make the plot. The raw counts were transformed by log
<sub>2</sub>
in order to make the figure readable.</p>
</caption>
<graphic xlink:href="pone.0125000.g008"></graphic>
</fig>
<fig id="pone.0125000.g009" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g009</object-id>
<label>Fig 9</label>
<caption>
<title>Two gene expression plots in the fifth data set.</title>
<p>These plots are for the comparison between Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3. The x-axis represents Col (Wild-Type) and the y-axis represents
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant). The left plot visualizes the expression values of all the genes, and the right one displays the expression values of differentially expressed genes. The gene expression values were calculated by MULTICOM-MAP. The raw counts were transformed by log
<sub>2</sub>
.</p>
</caption>
<graphic xlink:href="pone.0125000.g009"></graphic>
</fig>
<fig id="pone.0125000.g010" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g010</object-id>
<label>Fig 10</label>
<caption>
<title>Two gene expression plots in the sixth data set.</title>
<p>These plots are for the comparison between 2pc3 and 1Sfesrrb. The x-axis represents 2pc3 and the y-axis represents 1Sfesrrb. The left plot visualizes the expression values of all the genes, and the right one displays the expression values of differentially expressed genes. The gene expression values were calculated by MULTICOM-MAP. The raw counts were transformed by log
<sub>2</sub>
.</p>
</caption>
<graphic xlink:href="pone.0125000.g010"></graphic>
</fig>
</sec>
<sec id="sec011">
<title>3. Differentially expressed genes identified from the RNA-Seq data</title>
<p>RNAMiner identified differentially expressed genes between control and each treatment using Cuffdiff [
<xref rid="pone.0125000.ref010" ref-type="bibr">10</xref>
], edgeR [
<xref rid="pone.0125000.ref012" ref-type="bibr">12</xref>
], and DESeq [
<xref rid="pone.0125000.ref013" ref-type="bibr">13</xref>
]. The threshold of p-value was set to 0.05 to select differentially expressed genes. For example, the number of differentially expressed genes for each comparison and their overlaps in both mutant mouse and wild-type mouse in the first data set are shown in
<xref rid="pone.0125000.g011" ref-type="fig">Fig 11</xref>
. The number of differentially expressed genes for each comparison in the second data set is shown in
<xref rid="pone.0125000.g012" ref-type="fig">Fig 12</xref>
. These differentially expressed genes were derived from the overlaps of three sets of differentially expressed genes separately identified by Cuffdiff, MULTICOM-MAP+edgeR, and MULTICOM-MAP+DESeq. As shown in
<xref rid="pone.0125000.g012" ref-type="fig">Fig 12</xref>
, the number of differentially expressed genes increased with the increase of FruHis concentration in the absence or presence of 4 μM lycopene.</p>
<fig id="pone.0125000.g011" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g011</object-id>
<label>Fig 11</label>
<caption>
<title>The number of differentially expressed genes in the first data set.</title>
<p>These numbers were calculated for different pairs of comparisons between Control and each treatment, and their overlaps in both mutant mouse and wild-type mouse cells. The differentially expressed genes in each comparison were derived from the overlaps of three sets of differentially expressed genes generated by Cuffdiff, MULTICOM-MAP+edgeR, and MULTICOM-MAP+DESeq.</p>
</caption>
<graphic xlink:href="pone.0125000.g011"></graphic>
</fig>
<fig id="pone.0125000.g012" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g012</object-id>
<label>Fig 12</label>
<caption>
<title>The number of differentially expressed genes for each pairwise comparison in the second data set.</title>
<p>The differentially expressed genes in each comparison were derived from the overlaps of three sets of differentially expressed genes generated by Cuffdiff, MULTICOM-MAP+edgeR, and MULTICOM-MAP+DESeq.</p>
</caption>
<graphic xlink:href="pone.0125000.g012"></graphic>
</fig>
<p>The number of differentially expressed genes for two comparisons between Col (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant), between Col_qtrim (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim (mutant), and their overlaps in the fifth data set are shown in
<xref rid="pone.0125000.g013" ref-type="fig">Fig 13</xref>
. These differentially expressed genes were derived from the overlaps of three sets of differentially expressed genes generated separately by Cuffdiff, MULTICOM-MAP+edgeR, MULTICOM-MAP+DESeq. We also identified differentially expressed genes for two comparisons: between 2pc3 and 1Sfesrrb, between 4ctrl and 3DY131 in the sixth data set using edgeR based on read counts calculated by MULTICOM-MAP. EdgeR identified 6,210 differentially expressed genes for the comparison between 2pc3 and 1Sfesrrb, and 590 differentially expressed genes for the comparison between 4ctrl and 3DY131. On the RNAMiner web service, users can select different p-value (or q-value) thresholds to select a specific number of differentially expressed genes according to their needs. In addition to generating the testable biological hypotheses (e.g. gene targets for experimental testing), differential gene expression analysis generally reduces the size of data by about two folds, shifting point of interest from almost all the genes in a genome to a small portion of genes most relevant to the biological experiment.</p>
<fig id="pone.0125000.g013" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g013</object-id>
<label>Fig 13</label>
<caption>
<title>The number of differentially expressed genes in the fifth data set.</title>
<p>These numbers were calculated for two comparisons between Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3, between Col_qtrim and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim, and their overlap. The differentially expressed genes in each comparison were derived from the overlaps of three sets of differentially expressed genes generated by Cuffdiff, MULTICOM-MAP+edgeR, MULTICOM-MAP+DESeq.</p>
</caption>
<graphic xlink:href="pone.0125000.g013"></graphic>
</fig>
</sec>
<sec id="sec012">
<title>4. Predicted functions of differentially expressed genes</title>
<p>RNAMiner predicted functions of differentially expressed genes using MULTICOM-PDCN. The predicted function terms were ranked by their significance of enrichment among the differentially expressed genes. For example,
<xref rid="pone.0125000.g014" ref-type="fig">Fig 14</xref>
shows the top 10 most significantly enriched biological process functions for the comparison between Control and CDDO in both mutant mouse and wild-type mouse in the first data set. The two comparisons have 5 common biological processes in the top 10 biological processes. The top 10 biological process functions for two comparisons between 2A and 2F (FruHis = 16 without Lycopene), between 2A and 3D (FruHis = 16 with Lycopene) in the second data set are shown in
<xref rid="pone.0125000.g015" ref-type="fig">Fig 15</xref>
. The two comparisons have 3 common biological processes in the top 10 biological processes. The top 10 biological process functions for the comparison between Col (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant) in the fifth data set are reported in
<xref rid="pone.0125000.g016" ref-type="fig">Fig 16</xref>
, and the two comparisons share 8 common biological processes in the top 10 biological processes. In these figures, the number besides each column is p-value of the enrichment of each predicted function.</p>
<fig id="pone.0125000.g014" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g014</object-id>
<label>Fig 14</label>
<caption>
<title>Top 10 biological process functions in the first data set.</title>
<p>These functions were predicted for the comparison between Control and CDDO in both mutant mouse and wild-type mouse cells. Red bars denote the p-values of the top 10 predicted functions for wild-type mouse and blue bars denote the p-values of the top 10 predicted functions for mutant mouse. The number besides each bar is the significance of enrichment (p-value) of the predicted function. The p-value was calculated by Fisher’s exact test. The function names of wild-type mouse and mutant mouse are listed on the left separated by “/”. The two comparisons have five common biological processes among the top 10 biological processes.</p>
</caption>
<graphic xlink:href="pone.0125000.g014"></graphic>
</fig>
<fig id="pone.0125000.g015" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g015</object-id>
<label>Fig 15</label>
<caption>
<title>Top 10 biological process functions in the second data set.</title>
<p>These functions were predicted for two comparisons between 2A (Control, no FruHis, no Lycopene) and 2F (FruHis = 16 without Lycopene), between 2A and 3D (FruHis = 16 with Lycopene). Red bars denote the p-values of the top 10 predicted functions for the comparison between 2A and 3D and blue bars denote the p-values of the top 10 predicted functions for the comparison between 2A and 2F. The number besides each bar is the significance of enrichment (p-value) of the predicted function. The p-value was calculated by Fisher’s exact test. The function names of the two comparisons between 2A and 3D, between 2A and 2F are listed on the left separated by “/”. The two comparisons have three common biological processes among the top 10 biological processes.</p>
</caption>
<graphic xlink:href="pone.0125000.g015"></graphic>
</fig>
<fig id="pone.0125000.g016" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g016</object-id>
<label>Fig 16</label>
<caption>
<title>Top 10 biological process functions in the fifth data set.</title>
<p>These functions were predicted for two comparisons between Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3, between Col_qtrim and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim. Red bars denote the p-values of the top 10 predicted functions for the comparison between Col_qtrim and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim and blue bars denote the p-values of the top 10 predicted functions for the comparison between Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3. The number besides each bar is the significance of enrichment (p-value) of the predicted function. The p-value was calculated by Fisher’s exact test. The function names of the two comparisons between Col_qtrim and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3_qtrim, between Col and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 are listed on the left separated by “/”. The two comparisons have 8 common biological processes among the top 10 biological processes.</p>
</caption>
<graphic xlink:href="pone.0125000.g016"></graphic>
</fig>
<p>Although the step of gene function analysis does not substantially reduce the size of data physically, it can logically summarize hundreds of differentially expressed genes into a small number (i.e., tens) of biological processes activated or deactivated in the biological experiment which sheds light into the potential biological mechanism relevant to the experiment.</p>
</sec>
<sec id="sec013">
<title>5. Constructed gene regulatory networks</title>
<p>RNAMiner used MULTICOM-GNET [
<xref rid="pone.0125000.ref019" ref-type="bibr">19</xref>
,
<xref rid="pone.0125000.ref020" ref-type="bibr">20</xref>
] to construct gene regulatory networks based on differentially expressed genes and transcription factors. For example, a repression gene regulatory module with expression correlation 0.85 in mutant mouse in the first data set is illustrated in
<xref rid="pone.0125000.g017" ref-type="fig">Fig 17</xref>
. This module was comprised of 21 differentially expressed genes. Three transcription factors: Tgfb1i1, Htatip2, and Jun, were predicted to collaboratively regulate this group of genes. An activation gene regulatory module for the comparison between Col (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant) with expression correlation 0.85 in the fifth data set is shown in
<xref rid="pone.0125000.g018" ref-type="fig">Fig 18</xref>
. This module was comprised of 35 differentially expressed genes. Four transcription factors, AT3G59580, AT1G56650, AT1G28050, and AT1G52890, were predicted to collaboratively regulate this group of genes.</p>
<fig id="pone.0125000.g017" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g017</object-id>
<label>Fig 17</label>
<caption>
<title>One repression gene regulatory module in mutant mouse cells in the first data set.</title>
<p>The expression correlation score of the module was 0.85. The decision tree on the middle top illustrates how three putative transcription factors (Tgfb1i1, Htatip2, Jun) may collaboratively regulate the cluster of co-expressed genes in the middle bottom, where each row denotes a gene listed in the bottom left box and each column denotes one of four biological conditions (i.e. Control, CDDO, Sutherlandia, and Elderberry). The levels of gene expression values were represented by different colors ranging from lowest (green) to highest (red). The expression of the genes in the cluster under each condition is predicted to be regulated according to the expression levels of transcription factors listed on top of the condition. For example, under Sutherlandia treatment, the relatively low expression of Tgfbli1 and the medium expression of Jun caused the repression of the group of genes.</p>
</caption>
<graphic xlink:href="pone.0125000.g017"></graphic>
</fig>
<fig id="pone.0125000.g018" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g018</object-id>
<label>Fig 18</label>
<caption>
<title>One activation gene regulatory module in the fifth data set.</title>
<p>The gene regulatory module was constructed for the comparison between Col (Wild-Type) and
<italic>hae</italic>
-3
<italic>hsl</italic>
2-3 (mutant), and the expression correlation score of the module was 0.85. The decision tree on the middle top illustrates how four putative transcription factors (AT3G59580, AT1G28050, AT1G52890, AT1G56650) may collaboratively regulate the cluster of co-expressed genes in the middle bottom, where each row denotes a gene listed in the bottom left box and each column denotes one of six biological replicates of two samples (i.e. Col and
<italic>hae-3 hsl2-3</italic>
). The levels of gene expression values were represented by different colors ranging from lowest (green) to highest (red). The expression of the genes in the cluster under each sample is predicted to be regulated according to the expression levels of transcription factors listed on top of the condition. For example, under the first replicate of Col, the low expression of AT3G59580 and the low expression of AT1G28050 caused the repression of the group of genes.</p>
</caption>
<graphic xlink:href="pone.0125000.g018"></graphic>
</fig>
<p>RNAMiner also used a R package “igraph” [
<xref rid="pone.0125000.ref043" ref-type="bibr">43</xref>
] to visualize gene regulatory networks by linking the regulatory relationships between and within all the gene regulatory modules predicted by MULTICOM-GNET together.
<xref rid="pone.0125000.g019" ref-type="fig">Fig 19</xref>
shows a gene regulatory network representing the regulatory relationships of top 10 gene regulatory modules ranked by expression correlation scores on the first data set. There are 14 transcription factors (red nodes), 338 genes (blue nodes), and 1,280 edges (regulatory relationships) in the network.</p>
<fig id="pone.0125000.g019" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g019</object-id>
<label>Fig 19</label>
<caption>
<title>A visualized global gene regulatory network on the first dataset.</title>
<p>The network includes all the gene regulatory relationships between and within top 10 gene regulatory modules ranked by expression correlation scores on the first dataset. Blue nodes represent target genes, and red nodes represent transcription factors which regulate the target genes. Each edge represents a regulatory relationship between a transcription factor and a gene.</p>
</caption>
<graphic xlink:href="pone.0125000.g019"></graphic>
</fig>
<p>The step of gene regulatory network reconstruction condenses hundreds of differentially expressed genes and their expression data into dozens of valuable gene regulatory modules, which may reveal the underlying biological mechanism controlling the expression in the biological experiment. The network modules not only provide the human comprehensible interpretation of the gene expression levels, but also the important transcription factors and their target genes that are very valuable for generating hypotheses for new biological experiments.</p>
</sec>
</sec>
<sec id="sec014">
<title>Use of the RNAMiner Web Service</title>
<p>The RNAMiner web service (
<xref rid="pone.0125000.g020" ref-type="fig">Fig 20</xref>
) is available at
<ext-link ext-link-type="uri" xlink:href="http://calla.rnet.missouri.edu/rnaminer/index.html">http://calla.rnet.missouri.edu/rnaminer/index.html</ext-link>
. Users can submit requests on the home page and receive an email with a link to the data analysis results.</p>
<fig id="pone.0125000.g020" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g020</object-id>
<label>Fig 20</label>
<caption>
<title>The home page of the RNAMiner web service.</title>
<p>Users can submit requests on the home page of the RNAMiner web service, and also can learn how to use RNAMiner, find contact information, and download the test data by clicking the navigation buttons on left.</p>
</caption>
<graphic xlink:href="pone.0125000.g020"></graphic>
</fig>
<sec id="sec015">
<title>1. Submit a request</title>
<list list-type="order">
<list-item>
<p>Prepare RNA-Seq reads files (.fastq). The acceptable formats by the RNAMiner web service include “.fastq.gz” and “.fastq.tar.gz”.</p>
</list-item>
<list-item>
<p>Choose the analysis categories. Each category needs the results in the previous categories. If one category is chosen, the previous categories will be executed automatically. For example, if “predicting gene functions” is chosen, the first three categories will be executed automatically.</p>
</list-item>
<list-item>
<p>Choose the species. RNAMiner can analyze RNA-Seq data on four species:
<italic>Human</italic>
,
<italic>Mouse</italic>
,
<italic>Drosophila melanogaster</italic>
, and
<italic>Arabidopsis thaliana</italic>
.</p>
</list-item>
<list-item>
<p>Choose criterion of identifying differentially expressed genes. It is p-value or q-value.</p>
</list-item>
<list-item>
<p>Set threshold of p-value or q-value for identifying differentially expressed genes. The value should be between 0 and 1. The default value is 0.05.</p>
</list-item>
<list-item>
<p>Input email address. An email with a link to the data analysis results will be sent to this email address when the data analysis is finished.</p>
</list-item>
<list-item>
<p>Input sample names.</p>
</list-item>
<list-item>
<p>Upload reads files. The last three categories request users to upload reads files for both two samples. Users can upload more than one reads files for each sample.</p>
</list-item>
<list-item>
<p>Click “Submit”.</p>
</list-item>
</list>
<p>After a request is submitted successfully, one web page (
<xref rid="pone.0125000.g021" ref-type="fig">Fig 21</xref>
) will be shown saying the data is in process. If one user submitted one request to the RNAMiner web service and it is running or it is in the waiting queue, he/she cannot submit another request.</p>
<fig id="pone.0125000.g021" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g021</object-id>
<label>Fig 21</label>
<caption>
<title>One web page showing the successful submission of one request.</title>
<p>After one request is submitted successfully, one web page will be shown which informs users that the data is in process.</p>
</caption>
<graphic xlink:href="pone.0125000.g021"></graphic>
</fig>
</sec>
<sec id="sec016">
<title>2. Receive the results</title>
<p>When the data analysis is finished, users will receive an email with a link to one web page (
<xref rid="pone.0125000.g022" ref-type="fig">Fig 22</xref>
) with the data analysis information and a result link. The result page (
<xref rid="pone.0125000.g023" ref-type="fig">Fig 23</xref>
) will be shown by clicking the result link. Users can view and download the analysis data on the result page.</p>
<fig id="pone.0125000.g022" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g022</object-id>
<label>Fig 22</label>
<caption>
<title>One web page with data analysis information and a result link.</title>
<p>After the data analysis is finished, users will receive an email with a link to one web page with data analysis information and a result link. On this page, users can check the data analysis information and go to the result page.</p>
</caption>
<graphic xlink:href="pone.0125000.g022"></graphic>
</fig>
<fig id="pone.0125000.g023" orientation="portrait" position="float">
<object-id pub-id-type="doi">10.1371/journal.pone.0125000.g023</object-id>
<label>Fig 23</label>
<caption>
<title>One web page with data analysis results.</title>
<p>Users can view and download the data analysis results for each analysis category on this web page.</p>
</caption>
<graphic xlink:href="pone.0125000.g023"></graphic>
</fig>
<p>The time expense of analyzing a set of RNA-Seq data by RNAMiner depends on how big the data is, how many reads files there are in the data set, and how many jobs there are in the waiting queue. Normally a data analysis can be finished by RNAMiner in several hours. However, the time expense will be longer if there are a lot of jobs in the waiting queue. Our server cannot handle too many jobs at the same time because of CPU and space limitations.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="sec017">
<title>Conclusions</title>
<p>The RNAMiner protocol and pipeline can progressively reduce the size of large datasets to produce valuable and comprehensible biological knowledge of manageable size, ranging from gene expression values, differentially expressed genes, gene function predictions, and gene regulatory networks. The test results on six RNA-Seq datasets of four different species help demonstrate its utility and versatility.</p>
<p>In order to further improve the quality of RNA-Seq data analysis, additional tools can be plugged into the RNAMiner protocol. In the future, we will add a high-speed RNA mapping tool—Gsnap [
<xref rid="pone.0125000.ref025" ref-type="bibr">25</xref>
] and a high-accuracy RNA mapping tool—Stampy [
<xref rid="pone.0125000.ref045" ref-type="bibr">45</xref>
] into the pipeline to map RNA reads to reference genomes. For identifying differentially expressed genes, we will include baySeq [
<xref rid="pone.0125000.ref046" ref-type="bibr">46</xref>
], ShrinkSeq [
<xref rid="pone.0125000.ref047" ref-type="bibr">47</xref>
], and NOISeq [
<xref rid="pone.0125000.ref048" ref-type="bibr">48</xref>
] into the pipeline in order to handle various sources of noise in RNA-Seq data even better. Furthermore, we will include an in-house tool of constructing biological networks from a group of co-expressed genes to reconstruct highly valuable metabolic networks and signal transduction networks for gene clusters identified by the RNAMiner protocol. Moreover, we will add the capability of analyzing the function of non-coding small RNAs into RNAMiner and use the information during the reconstruction of biological networks. The new improvements will be incorporated into the RNAMiner web service for the community to use.</p>
</sec>
</body>
<back>
<ack>
<p>We would like to thank Kishore Banala for helping develop the web service.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="pone.0125000.ref001">
<label>1</label>
<mixed-citation publication-type="journal">
<name>
<surname>Fang</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Martin</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
(
<year>2012</year>
)
<article-title>Statistical methods for identifying differentially expressed genes in RNA-Seq experiments</article-title>
.
<source>Cell & Bioscience</source>
<volume>2</volume>
:
<fpage>26</fpage>
.
<pub-id pub-id-type="pmid">22849430</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref002">
<label>2</label>
<mixed-citation publication-type="journal">
<name>
<surname>Soneson</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Delorenzi</surname>
<given-names>M</given-names>
</name>
(
<year>2013</year>
)
<article-title>A comparison of methods for differential expression analysis of RNA-seq data</article-title>
.
<source>BMC bioinformatics</source>
<volume>14</volume>
:
<fpage>91</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-14-91">10.1186/1471-2105-14-91</ext-link>
</comment>
<pub-id pub-id-type="pmid">23497356</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref003">
<label>3</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>BA</given-names>
</name>
,
<name>
<surname>McCue</surname>
<given-names>K</given-names>
</name>
,
<name>
<surname>Schaeffer</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Wold</surname>
<given-names>B</given-names>
</name>
(
<year>2008</year>
)
<article-title>Mapping and quantifying mammalian transcriptomes by RNA-Seq</article-title>
.
<source>Nature methods</source>
<volume>5</volume>
:
<fpage>621</fpage>
<lpage>628</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nmeth.1226">10.1038/nmeth.1226</ext-link>
</comment>
<pub-id pub-id-type="pmid">18516045</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref004">
<label>4</label>
<mixed-citation publication-type="journal">
<name>
<surname>Chen</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Wang</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Shi</surname>
<given-names>T</given-names>
</name>
(
<year>2011</year>
)
<article-title>Overview of available methods for diverse RNA-Seq data analyses</article-title>
.
<source>Science China Life Sciences</source>
<volume>54</volume>
:
<fpage>1121</fpage>
<lpage>1128</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1007/s11427-011-4255-x">10.1007/s11427-011-4255-x</ext-link>
</comment>
<pub-id pub-id-type="pmid">22227904</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref005">
<label>5</label>
<mixed-citation publication-type="journal">
<name>
<surname>Oshlack</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Robinson</surname>
<given-names>MD</given-names>
</name>
,
<name>
<surname>Young</surname>
<given-names>MD</given-names>
</name>
(
<year>2010</year>
)
<article-title>From RNA-seq reads to differential expression results</article-title>
.
<source>Genome Biol</source>
<volume>11</volume>
:
<fpage>220</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/gb-2010-11-12-220">10.1186/gb-2010-11-12-220</ext-link>
</comment>
<pub-id pub-id-type="pmid">21176179</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref006">
<label>6</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Brutnell</surname>
<given-names>TP</given-names>
</name>
(
<year>2010</year>
)
<article-title>Exploring plant transcriptomes using ultra high-throughput sequencing</article-title>
.
<source>Briefings in Functional Genomics</source>
<volume>9</volume>
:
<fpage>118</fpage>
<lpage>128</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bfgp/elp057">10.1093/bfgp/elp057</ext-link>
</comment>
<pub-id pub-id-type="pmid">20130067</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref007">
<label>7</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kvam</surname>
<given-names>VM</given-names>
</name>
,
<name>
<surname>Liu</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Si</surname>
<given-names>Y</given-names>
</name>
(
<year>2012</year>
)
<article-title>A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data</article-title>
.
<source>American journal of botany</source>
<volume>99</volume>
:
<fpage>248</fpage>
<lpage>256</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3732/ajb.1100340">10.3732/ajb.1100340</ext-link>
</comment>
<pub-id pub-id-type="pmid">22268221</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref008">
<label>8</label>
<mixed-citation publication-type="journal">
<name>
<surname>Kim</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Pertea</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Pimentel</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Kelley</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
(
<year>2013</year>
)
<article-title>TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions</article-title>
.
<source>Genome Biol</source>
<volume>14</volume>
:
<fpage>R36</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/gb-2013-14-4-r36">10.1186/gb-2013-14-4-r36</ext-link>
</comment>
<pub-id pub-id-type="pmid">23618408</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref009">
<label>9</label>
<mixed-citation publication-type="journal">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
(
<year>2012</year>
)
<article-title>Fast gapped-read alignment with Bowtie 2</article-title>
.
<source>Nature methods</source>
<volume>9</volume>
:
<fpage>357</fpage>
<lpage>359</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nmeth.1923">10.1038/nmeth.1923</ext-link>
</comment>
<pub-id pub-id-type="pmid">22388286</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref010">
<label>10</label>
<mixed-citation publication-type="journal">
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Williams</surname>
<given-names>BA</given-names>
</name>
,
<name>
<surname>Pertea</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Mortazavi</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Kwan</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>van Baren</surname>
<given-names>MJ</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation</article-title>
.
<source>Nature biotechnology</source>
<volume>28</volume>
:
<fpage>511</fpage>
<lpage>515</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nbt.1621">10.1038/nbt.1621</ext-link>
</comment>
<pub-id pub-id-type="pmid">20436464</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref011">
<label>11</label>
<mixed-citation publication-type="other">Anders S (2010) HTSeq: Analysing high-throughput sequencing data with Python.
<ext-link ext-link-type="uri" xlink:href="http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html">http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref012">
<label>12</label>
<mixed-citation publication-type="journal">
<name>
<surname>Robertson</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Schein</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Chiu</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Corbett</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Field</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Jackman</surname>
<given-names>SD</given-names>
</name>
,
<etal>et al</etal>
(
<year>2010</year>
)
<article-title>De novo assembly and analysis of RNA-seq data</article-title>
.
<source>Nature methods</source>
<volume>7</volume>
:
<fpage>909</fpage>
<lpage>912</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nmeth.1517">10.1038/nmeth.1517</ext-link>
</comment>
<pub-id pub-id-type="pmid">20935650</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref013">
<label>13</label>
<mixed-citation publication-type="journal">
<name>
<surname>Anders</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Huber</surname>
<given-names>W</given-names>
</name>
(
<year>2010</year>
)
<article-title>Differential expression analysis for sequence count data</article-title>
.
<source>Genome Biology</source>
<volume>11</volume>
:
<fpage>R106</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/gb-2010-11-10-r106">10.1186/gb-2010-11-10-r106</ext-link>
</comment>
<pub-id pub-id-type="pmid">20979621</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref014">
<label>14</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sun</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Fernandez</surname>
<given-names>HR</given-names>
</name>
,
<name>
<surname>Donohue</surname>
<given-names>RC</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Birchler</surname>
<given-names>JA</given-names>
</name>
(
<year>2013</year>
)
<article-title>Male-specific lethal complex in Drosophila counteracts histone acetylation and does not mediate dosage compensation</article-title>
.
<source>Proceedings of the National Academy of Sciences</source>
<volume>110</volume>
:
<fpage>E808</fpage>
<lpage>E817</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.1222542110">10.1073/pnas.1222542110</ext-link>
</comment>
<pub-id pub-id-type="pmid">23382189</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref015">
<label>15</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sun</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Johnson</surname>
<given-names>AF</given-names>
</name>
,
<name>
<surname>Donohue</surname>
<given-names>RC</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Birchler</surname>
<given-names>JA</given-names>
</name>
(
<year>2013</year>
)
<article-title>Dosage compensation and inverse effects in triple X metafemales of Drosophila</article-title>
.
<source>Proceedings of the National Academy of Sciences</source>
<volume>110</volume>
:
<fpage>7383</fpage>
<lpage>7388</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.1305638110">10.1073/pnas.1305638110</ext-link>
</comment>
<pub-id pub-id-type="pmid">23589863</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref016">
<label>16</label>
<mixed-citation publication-type="journal">
<name>
<surname>Sun</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Johnson</surname>
<given-names>AF</given-names>
</name>
,
<name>
<surname>Li</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Lambdin</surname>
<given-names>AS</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Birchler</surname>
<given-names>JA</given-names>
</name>
(
<year>2013</year>
)
<article-title>Differential effect of aneuploidy on the X chromosome and genes with sex-biased expression in Drosophila</article-title>
.
<source>Proceedings of the National Academy of Sciences</source>
<volume>110</volume>
:
<fpage>16514</fpage>
<lpage>16519</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1073/pnas.1316041110">10.1073/pnas.1316041110</ext-link>
</comment>
<pub-id pub-id-type="pmid">24062456</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref017">
<label>17</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Cao</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
(
<year>2013</year>
)
<article-title>Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks</article-title>
.
<source>BMC bioinformatics</source>
<volume>14</volume>
:
<fpage>S3</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-14-S16-S3">10.1186/1471-2105-14-S16-S3</ext-link>
</comment>
<pub-id pub-id-type="pmid">24564553</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref018">
<label>18</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>XC</given-names>
</name>
,
<name>
<surname>Le</surname>
<given-names>MH</given-names>
</name>
,
<name>
<surname>Xu</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Stacey</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
(
<year>2011</year>
)
<article-title>A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny</article-title>
.
<source>PloS one</source>
<volume>6</volume>
:
<fpage>e17906</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1371/journal.pone.0017906">10.1371/journal.pone.0017906</ext-link>
</comment>
<pub-id pub-id-type="pmid">21455299</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref019">
<label>19</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhu</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Deng</surname>
<given-names>X</given-names>
</name>
,
<name>
<surname>Joshi</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Xu</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Stacey</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
(
<year>2012</year>
)
<article-title>Reconstructing differentially co-expressed gene modules and regulatory networks of soybean cells</article-title>
.
<source>BMC genomics</source>
<volume>13</volume>
:
<fpage>437</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2164-13-437">10.1186/1471-2164-13-437</ext-link>
</comment>
<pub-id pub-id-type="pmid">22938179</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref020">
<label>20</label>
<mixed-citation publication-type="journal">
<name>
<surname>Zhu</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Dahmen</surname>
<given-names>JL</given-names>
</name>
,
<name>
<surname>Stacey</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Cheng</surname>
<given-names>J</given-names>
</name>
(
<year>2013</year>
)
<article-title>Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data</article-title>
.
<source>BMC bioinformatics</source>
<volume>14</volume>
:
<fpage>278</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-14-278">10.1186/1471-2105-14-278</ext-link>
</comment>
<pub-id pub-id-type="pmid">24053776</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref021">
<label>21</label>
<mixed-citation publication-type="journal">
<name>
<surname>Giardine</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Riemer</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Hardison</surname>
<given-names>RC</given-names>
</name>
,
<name>
<surname>Burhans</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Elnitski</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Shah</surname>
<given-names>P</given-names>
</name>
,
<etal>et al</etal>
(
<year>2005</year>
)
<article-title>Galaxy: a platform for interactive large-scale genome analysis</article-title>
.
<source>Genome research</source>
<volume>15</volume>
:
<fpage>1451</fpage>
<lpage>1455</lpage>
.
<pub-id pub-id-type="pmid">16169926</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref022">
<label>22</label>
<mixed-citation publication-type="other">Department of Energy Systems Biology Knowledgebase (KBase).
<ext-link ext-link-type="uri" xlink:href="http://kbase.us">http://kbase.us</ext-link>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref023">
<label>23</label>
<mixed-citation publication-type="journal">
<name>
<surname>Goff</surname>
<given-names>SA</given-names>
</name>
,
<name>
<surname>Vaughn</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>McKay</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>Lyons</surname>
<given-names>E</given-names>
</name>
,
<name>
<surname>Stapleton</surname>
<given-names>AE</given-names>
</name>
,
<name>
<surname>Gessler</surname>
<given-names>D</given-names>
</name>
,
<etal>et al</etal>
(
<year>2011</year>
)
<article-title>The iPlant Collaborative: Cyberinfrastructure for Plant Biology</article-title>
.
<source>Frontiers in plant science</source>
<volume>2</volume>
:
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.3389/fpls.2011.00034">10.3389/fpls.2011.00034</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref024">
<label>24</label>
<mixed-citation publication-type="journal">
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
(
<year>2009</year>
)
<article-title>TopHat: discovering splice junctions with RNA-Seq</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
:
<fpage>1105</fpage>
<lpage>1111</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btp120">10.1093/bioinformatics/btp120</ext-link>
</comment>
<pub-id pub-id-type="pmid">19289445</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref025">
<label>25</label>
<mixed-citation publication-type="journal">
<name>
<surname>Wu</surname>
<given-names>TD</given-names>
</name>
,
<name>
<surname>Nacu</surname>
<given-names>S</given-names>
</name>
(
<year>2010</year>
)
<article-title>Fast and SNP-tolerant detection of complex variants and splicing in short reads</article-title>
.
<source>Bioinformatics</source>
<volume>26</volume>
:
<fpage>873</fpage>
<lpage>881</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btq057">10.1093/bioinformatics/btq057</ext-link>
</comment>
<pub-id pub-id-type="pmid">20147302</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref026">
<label>26</label>
<mixed-citation publication-type="journal">
<name>
<surname>Karolchik</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Baertsch</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Diekhans</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
,
<name>
<surname>Hinrichs</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Lu</surname>
<given-names>YT</given-names>
</name>
,
<etal>et al</etal>
(
<year>2003</year>
)
<article-title>The UCSC genome browser database</article-title>
.
<source>Nucleic acids research</source>
<volume>31</volume>
:
<fpage>51</fpage>
<lpage>54</lpage>
.
<pub-id pub-id-type="pmid">12519945</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref027">
<label>27</label>
<mixed-citation publication-type="journal">
<name>
<surname>Pruitt</surname>
<given-names>KD</given-names>
</name>
,
<name>
<surname>Tatusova</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Maglott</surname>
<given-names>DR</given-names>
</name>
(
<year>2007</year>
)
<article-title>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</article-title>
.
<source>Nucleic acids research</source>
<volume>35</volume>
:
<fpage>D61</fpage>
<lpage>D65</lpage>
.
<pub-id pub-id-type="pmid">17130148</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref028">
<label>28</label>
<mixed-citation publication-type="journal">
<name>
<surname>Li</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Handsaker</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Wysoker</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Fennell</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Ruan</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Homer</surname>
<given-names>N</given-names>
</name>
,
<etal>et al</etal>
(
<year>2009</year>
)
<article-title>The sequence alignment/map format and SAMtools</article-title>
.
<source>Bioinformatics</source>
<volume>25</volume>
:
<fpage>2078</fpage>
<lpage>2079</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/bioinformatics/btp352">10.1093/bioinformatics/btp352</ext-link>
</comment>
<pub-id pub-id-type="pmid">19505943</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref029">
<label>29</label>
<mixed-citation publication-type="journal">
<name>
<surname>Radivojac</surname>
<given-names>P</given-names>
</name>
,
<name>
<surname>Clark</surname>
<given-names>WT</given-names>
</name>
,
<name>
<surname>Oron</surname>
<given-names>TR</given-names>
</name>
,
<name>
<surname>Schnoes</surname>
<given-names>AM</given-names>
</name>
,
<name>
<surname>Wittkop</surname>
<given-names>T</given-names>
</name>
,
<name>
<surname>Sokolov</surname>
<given-names>A</given-names>
</name>
,
<etal>et al</etal>
(
<year>2013</year>
)
<article-title>A large-scale evaluation of computational protein function prediction</article-title>
.
<source>Nature methods</source>
<volume>10</volume>
:
<fpage>221</fpage>
<lpage>227</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/nmeth.2340">10.1038/nmeth.2340</ext-link>
</comment>
<pub-id pub-id-type="pmid">23353650</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref030">
<label>30</label>
<mixed-citation publication-type="journal">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
,
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
,
<name>
<surname>Schäffer</surname>
<given-names>AA</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
,
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
,
<etal>et al</etal>
(
<year>1997</year>
)
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
.
<source>Nucleic acids research</source>
<volume>25</volume>
:
<fpage>3389</fpage>
<lpage>3402</lpage>
.
<pub-id pub-id-type="pmid">9254694</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref031">
<label>31</label>
<mixed-citation publication-type="journal">
<name>
<surname>Soding</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Biegert</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Lupas</surname>
<given-names>A</given-names>
</name>
(
<year>2005</year>
)
<article-title>The HHpred interactive server for protein homology detection and structure prediction</article-title>
.
<source>Nucleic Acids Research</source>
<volume>33</volume>
:
<fpage>W244</fpage>
<lpage>W248</lpage>
.
<pub-id pub-id-type="pmid">15980461</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref032">
<label>32</label>
<mixed-citation publication-type="journal">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
,
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
,
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
,
<name>
<surname>Butler</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Cherry</surname>
<given-names>JM</given-names>
</name>
,
<etal>et al</etal>
(
<year>2000</year>
)
<article-title>Gene ontology: tool for the unification of biology</article-title>
.
<source>Nature Genetics</source>
<volume>25</volume>
:
<fpage>25</fpage>
<lpage>29</lpage>
.
<pub-id pub-id-type="pmid">10802651</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref033">
<label>33</label>
<mixed-citation publication-type="journal">
<name>
<surname>Boeckmann</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Bairoch</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Blatter</surname>
<given-names>MC</given-names>
</name>
,
<name>
<surname>Estreicher</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Gasteiger</surname>
<given-names>E</given-names>
</name>
,
<etal>et al</etal>
(
<year>2003</year>
)
<article-title>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</article-title>
.
<source>Nucleic Acids Research</source>
<volume>31</volume>
:
<fpage>365</fpage>
<lpage>370</lpage>
.
<pub-id pub-id-type="pmid">12520024</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref034">
<label>34</label>
<mixed-citation publication-type="journal">
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Coin</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Durbin</surname>
<given-names>R</given-names>
</name>
,
<name>
<surname>Finn</surname>
<given-names>RD</given-names>
</name>
,
<name>
<surname>Hollich</surname>
<given-names>V</given-names>
</name>
,
<name>
<surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
,
<etal>et al</etal>
(
<year>2004</year>
)
<article-title>The Pfam protein families database</article-title>
.
<source>Nucleic Acids Research</source>
<volume>32</volume>
:
<fpage>276</fpage>
<lpage>280</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref035">
<label>35</label>
<mixed-citation publication-type="book">
<name>
<surname>Agresti</surname>
<given-names>A</given-names>
</name>
(
<year>1990</year>
)
<chapter-title>Categorical data analysis</chapter-title>
<publisher-loc>New York</publisher-loc>
:
<publisher-name>Wiley</publisher-name>
Pages
<fpage>59</fpage>
<lpage>66</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref036">
<label>36</label>
<mixed-citation publication-type="book">
<name>
<surname>Agresti</surname>
<given-names>A</given-names>
</name>
(
<year>2002</year>
)
<chapter-title>Categorical data analysis</chapter-title>
<edition>Second edition</edition>
<publisher-loc>New York</publisher-loc>
:
<publisher-name>Wiley</publisher-name>
Pages
<fpage>91</fpage>
<lpage>101</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref037">
<label>37</label>
<mixed-citation publication-type="journal">
<name>
<surname>Fisher</surname>
<given-names>RA</given-names>
</name>
(
<year>1935</year>
)
<article-title>The logic of inductive inference</article-title>
.
<source>Journal of the Royal Statistical Society Series A</source>
<volume>98</volume>
:
<fpage>39</fpage>
<lpage>54</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref038">
<label>38</label>
<mixed-citation publication-type="journal">
<name>
<surname>Fisher</surname>
<given-names>RA</given-names>
</name>
(
<year>1962</year>
)
<article-title>Confidence limits for a cross-product ratio</article-title>
.
<source>Australian Journal of Statistics</source>
<volume>4</volume>
:
<fpage>41</fpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref039">
<label>39</label>
<mixed-citation publication-type="book">
<name>
<surname>Fisher</surname>
<given-names>RA</given-names>
</name>
(
<year>1970</year>
)
<source>Statistical Methods for Research Workers</source>
.
<publisher-name>Oliver & Boyd</publisher-name>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref040">
<label>40</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mehta</surname>
<given-names>CR</given-names>
</name>
,
<name>
<surname>Patel</surname>
<given-names>NR</given-names>
</name>
(
<year>1986</year>
)
<article-title>Algorithm 643: FEXACT: a Fortran subroutine for Fisher's exact test on unordered r*c contingency tables</article-title>
.
<source>ACM Transactions on Mathematical Software (TOMS)</source>
<volume>12</volume>
:
<fpage>154</fpage>
<lpage>161</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref041">
<label>41</label>
<mixed-citation publication-type="journal">
<name>
<surname>Clarkson</surname>
<given-names>DB</given-names>
</name>
,
<name>
<surname>Fan</surname>
<given-names>Y</given-names>
</name>
,
<name>
<surname>Joe</surname>
<given-names>H</given-names>
</name>
(
<year>1993</year>
)
<article-title>A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables</article-title>
.
<source>ACM Transactions on Mathematical Software (TOMS)</source>
<volume>19</volume>
:
<fpage>484</fpage>
<lpage>488</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref042">
<label>42</label>
<mixed-citation publication-type="journal">
<name>
<surname>Patefield</surname>
<given-names>WM</given-names>
</name>
(
<year>1981</year>
)
<article-title>Algorithm AS159. An efficient method of generating r x c tables with given row and column totals</article-title>
.
<source>Applied Statistics</source>
<volume>30</volume>
:
<fpage>91</fpage>
<lpage>97</lpage>
.</mixed-citation>
</ref>
<ref id="pone.0125000.ref043">
<label>43</label>
<mixed-citation publication-type="other">Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Systems: 1695.</mixed-citation>
</ref>
<ref id="pone.0125000.ref044">
<label>44</label>
<mixed-citation publication-type="journal">
<name>
<surname>Langmead</surname>
<given-names>B</given-names>
</name>
,
<name>
<surname>Trapnell</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Pop</surname>
<given-names>M</given-names>
</name>
,
<name>
<surname>Salzberg</surname>
<given-names>SL</given-names>
</name>
(
<year>2009</year>
)
<article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title>
.
<source>Genome Biol</source>
<volume>10</volume>
:
<fpage>R25</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/gb-2009-10-3-r25">10.1186/gb-2009-10-3-r25</ext-link>
</comment>
<pub-id pub-id-type="pmid">19261174</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref045">
<label>45</label>
<mixed-citation publication-type="journal">
<name>
<surname>Lunter</surname>
<given-names>G</given-names>
</name>
,
<name>
<surname>Goodson</surname>
<given-names>M</given-names>
</name>
(
<year>2011</year>
)
<article-title>Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>936</fpage>
<lpage>939</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.111120.110">10.1101/gr.111120.110</ext-link>
</comment>
<pub-id pub-id-type="pmid">20980556</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref046">
<label>46</label>
<mixed-citation publication-type="journal">
<name>
<surname>Hardcastle</surname>
<given-names>TJ</given-names>
</name>
,
<name>
<surname>Kelly</surname>
<given-names>KA</given-names>
</name>
(
<year>2010</year>
)
<article-title>baySeq: empirical Bayesian methods for identifying differential expression in sequence count data</article-title>
.
<source>BMC bioinformatics</source>
<volume>11</volume>
:
<fpage>422</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2105-11-422">10.1186/1471-2105-11-422</ext-link>
</comment>
<pub-id pub-id-type="pmid">20698981</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref047">
<label>47</label>
<mixed-citation publication-type="journal">
<name>
<surname>Van De Wiel</surname>
<given-names>MA</given-names>
</name>
,
<name>
<surname>Leday</surname>
<given-names>GG</given-names>
</name>
,
<name>
<surname>Pardo</surname>
<given-names>L</given-names>
</name>
,
<name>
<surname>Rue</surname>
<given-names>H</given-names>
</name>
,
<name>
<surname>Van Der Vaart</surname>
<given-names>AW</given-names>
</name>
,
<name>
<surname>Van Wieringen</surname>
<given-names>WN</given-names>
</name>
(
<year>2013</year>
)
<article-title>Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors</article-title>
.
<source>Biostatistics</source>
<volume>14</volume>
:
<fpage>113</fpage>
<lpage>128</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1093/biostatistics/kxs031">10.1093/biostatistics/kxs031</ext-link>
</comment>
<pub-id pub-id-type="pmid">22988280</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref048">
<label>48</label>
<mixed-citation publication-type="journal">
<name>
<surname>Tarazona</surname>
<given-names>S</given-names>
</name>
,
<name>
<surname>García-Alcalde</surname>
<given-names>F</given-names>
</name>
,
<name>
<surname>Dopazo</surname>
<given-names>J</given-names>
</name>
,
<name>
<surname>Ferrer</surname>
<given-names>A</given-names>
</name>
,
<name>
<surname>Conesa</surname>
<given-names>A</given-names>
</name>
(
<year>2011</year>
)
<article-title>Differential expression in RNA-seq: a matter of depth</article-title>
.
<source>Genome research</source>
<volume>21</volume>
:
<fpage>2213</fpage>
<lpage>2223</lpage>
.
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1101/gr.124321.111">10.1101/gr.124321.111</ext-link>
</comment>
<pub-id pub-id-type="pmid">21903743</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref049">
<label>49</label>
<mixed-citation publication-type="journal">
<name>
<surname>Foster</surname>
<given-names>BA</given-names>
</name>
,
<name>
<surname>Gingrich</surname>
<given-names>JR</given-names>
</name>
,
<name>
<surname>Kwon</surname>
<given-names>ED</given-names>
</name>
,
<name>
<surname>Madias</surname>
<given-names>C</given-names>
</name>
,
<name>
<surname>Greenberg</surname>
<given-names>NM</given-names>
</name>
(
<year>1997</year>
)
<article-title>Characterization of prostatic epithelial cell lines derived from transgenic adenocarcinoma of the mouse prostate (TRAMP) model</article-title>
.
<source>Cancer research</source>
<volume>57</volume>
:
<fpage>3325</fpage>
<lpage>3330</lpage>
.
<pub-id pub-id-type="pmid">9269988</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref050">
<label>50</label>
<mixed-citation publication-type="journal">
<name>
<surname>Mossine</surname>
<given-names>VV</given-names>
</name>
,
<name>
<surname>Mawhinney</surname>
<given-names>TP</given-names>
</name>
(
<year>2007</year>
)
<article-title>
<italic>N</italic>
<sup>α</sup>
-(1-Deoxy-D-fructos-1-yl)-L-histidine (“D-fructose-L-histidine”): a potent copper chelator from tomato powder</article-title>
.
<source>Journal of agricultural and food chemistry</source>
<volume>55</volume>
:
<fpage>10373</fpage>
<lpage>10381</lpage>
.
<pub-id pub-id-type="pmid">18004802</pub-id>
</mixed-citation>
</ref>
<ref id="pone.0125000.ref051">
<label>51</label>
<mixed-citation publication-type="journal">
<name>
<surname>Niederhuth</surname>
<given-names>CE</given-names>
</name>
,
<name>
<surname>Patharkar</surname>
<given-names>OR</given-names>
</name>
,
<name>
<surname>Walker</surname>
<given-names>JC</given-names>
</name>
(
<year>2013</year>
)
<article-title>Transcriptional profiling of the Arabidopsis abscission mutant
<italic>hae hsl</italic>
2 by RNA-Seq</article-title>
.
<source>BMC genomics</source>
<volume>14</volume>
:
<fpage>37</fpage>
<comment>doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1186/1471-2164-14-37">10.1186/1471-2164-14-37</ext-link>
</comment>
<pub-id pub-id-type="pmid">23327667</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000073 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000073 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4406561
   |texte=   From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:25902288" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024