Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0002750 ( Pmc/Corpus ); précédent : 0002749; suivant : 0002751 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow</title>
<author>
<name sortKey="Saggese, Igor" sort="Saggese, Igor" uniqKey="Saggese I" first="Igor" last="Saggese">Igor Saggese</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bona, Elisa" sort="Bona, Elisa" uniqKey="Bona E" first="Elisa" last="Bona">Elisa Bona</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Conway, Max" sort="Conway, Max" uniqKey="Conway M" first="Max" last="Conway">Max Conway</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121885934</institution-id>
<institution-id institution-id-type="GRID">grid.5335.0</institution-id>
<institution>Computer Laboratory,</institution>
<institution>University of Cambridge,</institution>
</institution-wrap>
Cambridge, CB2 1TN UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Favero, Francesco" sort="Favero, Francesco" uniqKey="Favero F" first="Francesco" last="Favero">Francesco Favero</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze della Salute,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
28100 Novara, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ladetto, Marco" sort="Ladetto, Marco" uniqKey="Ladetto M" first="Marco" last="Ladetto">Marco Ladetto</name>
<affiliation>
<nlm:aff id="Aff4">AO SS Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff6">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2336 6580</institution-id>
<institution-id institution-id-type="GRID">grid.7605.4</institution-id>
<institution>Dipartimento di Biotecnologie e Scienze per la Salute,</institution>
<institution>Università di Torino,</institution>
</institution-wrap>
10124 Torino, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Pietro" sort="Li, Pietro" uniqKey="Li P" first="Pietro" last="Li">Pietro Li</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121885934</institution-id>
<institution-id institution-id-type="GRID">grid.5335.0</institution-id>
<institution>Computer Laboratory,</institution>
<institution>University of Cambridge,</institution>
</institution-wrap>
Cambridge, CB2 1TN UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Manzini, Giovanni" sort="Manzini, Giovanni" uniqKey="Manzini G" first="Giovanni" last="Manzini">Giovanni Manzini</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1940 4177</institution-id>
<institution-id institution-id-type="GRID">grid.5326.2</institution-id>
<institution>Istituto di Informatica e Telematica, CNR,</institution>
</institution-wrap>
56124 Pisa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mignone, Flavio" sort="Mignone, Flavio" uniqKey="Mignone F" first="Flavio" last="Mignone">Flavio Mignone</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30066630</idno>
<idno type="pmc">6069750</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069750</idno>
<idno type="RBID">PMC:6069750</idno>
<idno type="doi">10.1186/s12859-018-2174-6</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000275</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000275</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow</title>
<author>
<name sortKey="Saggese, Igor" sort="Saggese, Igor" uniqKey="Saggese I" first="Igor" last="Saggese">Igor Saggese</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Bona, Elisa" sort="Bona, Elisa" uniqKey="Bona E" first="Elisa" last="Bona">Elisa Bona</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Conway, Max" sort="Conway, Max" uniqKey="Conway M" first="Max" last="Conway">Max Conway</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121885934</institution-id>
<institution-id institution-id-type="GRID">grid.5335.0</institution-id>
<institution>Computer Laboratory,</institution>
<institution>University of Cambridge,</institution>
</institution-wrap>
Cambridge, CB2 1TN UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Favero, Francesco" sort="Favero, Francesco" uniqKey="Favero F" first="Francesco" last="Favero">Francesco Favero</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze della Salute,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
28100 Novara, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ladetto, Marco" sort="Ladetto, Marco" uniqKey="Ladetto M" first="Marco" last="Ladetto">Marco Ladetto</name>
<affiliation>
<nlm:aff id="Aff4">AO SS Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff6">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2336 6580</institution-id>
<institution-id institution-id-type="GRID">grid.7605.4</institution-id>
<institution>Dipartimento di Biotecnologie e Scienze per la Salute,</institution>
<institution>Università di Torino,</institution>
</institution-wrap>
10124 Torino, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Pietro" sort="Li, Pietro" uniqKey="Li P" first="Pietro" last="Li">Pietro Li</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121885934</institution-id>
<institution-id institution-id-type="GRID">grid.5335.0</institution-id>
<institution>Computer Laboratory,</institution>
<institution>University of Cambridge,</institution>
</institution-wrap>
Cambridge, CB2 1TN UK</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Manzini, Giovanni" sort="Manzini, Giovanni" uniqKey="Manzini G" first="Giovanni" last="Manzini">Giovanni Manzini</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1940 4177</institution-id>
<institution-id institution-id-type="GRID">grid.5326.2</institution-id>
<institution>Istituto di Informatica e Telematica, CNR,</institution>
</institution-wrap>
56124 Pisa, Italy</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mignone, Flavio" sort="Mignone, Flavio" uniqKey="Mignone F" first="Flavio" last="Mignone">Flavio Mignone</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p id="Par1">De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis workflow is the assembly of reads to reconstruct transcripts thus reducing the complexity of the analysis. Despite many available tools show a good sensitivity, there is a high percentage of false positives due to the high number of assemblies considered and it is likely that the high frequency of false positive is underestimated by currently used benchmarks. The reconstruction of not existing transcripts may false the biological interpretation of results as – for example – may overestimate the identification of “novel” transcripts. Moreover, benchmarks performed are usually based on RNA-seq data from annotated genomes and assembled transcripts are compared to annotations and genomes to identify putative good and wrong reconstructions, but these tests alone may lead to accept a particular type of false positive as true, as better described below.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">Here we present a novel methodology of de novo assembly, implemented in a software named STAble (Short-reads Transcriptome Assembler). The novel concept of this assembler is that the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of reducing the number of chimeras produced. Furthermore, we applied a new set of benchmarks based on simulated data to better define the performance of assembly method and carefully identifying true reconstructions.</p>
<p id="Par3">STAble was also used to build a prototype workflow to analyse metatranscriptomics data in connection to a steady state metabolic modelling algorithm. This algorithm was used to produce high quality metabolic interpretations of small gene expression sets obtained from already published RNA-seq data that we assembled with STAble.</p>
</sec>
<sec>
<title>Conclusions</title>
<p id="Par4">The presented results, albeit preliminary, clearly suggest that with this approach is possible to identify informative reactions not directly revealed by raw transcriptomic data.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author>
<name sortKey="Chiodini, R" uniqKey="Chiodini R">R Chiodini</name>
</author>
<author>
<name sortKey="Badr, A" uniqKey="Badr A">A Badr</name>
</author>
<author>
<name sortKey="Zhang, G" uniqKey="Zhang G">G Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, Z" uniqKey="Wang Z">Z Wang</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chang, Z" uniqKey="Chang Z">Z Chang</name>
</author>
<author>
<name sortKey="Li, G" uniqKey="Li G">G Li</name>
</author>
<author>
<name sortKey="Liu, J" uniqKey="Liu J">J Liu</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Ashby, C" uniqKey="Ashby C">C Ashby</name>
</author>
<author>
<name sortKey="Liu, D" uniqKey="Liu D">D Liu</name>
</author>
<author>
<name sortKey="Cramer, Cl" uniqKey="Cramer C">CL Cramer</name>
</author>
<author>
<name sortKey="Huang, X" uniqKey="Huang X">X Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schulz, Mh" uniqKey="Schulz M">MH Schulz</name>
</author>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Vingron, M" uniqKey="Vingron M">M Vingron</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grabherr, Mg" uniqKey="Grabherr M">MG Grabherr</name>
</author>
<author>
<name sortKey="Haas, Bj" uniqKey="Haas B">BJ Haas</name>
</author>
<author>
<name sortKey="Yassour, M" uniqKey="Yassour M">M Yassour</name>
</author>
<author>
<name sortKey="Levin, Jz" uniqKey="Levin J">JZ Levin</name>
</author>
<author>
<name sortKey="Thompson, Da" uniqKey="Thompson D">DA Thompson</name>
</author>
<author>
<name sortKey="Amit, I" uniqKey="Amit I">I Amit</name>
</author>
<author>
<name sortKey="Adiconis, X" uniqKey="Adiconis X">X Adiconis</name>
</author>
<author>
<name sortKey="Fan, L" uniqKey="Fan L">L Fan</name>
</author>
<author>
<name sortKey="Raychowdhury, R" uniqKey="Raychowdhury R">R Raychowdhury</name>
</author>
<author>
<name sortKey="Zeng, Q" uniqKey="Zeng Q">Q Zeng</name>
</author>
<author>
<name sortKey="Chen, Z" uniqKey="Chen Z">Z Chen</name>
</author>
<author>
<name sortKey="Mauceli, E" uniqKey="Mauceli E">E Mauceli</name>
</author>
<author>
<name sortKey="Hacohen, N" uniqKey="Hacohen N">N Hacohen</name>
</author>
<author>
<name sortKey="Gnirke, A" uniqKey="Gnirke A">A Gnirke</name>
</author>
<author>
<name sortKey="Rhind, N" uniqKey="Rhind N">N Rhind</name>
</author>
<author>
<name sortKey="Di Palma, F" uniqKey="Di Palma F">F Di Palma</name>
</author>
<author>
<name sortKey="Birren, Bw" uniqKey="Birren B">BW Birren</name>
</author>
<author>
<name sortKey="Nusbaum, C" uniqKey="Nusbaum C">C Nusbaum</name>
</author>
<author>
<name sortKey="Lindblad Toh, K" uniqKey="Lindblad Toh K">K Lindblad-Toh</name>
</author>
<author>
<name sortKey="Friedman, N" uniqKey="Friedman N">N Friedman</name>
</author>
<author>
<name sortKey="Regev, A" uniqKey="Regev A">A Regev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Edgar, Rc" uniqKey="Edgar R">RC Edgar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, W" uniqKey="Huang W">W Huang</name>
</author>
<author>
<name sortKey="Li, L" uniqKey="Li L">L Li</name>
</author>
<author>
<name sortKey="Myers, Jr" uniqKey="Myers J">JR Myers</name>
</author>
<author>
<name sortKey="Marth, Gt" uniqKey="Marth G">GT Marth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Td" uniqKey="Wu T">TD Wu</name>
</author>
<author>
<name sortKey="Watanabe, Ck" uniqKey="Watanabe C">CK Watanabe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamke, J" uniqKey="Kamke J">J Kamke</name>
</author>
<author>
<name sortKey="Kittelmann, S" uniqKey="Kittelmann S">S Kittelmann</name>
</author>
<author>
<name sortKey="Soni, P" uniqKey="Soni P">P Soni</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Tavendale, M" uniqKey="Tavendale M">M Tavendale</name>
</author>
<author>
<name sortKey="Ganesh, S" uniqKey="Ganesh S">S Ganesh</name>
</author>
<author>
<name sortKey="Janssen, Ph" uniqKey="Janssen P">PH Janssen</name>
</author>
<author>
<name sortKey="Shi, W" uniqKey="Shi W">W Shi</name>
</author>
<author>
<name sortKey="Froula, J" uniqKey="Froula J">J Froula</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author>
<name sortKey="Attwood, Gt" uniqKey="Attwood G">GT Attwood</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kanehisa, M" uniqKey="Kanehisa M">M Kanehisa</name>
</author>
<author>
<name sortKey="Furumichi, M" uniqKey="Furumichi M">M Furumichi</name>
</author>
<author>
<name sortKey="Tanabe, M" uniqKey="Tanabe M">M Tanabe</name>
</author>
<author>
<name sortKey="Sato, Y" uniqKey="Sato Y">Y Sato</name>
</author>
<author>
<name sortKey="Morishima, K" uniqKey="Morishima K">K Morishima</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conway, M" uniqKey="Conway M">M Conway</name>
</author>
<author>
<name sortKey="Angione, C" uniqKey="Angione C">C Angione</name>
</author>
<author>
<name sortKey="Li, P" uniqKey="Li P">P Liò</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hinsu, At" uniqKey="Hinsu A">AT Hinsu</name>
</author>
<author>
<name sortKey="Parmar, Nr" uniqKey="Parmar N">NR Parmar</name>
</author>
<author>
<name sortKey="Nathani, Nm" uniqKey="Nathani N">NM Nathani</name>
</author>
<author>
<name sortKey="Pandit, Rj" uniqKey="Pandit R">RJ Pandit</name>
</author>
<author>
<name sortKey="Patel, Ab" uniqKey="Patel A">AB Patel</name>
</author>
<author>
<name sortKey="Patel, Ak" uniqKey="Patel A">AK Patel</name>
</author>
<author>
<name sortKey="Joshi, Cg" uniqKey="Joshi C">CG Joshi</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">30066630</article-id>
<article-id pub-id-type="pmc">6069750</article-id>
<article-id pub-id-type="publisher-id">2174</article-id>
<article-id pub-id-type="doi">10.1186/s12859-018-2174-6</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Saggese</surname>
<given-names>Igor</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bona</surname>
<given-names>Elisa</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Conway</surname>
<given-names>Max</given-names>
</name>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Favero</surname>
<given-names>Francesco</given-names>
</name>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ladetto</surname>
<given-names>Marco</given-names>
</name>
<xref ref-type="aff" rid="Aff4">4</xref>
<xref ref-type="aff" rid="Aff6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Liò</surname>
<given-names>Pietro</given-names>
</name>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Manzini</surname>
<given-names>Giovanni</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
<xref ref-type="aff" rid="Aff5">5</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Mignone</surname>
<given-names>Flavio</given-names>
</name>
<address>
<email>flavio.mignone@uniupo.it</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze e Innovazione Tecnologica,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
15121 Alessandria, Italy</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121885934</institution-id>
<institution-id institution-id-type="GRID">grid.5335.0</institution-id>
<institution>Computer Laboratory,</institution>
<institution>University of Cambridge,</institution>
</institution-wrap>
Cambridge, CB2 1TN UK</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000000121663741</institution-id>
<institution-id institution-id-type="GRID">grid.16563.37</institution-id>
<institution>Dipartimento di Scienze della Salute,</institution>
<institution>Università degli Studi del Piemonte Orientale,</institution>
</institution-wrap>
28100 Novara, Italy</aff>
<aff id="Aff4">
<label>4</label>
AO SS Antonio e Biagio e Cesare Arrigo, 15121 Alessandria, Italy</aff>
<aff id="Aff5">
<label>5</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 1940 4177</institution-id>
<institution-id institution-id-type="GRID">grid.5326.2</institution-id>
<institution>Istituto di Informatica e Telematica, CNR,</institution>
</institution-wrap>
56124 Pisa, Italy</aff>
<aff id="Aff6">
<label>6</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2336 6580</institution-id>
<institution-id institution-id-type="GRID">grid.7605.4</institution-id>
<institution>Dipartimento di Biotecnologie e Scienze per la Salute,</institution>
<institution>Università di Torino,</institution>
</institution-wrap>
10124 Torino, Italy</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>9</day>
<month>7</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>9</day>
<month>7</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>19</volume>
<issue>Suppl 7</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests and that none of the Supplement Editors were involved in the peer review process for any articles for which they are an author.</issue-sponsor>
<elocation-id>184</elocation-id>
<permissions>
<copyright-statement>© The Author(s). 2018</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p id="Par1">De novo assembly of RNA-seq data allows the study of transcriptome in absence of a reference genome either if data is obtained from a single organism or from a mixed sample as in metatranscriptomics studies. Given the high number of sequences obtained from NGS approaches, a critical step in any analysis workflow is the assembly of reads to reconstruct transcripts thus reducing the complexity of the analysis. Despite many available tools show a good sensitivity, there is a high percentage of false positives due to the high number of assemblies considered and it is likely that the high frequency of false positive is underestimated by currently used benchmarks. The reconstruction of not existing transcripts may false the biological interpretation of results as – for example – may overestimate the identification of “novel” transcripts. Moreover, benchmarks performed are usually based on RNA-seq data from annotated genomes and assembled transcripts are compared to annotations and genomes to identify putative good and wrong reconstructions, but these tests alone may lead to accept a particular type of false positive as true, as better described below.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">Here we present a novel methodology of de novo assembly, implemented in a software named STAble (Short-reads Transcriptome Assembler). The novel concept of this assembler is that the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of reducing the number of chimeras produced. Furthermore, we applied a new set of benchmarks based on simulated data to better define the performance of assembly method and carefully identifying true reconstructions.</p>
<p id="Par3">STAble was also used to build a prototype workflow to analyse metatranscriptomics data in connection to a steady state metabolic modelling algorithm. This algorithm was used to produce high quality metabolic interpretations of small gene expression sets obtained from already published RNA-seq data that we assembled with STAble.</p>
</sec>
<sec>
<title>Conclusions</title>
<p id="Par4">The presented results, albeit preliminary, clearly suggest that with this approach is possible to identify informative reactions not directly revealed by raw transcriptomic data.</p>
</sec>
</abstract>
<conference>
<conf-name>12th and 13th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2015/16)</conf-name>
<conf-loc>Naples, Italy and Stirling, UK</conf-loc>
<conf-date>10-12 September 2015, 1-3 September 2016</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2018</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p id="Par14">Among many applications of Next Generation Sequencing (NGS), [
<xref ref-type="bibr" rid="CR1">1</xref>
] there are two techniques that can be applied to the “omic” study of transcripts: RNA-seq [
<xref ref-type="bibr" rid="CR2">2</xref>
] that profiles transcriptomes from a single organism or metatranscriptomics that profiles transcriptomes from a complex microbial community.</p>
<p id="Par15">The first field is more established and allows to assess the presence of RNA transcripts in a biological sample at a given moment and to perform quantification. The latter is a more recent and less explored approach related to metagenomics studies: while metagenomics aims at the identification of species, metatranscriptomics tries to characterize functional active bacteria and their metabolic interaction through the identification of the expressed transcripts.</p>
<p id="Par16">Facing the growing promises and challenges of clinical metagenomics, metatranscriptomics analysis might represent a critical step to further elucidate the role of complex microbial communities in the physiology and pathology of host organisms with a growing impact in clinical application. Indeed, most of the evidence so far accumulated is linked to the role of specific species, genera or families rather that to their metabolic output. While this might be optimal in terms of impact on immune recognition, immune education and trigger of autoimmune processes, this approach may be insufficient to fully elucidate the impact of microbial communities on processes such as metabolic diseases, inflammatory response, and nutrient availability which are potentially more strictly related to the global metabolic output rather than to the phylogenesis of the species composing a specific microbiota.</p>
<p id="Par17">From the perspective of data analysis, current NGS sequencing platforms do not output the whole transcripts but short reads representing a fragment of the original sequence. Assembly of reads to reconstruct full transcripts represents a crucial point in data analysis and any subsequent steps in the analysis of transcriptomics data heavily rely on the quality of reconstructions. Even when a reference genome is available for the organism under study, the preventive assembly of reads can prove useful to reduce the complexity of the analysis by both increasing the length and lowering the number of input sequences. Currently state-of-the-art tools to reconstruct RNA-seq data are Bridger [
<xref ref-type="bibr" rid="CR3">3</xref>
], Oases [
<xref ref-type="bibr" rid="CR4">4</xref>
] and Trinity [
<xref ref-type="bibr" rid="CR5">5</xref>
]. They share a similar approach as they rely on the identification of k-mer sequences. Bridger then uses this information to build and traverse splicing graphs, while Oases and Trinity rely on De-Bruijn graphs.</p>
<p id="Par18">Despite exhibiting a good sensitivity, all of them show two main limitations: i) high number of false positive reconstructions and ii) very high demands of computational power.</p>
<p id="Par19">Working with real data, in absence of any reference, it is not trivial - and maybe not even possible - to determine the correctness of a reconstruction, so it is advisable to use approaches that minimize the production of false reconstructions. High sensitivity claimed in benchmarks is often obtained by increasing the number of reconstructions, at the cost of increasing the number of false positives too, but this aspect is usually neglected. Furthermore, current approaches are very demanding in terms of hardware specifications and dedicated infrastructures are required but they are not always available.</p>
<p id="Par20">Here we present STAble, a prototype for a new de novo assembler developed around a novel approach quite different from the state-of-the-art: the whole reads are used to determine possible alignments instead of using smaller k-mers, with the aim of drastically reduce the number of chimeras produced. STAble consists of three different modules (see Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
). The first step is the efficient detection of potential head-tail alignments between reads, possibly with mismatches. This information is then used by the second module to build an unweighted directed graph, which is traversed by a custom algorithm that takes into account biological properties of input data. Finally, the third module performs some post-processing on results assuming no reference information is available.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>STAble’s analysis workflow. The first module detects potential head-tail alignments between reads, the second one uses this information to build and traverse a directed unweighted graph to reconstruct transcripts that are then post-processed before returning final output</p>
</caption>
<graphic xlink:href="12859_2018_2174_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p id="Par21">In benchmarks, STAble has shown a sensitivity comparable to current tools, while producing a smaller number of false positive reconstructions. STAble is designed to be parallelizable and grid-friendly, allowing to split input datasets in blocks that can be processed sequentially or in parallel computations: this feature allows to perform analyses even in absence of dedicated computing infrastructures. Moreover, STAble was tested with both simulated and real metatranscriptomics data. With simulated data we were able to evaluate the ability of our system to correctly reconstruct transcripts while with real data we tested a prototype implementation of a new approach based on the integration of transcriptomics data with metabolic network.</p>
</sec>
<sec id="Sec2">
<title>Methods</title>
<p id="Par22">STAble implements an original approach based on the idea to let the whole reads guide the assembly process, instead of considering smaller k-mers with the aim of reducing false positive reconstructions. Analysis workflow is shown in Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
and consists of three main modules:
<list list-type="order">
<list-item>
<p id="Par23">Efficient detection of head-tail alignments.</p>
</list-item>
<list-item>
<p id="Par24">Construction and traversal of an unweighted directed graph.</p>
</list-item>
<list-item>
<p id="Par25">Post processing of results.</p>
</list-item>
</list>
</p>
<p id="Par26">The first module identifies overlapping reads: it starts from a fastq file containing input sequences and finds all “valid” head-tail overlaps between pairs of reads. More precisely, the module is based on a custom procedure to identify head-tail overlaps that works as follow. Computation starts by recoding input FASTQ from 8-bit ASCII characters to a 2-bit alphabet: this allows a reduction in memory consumption and speeds up subsequent operations. No special symbol is assigned to ambiguous bases - such as N - but the same symbol reserved for C is used. This choice was made to keep the size of the new alphabet as low as possible. Results quality is not affected since reads with too many ambiguous bases are usually discarded by pre-processing steps because of low quality, so false matches with C are expected to be rare.</p>
<p id="Par27">After initialisation is done, the algorithms proceed to analyse input sequences one at a time and each 7 nt long anchor is indexed. The first and last
<italic>anchor scope</italic>
(default: 5) anchors are searched in the anchor index to detect potential aligning reads. Read pairs are then shifted to align the anchor and Hamming distance of the overlapping area is efficiently computed as number of mismatches by using XOR metrics. The module returns a list of triples [
<italic>i; j; k</italic>
] where
<italic>i</italic>
and
<italic>j</italic>
represent two reads and
<italic>k</italic>
is the length of the overlap found between the tail of read
<italic>i</italic>
and the head of read
<italic>j</italic>
. A head-tail overlap is considered “valid” only if it satisfies the following two conditions:
<list list-type="order">
<list-item>
<p id="Par28">Hamming distance between the length-k tail of sequence i and the length-k head of sequence j must not be greater than max_errors, where max_errors is the maximum number of mismatches allowed. (default: 10% of overlap length).</p>
</list-item>
<list-item>
<p id="Par29">Overlap length k is a value between min_len and max_len. min_len is the minimum length allowed for overlaps (default: 20% of longer sequence between overlapping pair) and max_len is the maximum length allowed for overlaps (default: 90% of shorter sequence between overlapping pair). Although RNA-seq reads are supposed to have all the same length, our algorithm can work even on reads with different lengths. This is useful if sequences have been previously quality filtered.</p>
</list-item>
</list>
</p>
<p id="Par30">The first condition is pivotal to guarantee a good alignment and avoid the reconstruction of chimeric transcripts.</p>
<p id="Par31">Regarding the second condition, a minimum length for the overlap is required to avoid alignments caused by casual similarities.</p>
<p id="Par32">Similarly, a maximum length must be set to deal with redundancy of information caused by high sequencing depths: an alignment caused by an excessive overlap will generate a poorly informative contig (just “few” bases longer than the single read).</p>
<p id="Par33">The triples returned by the first module are used to build an unweighted directed graph
<italic>G</italic>
where each node represents a read and an arc a head-tail alignment between two reads. Ideally, every path in
<italic>G</italic>
from a source (node without incoming edges) to a sink (node without outgoing edges) would represent a transcript, or a fragment of it. However, due to the high sequencing depths the same transcript or fragment could be obtained by many paths differing for a small number of nodes and it would be too expensive to generate all of them. In addition, the presence of alternative splicing and head-tail alignments over repeated regions may lead to chimeric reconstructions. To take into account all these issues we have developed a custom traversal algorithm, which is the core of the second module.</p>
<p id="Par34">The traversal algorithm executes a depth-first search starting from each source node in
<italic>G. W</italic>
hen a sink node is reached, the current path is output if its length is greater than the parameter
<italic>minLenght</italic>
. During the depth-first search we discard the current path if it turns out to be “too similar” to a prefix of an already generated path originating from the same source. For this purpose, two paths are considered “too similar” if they have the same first and last nodes, and one path can be obtained from the other replacing at most
<italic>simThreshold</italic>
nodes. Another technique to reduce the number of paths produced by the traversal algorithm is to enforce that each path should contain a minimum number of “new” nodes. This is achieved as follows. Initially all nodes are colored white. When a path is output all its nodes are colored black, and we output a new path only if it contains at least
<italic>whiteThreshold</italic>
nodes. At the end of the graph traversal, all produced paths are transformed into transcripts by replacing each node with the read it represents and combining the reads keeping into account the length of their overlaps. This set of transcripts is the output of the second module.</p>
<p id="Par35">Finally, the third module processes the resulting transcripts are processed by performing various operations: the most important one is the clustering of sequences to remove the last degree of redundancy that is not detected by traversal algorithm.</p>
<p id="Par36">The last module performs a post-processing removal of redundancies by using clustering algorithms. Currently we implement Usearch algorithm [
<xref ref-type="bibr" rid="CR6">6</xref>
] for a fast removal of duplicated sequences.</p>
<p id="Par37">Finally, all reconstructed transcripts are weighted by a quick bowtie alignment with raw reads.</p>
<p id="Par38">STAble is designed to be parallelizable and grid-friendly in order to speed up analysis process and reduce hardware requirements. The idea is to random split input dataset in smaller blocks of size k: each block is then processed with the three modules described above. Processing of each block can be performed sequentially or in parallel computations even on common desktop computers. Results are then merged, clustered and used as input for a new iteration: computation stops when dataset size becomes smaller than k.</p>
<sec id="Sec3">
<title>Known limitations</title>
<p id="Par39">Current version of STAble suffers from some known limitations. First it treats paired-end reads as single-end and does not takes advantages of the information provided by the paired end approach. Moreover, the head-tail alignment of reads does not manage reverse complement pairing. This leads to the redundant identification of each transcript in both forward and reverse strand. This issue is minimised by the post-processing clustering applied but it would be advisable to upgrade the analysis procedure to correctly handle reverse complement pairing with an expected improvement of reconstructions.</p>
</sec>
<sec id="Sec4">
<title>Benchmark</title>
<p id="Par40">Simulated datasets were generated selecting random transcripts from human genome or from bacteria and producing reads using ART [
<xref ref-type="bibr" rid="CR7">7</xref>
] as Illumina 150 bp single end with 20× of fold coverage and HiSeq 2500 quality profile. Reads were used to reconstruct transcripts with STAble and with other assemblers (default parameters were used). Reconstructed transcripts were aligned to database used for simulations using BLASTn: reconstructed transcripts not aligning as a single match for at least 85% of its length to any reference sequence were marked as False Positives. False Positives were then aligned to genome with GMAP [
<xref ref-type="bibr" rid="CR8">8</xref>
]. If the mapping showed a realistic pattern of introns-exons the reconstructed transcript was labelled as False Positive class A – FPA, A match was considered “realistic” if resulting from GMAP analysis as a single path covering 90% of the transcript with at least 90% of similarity. False positive reconstructions not satisfying these criteria were labelled as False Positive class B - FPB (see results for details).</p>
<p id="Par41">True Positives transcripts reconstructing reference sequences for at least 90% of their length were labelled as full-length reconstructed.</p>
</sec>
<sec id="Sec5">
<title>Hardware</title>
<p id="Par42">STAble was run on a desktop computer equipped with a dual-core Intel Core i3 processor and 8GB of RAM. Other tools were tested on an Intel Xeon with 8 cores and 48GB of RAM.</p>
</sec>
<sec id="Sec6">
<title>Real datasets</title>
<p id="Par43">Raw data described in [
<xref ref-type="bibr" rid="CR9">9</xref>
] were downloaded from National Centre for Biotechnology Information Sequence Read Archive, accession number SRA075938, bioproject number PRJNA202380 [
<xref ref-type="bibr" rid="CR10">10</xref>
]. We downloaded a total number of six metatranscriptomic samples with the following names according to [
<xref ref-type="bibr" rid="CR9">9</xref>
] Sheep tag: S1234 = SRR1206249 (high), S1494 = SRR873453 (low), S1333 = SRR873463 (high), SRR1283 = SRR873451 (low), S1265 = SRR873454 (low), S1586 = SRR873461 (high). Raw datasets were downloaded in fastq format and used as input for our analysis workflow. The first step was the assembly of reads with STAble to reconstruct transcripts. We then downloaded bacterial FASTA sequences of orthologous genes of several pathways (glycolysis/gluconeogenesis, butanoate metabolism, methane metabolism, carbon fixation pathways, phosphotransferase system) from KEGG ortholog database [
<xref ref-type="bibr" rid="CR11">11</xref>
]. Reconstructed transcripts were aligned to bacterial genes using BLAST accepting matches with at least 92% of similarity and allowing up to 20 nucleotides of mismatches over flanking regions. The contingency tables with read count for each orthologous gene were processed with metabolic models to interpret gene expression. The method adopted is described in [
<xref ref-type="bibr" rid="CR12">12</xref>
]. Briefly, we performed a blind Monte-Carlo simulation over feasible flux configurations. Specifically, we sampled from the set of flux configurations that provide near optimal biomass, while also providing optimality against a second random set of objectives. We then regard this large set of flux configurations as the set of possible populations (G), and then find the subset (termed L) of G which is consistent with the experimentally determined gene expression vectors. This is achieved by gene-by-gene parametric comparison between G and the set of gene expression vectors. Finally, we compare L to G to understand which reactions are most strongly influenced by the gene expressions tested. The overall method is depicted in Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Workflow of metatranscriptomic analysis integrated with metabolic network.
<bold>a</bold>
Raw reads from were assembled with a default STAble analysis.
<bold>b</bold>
Reconstructed transcripts were assigned to orthologous transcripts included in several metabolic pathways as annotated in KEGG database. A contingency table with KEGG reference genes binned with reconstructed transcripts is generated.
<bold>c</bold>
Metabolic model flux analysis to interpret gene expression using the method described in [
<xref ref-type="bibr" rid="CR13">13</xref>
]</p>
</caption>
<graphic xlink:href="12859_2018_2174_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
</sec>
<sec id="Sec7">
<title>Results and discussion</title>
<p id="Par44">STAble performance was compared with Bridger, Oases and Trinity. The prototype was tested on a large set of simulated data in order to be able to perform deeper evaluations on results quality. Benchmarks are usually performed on real data, using RNA-seq data from organisms for which a reference genome is available. Reconstructed transcripts are then compared with annotated transcripts to identify good quality reconstructions. By aligning reconstructed transcripts with genome it is possible to identify chimeric or unrealistic transcripts (i.e. mapping onto multiple chromosomes, with unlikely long introns or with inversions).</p>
<p id="Par45">We benchmarked STAble with simulated data because they allow the unambiguous identification of true and false assembled transcripts which is only partially possible with real datasets. By working with simulated datasets we highlighted a new kind of false positive reconstruction which is not visible with real data. This false positive type (we named False Positive class A - FPA) is depicted in Fig. 
<xref rid="Fig3" ref-type="fig">3</xref>
. Let’s suppose that t1, t2 and t3 are annotated alternative splicing forms of the same gene and that only t1 and t2 are present in sample: reads may allow to reconstruct t3 even if it is not effectively transcribed, so t3 has to be considered as a false positive. However, with real data it is not be possible to identify FPA (as t3 is a real transcript albeit not expressed in the sample under analysis) so the rate of false positives is likely to be underestimated.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Supposing that only splicing variants t1 and t2 are present in the sample, reads may allow to reconstruct t3 that is a valid annotated alternative but has to be considered as a false positive</p>
</caption>
<graphic xlink:href="12859_2018_2174_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p id="Par46">In the following discussion we label as FPA (False Positive class A) false positive reconstructions that do not match any sequence in the database used for simulation but correctly match with genome, while we label as FPB (False Positive class B) chimeric reconstructions.</p>
<p id="Par47">Simulated data and analysis were performed as described in Methods.</p>
<p id="Par48">Table 
<xref rid="Tab1" ref-type="table">1</xref>
summarizes the results obtained assembling 147,800 simulated reads from a pool of 200 transcripts and 1,088,271 reads from a pool of 6309 transcripts randomly picked from human transcriptome. Results show that STAble performs similarly with other tools in term of sensitivity. While Oases and Trinity show a slightly higher number of transcripts reconstructed at 100% it has to be noted that they are affected by a high rate of false positives. Bridger and Oases show the highest rate of FPB, Oases and Trinity show a very high number of FPA. Only STAble performs reconstructions with a low rate of both FPA and FPB. Moreover, it is important to underline that when considering reference transcripts reconstructed at least 70% STAble performance is almost the same as Trinity’s. It is interesting to note that on benchmark datasets based on real data with reference genome (for which a real set of actually expressed sequences is not available) - where it is not possible to detect FPA Trinity would have shown a very low false positive rate as FPA would have been detected as True Positives.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Results on 200 (Dataset A) and 6309 (Dataset B) random human transcripts. STAble returned the most reliable set of results showing a sensitivity comparable to other assemblers while producing only 3 false positives</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td colspan="10">Dataset A</td>
</tr>
<tr>
<td> Assembler</td>
<td># of results</td>
<td># of FP</td>
<td>FPA</td>
<td>FPB</td>
<td>100%</td>
<td>70%</td>
<td>S100</td>
<td>S70</td>
<td>FPR</td>
</tr>
<tr>
<td> STAble</td>
<td>227</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>152</td>
<td>161</td>
<td>76%</td>
<td>81%</td>
<td>0.44%</td>
</tr>
<tr>
<td> Bridger</td>
<td>210</td>
<td>58</td>
<td>30</td>
<td>28</td>
<td>143</td>
<td>148</td>
<td>72%</td>
<td>74%</td>
<td>28%</td>
</tr>
<tr>
<td> Oases</td>
<td>321</td>
<td>106</td>
<td>89</td>
<td>17</td>
<td>159</td>
<td>165</td>
<td>80%</td>
<td>83%</td>
<td>33%</td>
</tr>
<tr>
<td> Trinity</td>
<td>258</td>
<td>56</td>
<td>48</td>
<td>8</td>
<td>157</td>
<td>167</td>
<td>79%</td>
<td>84%</td>
<td>22%</td>
</tr>
<tr>
<td colspan="10">Dataset B</td>
</tr>
<tr>
<td> Assembler</td>
<td># of results</td>
<td># of FP</td>
<td>FPA</td>
<td>FPB</td>
<td>100%</td>
<td>70%</td>
<td>S100</td>
<td>S70</td>
<td>FPR</td>
</tr>
<tr>
<td> STAble</td>
<td>8906</td>
<td>2285</td>
<td>1053</td>
<td>1232</td>
<td>3295</td>
<td>4179</td>
<td>52%</td>
<td>66%</td>
<td>26%</td>
</tr>
<tr>
<td> Bridger</td>
<td>5697</td>
<td>1820</td>
<td>945</td>
<td>875</td>
<td>2728</td>
<td>3315</td>
<td>43%</td>
<td>53%</td>
<td>32%</td>
</tr>
<tr>
<td> Oases</td>
<td>16,895</td>
<td>5722</td>
<td>2835</td>
<td>2887</td>
<td>3550</td>
<td>4156</td>
<td>56%</td>
<td>66%</td>
<td>34%</td>
</tr>
<tr>
<td> Trinity</td>
<td>8300</td>
<td>2543</td>
<td>2223</td>
<td>320</td>
<td>3603</td>
<td>4315</td>
<td>57%</td>
<td>68%</td>
<td>31%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>Assembler</italic>
Name of the assembler,
<italic># of results</italic>
Total number of reconstructed transcripts,
<italic># of FP</italic>
Number of False Positive results,
<italic>FPA</italic>
False Positive class A,
<italic>FPB</italic>
False Positive class B,
<italic>100%</italic>
Number of full reconstructed transcripts,
<italic>70%</italic>
Number of transcripts reconstructed at 70%,
<italic>S100</italic>
Percentage of full reconstructed transcripts,
<italic>S70</italic>
Percentage of transcripts reconstructed at 70%,
<italic>FPR</italic>
False Positive Ratio</p>
</table-wrap-foot>
</table-wrap>
</p>
<p id="Par49">Finally, we performed some benchmarks on simulated bacterial metatranscriptomic datasets. Annotated transcripts from 10 different species were mixed and used to generate two additional simulated datasets: 1242040 reads from a pool of 11,815 mixed bacterial transcripts, and 2,382,790 reads from a pool of 43,578 mixed bacterial transcripts. Results are summarized in Table 
<xref rid="Tab2" ref-type="table">2</xref>
. STAble has shown the highest sensibility with a comparable FPR with the other programs. It is interesting to note that due to absence of alternative splicing in bacterial transcriptome it is not possible to produce FPA class errors (see Table
<xref rid="Tab2" ref-type="table">2</xref>
). Noticeably STAble - running on a desktop computer equipped with 8GB of RAM - was the only assembler capable of completing the assembly task with the larger dataset. All existing tools terminated returning an out of memory error even on a computer with 48GB of RAM.
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Eleven thousand eight hundred fifteen (dataset C) and 43,578 (dataset D) mixed bacterial transcripts. STAble shown the best sensitivity while producing the lowest false positive ratio alongside with Trinity. Due to absence of alternative splicing in bacterial transcriptome it is not possible to produce FPA class errors. With the larger dataset it is not possible to compare results with existing assemblers as they terminated with an out of memory error</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td colspan="8">Dataset C</td>
</tr>
<tr>
<td> Assembler</td>
<td># of results</td>
<td># of FP</td>
<td>100%</td>
<td>70%</td>
<td>S100</td>
<td>S70</td>
<td>FPR</td>
</tr>
<tr>
<td> STAble</td>
<td>13,985</td>
<td>983</td>
<td>10,007</td>
<td>10,263</td>
<td>85%</td>
<td>87%</td>
<td>7%</td>
</tr>
<tr>
<td> Bridger</td>
<td>5873</td>
<td>253</td>
<td>8510</td>
<td>9075</td>
<td>72%</td>
<td>77%</td>
<td>4%</td>
</tr>
<tr>
<td> Oases</td>
<td>5579</td>
<td>268</td>
<td>6687</td>
<td>8603</td>
<td>57%</td>
<td>73%</td>
<td>5%</td>
</tr>
<tr>
<td> Trinity</td>
<td>7597</td>
<td>145</td>
<td>9136</td>
<td>9565</td>
<td>77%</td>
<td>81%</td>
<td>2%</td>
</tr>
<tr>
<td colspan="8">Dataset D</td>
</tr>
<tr>
<td> Assembler</td>
<td># of results</td>
<td># of FP</td>
<td>100%</td>
<td>70%</td>
<td>S100</td>
<td>S70</td>
<td>FPR</td>
</tr>
<tr>
<td> STAble</td>
<td>134,110</td>
<td>1040</td>
<td>20,800</td>
<td>35,424</td>
<td>48%</td>
<td>81%</td>
<td>0.8%</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<italic>Assembler</italic>
Name of the assembler,
<italic># of results</italic>
Total number of reconstructed transcripts,
<italic># of FP</italic>
Number of False Positive results,
<italic>FPA</italic>
False Positive class A,
<italic>FPB</italic>
False Positive class B,
<italic>100%</italic>
Number of full reconstructed transcripts,
<italic>70%</italic>
Number of transcripts reconstructed at 70%,
<italic>S100</italic>
Percentage of full reconstructed transcripts,
<italic>S70</italic>
Percentage of transcripts reconstructed at 70%,
<italic>FPR</italic>
False Positive Ratio</p>
</table-wrap-foot>
</table-wrap>
</p>
<p id="Par50">To test our workflow on real data we took advantage of the work by Kamke and colleagues [
<xref ref-type="bibr" rid="CR9">9</xref>
]. In their paper, they make a comparison of rumen microbiome of high and low methane yield sheep with metatranscriptomic studies. We downloaded raw reads from SRA for 3 high and 3 low methane yield samples and we processed them as described in Materials and Methods, then we compared our results with the ones discussed by Kamke and colleagues [
<xref ref-type="bibr" rid="CR9">9</xref>
]. Briefly, reads were assembled with STAble, mapped to KEGG orthologous genes of few basics bacteria metabolic pathways. The usage of few metabolic pathways instead of the more time-consuming usage of the entire genes set is consistent with our approach. Indeed, as described in Materials and Methods the metabolic flux algorithm used can work well even with small gene expression sets. In particular, we compared reconstructed transcripts with genes involved in some metabolic pathway such as the glycolysis/gluconeogenesis pathway (as an example of basic bacterial metabolism pathway), the butanoate metabolism and the methane metabolism pathway. We also used the carbon fixation pathways in prokaryotes and the membrane transport pathway of phosphotransferase system that is one of the pathway cited and analysed by Kamke and colleagues [
<xref ref-type="bibr" rid="CR9">9</xref>
].</p>
<p id="Par51">The contingency tables with genes and their abundance were used to feed a metabolic model network to interpret gene expression. The simplest approach when performing this kind of analysis is to directly design a mapping function, which projects the gene expressions as constraints on their associated reactions in the metabolic model. This gives a one to one mapping between gene expression vectors and metabolic models, and necessitates a great degree of care in the design of the mapping function. Specifically, the mapping function needs to produce detectable differences between metabolic models, while also ensuring that predicted fluxes are all within the bounds of what is biologically feasible.</p>
<p id="Par52">Here, we take a radically different approach. Rather than parameterizing metabolic models using gene expression vectors directly, we instead perform a blind Monte-Carlo simulation over flux configurations that provide near optimal biomass. We then regard this large set of flux configurations as the set of possible populations (G), and then find the subset (termed L) of G, which is consistent with the experimentally determined gene expression vectors. Finally, we compare L to G to understand which reactions are most strongly influenced by the gene expressions tested (summarized in Table 
<xref rid="Tab3" ref-type="table">3</xref>
and Table 
<xref rid="Tab4" ref-type="table">4</xref>
).
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>List of all bacterial metabolic reactions identified in high methane yield animals</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Abbreviation</th>
<th>Subsystem</th>
<th>Official Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>NADH16pp</td>
<td>Oxidative Phosphorylation</td>
<td>NADH dehydrogenase (ubiquinone-8 & 3 protons) (periplasm)</td>
</tr>
<tr>
<td>PROt2rpp</td>
<td>Transport</td>
<td>L-proline reversible transport via proton symport (periplasm)</td>
</tr>
<tr>
<td>PROt4pp</td>
<td>Transport</td>
<td>Na+/Proline-L symporter (periplasm)</td>
</tr>
<tr>
<td>GLCP2</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>glycogen phosphorylase</td>
</tr>
<tr>
<td>GLCS1</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>glycogen synthase (ADPGlc)</td>
</tr>
<tr>
<td>GLGC</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>glucose-1-phosphate adenylyltransferase</td>
</tr>
<tr>
<td>THRt2rpp</td>
<td>Transport</td>
<td>L-threonine reversible transport via proton symport (periplasm)</td>
</tr>
<tr>
<td>THRt4pp</td>
<td>Transport</td>
<td>L-threonine via sodium symport (periplasm)</td>
</tr>
<tr>
<td>INSt2pp</td>
<td>Transport</td>
<td>inosine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>INSt2rpp</td>
<td>Transport</td>
<td>inosine transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>PPCSCT</td>
<td>Alternate Carbon Metabolism</td>
<td>Propanoyl-CoA: succinate CoA-transferase</td>
</tr>
<tr>
<td>SUCOAS</td>
<td>Citric Acid Cycle</td>
<td>succinyl-CoA synthetase (ADP-forming)</td>
</tr>
<tr>
<td>TALA</td>
<td>Pentose Phosphate Pathway</td>
<td>transaldolase</td>
</tr>
<tr>
<td>ACCOAL</td>
<td>Alternate Carbon Metabolism</td>
<td>acetate-CoA ligase (ADP-forming)</td>
</tr>
<tr>
<td>GLUt4pp</td>
<td>Transport</td>
<td>Na+/glutamate symport (periplasm)</td>
</tr>
<tr>
<td>PPAKr</td>
<td>Alternate Carbon Metabolism</td>
<td>Propionate kinase</td>
</tr>
<tr>
<td>PTA2</td>
<td>Alternate Carbon Metabolism</td>
<td>Phosphate acetyltransferase</td>
</tr>
<tr>
<td>THFAT</td>
<td>Folate Metabolism</td>
<td>Tetrahydrofolate aminomethyltransferase</td>
</tr>
<tr>
<td>FOMETRi</td>
<td>Folate Metabolism</td>
<td>Aminomethyltransferase</td>
</tr>
<tr>
<td>ADK3</td>
<td>Nucleotide Salvage Pathway</td>
<td>adentylate kinase (GTP)</td>
</tr>
<tr>
<td>FBA3</td>
<td>Pentose Phosphate Pathway</td>
<td>7-bisphosphate D-glyceraldehyde-3-phosphate-lyase</td>
</tr>
<tr>
<td>PFK_3</td>
<td>Pentose Phosphate Pathway</td>
<td>phosphofructokinase (s7p)</td>
</tr>
<tr>
<td>URAt2pp</td>
<td>Transport</td>
<td>uracil transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>URAt2rpp</td>
<td>Transport</td>
<td>uracil transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>GLYt2pp</td>
<td>Transport</td>
<td>glycine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>GLCP</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>glycogen phosphorylase</td>
</tr>
<tr>
<td>NDPK1</td>
<td>Nucleotide Salvage Pathway</td>
<td>nucleoside-diphosphate kinase (ATP:GDP)</td>
</tr>
<tr>
<td>CA2t3pp</td>
<td>Inorganic Ion Transport and Metabolism</td>
<td>calcium (Ca + 2) transport out via proton antiport (periplasm)</td>
</tr>
<tr>
<td>CAt6pp</td>
<td>Inorganic Ion Transport and Metabolism</td>
<td>calcium / sodium antiporter (1:1)</td>
</tr>
<tr>
<td>PPKr</td>
<td>Oxidative Phosphorylation</td>
<td>polyphosphate kinase</td>
</tr>
<tr>
<td>URIt2pp</td>
<td>Transport</td>
<td>uridine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>URIt2rpp</td>
<td>Transport</td>
<td>uridine transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>NADH18pp</td>
<td>Oxidative Phosphorylation</td>
<td>NADH dehydrogenase (demethylmenaquinone-8 & 3 protons) (periplasm)</td>
</tr>
<tr>
<td>FRD3</td>
<td>Citric Acid Cycle</td>
<td>fumarate reductase</td>
</tr>
<tr>
<td>ALAt2pp</td>
<td>Transport</td>
<td>L-alanine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>ALAt2rpp</td>
<td>Transport</td>
<td>L-alanine reversible transport via proton symport (periplasm)</td>
</tr>
<tr>
<td>GLYt2rpp</td>
<td>Transport</td>
<td>glycine reversible transport via proton symport (periplasm)</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>List of all bacterial metabolic reactions identified in low methane yield animals</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Abbreviation</th>
<th>Subsystem</th>
<th>Official Name</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALATA_L</td>
<td>Alanine and Aspartate Metabolism</td>
<td>L-alanine transaminase</td>
</tr>
<tr>
<td>THMDt2pp</td>
<td>Transport</td>
<td>thymidine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>THMDt2rpp</td>
<td>Transport</td>
<td>thymidine transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>NAt3pp</td>
<td>Inorganic Ion Transport and Metabolism</td>
<td>sodium transport out via proton antiport (cytoplasm to periplasm)</td>
</tr>
<tr>
<td>VPAMTr</td>
<td>Valine, Leucine and Isoleucine Metabolism</td>
<td>Valine-pyruvate aminotransferase</td>
</tr>
<tr>
<td>VALTA</td>
<td>Valine, Leucine and Isoleucine Metabolism</td>
<td>valine transaminase</td>
</tr>
<tr>
<td>SUCDi</td>
<td>Oxidative Phosphorylation</td>
<td>succinate dehydrogenase (irreversible)</td>
</tr>
<tr>
<td>GLUABUTt7pp</td>
<td>Transport</td>
<td>4-aminobutyrate/glutamate antiport (periplasm)</td>
</tr>
<tr>
<td>ABUTt2pp</td>
<td>Transport</td>
<td>4-aminobutyrate transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>GLYt4pp</td>
<td>Transport</td>
<td>glycine transport in via sodium symport (periplasm)</td>
</tr>
<tr>
<td>GLUt2rpp</td>
<td>Transport</td>
<td>L-glutamate transport via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>GLDBRAN2</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>glycogen debranching enzyme (bglycogen - > glycogen)</td>
</tr>
<tr>
<td>GLYCLTt2rpp</td>
<td>Transport</td>
<td>glycolate transport via proton symport</td>
</tr>
<tr>
<td>GLYCLTt4pp</td>
<td>Transport</td>
<td>glycolate transport via sodium symport (periplasm)</td>
</tr>
<tr>
<td>ACt2rpp</td>
<td>Transport</td>
<td>acetate reversible transport via proton symport (periplasm)</td>
</tr>
<tr>
<td>ACt4pp</td>
<td>Transport</td>
<td>Na+/Acetate symport (periplasm)</td>
</tr>
<tr>
<td>ADK1</td>
<td>Nucleotide Salvage Pathway</td>
<td>adenylate kinase</td>
</tr>
<tr>
<td>PTAr</td>
<td>Pyruvate Metabolism</td>
<td>phosphotransacetylase</td>
</tr>
<tr>
<td>ACKr</td>
<td>Pyruvate Metabolism</td>
<td>acetate kinase</td>
</tr>
<tr>
<td>ACS</td>
<td>Pyruvate Metabolism</td>
<td>acetyl-CoA synthetase</td>
</tr>
<tr>
<td>SERt2rpp</td>
<td>Transport</td>
<td>L-serine reversible transport via proton symport (periplasm)</td>
</tr>
<tr>
<td>SERt4pp</td>
<td>Transport</td>
<td>L-serine via sodium symport (periplasm)</td>
</tr>
<tr>
<td>GLCtex</td>
<td>Transport</td>
<td>glucose transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>PRPPS</td>
<td>Histidine Metabolism</td>
<td>phosphoribosylpyrophosphate synthetase</td>
</tr>
<tr>
<td>PPM</td>
<td>Alternate Carbon Metabolism</td>
<td>phosphopentomutase</td>
</tr>
<tr>
<td>R15BPK</td>
<td>Alternate Carbon Metabolism</td>
<td>Ribose-1,5 bisphosphokinase</td>
</tr>
<tr>
<td>R1PK</td>
<td>Alternate Carbon Metabolism</td>
<td>ribose 1-phosphokinase</td>
</tr>
<tr>
<td>GLCtexi</td>
<td>Transport</td>
<td>D-glucose transport via diffusion (extracellular to periplasm) irreversible</td>
</tr>
<tr>
<td>ADNt2pp</td>
<td>Transport</td>
<td>adenosine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>ADNt2rpp</td>
<td>Transport</td>
<td>adenosine transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>ASPt2pp</td>
<td>Transport</td>
<td>L-aspartate transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>ASPt2rpp</td>
<td>Transport</td>
<td>L-aspartate transport in via proton symport (periplasm) reversible</td>
</tr>
<tr>
<td>INDOLEt2pp</td>
<td>Transport</td>
<td>Indole transport via proton symport irreversible (periplasm)</td>
</tr>
<tr>
<td>INDOLEt2rpp</td>
<td>Transport</td>
<td>Indole transport via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>FBA</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>fructose-bisphosphate aldolase</td>
</tr>
<tr>
<td>PFK</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>phosphofructokinase</td>
</tr>
<tr>
<td>ICHORS</td>
<td>Cofactor and Prosthetic Group Biosynthesis</td>
<td>isochorismate synthase</td>
</tr>
<tr>
<td>ICHORSi</td>
<td>Cofactor and Prosthetic Group Biosynthesis</td>
<td>Isochorismate Synthase</td>
</tr>
<tr>
<td>HPYRI</td>
<td>Alternate Carbon Metabolism</td>
<td>hydroxypyruvate isomerase</td>
</tr>
<tr>
<td>HPYRRx</td>
<td>Alternate Carbon Metabolism</td>
<td>Hydroxypyruvate reductase (NADH)</td>
</tr>
<tr>
<td>TRSARr</td>
<td>Alternate Carbon Metabolism</td>
<td>tartronate semialdehyde reductase</td>
</tr>
<tr>
<td>CYTDt2pp</td>
<td>Transport</td>
<td>cytidine transport in via proton symport (periplasm)</td>
</tr>
<tr>
<td>CYTDt2rpp</td>
<td>Transport</td>
<td>cytidine transport in via proton symport reversible (periplasm)</td>
</tr>
<tr>
<td>FRD2</td>
<td>Citric Acid Cycle</td>
<td>fumarate reductase</td>
</tr>
<tr>
<td>NADH17pp</td>
<td>Oxidative Phosphorylation</td>
<td>NADH dehydrogenase (menaquinone-8 & 3 protons) (periplasm)</td>
</tr>
<tr>
<td>EX_h(e)</td>
<td>Exchange</td>
<td>H+ exchange</td>
</tr>
<tr>
<td>EX_fe3(e)</td>
<td>Exchange</td>
<td>Fe3+ exchange</td>
</tr>
<tr>
<td>EX_fe2(e)</td>
<td>Exchange</td>
<td>Fe2+ exchange</td>
</tr>
<tr>
<td>Htex</td>
<td>Transport</td>
<td>proton transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>FEROpp</td>
<td>Inorganic Ion Transport and Metabolism</td>
<td>ferroxidase</td>
</tr>
<tr>
<td>FE3tex</td>
<td>Transport</td>
<td>iron (III) transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>FE2tex</td>
<td>Transport</td>
<td>iron (II) transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>GLBRAN2</td>
<td>Glycolysis/Gluconeogenesis</td>
<td>4-alpha-glucan branching enzyme (glycogen - > bglycogen)</td>
</tr>
<tr>
<td>EX_o2(e)</td>
<td>Exchange</td>
<td>O2 exchange</td>
</tr>
<tr>
<td>EX_h2o(e)</td>
<td>Exchange</td>
<td>H2O exchange</td>
</tr>
<tr>
<td>O2tex</td>
<td>Transport</td>
<td>oxygen transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>H2Otex</td>
<td>Transport</td>
<td>H2O transport via diffusion (extracellular to periplasm)</td>
</tr>
<tr>
<td>CRNDt2rpp</td>
<td>Transport</td>
<td>D-carnitine outward transport (H+ antiport)</td>
</tr>
<tr>
<td>CRNt2rpp</td>
<td>Transport</td>
<td>L-carnitine outward transport (H+ antiport)</td>
</tr>
<tr>
<td>CRNt8pp</td>
<td>Transport</td>
<td>L-carnitine/D-carnitine antiporter (periplasm)</td>
</tr>
<tr>
<td>ALAt4pp</td>
<td>Transport</td>
<td>L-alanine transport in via sodium symport (periplasm)</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="Par53">Results obtained from our metabolic network analysis are consistent with data about differences in usage of Glycolysis/Gluconeogenesis and Butanoate Biosyntesis pathways described in the paper (data not shown). Interestingly our analysis identified new pathways that are independent from the original set of transcripts used to feed the metabolic model network. Indeed, our metabolic network analysis identified that both in LMY and HMY bacteria, transport channels are highly expressed.</p>
<p id="Par54">STAble can improve data about gene coding for transport membrane proteins and for nutrient (Fe, Ca and Na) transport in bacterial cells both in LMY and HMY, comparing results with those obtained by Kamke and coworkers [
<xref ref-type="bibr" rid="CR9">9</xref>
]. Moreover, the performed analysis revealed carbohydrate metabolism as dominating followed by amino acid metabolism, results in agreement with those reported by Hinsu and colleagues that described functionally active bacteria and their biological processes in rumen of buffalo (
<italic>Bubalus bubalis</italic>
) adapted to different dietary treatments [
<xref ref-type="bibr" rid="CR13">13</xref>
].</p>
<p id="Par55">These results are intriguing because they confirm that our workflow appears to produce more punctual information regarding metabolic pathways upregulated or downregulated into the same microbiome, not directly correlated with the transcripts, identified with raw RNA-seq data.</p>
<p id="Par56">Our results highlight the potential of our new approach to de novo assembly of RNA-seq data. STAble’s sensitivity is comparable to other assemblers while the rate of false positives - which has been our main focus - is lower. When working in absence of any reference a reasonable trade-off between sensitivity and accuracy is very important for the all the subsequent analyses that have to be performed on results. Indeed false positive reconstructions may lead to biased biological interpretation of results as – for example – they might lead to an overestimation of “novel” transcripts.</p>
<p id="Par57">In addition, STAble was designed to be parallelizable and grid-friendly, allowing to perform the computationally onerous assembly task even in absence of dedicated infrastructures: is quite surprising that in one of the test scenarios existing assemblers failed with 48GB of RAM while STAble was able to run on a desktop PC.</p>
<p id="Par58">STAble was successfully integrated with a new analysis workflow based on metabolic model network recently described in [
<xref ref-type="bibr" rid="CR12">12</xref>
]. The combination of STAble with this workflow can be used as an “expert system” to obtain more punctual information about the metabolic pathways activated in a bacterial community. The same level of information is not fully available when using only metagenomics and even meta-transcriptomics data.</p>
</sec>
<sec id="Sec8">
<title>Conclusions</title>
<p id="Par59">Metatranscriptomics is the community based evolution of RNA-Seq analysis and might represent a critical step to further elucidate the role of complex microbial communities in their environment and in the physiology and pathology of host organisms. From a clinical perspective most of the evidence so far accumulated (and that can be collected from standard metagenomics studies) is linked to the role of specific species, genera or families rather than their metabolic output. While this might be optimal in terms of impact on immune recognition, immune education and trigger of autoimmune processes, this approach may be insufficient to fully elucidate the impact of microbial communities on processes such as metabolic diseases, inflammatory response, and nutrient availability which are potentially more strictly related to the global metabolic output rather than to the phylogenesis of the species composing a specific microbiota.</p>
<p id="Par60">Integrating a robust assembler for metatranscriptomic data and expanding its informative potential with the integration of a metabolic model network could be an improved tool to characterize actively transcribed genes in a microbial community and to predict their metabolic output.</p>
</sec>
</body>
<back>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>FPA</term>
<def>
<p id="Par5">False Positive class A</p>
</def>
</def-item>
<def-item>
<term>FPB</term>
<def>
<p id="Par6">False Positive class B</p>
</def>
</def-item>
<def-item>
<term>HMY</term>
<def>
<p id="Par7">High Methane Yield</p>
</def>
</def-item>
<def-item>
<term>KEGG</term>
<def>
<p id="Par8">Kyoto Encyclopedia of Genes and Genomes</p>
<p id="Par9">LMY</p>
<p id="Par10">Low Methane Yield</p>
</def>
</def-item>
<def-item>
<term>NGS</term>
<def>
<p id="Par11">Next Generation Sequencing</p>
</def>
</def-item>
<def-item>
<term>RNA-seq</term>
<def>
<p id="Par12">RNA sequencing</p>
</def>
</def-item>
<def-item>
<term>SRA</term>
<def>
<p id="Par13">Sequence Read Archive</p>
</def>
</def-item>
</def-list>
</glossary>
<ack>
<sec id="FPar1">
<title>Funding</title>
<p id="Par61">Research and publication costs has been supported by University of Piemonte Orientale through local research and visiting funding program.</p>
</sec>
<sec id="FPar2">
<title>Availability of data and materials</title>
<p id="Par62">Software and datasets used for benchmark are available upon request.</p>
</sec>
<sec id="FPar3">
<title>About this supplement</title>
<p id="Par63">This article has been published as part of
<italic>BMC Bioinformatics</italic>
Volume 19 Supplement 7, 2018: 12th and 13th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2015/16). The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-7">https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-7</ext-link>
.</p>
</sec>
</ack>
<notes notes-type="author-contribution">
<title>Authors’ contributions</title>
<p>GM and IS wrote the algorithm for transcripts reconstruction. MC added metabolic pathways analysis to STAble. FF performed integration and validation tests. Interpretation of biological data was carried out by ML and EB. FM and PL conceived the work. All authors contributed to the writing. All authors read and approved the final manuscript.</p>
</notes>
<notes notes-type="COI-statement">
<sec id="FPar4">
<title>Ethics approval and consent to participate</title>
<p id="Par64">Not applicable</p>
</sec>
<sec id="FPar5">
<title>Competing interests</title>
<p id="Par65">The authors declare that they have no competing interests.</p>
</sec>
<sec id="FPar6">
<title>Publisher’s Note</title>
<p id="Par66">Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
</sec>
</notes>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Chiodini</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Badr</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>The impact of next-generation sequencing on genomics</article-title>
<source>J Genet Genomics</source>
<year>2011</year>
<volume>38</volume>
<fpage>95</fpage>
<lpage>109</lpage>
<pub-id pub-id-type="doi">10.1016/j.jgg.2011.02.003</pub-id>
<pub-id pub-id-type="pmid">21477781</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>RNA-Seq: a revolutionary tool for transcriptomics</article-title>
<source>Nat Rev Genet</source>
<year>2010</year>
<volume>10</volume>
<issue>Suppl 1</issue>
<fpage>57</fpage>
<lpage>63</lpage>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ashby</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cramer</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>X</given-names>
</name>
</person-group>
<article-title>Bridger: a new framework for de novo transcriptome assembly using RNA-seq data</article-title>
<source>Genome Biol</source>
<year>2015</year>
<volume>16</volume>
<fpage>30</fpage>
<pub-id pub-id-type="doi">10.1186/s13059-015-0596-2</pub-id>
<pub-id pub-id-type="pmid">25723335</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schulz</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Zerbino</surname>
<given-names>DR</given-names>
</name>
<name>
<surname>Vingron</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<fpage>1086</fpage>
<lpage>1092</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts094</pub-id>
<pub-id pub-id-type="pmid">22368243</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grabherr</surname>
<given-names>MG</given-names>
</name>
<name>
<surname>Haas</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Yassour</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Levin</surname>
<given-names>JZ</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Amit</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Adiconis</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Raychowdhury</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Mauceli</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Hacohen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Gnirke</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rhind</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Di Palma</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Birren</surname>
<given-names>BW</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Regev</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Full-length transcriptome assembly from RNA-Seq data without a reference genome</article-title>
<source>Nat Biotechnol</source>
<year>2011</year>
<volume>29</volume>
<fpage>644</fpage>
<lpage>652</lpage>
<pub-id pub-id-type="doi">10.1038/nbt.1883</pub-id>
<pub-id pub-id-type="pmid">21572440</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Edgar</surname>
<given-names>RC</given-names>
</name>
</person-group>
<article-title>Search and clustering orders of magnitude faster than BLAST</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>19</issue>
<fpage>2460</fpage>
<lpage>2461</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq461</pub-id>
<pub-id pub-id-type="pmid">20709691</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Marth</surname>
<given-names>GT</given-names>
</name>
</person-group>
<article-title>ART: a next-generation sequencing read simulator</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>Suppl 4</issue>
<fpage>593</fpage>
<lpage>594</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btr708</pub-id>
<pub-id pub-id-type="pmid">22199392</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>TD</given-names>
</name>
<name>
<surname>Watanabe</surname>
<given-names>CK</given-names>
</name>
</person-group>
<article-title>GMAP: a genomic mapping and alignment program for mRNA and EST sequences</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<issue>9</issue>
<fpage>1859</fpage>
<lpage>1875</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti310</pub-id>
<pub-id pub-id-type="pmid">15728110</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kamke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kittelmann</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Soni</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Tavendale</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ganesh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Janssen</surname>
<given-names>PH</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Froula</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Attwood</surname>
<given-names>GT</given-names>
</name>
</person-group>
<article-title>Rumen metagenome and metatranscriptome analyses of low methane yield sheep reveals a Sharpea-enriched microbiome characterised bylactic acid formation and utilisation</article-title>
<source>Microbiome</source>
<year>2016</year>
<volume>4</volume>
<issue>Suppl 1</issue>
<fpage>56</fpage>
<pub-id pub-id-type="doi">10.1186/s40168-016-0201-2</pub-id>
<pub-id pub-id-type="pmid">27760570</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<mixed-citation publication-type="other">Sequence Read Achive. 2010.
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/sra">http://www.ncbi.nlm.nih.gov/sra</ext-link>
. Accessed 5 Jan 2017.</mixed-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kanehisa</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Furumichi</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tanabe</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sato</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Morishima</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>KEGG: new perspectives on genomes, pathways, diseases and drugs</article-title>
<source>Nucleic Acids Res</source>
<year>2017</year>
<volume>45</volume>
<issue>Suppl D1</issue>
<fpage>D353</fpage>
<lpage>D361</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkw1092</pub-id>
<pub-id pub-id-type="pmid">27899662</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conway</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Angione</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Liò</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Iterative multi level calibration of metabolic networks</article-title>
<source>Curr Bioinforma</source>
<year>2016</year>
<volume>11</volume>
<issue>Suppl 1</issue>
<fpage>93</fpage>
<lpage>105</lpage>
<pub-id pub-id-type="doi">10.2174/1574893611666151203222505</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hinsu</surname>
<given-names>AT</given-names>
</name>
<name>
<surname>Parmar</surname>
<given-names>NR</given-names>
</name>
<name>
<surname>Nathani</surname>
<given-names>NM</given-names>
</name>
<name>
<surname>Pandit</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>AB</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>AK</given-names>
</name>
<name>
<surname>Joshi</surname>
<given-names>CG</given-names>
</name>
</person-group>
<article-title>Functional gene profiling through metaRNAseq approach reveals diet-dependent variation in rumen microbiota of buffalo (Bubalus bubalis)</article-title>
<source>Anaerobe</source>
<year>2017</year>
<volume>44</volume>
<fpage>106</fpage>
<lpage>116</lpage>
<pub-id pub-id-type="doi">10.1016/j.anaerobe.2017.02.021</pub-id>
<pub-id pub-id-type="pmid">28246035</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0002750 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0002750 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021