Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000543 ( Pmc/Corpus ); précédent : 0005429; suivant : 0005440 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Identifications of conserved 7-mers in 3'-UTRs and microRNAs in
<italic>Drosophila</italic>
</title>
<author>
<name sortKey="Gu, Jin" sort="Gu, Jin" uniqKey="Gu J" first="Jin" last="Gu">Jin Gu</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fu, Hu" sort="Fu, Hu" uniqKey="Fu H" first="Hu" last="Fu">Hu Fu</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Yanda" sort="Li, Yanda" uniqKey="Li Y" first="Yanda" last="Li">Yanda Li</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17996040</idno>
<idno type="pmc">2241842</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2241842</idno>
<idno type="RBID">PMC:2241842</idno>
<idno type="doi">10.1186/1471-2105-8-432</idno>
<date when="2007">2007</date>
<idno type="wicri:Area/Pmc/Corpus">000543</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000543</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Identifications of conserved 7-mers in 3'-UTRs and microRNAs in
<italic>Drosophila</italic>
</title>
<author>
<name sortKey="Gu, Jin" sort="Gu, Jin" uniqKey="Gu J" first="Jin" last="Gu">Jin Gu</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Fu, Hu" sort="Fu, Hu" uniqKey="Fu H" first="Hu" last="Fu">Hu Fu</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Zhang, Xuegong" sort="Zhang, Xuegong" uniqKey="Zhang X" first="Xuegong" last="Zhang">Xuegong Zhang</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Li, Yanda" sort="Li, Yanda" uniqKey="Li Y" first="Yanda" last="Li">Yanda Li</name>
<affiliation>
<nlm:aff id="I1">Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>MicroRNAs (miRNAs) are a class of endogenous regulatory small RNAs which play an important role in posttranscriptional regulations by targeting mRNAs for cleavage or translational repression. The base-pairing between the 5'-end of miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition. Recent studies show that many seed matches in 3'-UTRs, which are fully complementary to miRNA 5'-ends, are highly conserved. Based on these features, a two-stage strategy can be implemented to achieve the
<italic>de novo </italic>
identification of miRNAs by requiring the complete base-pairing between the 5'-end of miRNA candidates and the potential seed matches in 3'-UTRs.</p>
</sec>
<sec>
<title>Results</title>
<p>We presented a new method, which combined multiple pairwise conservation information, to identify the frequently-occurred and conserved 7-mers in 3'-UTRs. A pairwise conservation score (PCS) was introduced to describe the conservation of all 7-mers in 3'-UTRs between any two
<italic>Drosophila </italic>
species. Using PCSs computed from 6 pairs of flies, we developed a support vector machine (SVM) classifier ensemble, named Cons-SVM and identified 689 conserved 7-mers including 63 seed matches covering 32 out of 38 known miRNA families in the reference dataset. In the second stage, we searched for 90 nt conserved stem-loop regions containing the complementary sequences to the identified 7-mers and used the previously published miRNA prediction software to analyze these stem-loops. We predicted 47 miRNA candidates in the genome-wide screen.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Cons-SVM takes advantage of the independent evolutionary information from the 6 pairs of flies and shows high sensitivity in identifying seed matches in 3'-UTRs. Combining the multiple pairwise conservation information by the machine learning approach, we finally identified 47 miRNA candidates in
<italic>D. melanogaster</italic>
.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-title>BMC Bioinformatics</journal-title>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17996040</article-id>
<article-id pub-id-type="pmc">2241842</article-id>
<article-id pub-id-type="publisher-id">1471-2105-8-432</article-id>
<article-id pub-id-type="doi">10.1186/1471-2105-8-432</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Identifications of conserved 7-mers in 3'-UTRs and microRNAs in
<italic>Drosophila</italic>
</article-title>
</title-group>
<contrib-group>
<contrib id="A1" contrib-type="author">
<name>
<surname>Gu</surname>
<given-names>Jin</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>wellgoo@gmail.com</email>
</contrib>
<contrib id="A2" contrib-type="author">
<name>
<surname>Fu</surname>
<given-names>Hu</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>fu.hu.thu@gmail.com</email>
</contrib>
<contrib id="A3" contrib-type="author">
<name>
<surname>Zhang</surname>
<given-names>Xuegong</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>zhangxg@tsinghua.edu.cn</email>
</contrib>
<contrib id="A4" corresp="yes" contrib-type="author">
<name>
<surname>Li</surname>
<given-names>Yanda</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>daulyd@tsinghua.edu.cn</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, China</aff>
<pub-date pub-type="collection">
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>8</day>
<month>11</month>
<year>2007</year>
</pub-date>
<volume>8</volume>
<fpage>432</fpage>
<lpage>432</lpage>
<ext-link ext-link-type="uri" xlink:href="http://www.biomedcentral.com/1471-2105/8/432"></ext-link>
<history>
<date date-type="received">
<day>21</day>
<month>11</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>8</day>
<month>11</month>
<year>2007</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2007 Gu et al; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2007</copyright-year>
<copyright-holder>Gu et al; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0"></ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</p>
<pmc-comment> Gu Jin wellgoo@gmail.com Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila 2007BMC Bioinformatics 8(1): 432-. (2007)1471-2105(2007)8:1<432>urn:ISSN:1471-2105</pmc-comment>
</license>
</permissions>
<abstract>
<sec>
<title>Background</title>
<p>MicroRNAs (miRNAs) are a class of endogenous regulatory small RNAs which play an important role in posttranscriptional regulations by targeting mRNAs for cleavage or translational repression. The base-pairing between the 5'-end of miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition. Recent studies show that many seed matches in 3'-UTRs, which are fully complementary to miRNA 5'-ends, are highly conserved. Based on these features, a two-stage strategy can be implemented to achieve the
<italic>de novo </italic>
identification of miRNAs by requiring the complete base-pairing between the 5'-end of miRNA candidates and the potential seed matches in 3'-UTRs.</p>
</sec>
<sec>
<title>Results</title>
<p>We presented a new method, which combined multiple pairwise conservation information, to identify the frequently-occurred and conserved 7-mers in 3'-UTRs. A pairwise conservation score (PCS) was introduced to describe the conservation of all 7-mers in 3'-UTRs between any two
<italic>Drosophila </italic>
species. Using PCSs computed from 6 pairs of flies, we developed a support vector machine (SVM) classifier ensemble, named Cons-SVM and identified 689 conserved 7-mers including 63 seed matches covering 32 out of 38 known miRNA families in the reference dataset. In the second stage, we searched for 90 nt conserved stem-loop regions containing the complementary sequences to the identified 7-mers and used the previously published miRNA prediction software to analyze these stem-loops. We predicted 47 miRNA candidates in the genome-wide screen.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Cons-SVM takes advantage of the independent evolutionary information from the 6 pairs of flies and shows high sensitivity in identifying seed matches in 3'-UTRs. Combining the multiple pairwise conservation information by the machine learning approach, we finally identified 47 miRNA candidates in
<italic>D. melanogaster</italic>
.</p>
</sec>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Background</title>
<p>MiRNAs are a class of ~22 nt endogenous small RNAs which regulate target mRNAs by repressing the translation or directly degrading the mRNA transcripts [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. MiRNAs take part in several essential biological processes, such as development, metabolism, cell differentiation and aging [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. To date, 78 miRNAs and few miRNA:mRNA interactions have been experimentally identified in
<italic>Drosophila </italic>
[
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B4">4</xref>
]. In an early computational study, Lai et al. estimated that the fly genome contains around 110 miRNA genes [
<xref ref-type="bibr" rid="B5">5</xref>
]. Applying the high-throughput pyrosequencing method on mixed-stage
<italic>C. elegans</italic>
, Ruby et al. confidently identified 112 miRNAs while missing 19 annotated [
<xref ref-type="bibr" rid="B6">6</xref>
].
<italic>C. elegans </italic>
genome may contain around 150 miRNAs. Although the number of protein-coding genes in a fly (14,000) is less than in a worm (18,000), the number of body cells in a fly is ten times more than a worm. The total number of miRNAs in a fly can also be expected to be around 150, which is similar to a worm. These studies suggest that another 40~70 miRNAs still need to be discovered.</p>
<p>Many miRNA prediction [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
-
<xref ref-type="bibr" rid="B14">14</xref>
] and target prediction algorithms [
<xref ref-type="bibr" rid="B15">15</xref>
-
<xref ref-type="bibr" rid="B19">19</xref>
] have been introduced in recent years. But most of the previous studies took miRNA prediction and target prediction as two separate tasks. The functions of the predicted miRNAs are hard to be explored because of inaccurate prediction of the 5'-ends of mature miRNAs. In a recent study, Nam et al. reported that the mean distances between their predicted 5'-ends of mature miRNAs and the experimental identified 5'-ends are about 2 nt (nucleotide) [
<xref ref-type="bibr" rid="B10">10</xref>
]. Several studies showed that the base-pairing between the 5'-end of the mature miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition and the 7 or 8 nt miRNA seed matches (the 7 or 8 nt sequences fully complementary to the 5'-ends of miRNA in the 3'-UTRs) are highly conserved in 3'-UTRs [
<xref ref-type="bibr" rid="B20">20</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. Based on these features, a new strategy combining the prediction of miRNA and their target prediction was introduced: first, they identified conserved motifs in 3'-UTRs; second, they regarded these conserved motifs as candidate seed matches derived from miRNA binding sites and then used them to search for complementary sites in the genome; finally, two ~100 nt sequences were extracted according to each matched locus and miRNAs were predicted from these ~100 nt sequences [
<xref ref-type="bibr" rid="B23">23</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
].</p>
<p>Comparative genomic methods are useful to identify conserved sequence motifs [
<xref ref-type="bibr" rid="B23">23</xref>
-
<xref ref-type="bibr" rid="B30">30</xref>
]. Most of studies only focus on motifs in the promoter regions. Seed matches corresponding to miRNA binding sites in 3'-UTRs have several features: 1) the length of conserved sites is 7 or 8 nt; 2) tens of different mRNAs contain a same seed match, and may be regulated by the same miRNA; 3) many seed matches are highly conserved in 3'-UTRs [
<xref ref-type="bibr" rid="B15">15</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. Xie et al. and Chan et al. presented different algorithms to analyze "common regulatory motifs" in 3'-UTRs. Xie et al. presented a motif conservation score (MCS) to identify frequently-occurred and conserved motifs in 3'-UTRs from 4-way alignments of mammals and predicted 207 human miRNAs based on the identified motifs in 3'-UTRs [
<xref ref-type="bibr" rid="B23">23</xref>
]. The MCS scoring method only considers the conservation ratio of motifs. The motifs with small counts may have pseudo higher or lower conservation ratios during the evolutions (according to the law of large numbers). These motifs will produce noises when identifying conserved motifs. Chan et al. used a non-alignment based method (FastCompare) to identify conserved k-mers in worm and fly [
<xref ref-type="bibr" rid="B24">24</xref>
,
<xref ref-type="bibr" rid="B31">31</xref>
]. This method can avoid the problem of misaligning homologous sequences. But this method needs a set of known homologous genes to start the analysis. Many latest sequenced genomes do not have accurate annotations of gene regions. For example, in
<italic>Drosophila</italic>
, at this time, nine genomes have been sequenced, but only
<italic>D. melangoster </italic>
and
<italic>D. pseudoobscura </italic>
gene annotations are available.</p>
<p>In this work, we presented a new scoring system and a pattern recognition method, which can identify "conserved motifs" which have high conservation ratio and frequent occurrences in aligned 3'-UTRs. We introduced a pairwise conservation score (PCS) to evaluate the "conservation" of 16,384 7-mers independently in the 3'-UTRs of 6 pairs of flies. Then we developed a support vector machine (SVM) ensemble, named as Cons-SVM, to identify conserved 7-mers having similar conservation patterns with the reference seed matches along the phyla. We identified 689 conserved 7-mers including 65 out of 86 reference seed matches (seed matches, the 7-mers complementary to the 1–7 nt and 2–8 nt of mature miRNAs). Following study showed that Cons-SVM has higher sensitivity than previous methods for identifying seed-match-like conserved 7-mers.</p>
<p>The second stage of our work was to identify miRNA candidates based on the 689 conserved 7-mers. Introducing the seed match information into released miRNA prediction methods can increase the specificity and can also more accurately predict the 5'-ends of mature parts on the predicted pre-miRNA candidates. Different to previous studies using the similar strategy, we designed a more detail method to identify pre-miRNA candidates and the corresponding mature parts. We first explored whether the 90 nt flanking sequences having the complementary sites to any conserved 7-mer in the whole genome can form conserved stem-loops. Using the miRNA prediction method RNAmicro [
<xref ref-type="bibr" rid="B14">14</xref>
], we identified 97 pre-miRNAs including 41 new pre-miRNA candidates not collected in miRBase. Then, we introduced several rules to annotate the mature parts in the predicted pre-miRNA candidates. 47 mature miRNA candidates are identified on the 41 pre-miRNA candidates. Eight/seven of them can find homologies in mosquito/honeybee. Then we predicted the target genes of any miRNA candidate simply by investigating whether the 3'-UTRs of specific genes have one or more conserved seed matches of that candidate.</p>
<p>The two-stage method successfully identified many new miRNA candidate and their binding sites in 3'-UTRs, revealing extensive miRNA:mRNA interactions in fly.</p>
</sec>
<sec>
<title>Results and Discussions</title>
<p>We used a two-stage method to identify conserved 7-mers in 3'-UTRs and miRNA:mRNA interactions in
<italic>Drosophila </italic>
(Figure
<xref ref-type="fig" rid="F1">1</xref>
). In the first stage, conserved 7-mers were identified by considering the multiple pairwise conservations of 16,384 (4
<sup>7 </sup>
= 16,384) 7-mers in seven flies' 3'-UTRs. In the second stage, the conserved 7-mers were used to search for pre-miRNA candidates in the whole genome. Then the 5'-ends of mature miRNA candidates were annotated based on sequence features. Finally, the target genes of the miRNA candidates were analyzed.</p>
<fig position="float" id="F1">
<label>Figure 1</label>
<caption>
<p>
<bold>The flowchart of the method</bold>
. The whole method consists of two stages: in the first stage, conserved 7-mers are identified by considering all 7-mers' conservation patterns in six pairs of flies; in the second stage, pre-miRNAs and mature miRNAs are predicted by adding seed-matching information into published miRNA prediction methods in the whole genome.</p>
</caption>
<graphic xlink:href="1471-2105-8-432-1"></graphic>
</fig>
<sec>
<title>The reference dataset</title>
<p>In this work, seven flies were studied (the abbreviated and full names of the studied organisms:
<italic>D. melanogaster</italic>
, Dme;
<italic>D. simulans</italic>
, Dsi;
<italic>D. yakuba</italic>
, Dya;
<italic>D. ananassae</italic>
, Dan;
<italic>D. pseudoobscura</italic>
, Dps;
<italic>D. mojavensis</italic>
, Dmo;
<italic>D. virilis</italic>
, Dvi). For the 78 mature miRNAs collected in miRBase, 59 miRNAs are identified by cloning, 16 are only verified by northern blotting and the other 3 are predicted by sequence homologies [
<xref ref-type="bibr" rid="B3">3</xref>
-
<xref ref-type="bibr" rid="B5">5</xref>
]. The 5'-ends of the 59 miRNAs identified by cloning (corresponding to 61 unique pre-miRNAs) are accurately determined, so we used them as the references (Table S1, Additional file
<xref ref-type="supplementary-material" rid="S4">4</xref>
). We extracted a set of 86 non-redundant seed sequences according to the 1–7 nt and 2–8 nt of the 59 miRNAs. The 59 miRNAs can be clustered into 40 unique families based on their seed sequence similarities. The 86 non-redundant seed matches fully complementary to miRNA seed sequences were used as positive samples in the following analysis.</p>
</sec>
<sec>
<title>The conservation ratio and the count of seed matches in 3'-UTRs</title>
<p>To investigate the two "variables" of seed matches, we compared the conservation ratios (conservation ratio for a 7-mer is defined as its count in the conserved regions divided by the count in the original sequences) and the number of occurrences in 3'-UTRs among three defined datasets: the "seed matches" dataset containing the 86 non-redundant reference seed matches, the "shuffled seed matches" dataset (having the same nucleotide content as the seed matches dataset, every seed match was shuffled 5 times), and the "all 7-mers" dataset.</p>
<p>Seed matches tend to be conserved in 3'-UTRs [
<xref ref-type="bibr" rid="B15">15</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B20">20</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. In our study, we also found that the conservation ratios for seed matches are much larger than those for the other 7-mers. Requiring Dme-Dps pairwise conservation, the mean of the conservation ratios of the 7-mers in the "seed matches" (0.2816) dataset is significantly higher than that in the "shuffled seed matches" dataset (0.1160, p-value: 0) and in the "all 7-mers" dataset (0.1151, p-value: 0). Tens of different mRNAs contain the same seed match, and may be regulated by the same miRNA [
<xref ref-type="bibr" rid="B15">15</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
]. We wondered whether seed matches have more occurrences than other 7-mers. In the original Dme 3'-UTRs, the mean of the count of 7-mers in the "seed matches" dataset (278.7) is weakly higher than that in the "shuffled seed matches" dataset (257.9, p-value: 0.2455) and in the "all 7-mers" dataset (236.9, p-value: 9.4118e-004). These results suggest that the higher conservation ratios is an effective feature to identify seed-match-like 7-mers in 3'-UTRs, while the more occurrences may help identify the seed matches with excessive counts but reduce the sensitivity for the seed matches with depletive counts (The histograms of the conservation ratios and the number of occurrences are shown in Figure S1, Additional file
<xref ref-type="supplementary-material" rid="S1">1</xref>
). Wilcoxon rank sum test was used to test the mean difference.</p>
</sec>
<sec>
<title>Computation of pairwise conservation scores for all 7-mers</title>
<p>We introduced a pairwise conservation score (PCS), which is defined as the log rank ratio between the counts in the original Dme 3'-UTRs and in the pairwise-conserved 3'-UTRs, to evaluate the "conservation" of each 7-mer in a pair of flies (see detail in Methods). Zero PCS means that a 7-mer is under neutral evolution and larger PCS means that a 7-mer is more conserved. The PCS favours the 7-mer with higher conservation ratio and also weakly prefers more occurrences. Take a reference miR-12 as the example. The 7-mer seed match complementary to the 1–7 nt of miR-12, "ATACTCA", has 292 occurrences in Dme, with 71 conserved in the Dme-Dps pair. However, a non-seed-match 7-mer, "ATACTTG", has 287 occurrences in Dme, but only 26 conserved in the Dme-Dps pair. Another non-seed-match 7-mer, "GTAGGCC", has similar 24.6% (15/61) occurrences conserved, but only 15 occurrences can be found in the Dme-Dps pair. The PCS of the seed match "ATACTCA" is 0.889, larger than the PCSs of "ACGTCAC" and "GTAGGCC" -0.424 and 0.498, respectively.</p>
<p>3'-UTRs are highly AU-rich (in the studied 3'-UTR set, AU-content 62.6%). Different AU-contents of different 7-mers may bias their corresponding PCSs. We compared the mean value (in Dme-Dps pair) and the distribution of PCSs among the three datasets defined in the previous section. The PCSs of the "seed matches" dataset have significantly higher mean value (0.97) than that of the "shuffled seed matches" dataset (-0.042) and the "all 7-mers" dataset (8.3e-006). While the distribution of the PCSs of the "shuffled seed matches" dataset shows no significant difference with that of the "all 7-mers" dataset (p value: 0.1262). The near zero mean value of the PCSs of the "shuffled seed matches" dataset and the similar distribution of the PCSs between the "shuffled seed matches" dataset and the "all 7-mers" dataset suggest that the shuffled seed matches have similar PCSs as the background (all 7-mers). In summary, the PCSs of the "seed matches" dataset differentiate significantly with those of the "all 7-mers" dataset (background), but the PCSs of the "shuffled seed matches" dataset, having the same nucleotide-content with the "seed matches" dataset, show no significant difference with the "all 7-mers" dataset (background). This result indicates that the nucleotide content does not bias the PCSs of different 7-mers. Wilcoxon rank sum test was used to test the mean difference, and two-sample Kolmogorov-Smirnov goodness-of-fit hypothesis test was used to test the distribution difference.</p>
<p>For all 16,384 (4
<sup>7</sup>
) 7-mers, we computed their PCSs in 6 pairs of flies (Dme-Dsi, Dme-Dya, Dme-Dan, Dme-Dps, Dme-Dmo and Dme-Dvi pairs). The histograms and tables of PCSs (Figure S2, Additional file
<xref ref-type="supplementary-material" rid="S2">2</xref>
) and conservation ratios (Figure S3, Additional file
<xref ref-type="supplementary-material" rid="S3">3</xref>
) were show in Table S2 (Additional file
<xref ref-type="supplementary-material" rid="S5">5</xref>
). In the Dme-Dsi pair, the distribution of the PCSs of the 86 reference seed matches is indistinguishable from that of all 7-mers and located around 0, although the PCSs of the seed matches tend to be larger than 0. As the evolution distance increases, the PCSs of the seed matches disperse gradually. The numbers of seed matches scoring higher than 0 are 75, 78, 80, 80, 80 and 79 in the six pairs of flies, and 80 for the average score.</p>
</sec>
<sec>
<title>Identification of "conserved" 7-mers by Cons-SVM</title>
<p>To identify "conserved" 7-mers having similar conservation patterns with seed matches, we combined the 6 PCSs, which characterize the conservation pattern of each 7-mer in the
<italic>Drosophila </italic>
phyla, to form a feature vector and then we developed a SVM classifier ensemble to identify the 7-mers having the similar conservation pattern with the 86 reference seed matches.</p>
<p>We used all the PCSs of each 7-mers in 6 different pairs of flies as the features to describe their conservations in the seven studied flies. The 86 seed matches derived from the 59 reference miRNAs were used as positive training samples. Another 86 7-mers randomly sampled from all the other 7-mers were used as negative training samples. The SVM classifier was trained based on these two sample sets. Then the trained SVM was used to classify all the 16,384 7-mers into conserved 7-mers and non-conserved ones. To control the variations of the randomly sampling for the negative samples, we repeated the sampling 500 times and trained 500 SVMs. The outputs of the 500 SVMs were combined as a classifier ensemble by a voting strategy. To reduce false positives, we used a stringent voting strategy that a sample was classified as positive only if it was classified as positive in all 500 SVMs. We call the classifier ensemble as Cons-SVM.</p>
<p>Applying Cons-SVM on all the 16,384 7-mers, 689 of them were classified as positive, including 65 reference seed matches. To estimate the false positives of identifying "conserved" 7-mers, we repeated the same approach on the random dataset: PCSs were computed from the random 3'-UTRs, and then the same Cons-SVM was applied on the combined PCS vectors. All the samples derived from the random dataset should be negative samples. So we regarded the identified 56 7-mers derived from the random dataset as false positives. The false positive rate was estimated as 8.1% (56/689). We then used a cross validation method (see detail in Methods section) to test the sensitivity of the Cons-SVM. 63 out of 86 seed matches were identified as positives (Sensitivity 73.3%). These 63 seed matches could match 52 out of 59 reference miRNAs (33 out of 40 miRNA families, sensitivity 82.5%). Cons-SVM has achieved high sensitivity for identifying "conserved" 7-mers with similar conservation patterns of reference seed matches in 3'-UTRs, but a few real seed matches with weaker conservation and lower count will be missed.</p>
</sec>
<sec>
<title>Comparisons with other methods</title>
<p>We next compared the results with the MCS algorithm and the FastCompare algorithm [
<xref ref-type="bibr" rid="B23">23</xref>
,
<xref ref-type="bibr" rid="B24">24</xref>
]. The MCS algorithm quantified the extent of excess conservation of a motif by considering that the observed conservation rate of the motif exceeds the conservation rate for comparable random motifs in multiple alignments. The FastCompare algorithm used network-level conservation [
<xref ref-type="bibr" rid="B31">31</xref>
] to evaluate the conservation of k-mers in pairs of species.</p>
<p>We used the 689 highest scoring 7-mers from the other two methods to compare with our results of Cons-SVM. Results show that Cons-SVM has higher sensitivity than FastCompare and the MCS algorithm. Because only Dme-Dps conservation information was used in the FastCompare algorithm, we also compared the performance of the three algorithms under the same condition (we used PCSs computed only in Dme-Dps pair instead of Cons-SVM). Results show that the PCS scoring method also shows higher sensitivity than the other two algorithms (Table
<xref ref-type="table" rid="T1">1</xref>
). The specificities were not compared because these algorithms needed different strategies to produce randomized data: in our work we randomized the Dme 3'-UTRs and 6 pairwise alignments, for the MCS algorithm we should randomize the multiple alignments and for the FastCompare we should randomize the 3'-UTRs of the two studied species.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Performance comparisons of different algorithms</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left">The organisms selected for analysis</td>
<td align="left">The algorithm</td>
<td align="left">The number of Identified reference seed matches
<sup>1</sup>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Dme Dsi Dya Dan Dps</td>
<td align="left">Cons-SVM</td>
<td align="left">63
<sup>2 </sup>
(33
<sup>3</sup>
)</td>
</tr>
<tr>
<td align="left">Dmo Dvi</td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">Dme Dsi Dya Dan Dps</td>
<td align="left">MCS [23]</td>
<td align="left">58(29)</td>
</tr>
<tr>
<td align="left">Dmo Dvi</td>
<td></td>
<td></td>
</tr>
<tr>
<td align="left">Dme Dps</td>
<td align="left">FastCompare [24]</td>
<td align="left">52(29)</td>
</tr>
<tr>
<td align="left">Dme Dps</td>
<td align="left">PCS
<sup>4</sup>
</td>
<td align="left">59(32)</td>
</tr>
<tr>
<td align="left">Dme Dps</td>
<td align="left">MCS [23]</td>
<td align="left">47(26)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>1 </sup>
Because Cons-SVM identifies 689 candidate seed matches, we test the performances of different algorithms when selecting the 689 highest ranking 7-mers.</p>
<p>
<sup>2 </sup>
This number is obtained by LOOCV (Leave One Out Cross Validation). The number of identified reference seed matches is 65 when classification.</p>
<p>
<sup>3 </sup>
The numbers in the parenthesis indicate how many miRNA families are identified according to the identified seed matches.</p>
<p>
<sup>4 </sup>
We only used the PCSs computed from Dme-Dps pairs and used the 689 highest-score 7-mers in the analysis.</p>
</table-wrap-foot>
</table-wrap>
<p>For the 689 high-scoring 7-mers, 277 are identified by all three methods, 236 are identified by only two different methods, and 764 7-mers are identified by only one method. This result suggests that the three methods extract different information from the genomic data and new experimental data are needed to evaluate the accuracy of the three methods. But the result is much more consistent for identifying miRNA seed matches, 46 seed matches (25 families) are identified by all three methods, 12 (5 families) are identified by only two different methods, and 11 (6 families) are identified by only one method (Figure
<xref ref-type="fig" rid="F2">2</xref>
). Tabulated details for each reference miRNA are presented in Table S3 (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
).</p>
<fig position="float" id="F2">
<label>Figure 2</label>
<caption>
<p>
<bold>Comparions of the results using the three methods</bold>
. The number in each block indicates the corresponding number of 7-mers in that part. The number in the parenthesis indicates the number of reference miRNA families in that block.</p>
</caption>
<graphic xlink:href="1471-2105-8-432-2"></graphic>
</fig>
</sec>
<sec>
<title>Prediction of pre-miRNAs</title>
<p>3'-UTRs contain many other conserved regulatory elements except miRNA seed matches. The AU-rich elements (UAAUUUA, UUAUUUA), the proneural box (aauggaAGACAAU), and the alcohol dehydrogenase 3'-UTR downregulation control element (AAGGCUGa) can also be found in the 689 identified conserved 7-mers. What remains to be answered is how many conserved 7-mers are potentially miRNA target sites. We implemented genome-wide miRNA predictions using two published miRNA prediction methods while introducing one additional feature: whether the predicted miRNAs have at least one conserved site complementary to one of the identified 689 conserved 7-mers.</p>
<p>All the 689 identified conserved 7-mers were searched in Dme's genome in both strands excluding all annotated exons, tRNAs, snRNAs, rRNAs and other noncoding gene regions. In each matched locus, two 90 nt sequences were extracted, one from -15 nt to +75 nt and the other from -55 nt to +35 nt. We filtered these sequences with free energy and basic stem-loop structural features (see details in Methods) and then predicted pre-miRNA candidates by two miRNA prediction methods triplet-SVM and RNAmicro. The two methods are chosen, because triplet-SVM shows higher sensitivity while RNAmicro has higher specificity [
<xref ref-type="bibr" rid="B12">12</xref>
,
<xref ref-type="bibr" rid="B14">14</xref>
]. Then we analyzed whether these predicted pre-miRNAs loci were conserved in each pair of flies: the pairwise alignments of each pre-miRNA were extracted from whole-genome pairwise alignments of each pair of flies (the data downloaded from the UCSC Genome Browser ftp site); a predicted pre-miRNA locus was regarded as conserved between the two flies, if 1) the corresponding regions are aligned in the UCSC pairwise alignments, 2) the "seed" sequences (the 7-nt fully complementary to any conserved 7-mer) was totally identical [
<xref ref-type="bibr" rid="B7">7</xref>
], and 3) the aligned sequence of the second organism was also predicted as a pre-miRNA by the miRNA prediction method. A predicted pre-miRNA locus was taken for following analysis, if it was conserved in at least four pairs of flies. Then the pre-miRNA candidates overlapped in their genome locations were clustered into a single miRNA locus. The locus with the minimal free energy was selected as the representative pre-miRNA candidate of the cluster. Due to limited space, here we only presented the results when we used the miRNA prediction method RNAmicro. The results using triplet-SVM is reported via our website [
<xref ref-type="bibr" rid="B45">45</xref>
].</p>
<p>According to the above steps, we identified 97 pre-miRNA candidates (using RNAmicro) including 46 pre-miRNAs in the 61 reference pre-miRNAs (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). In the 15 missed reference pre-miRNAs, 4 did not pass the pre-processing filter due to their predicted double-loop structures (mir-2c, mir-31a, mir-31b, mir-286) and another 2 due to their low predicted free energy (mir-309, mir-311). So the sensitivity on the reference set should be 83.6% (46/55). Another set of 7 pre-miRNAs collected by miRBase are also identified. For the remaining 44 predicted pre-miRNAs, 3 are mapped to the minus strand of reference pre-miRNAs (mir-5, mir-9c, mir-iab-4), and the other 41 are new pre-miRNA candidates which we named as "dme-pmir-1" to "dme-pmir-41" (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
). Three pre-miRNAs candidates are located in alternative regions of protein-coding genes: pmir-29 (intron:-:Brf-RA|exon:+:CG5319-RA), pmir-13 (exon:+:Glycogenin-RB|intron:+:Glycogenin-RA) and pmir-18 (intron:-:CG9238-RA|exon:-:CG9238-RB). In human, mir-17~92-2 cluster is located in an alternative region of a protein-coding gene and the miRNAs in the cluster may be related to cancers [
<xref ref-type="bibr" rid="B32">32</xref>
,
<xref ref-type="bibr" rid="B33">33</xref>
]. So we kept the three predictions.</p>
<p>Lai et al. reported the 210 top scoring pre-miRNA candidates using the miRSeeker pipeline [
<xref ref-type="bibr" rid="B5">5</xref>
]. They identified 47 reference pre-miRNAs and a set of 9 pre-miRNAs also collected in miRBase, in which 40 and 4 are identified by our method, respectively. For their remaining predictions, 15 candidate miRNAs can also be predicted by our method. Chan et al. predicted 92 pre-miRNAs in their work [
<xref ref-type="bibr" rid="B24">24</xref>
]. They only predicted 12 reference pre-miRNAs. And for the remaining 80 new predictions, 10 are overlapped with Lai et al. and 4 with our method (pmir-26-5, pmi-24a; pmir-9-5, pmi-287a; pmir-16-5, pmi-238a; pmir-5-3, pmi-148c). Only 2 predictions are reported in all the three methods (rank 24, pmir-16-5, pmi-238a; rank 57, pmir-5-3, pmi-148c). The results indicate that our method for identifying miRNAs has high sensitivity, but the specificity remains unclear due to limited consistency of the predictions.</p>
</sec>
<sec>
<title>Identifications of mature miRNAs</title>
<p>Next, we annotated the mature parts on the two arms of the predicted pre-miRNAs. We observed that the conserved 7-mers matched to the 1–7 nt and 2–8 nt of known mature miRNAs were significantly more than those matched to other loci (Figure
<xref ref-type="fig" rid="F3">3</xref>
). We also observed that the first nucleotide of mature miRNAs favoured "U" (Figure
<xref ref-type="fig" rid="F4">4</xref>
). This phenomenon was reported in an early study in
<italic>C. elegans </italic>
[
<xref ref-type="bibr" rid="B34">34</xref>
]. Based on these two observations, we introduced several rules to identify the 5'-end of mature parts on the two arms of the predicted pre-miRNA candidates. We investigated the number of conserved sites complementary to the conserved 7-mers and whether these conserved sites having "U" as the first nucleotide (see details in Methods).</p>
<fig position="float" id="F3">
<label>Figure 3</label>
<caption>
<p>
<bold>The 689 conserved 7-mers identified by Cons-SVM matching with the 59 reference miRNAs</bold>
. Much more sites matched with the 1–7 nt or 2–8 nt of the mature miRNAs.</p>
</caption>
<graphic xlink:href="1471-2105-8-432-3"></graphic>
</fig>
<fig position="float" id="F4">
<label>Figure 4</label>
<caption>
<p>
<bold>The nucleotide composition of the 59 reference miRNAs</bold>
. The 5' first nucleotide of mature miRNAs significantly favours "U". Other sites do not show similar nucleotide bias. The logo plot is produced by WebLogo [49].</p>
</caption>
<graphic xlink:href="1471-2105-8-432-4"></graphic>
</fig>
<p>The identified 46 reference pre-miRNAs contain 43 unique reference mature miRNAs (33 families). Following the rules presented in Methods, we correctly predicted the 5'-ends (first or the second nucleotide) of 33 mature miRNAs (27 families), with accuracy 76.7% (33/43) (Table S3, Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
). In the 33 correct predictions, we retrieve 29 exact 5'-ends (23 families) and the 4 off by +1 nt. MiR-133, miR-219, miR-263a, miR-274, miR-281-2*, miR-282, miR-283 and miR-310 which are also collected in miRBase without cloning evidence, are also identified. But the predicted 5'-ends of miR-263a, miR-274, miR-282 and miR-283 are very different with current annotations. We predicted that they should start at the 6th, 3rd, 4th and 3rd nucleotide of the current annotated mature sequences, respectively. The four miRNAs are computationally predicted and validated by northern blot [
<xref ref-type="bibr" rid="B5">5</xref>
]. The sequence lengths of the four miRNAs are much longer than other miRNAs (24, 26, 28 and 21 nt long, respectively). The results suggest that the accurate 5'-ends of these miRNAs should be further validated. Two pre-miRNAs only identified mature miRNAs on the star (*) arm (mir-10, mir-285). And ten pre-miRNAs also predicted mature parts on the star (*) arm (mir-305, mir-79, let-7, mir-2a-2, mir-8, mir-7, mir-9a, mir-316, mir-34, mir-12).</p>
<p>Using RNAmicro software, we identified 41 pre-miRNA candidates. And then we predicted 47 mature miRNA candidates on these candidates. Three of the mature candidates have sequence homologies to known miRNAs in other species and 8/7 can find homologies in mosquito/honeybee's genome (Table
<xref ref-type="table" rid="T2">2</xref>
). Detail information for all predicted pre-miRNA candidates and their corresponding mature parts is all presented in Table S4 (Additional file
<xref ref-type="supplementary-material" rid="S7">7</xref>
).</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>The list of predicted miRNAs which have homologies with other known miRNAs or conserved in other insects</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="center">
<bold>MiRNA</bold>
</td>
<td align="left">
<bold>Mature Sequnece</bold>
</td>
<td align="left">
<bold>Genomic location</bold>
</td>
<td align="left">
<bold>Other</bold>
</td>
<td align="center">
<bold>Ag</bold>
</td>
<td align="center">
<bold>Am</bold>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="center">pmir-1</td>
<td align="left">TAAGCGTAtagcttttcccct</td>
<td align="left">chr2L:Minus:Intron
<break></break>
243041–243130</td>
<td align="center">Rank#197
<sup>a</sup>
</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-11</td>
<td align="left">TTATTGCTtgagaatacacgt</td>
<td align="left">chr2R:Minus:Intergenic
<break></break>
11580118–11580207</td>
<td align="center">tni-miR-137
<break></break>
Rank#55</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-16</td>
<td align="left">GATATGTttgatattcttggt</td>
<td align="left">chr3L:Plus:Intron
<break></break>
8545755–8545844</td>
<td align="center">cbr-miR-50
<break></break>
Rank#24
<break></break>
pmi-238a
<sup>b</sup>
</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-20</td>
<td align="left">AATTGACTctagtagggagtc</td>
<td align="left">chr3R:Plus:Intron
<break></break>
121093–121182</td>
<td align="center">Rank#5</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-26</td>
<td align="left">TAAGTACtagtgccgcaggag</td>
<td align="left">chr3R:Minus:Intron
<break></break>
9289943–9290032</td>
<td align="center">cel-mir-252
<break></break>
cbr-mir-252
<break></break>
pmi-24a</td>
<td align="center">+</td>
<td></td>
</tr>
<tr>
<td align="center">pmir-29</td>
<td align="left">ATGCAACgttgctgggaagtg</td>
<td align="left">chr3R:Plus:Intron
<break></break>
13213562–13213651</td>
<td></td>
<td></td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-31</td>
<td align="left">TGTTAACtgtaagactgtgtc</td>
<td align="left">chr3R:Minus:Intron
<break></break>
17623957–17624046</td>
<td></td>
<td align="center">+</td>
<td></td>
</tr>
<tr>
<td align="center">pmir-33</td>
<td align="left">TATTGTCCtgtcacagcagta</td>
<td align="left">chr3R:Minus:Intergenic
<break></break>
21414590–21414679</td>
<td align="center">Rank#119</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
<tr>
<td align="center">pmir-37</td>
<td align="left">TTCGTTGTcgacgaaacctgc</td>
<td align="left">chrX:Minus: Intergenic
<break></break>
1645018–1645107</td>
<td align="center">Rank#15</td>
<td align="center">+</td>
<td align="center">+</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>
<sup>a </sup>
The miRNA candidates also predicted by Lai et al [5].</p>
<p>
<sup>b </sup>
The miRNA candidates also predicted by Chan et al [24].</p>
</table-wrap-foot>
</table-wrap>
</sec>
<sec>
<title>Analysis of miRNA targets</title>
<p>We predicted the target genes of the 47 predicted mature miRNAs candidates simply by investigating whether the conserved regions (conserved in Dme-Dps pair) of the 3'-UTRs of specific genes contains one or more seed matches of each miRNA. Then we used GeneMerge to analyze the function enrichments of the target genes for each miRNA. GeneMerge is a program which can provide statistical rank scores for over-representation of particular GO categories [
<xref ref-type="bibr" rid="B36">36</xref>
-
<xref ref-type="bibr" rid="B38">38</xref>
] for a given set of genes [
<xref ref-type="bibr" rid="B35">35</xref>
]. Significant functional categories (Bonferroni corrected p-value < 0.001) are reported in Table
<xref ref-type="table" rid="T3">3</xref>
. The target genes of 5 miRNA candidates are enriched in transcriptional activity (pmiR-7-5, 8-5, 10-3, 15-5, 32-5), and 2 are enriched in protein binding (pmiR-3-5, 25-5).</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>The list of predicted miRNAs which have significant GO categories (with Bonferroni corrected P-value less than 0.001)</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td align="left">
<bold>MiRNA</bold>
</td>
<td align="left">
<bold>GO Category Description</bold>
</td>
<td align="left">
<bold>P-value</bold>
</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left">pmiR-3-5</td>
<td align="left">protein binding</td>
<td align="left">1.59E-03</td>
</tr>
<tr>
<td align="left">pmiR-5-3</td>
<td align="left">receptor activity</td>
<td align="left">8.77E-03</td>
</tr>
<tr>
<td></td>
<td align="left">cell adhesion molecule binding</td>
<td align="left">1.45E-04</td>
</tr>
<tr>
<td align="left">pmiR-7-5</td>
<td align="left">transcription factor activity</td>
<td align="left">5.07E-03</td>
</tr>
<tr>
<td align="left">pmiR-8-5</td>
<td align="left">transcription factor activity</td>
<td align="left">2.70E-03</td>
</tr>
<tr>
<td></td>
<td align="left">specific RNA polymerase II transcription factor activity</td>
<td align="left">1.61E-03</td>
</tr>
<tr>
<td></td>
<td align="left">structural constituent of cytoskeleton</td>
<td align="left">7.38E-04</td>
</tr>
<tr>
<td align="left">pmiR-10-3</td>
<td align="left">DNA binding</td>
<td align="left">9.60E-03</td>
</tr>
<tr>
<td></td>
<td align="left">SH3 domain binding</td>
<td align="left">3.19E-03</td>
</tr>
<tr>
<td></td>
<td align="left">specific RNA polymerase II transcription factor activity</td>
<td align="left">2.14E-03</td>
</tr>
<tr>
<td align="left">pmiR-13-3</td>
<td align="left">cell adhesion molecule binding</td>
<td align="left">1.78E-03</td>
</tr>
<tr>
<td align="left">pmiR-15-5</td>
<td align="left">transcription factor activity</td>
<td align="left">4.73E-07</td>
</tr>
<tr>
<td></td>
<td align="left">RNA polymerase II transcription factor activity</td>
<td align="left">6.85E-07</td>
</tr>
<tr>
<td></td>
<td align="left">protein serine/threonine kinase activity</td>
<td align="left">6.71E-05</td>
</tr>
<tr>
<td align="left">pmiR-24-3</td>
<td align="left">DNA binding</td>
<td align="left">2.93E-03</td>
</tr>
<tr>
<td></td>
<td align="left">structural constituent of cytoskeleton</td>
<td align="left">8.08E-03</td>
</tr>
<tr>
<td align="left">pmiR-25-5</td>
<td align="left">protein binding</td>
<td align="left">6.78E-03</td>
</tr>
<tr>
<td align="left">pmiR-28-3</td>
<td align="left">guanyl-nucleotide exchange factor activity</td>
<td align="left">7.01E-03</td>
</tr>
<tr>
<td align="left">pmiR-31-3</td>
<td align="left">potassium channel activity</td>
<td align="left">7.01E-03</td>
</tr>
<tr>
<td align="left">pmiR-32-5</td>
<td align="left">specific RNA polymerase II transcription factor activity</td>
<td align="left">3.75E-04</td>
</tr>
<tr>
<td align="left">pmiR-36-5</td>
<td align="left">phosphatidylcholine-sterol O-acyltransferase activity</td>
<td align="left">1.61E-03</td>
</tr>
<tr>
<td align="left">pmiR-39-5</td>
<td align="left">receptor binding</td>
<td align="left">5.22E-03</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We also analyzed the target and anti-target gene groups with the same GO categories [
<xref ref-type="bibr" rid="B39">39</xref>
]. We calculated the significance of seed enrichment in specific groups of genes for three datasets: the 59 reference miRNAs, the 47 new miRNAs candidates and the 9 candidates with additional conservation in mosquito or honeybee (see detail in Method). We curiously found that the target and anti-target groups are not consistent within the reference miRNAs and the new candidates. Many GO categories enriched for seed matches of the reference miRNAs (target groups), such as
<italic>nervous system development</italic>
,
<italic>regulation of transcription from RNA polymerase II promoter </italic>
and
<italic>DNA binding</italic>
, are not enriched for the seed matches of the new candidates.
<italic>Eye development </italic>
(corrected p-value: 0.0022143) and
<italic>integral to membrane </italic>
(corrected p-value: 0.05618) are the two top target GO categories of the new candidates. For anti-target GO categories,
<italic>structural constituent of ribosome </italic>
genes avoid both the seed matches of the reference miRNAs and the new candidates;
<italic>DNA binding </italic>
(corrected p-value: 0.045038) and
<italic>specific RNA polymerase II transcription factor activity </italic>
(corrected p-value: 0.089004) genes even significantly avoid the seed matches of the 9 ultra-conserved candidates. This difference is an interesting problem and still needs further study to answer it. Detail results are presented in Table S5 (Additional file
<xref ref-type="supplementary-material" rid="S8">8</xref>
).</p>
</sec>
</sec>
<sec>
<title>Conclusion</title>
<p>To reveal miRNA-directed posttranscriptional regulations in
<italic>Drosophila</italic>
, we used a two-stage method. We first used the conservation pattern along the phyla to identify conserved 7-mers. A pairwise conservation score (PCS) was introduced to describe the pairwise conservation of all 7-mers. Then a SVM ensemble was developed to combine the PCSs in 6 different pairs of flies. We identified 689 conserved 7-mers in the first stage. In the second stage, we tried to identify the candidate seed matches potentially involved in miRNA regulations and their corresponding miRNAs. We used all the identified 7-mers to search for pre-miRNAs. Then we manually annotated predicted miRNA genes and the 5'-ends of mature miRNAs according to conservation and sequence information. Finally, we identified 47 miRNA candidates. Target genes of each miRNA candidate were analyzed. Results show that many target and anti-target GO categories are different between the known miRNAs and the new predictions.</p>
</sec>
<sec sec-type="methods">
<title>Methods</title>
<sec>
<title>The sequences of the genomes and the 3'-UTRs</title>
<p>The genomes of
<italic>D. melanogaster </italic>
(dm2), the pairwise alignments of 6 pairs of flies (Dme-Dsi, Dme-Dya, Dme-Dan, Dme-Dps, Dme-Dmo and Dme-Dvi; following genome assemblies were used to construct the alignments: dm2, droSim1, droYak, droAna, dp3, droMoj1 and droVir1) were downloaded from UCSC Genome Browser ftp site [
<xref ref-type="bibr" rid="B40">40</xref>
,
<xref ref-type="bibr" rid="B41">41</xref>
]. The Dme 3'-UTRs were extracted from the genome sequences according to flybase 3'-UTR annotations (version 4.2.1). The pairwise alignments of 3'-UTRs were extracted from the whole genome pairwise alignments also according to the flybase annotations. If multiple 3'-UTRs existed in a single gene, the 3'-UTRs were merged as one sequence with the maximum coverage. We finally constructed a 3'-UTRs dataset containing 9,803 fly genes.</p>
<p>The mosquito (anoGam1) and honeybee (apiMel2) genomes were also downloaded from UCSC Genome Browser ftp site.</p>
<p>The Dme 3'-UTRs were randomized using python scripts written by Peter Clote for the Altschul-Erikson algorithm [
<xref ref-type="bibr" rid="B42">42</xref>
]. The pairwise alignments of 3'-UTRs were randomized using Perl scripts written by Stefan Washietl [
<xref ref-type="bibr" rid="B43">43</xref>
].</p>
</sec>
<sec>
<title>The sequences of miRNAs</title>
<p>The sequences 78 pre-miRNAs and 78 mature miRNAs were downloaded from miRBase (Version 8.0) [
<xref ref-type="bibr" rid="B44">44</xref>
]. In the 78 mature miRNAs, 59 are identified by cloning, 16 are computational predicted and validated by northern blotting, and the other 3 are verified by distant homologies. The list of the 59 cloning-identified miRNAs can be found in Table S3 (Additional file
<xref ref-type="supplementary-material" rid="S6">6</xref>
).</p>
<p>The 59 cloning-identified miRNAs and corresponding 61 unique pre-miRNAs were used as the reference dataset in this work. The seed matches of each of the miRNAs were derived from the full complementary sequences to 1–7 nt and 2–8 nt of the 59 miRNAs. The seed matches with the same sequences were only considered once.</p>
</sec>
<sec>
<title>Pairwise conservation score (PCS)</title>
<p>The pairwise conservation score (PCS) is defined as follows,</p>
<p>
<disp-formula>
<graphic xlink:href="1471-2105-8-432-i1.gif"></graphic>
</disp-formula>
</p>
<p>r
<sub>k0 </sub>
is the rank of the number of the occurrences of the studied 7-mer in Dme 3'-UTRs, and r
<sub>ki </sub>
is the rank of the number of the occurrences of the studied 7-mer in the studied pairwise alignments of 3'-UTRs. Larger PCS for a k-mer means that larger portion and more number of the k-mer sites are left after evolution. The Perl script to compute PCSs is available for free download via our website [
<xref ref-type="bibr" rid="B45">45</xref>
].</p>
</sec>
<sec>
<title>Cons-SVM</title>
<p>We used the bagging method [
<xref ref-type="bibr" rid="B46">46</xref>
] to alleviate the variations caused by the unbalance of the number of positive and negative samples. We used the 86 reference seed matches as positive training samples and randomly sampled 86 from the other 7-mers as negative training samples to train a SVM. The procedure was repeated 500 times. Then all 500 SVMs were combined as an ensemble. Any sample which was classified as positive in all 500 SVMs was regarded as positive. We used LibSVM package [
<xref ref-type="bibr" rid="B47">47</xref>
] for all the analysis. Linear kernel with the default parameter was used to train each SVM.</p>
<p>We used the leave one out cross validation method (LOOCV) to test the sensitivity of Cons-SVM. The seed matches in one of the 40 miRNA families were selected as the testing samples in each time. The seed matches in the other 39 families were used as the positive training samples to train a new Cons-SVM following the above procedures. Then the new trained Cons-SVM was used to classify the testing samples. The total number of seed matches that were classified as positive was regarded as the final result.</p>
</sec>
<sec>
<title>Pre-miRNAs prediction</title>
<p>Several steps were implemented to predict pre-miRNAs: 1) all identified conserved 7-mers were searched in the Dme's genome in both strands excluding all annotated exons, tRNAs, snRNAs, rRNAs and other noncoding gene regions. 2) for each matched locus, two 90 nt sequences were extracted: one was from -15 to +74, and another one was from -54 to +35 (corresponding to the two potential pre-miRNAs, because mature miRNAs can either locate at the 5'-arm or the 3'-arm of the pre-miRNA). 3) these 90 nt sequences were folded by RNAfold [
<xref ref-type="bibr" rid="B48">48</xref>
], and those free energy higher than -25 kcal/m, more than one terminal loops, the base-pairs of the stem less than 20 bp, the distance from the matched 7-mer to the terminal loop less than 21 bp were filtered out (these filters are widely used in miRNA prediction algorithms); 4) candidate pre-miRNAs were predicted using triplet-SVM and RNAmicro [
<xref ref-type="bibr" rid="B12">12</xref>
,
<xref ref-type="bibr" rid="B14">14</xref>
]. Then the alignments of each candidate pre-miRNA were extracted from the pairwise alignments of the 6 pairs of flies. A pre-miRNA candidate was regarded as conserved in any pair of flies, if 1) the 7 nt fully complementary with any conserved 7-mers was totally identical, and 2) the aligned sequence in the second organism was also predicted as "real" pre-miRNAs by the miRNA prediction method. A pre-miRNA candidate which was conserved in at least 4 pairs of flies was regarded as a conserved pre-miRNA candidate. Then the conserved pre-miRNA candidates overlapped in their genome locations were clustered into one pre-miRNA locus, and the candidate having the lowest free energy of the predicted structure was denoted as the representative of the cluster.</p>
</sec>
<sec>
<title>Mature miRNA prediction</title>
<p>We introduced several rules to identify the mature parts on the two arms of each predicted pre-miRNA: 1) if the predicted pre-miRNA only matched a single conserved site complementary to any conserved 7-mer, the conserved site complementary with the 7-mer was regarded as the 1–7 nt of the mature miRNAs. 2) if the predicted pre-miRNAs matched several conserved 7-mers, the 5'-most conserved site complementary to the conserved 7-mers had the first nucleotide as "U" was regarded as the 1–7 nt of mature miRNAs, if none of conserved site complementary to the conserved 7-mers had "U" as the first nucleotide, the 5'-most site was regarded as the 1–7 nt of mature miRNAs. The 21 nt sequence region from the predicted 5'-end of each mature miRNA is annotated as the candidate mature sequence.</p>
<p>All the predicted mature miRNA candidates were searched for homologies in miRBase, mosquito and honeybee genomes with BLAST program [
<xref ref-type="bibr" rid="B50">50</xref>
]. The hits with the length of aligned sequence longer than 19 nt and with maximal one mismatch were regarded as the homologies.</p>
</sec>
<sec>
<title>Target Analysis</title>
<p>We first analyzed the enriched functional categories of target genes for each candidate miRNA. The target genes were simply predicted by searching for conserved 7-mers, which are complementary to the 5'-ends (1–7 nt and 2–8 nt) of mature miRNAs and in the 689 conserved 7-mers, in the aligned 3'-UTRs of specific genes in the Dme-Dps pair. Then we used GeneMerge [
<xref ref-type="bibr" rid="B35">35</xref>
] to analyze the GO categories of the target genes of each miRNA candidate.</p>
<p>Then we analyzed the target and anti-target groups of genes with the same GO categories. This analysis, proposed by Start et al., can be used to test whether the 3'UTRs in a functional category are specifically enriched for miRNA target sites over what is expected given their length [
<xref ref-type="bibr" rid="B39">39</xref>
]. First, we calculated the frequency of all 16,384 7-mers in all 3'-UTRs. We denoted the counts of seed match 7-mers and the all 7-mers in the Dme-Dps conserved 3'-UTRs as SeedM
<sub>Gene_All </sub>
and All_7M
<sub>Gene_All</sub>
, respectively. Then, we calculated the frequency of all 7-mers in 3'-UTRs of specific group genes (for example, the genes annotated as
<italic>central nervous system development</italic>
). We denoted the counts of seed match 7-mers and the all 7-mers in the 3'-UTRs of specific group of genes as SeedM
<sub>Gene_Specific </sub>
and All_7M
<sub>Gene_Specific</sub>
, respectively. Finally, we can assess the significance of seed enrichment for a group of genes by calculating the binomial probability (p value) that the observed level of enrichment is random, where the ratios for all genes define the background probability:</p>
<p>
<disp-formula>
<graphic xlink:href="1471-2105-8-432-i2.gif"></graphic>
</disp-formula>
</p>
<p>Bonferroni corrected p-value is also calculated.</p>
</sec>
</sec>
<sec>
<title>Authors' contributions</title>
<p>JG designed the algorithm, developed the PCS program and Cons-SVM program in Perl scripts and finished most of the manuscript. HF pre-processed the 3'-UTR dataset, developed the MCS algorithm, and re-developed the PCS program in C++. XZ provided useful guides for the experiment design and manuscript preparations. YL initiated the project and guided the whole work.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="S4">
<caption>
<title>Additional file 4</title>
<p>
<bold>The list of the 59 reference miRNAs identified by cloning</bold>
. Table S1. The list of the 59 reference miRNAs. All the entries are clustered according to miRNA family information. The number of homologies in six flies, the results of different motif finding methods and the results of mature miRNA identification are presented in the file.</p>
</caption>
<media xlink:href="1471-2105-8-432-S4.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S1">
<caption>
<title>Additional file 1</title>
<p>
<bold>The distributions of conservation ratios and the counts in Dme-Dps pair</bold>
. Figure S1. The distributions are computed and plotted for three dataset: reference seed matches, shuffled seed matches and all 7-mers. A) The distribution of conservation ratios. B) The distribution of counts in Dme 3'-UTRs. C) The distribution of counts in Dme-Dps conserved 3'-UTRs.</p>
</caption>
<media xlink:href="1471-2105-8-432-S1.png" mimetype="image" mime-subtype="png">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S2">
<caption>
<title>Additional file 2</title>
<p>
<bold>The histograms of PCSs and the trends of PCSs along the phyla</bold>
. Figure S2. A)-G) The PCSs of all 7-mers, from left to right: the average PCSs and the PCSs in Dme-Dsi, Dme-Dya, Dme-Dan, Dme-Dps, Dme-Dmo, Dme-Dvi. The top panel of each sub-figure shows the histograms of the PCSs of all the 7-mers and the bottom panel shows the enlarged visions. H) The trends of the PCSs of 86 seed matches along the phyla. Because the evolutionary distances of the Dme-Dmo and Dme-Dvi pairs are the same, only the PCSs of the Dme-Dmo pairs are displayed.</p>
</caption>
<media xlink:href="1471-2105-8-432-S2.png" mimetype="image" mime-subtype="png">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S3">
<caption>
<title>Additional file 3</title>
<p>
<bold>The histograms of conservation ratios along the phyla</bold>
. Figure S3. A)-F) The conservation ratios of all 7-mers, from left to right: the conservation ratios in Dme-Dsi, Dme-Dya, Dme-Dan, Dme-Dps, Dme-Dmo, Dme-Dvi. The top panel of each sub-figure shows the histograms of the conservation ratios of all the 7-mers and the bottom panel shows the enlarged visions.</p>
</caption>
<media xlink:href="1471-2105-8-432-S3.png" mimetype="image" mime-subtype="png">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S5">
<caption>
<title>Additional file 5</title>
<p>
<bold>The list of PCSs, counts and conservation ratios of all 7-mers</bold>
. Table S2. The list of the all 16,384 7-mers. The "Type" column describes whether a 7-mer is derived the 5'-end of cloned/northern blotting/homology miRNAs or predicted miRNA candidates. The "Class" column describes whether a 7-mer is classified as positive by Cons-SVM. The "PCS:XX" columns present the pairwise conservation scores in "XX" condition. The "BC:X" columns present the counts in single "X" 3'-UTRs. The "AC:XX" columns present the counts in "XX" conserved 3'-UTRs. The "CR:XX" columns present the conservation ratios in "XX" pair.</p>
</caption>
<media xlink:href="1471-2105-8-432-S5.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S6">
<caption>
<title>Additional file 6</title>
<p>
<bold>The list of the 78 pre-miRNAs in miRBase and related annotations</bold>
. Table S3. The annotations of 78 pre-miRNAs and corresponding mature miRNAs.</p>
</caption>
<media xlink:href="1471-2105-8-432-S6.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S7">
<caption>
<title>Additional file 7</title>
<p>
<bold>The list of the 41 predicted pre-miRNAs using RNAmicro</bold>
. Table S4. The annotations of 41 pre-miRNAs such as corresponding mature parts, genomic location, homologies, etc.</p>
</caption>
<media xlink:href="1471-2105-8-432-S7.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
<supplementary-material content-type="local-data" id="S8">
<caption>
<title>Additional file 8</title>
<p>
<bold>The target and anti-target GO categories of miRNAs</bold>
. Table S5. The raw p-values and Bonferroni corrected p-values are reported for each GO category. The "Known_XX" columns are the p-values calculated for the 59 reference miRNAs. The "New_XX" columns are the p-values calculated for the 47 miRNA candidates. The "Cons_XX" columns are the p-values calculated for the 9 miRNA candidates with additional conservation in mosquito or honeybee.</p>
</caption>
<media xlink:href="1471-2105-8-432-S8.xls" mimetype="application" mime-subtype="vnd.ms-excel">
<caption>
<p>Click here for file</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<sec>
<title>Acknowledgements</title>
<p>We thank Tao He and Zhengpeng Wu for their critical reading and suggestions for the manuscript. We thank Anita Jerome for carefully reading and revising the manuscript. We thank Dr. Michael Q Zhang for his discussion and reading of the manuscript. We also thank Tao Peng for his help on the classifier design, Dr. Chenghai Xue for his discussion on the manuscript structure, and Xiaowo Wang, Jing Zhang and Yunfei Pei for their helpful discussions. This work is supported in part by the National Natural Science Foundation of China (NSFC60572086 and NSFC30625012) and the National Basic Research Program of China (2004CB518605).</p>
</sec>
</ack>
<ref-list>
<ref id="B1">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>MicroRNAs: genomics, biogenesis, mechanism, and function</article-title>
<source>Cell</source>
<year>2004</year>
<volume>116</volume>
<fpage>281</fpage>
<lpage>297</lpage>
<pub-id pub-id-type="pmid">14744438</pub-id>
<pub-id pub-id-type="doi">10.1016/S0092-8674(04)00045-5</pub-id>
</citation>
</ref>
<ref id="B2">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ambros</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>The functions of animal microRNAs</article-title>
<source>Nature</source>
<year>2004</year>
<volume>431</volume>
<fpage>350</fpage>
<lpage>355</lpage>
<pub-id pub-id-type="pmid">15372042</pub-id>
<pub-id pub-id-type="doi">10.1038/nature02871</pub-id>
</citation>
</ref>
<ref id="B3">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lagos-Quintana</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rauhut</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lendeckel</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>Identification of novel genes coding for small expressed RNAs</article-title>
<source>Science</source>
<year>2001</year>
<volume>294</volume>
<fpage>853</fpage>
<lpage>858</lpage>
<pub-id pub-id-type="pmid">11679670</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1064921</pub-id>
</citation>
</ref>
<ref id="B4">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Aravin</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Lagos-Quintana</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Yalcin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Zavolan</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marks</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gaasterland</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Meyer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>The small RNA profile during Drosophila melanogaster development</article-title>
<source>Dev Cell</source>
<year>2003</year>
<volume>5</volume>
<fpage>337</fpage>
<lpage>350</lpage>
<pub-id pub-id-type="pmid">12919683</pub-id>
<pub-id pub-id-type="doi">10.1016/S1534-5807(03)00228-4</pub-id>
</citation>
</ref>
<ref id="B5">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lai</surname>
<given-names>EC</given-names>
</name>
<name>
<surname>Tomancak</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Williams</surname>
<given-names>RW</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>GM</given-names>
</name>
</person-group>
<article-title>Computational identification of Drosophila microRNA genes</article-title>
<source>Genome Biol</source>
<year>2003</year>
<volume>4</volume>
<fpage>R42</fpage>
<pub-id pub-id-type="pmid">12844358</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2003-4-7-r42</pub-id>
</citation>
</ref>
<ref id="B6">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ruby</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Jan</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Player</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Axtell</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Nusbaum</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ge</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans</article-title>
<source>Cell</source>
<year>2006</year>
<volume>127</volume>
<fpage>1193</fpage>
<lpage>1207</lpage>
<pub-id pub-id-type="pmid">17174894</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2006.10.040</pub-id>
</citation>
</ref>
<ref id="B7">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Berezikov</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Guryev</surname>
<given-names>V</given-names>
</name>
<name>
<surname>van de Belt</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Wienholds</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Plasterk</surname>
<given-names>RH</given-names>
</name>
<name>
<surname>Cuppen</surname>
<given-names>E</given-names>
</name>
</person-group>
<article-title>Phylogenetic shadowing and computational identification of human microRNA genes</article-title>
<source>Cell</source>
<year>2005</year>
<volume>120</volume>
<fpage>21</fpage>
<lpage>24</lpage>
<pub-id pub-id-type="pmid">15652478</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2004.12.031</pub-id>
</citation>
</ref>
<ref id="B8">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lim</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Lau</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Weinstein</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Abdelhakim</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Yekta</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rhoades</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Burge</surname>
<given-names>CB</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>The microRNAs of Caenorhabditis elegans</article-title>
<source>Genes Dev</source>
<year>2003</year>
<volume>17</volume>
<fpage>991</fpage>
<lpage>1008</lpage>
<pub-id pub-id-type="pmid">12672692</pub-id>
<pub-id pub-id-type="doi">10.1101/gad.1074403</pub-id>
</citation>
</ref>
<ref id="B9">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Washietl</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hofacker</surname>
<given-names>IL</given-names>
</name>
<name>
<surname>Lukasser</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Huttenhofer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Stadler</surname>
<given-names>PF</given-names>
</name>
</person-group>
<article-title>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</article-title>
<source>Nat Biotechnol</source>
<year>2005</year>
<volume>23</volume>
<fpage>1383</fpage>
<lpage>1390</lpage>
<pub-id pub-id-type="pmid">16273071</pub-id>
<pub-id pub-id-type="doi">10.1038/nbt1144</pub-id>
</citation>
</ref>
<ref id="B10">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nam</surname>
<given-names>JW</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>KR</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kim</surname>
<given-names>VN</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>BT</given-names>
</name>
</person-group>
<article-title>Human microRNA prediction through a probabilistic co-learning model of sequence and structure</article-title>
<source>Nucleic Acids Res</source>
<year>2005</year>
<volume>33</volume>
<fpage>3570</fpage>
<lpage>3581</lpage>
<pub-id pub-id-type="pmid">15987789</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gki668</pub-id>
</citation>
</ref>
<ref id="B11">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>He</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>MicroRNA identification based on sequence and structure alignment</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>3610</fpage>
<lpage>3614</lpage>
<pub-id pub-id-type="pmid">15994192</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bti562</pub-id>
</citation>
</ref>
<ref id="B12">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xue</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
<name>
<surname>He</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
</person-group>
<article-title>Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>310</fpage>
<pub-id pub-id-type="pmid">16381612</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-6-310</pub-id>
</citation>
</ref>
<ref id="B13">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sewer</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Paul</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Landgraf</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Aravin</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Pfeffer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Brownstein</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T</given-names>
</name>
<name>
<surname>van Nimwegen</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Zavolan</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Identification of clustered microRNAs using an ab initio prediction method</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>267</fpage>
<pub-id pub-id-type="pmid">16274478</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-6-267</pub-id>
</citation>
</ref>
<ref id="B14">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hertel</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stadler</surname>
<given-names>PF</given-names>
</name>
</person-group>
<article-title>Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>e197</fpage>
<lpage>e202</lpage>
<pub-id pub-id-type="pmid">16873472</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btl257</pub-id>
</citation>
</ref>
<ref id="B15">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Burge</surname>
<given-names>CB</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets</article-title>
<source>Cell</source>
<year>2005</year>
<volume>120</volume>
<fpage>15</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="pmid">15652477</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2004.12.035</pub-id>
</citation>
</ref>
<ref id="B16">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grun</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>YL</given-names>
</name>
<name>
<surname>Langenberger</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Gunsalus</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Rajewsky</surname>
<given-names>N</given-names>
</name>
</person-group>
<article-title>microRNA target predictions across seven Drosophila species and comparison to mammalian targets</article-title>
<source>PLoS Comput Biol</source>
<year>2005</year>
<volume>1</volume>
<fpage>e13</fpage>
<pub-id pub-id-type="pmid">16103902</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0010013</pub-id>
</citation>
</ref>
<ref id="B17">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Enright</surname>
<given-names>AJ</given-names>
</name>
<name>
<surname>John</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Gaul</surname>
<given-names>U</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Sander</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Marks</surname>
<given-names>DS</given-names>
</name>
</person-group>
<article-title>MicroRNA targets in Drosophila</article-title>
<source>Genome Biol</source>
<year>2003</year>
<volume>5</volume>
<fpage>R1</fpage>
<pub-id pub-id-type="pmid">14709173</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2003-5-1-r1</pub-id>
</citation>
</ref>
<ref id="B18">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lewis</surname>
<given-names>BP</given-names>
</name>
<name>
<surname>Shih</surname>
<given-names>IH</given-names>
</name>
<name>
<surname>Jones-Rhoades</surname>
<given-names>MW</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
<name>
<surname>Burge</surname>
<given-names>CB</given-names>
</name>
</person-group>
<article-title>Prediction of mammalian microRNA targets</article-title>
<source>Cell</source>
<year>2003</year>
<volume>115</volume>
<fpage>787</fpage>
<lpage>798</lpage>
<pub-id pub-id-type="pmid">14697198</pub-id>
<pub-id pub-id-type="doi">10.1016/S0092-8674(03)01018-3</pub-id>
</citation>
</ref>
<ref id="B19">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Krek</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Grun</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Poy</surname>
<given-names>MN</given-names>
</name>
<name>
<surname>Wolf</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Rosenberg</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Epstein</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>MacMenamin</surname>
<given-names>P</given-names>
</name>
<name>
<surname>da Piedade</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Gunsalus</surname>
<given-names>KC</given-names>
</name>
<name>
<surname>Stoffel</surname>
<given-names>M</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Combinatorial microRNA target predictions</article-title>
<source>Nat Genet</source>
<year>2005</year>
<volume>37</volume>
<fpage>495</fpage>
<lpage>500</lpage>
<pub-id pub-id-type="pmid">15806104</pub-id>
<pub-id pub-id-type="doi">10.1038/ng1536</pub-id>
</citation>
</ref>
<ref id="B20">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lai</surname>
<given-names>EC</given-names>
</name>
</person-group>
<article-title>Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation</article-title>
<source>Nat Genet</source>
<year>2002</year>
<volume>30</volume>
<fpage>363</fpage>
<lpage>364</lpage>
<pub-id pub-id-type="pmid">11896390</pub-id>
<pub-id pub-id-type="doi">10.1038/ng865</pub-id>
</citation>
</ref>
<ref id="B21">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stark</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brennecke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>RB</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>SM</given-names>
</name>
</person-group>
<article-title>Identification of Drosophila MicroRNA targets</article-title>
<source>PLoS Biol</source>
<year>2003</year>
<volume>1</volume>
<fpage>E60</fpage>
<pub-id pub-id-type="pmid">14691535</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0000060</pub-id>
</citation>
</ref>
<ref id="B22">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brennecke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Stark</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>RB</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>SM</given-names>
</name>
</person-group>
<article-title>Principles of microRNA-target recognition</article-title>
<source>PLoS Biol</source>
<year>2005</year>
<volume>3</volume>
<fpage>e85</fpage>
<pub-id pub-id-type="pmid">15723116</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pbio.0030085</pub-id>
</citation>
</ref>
<ref id="B23">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kulbokas</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Golub</surname>
<given-names>TR</given-names>
</name>
<name>
<surname>Mootha</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Lindblad-Toh</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Kellis</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals</article-title>
<source>Nature</source>
<year>2005</year>
<volume>434</volume>
<fpage>338</fpage>
<lpage>345</lpage>
<pub-id pub-id-type="pmid">15735639</pub-id>
<pub-id pub-id-type="doi">10.1038/nature03441</pub-id>
</citation>
</ref>
<ref id="B24">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chan</surname>
<given-names>CS</given-names>
</name>
<name>
<surname>Elemento</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Tavazoie</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Revealing Posttranscriptional Regulatory Elements Through Network-Level Conservation</article-title>
<source>PLoS Comput Biol</source>
<year>2005</year>
<volume>1</volume>
<fpage>e69</fpage>
<pub-id pub-id-type="pmid">16355253</pub-id>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.0010069</pub-id>
</citation>
</ref>
<ref id="B25">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sinha</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Blanchette</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>170</fpage>
<pub-id pub-id-type="pmid">15511292</pub-id>
<pub-id pub-id-type="doi">10.1186/1471-2105-5-170</pub-id>
</citation>
</ref>
<ref id="B26">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gertz</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Riles</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Turnbaugh</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>BA</given-names>
</name>
</person-group>
<article-title>Discovery, validation, and genetic dissection of transcription factor binding sites by comparative and functional genomics</article-title>
<source>Genome Res</source>
<year>2005</year>
<volume>15</volume>
<fpage>1145</fpage>
<lpage>1152</lpage>
<pub-id pub-id-type="pmid">16077013</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.3859605</pub-id>
</citation>
</ref>
<ref id="B27">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Loots</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Ovcharenko</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Dubchak</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
</person-group>
<article-title>rVista for comparative sequence-based discovery of functional transcription factor binding sites</article-title>
<source>Genome Res</source>
<year>2002</year>
<volume>12</volume>
<fpage>832</fpage>
<lpage>839</lpage>
<pub-id pub-id-type="pmid">11997350</pub-id>
<comment>10.1101/gr.225502. Article published online before print in April 2002</comment>
</citation>
</ref>
<ref id="B28">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boffelli</surname>
<given-names>D</given-names>
</name>
<name>
<surname>McAuliffe</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Ovcharenko</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lewis</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Ovcharenko</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Pachter</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
</person-group>
<article-title>Phylogenetic shadowing of primate sequences to find functional regions of the human genome</article-title>
<source>Science</source>
<year>2003</year>
<volume>299</volume>
<fpage>1391</fpage>
<lpage>1394</lpage>
<pub-id pub-id-type="pmid">12610304</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1081331</pub-id>
</citation>
</ref>
<ref id="B29">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cliften</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sudarsanam</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Desikan</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Fulton</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Majors</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Waterston</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>BA</given-names>
</name>
<name>
<surname>Johnston</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Finding functional features in Saccharomyces genomes by phylogenetic footprinting</article-title>
<source>Science</source>
<year>2003</year>
<volume>301</volume>
<fpage>71</fpage>
<lpage>76</lpage>
<pub-id pub-id-type="pmid">12775844</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1084337</pub-id>
</citation>
</ref>
<ref id="B30">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Siepel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bejerano</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Hou</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rosenbloom</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Clawson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Spieth</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Hillier</surname>
<given-names>LW</given-names>
</name>
<name>
<surname>Richards</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes</article-title>
<source>Genome Res</source>
<year>2005</year>
<volume>15</volume>
<fpage>1034</fpage>
<lpage>1050</lpage>
<pub-id pub-id-type="pmid">16024819</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.3715005</pub-id>
</citation>
</ref>
<ref id="B31">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elemento</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Tavazoie</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach</article-title>
<source>Genome Biol</source>
<year>2005</year>
<volume>6</volume>
<fpage>R18</fpage>
<pub-id pub-id-type="pmid">15693947</pub-id>
<pub-id pub-id-type="doi">10.1186/gb-2005-6-2-r18</pub-id>
</citation>
</ref>
<ref id="B32">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>He</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Hemann</surname>
<given-names>MT</given-names>
</name>
<name>
<surname>Hernando-Monge</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Mu</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Goodson</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Powers</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Cordon-Cardo</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lowe</surname>
<given-names>SW</given-names>
</name>
<name>
<surname>Hannon</surname>
<given-names>GJ</given-names>
</name>
<name>
<surname>Hammond</surname>
<given-names>SM</given-names>
</name>
</person-group>
<article-title>A microRNA polycistron as a potential human oncogene</article-title>
<source>Nature</source>
<year>2005</year>
<volume>435</volume>
<fpage>828</fpage>
<lpage>833</lpage>
<pub-id pub-id-type="pmid">15944707</pub-id>
<pub-id pub-id-type="doi">10.1038/nature03552</pub-id>
</citation>
</ref>
<ref id="B33">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hayashita</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Osada</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Tatematsu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Yamada</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Yanagisawa</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Tomida</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Yatabe</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Kawahara</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Sekido</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Takahashi</surname>
<given-names>T</given-names>
</name>
</person-group>
<article-title>A polycistronic microRNA cluster, miR-17-92, is overexpressed in human lung cancers and enhances cell proliferation</article-title>
<source>Cancer Res</source>
<year>2005</year>
<volume>65</volume>
<fpage>9628</fpage>
<lpage>9632</lpage>
<pub-id pub-id-type="pmid">16266980</pub-id>
<pub-id pub-id-type="doi">10.1158/0008-5472.CAN-05-2352</pub-id>
</citation>
</ref>
<ref id="B34">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lau</surname>
<given-names>NC</given-names>
</name>
<name>
<surname>Lim</surname>
<given-names>LP</given-names>
</name>
<name>
<surname>Weinstein</surname>
<given-names>EG</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans</article-title>
<source>Science</source>
<year>2001</year>
<volume>294</volume>
<fpage>858</fpage>
<lpage>862</lpage>
<pub-id pub-id-type="pmid">11679671</pub-id>
<pub-id pub-id-type="doi">10.1126/science.1065062</pub-id>
</citation>
</ref>
<ref id="B35">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castillo-Davis</surname>
<given-names>CI</given-names>
</name>
<name>
<surname>Hartl</surname>
<given-names>DL</given-names>
</name>
</person-group>
<article-title>GeneMerge-postgenomic analysis, datamining, and hypothesis testing</article-title>
<source>Bioinformatics</source>
<year>2003</year>
<volume>19</volume>
<fpage>891</fpage>
<lpage>892</lpage>
<pub-id pub-id-type="pmid">12724301</pub-id>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btg114</pub-id>
</citation>
</ref>
<ref id="B36">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ashburner</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ball</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Blake</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Botstein</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Butler</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Cherry</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Davis</surname>
<given-names>AP</given-names>
</name>
<name>
<surname>Dolinski</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dwight</surname>
<given-names>SS</given-names>
</name>
<name>
<surname>Eppig</surname>
<given-names>JT</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</article-title>
<source>Nat Genet</source>
<year>2000</year>
<volume>25</volume>
<fpage>25</fpage>
<lpage>29</lpage>
<pub-id pub-id-type="pmid">10802651</pub-id>
<pub-id pub-id-type="doi">10.1038/75556</pub-id>
</citation>
</ref>
<ref id="B37">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Magrane</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Barrell</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Lee</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Dimmer</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Maslen</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Binns</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Harte</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Lopez</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Apweiler</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<fpage>D262</fpage>
<lpage>D266</lpage>
<pub-id pub-id-type="pmid">14681408</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkh021</pub-id>
</citation>
</ref>
<ref id="B38">
<citation citation-type="journal">
<person-group person-group-type="author">
<collab>Gene Ontology Consortium</collab>
</person-group>
<article-title>The Gene Ontology(GO) project in 2006</article-title>
<source>Nucleic Acids Res</source>
<year>2004</year>
<volume>34</volume>
<fpage>D322</fpage>
<lpage>D326</lpage>
</citation>
</ref>
<ref id="B39">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Stark</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Brennecke</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bushati</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Russell</surname>
<given-names>RB</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>SM</given-names>
</name>
</person-group>
<article-title>Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution</article-title>
<source>Cell</source>
<year>2005</year>
<volume>123</volume>
<fpage>1133</fpage>
<lpage>1146</lpage>
<pub-id pub-id-type="pmid">16337999</pub-id>
<pub-id pub-id-type="doi">10.1016/j.cell.2005.11.023</pub-id>
</citation>
</ref>
<ref id="B40">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Karolchik</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Diekhans</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Hinrichs</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>YT</given-names>
</name>
<name>
<surname>Roskin</surname>
<given-names>KM</given-names>
</name>
<name>
<surname>Schwartz</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sugnet</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Thomas</surname>
<given-names>DJ</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The UCSC Genome Browser Database</article-title>
<source>Nucleic Acids Res</source>
<year>2003</year>
<volume>31</volume>
<fpage>51</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="pmid">12519945</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkg129</pub-id>
</citation>
</ref>
<ref id="B41">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hinrichs</surname>
<given-names>AS</given-names>
</name>
<name>
<surname>Karolchik</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Baertsch</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Barber</surname>
<given-names>GP</given-names>
</name>
<name>
<surname>Bejerano</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Clawson</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Diekhans</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Furey</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Harte</surname>
<given-names>RA</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>F</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The UCSC Genome Browser Database: update 2006</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<fpage>D590</fpage>
<lpage>598</lpage>
<pub-id pub-id-type="pmid">16381938</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkj144</pub-id>
</citation>
</ref>
<ref id="B42">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Clote</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>The Altschul-Erikson algorithm</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bioinformatics.bc.edu/clotelab/RNAdinucleotideShuffle/dinucleotideShuffle.html"></ext-link>
</citation>
</ref>
<ref id="B43">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Washietl</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Alifoldz algorithm</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.tbi.univie.ac.at/papers/SUPPLEMENTS/Alifoldz/"></ext-link>
</citation>
</ref>
<ref id="B44">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Grocock</surname>
<given-names>RJ</given-names>
</name>
<name>
<surname>van Dongen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Bateman</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Enright</surname>
<given-names>AJ</given-names>
</name>
</person-group>
<article-title>miRBase: microRNA sequences, targets and gene nomenclature</article-title>
<source>Nucleic Acids Res</source>
<year>2006</year>
<fpage>D140</fpage>
<lpage>144</lpage>
<pub-id pub-id-type="pmid">16381832</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/gkj112</pub-id>
</citation>
</ref>
<ref id="B45">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Gu</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>The pairwise conservation score program</article-title>
<ext-link ext-link-type="uri" xlink:href="http://bioinfo.au.tsinghua.edu.cn/member/~gujin/pcs/"></ext-link>
</citation>
</ref>
<ref id="B46">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Valentini</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Dietterich</surname>
<given-names>TG</given-names>
</name>
</person-group>
<article-title>Low Bias Bagged Support Vector Machines</article-title>
<source>The Twentieth International Conference on Machine Learning, ICML</source>
<year>2003</year>
<fpage>752</fpage>
<lpage>759</lpage>
</citation>
</ref>
<ref id="B47">
<citation citation-type="other">
<person-group person-group-type="author">
<name>
<surname>Chang</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Lin</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>LIBSVM: a library for support vector machines</article-title>
<ext-link ext-link-type="uri" xlink:href="http://www.csie.ntu.edu.tw/~cjlin/libsvm"></ext-link>
</citation>
</ref>
<ref id="B48">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hofacker</surname>
<given-names>IL</given-names>
</name>
<name>
<surname>Fontana</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Stadler</surname>
<given-names>PF</given-names>
</name>
<name>
<surname>Bonhoeffer</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Tacker</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schuster</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Fast Folding and Comparison of RNA Secondary Structures</article-title>
<source>Monatshefte f Chemie</source>
<year>1994</year>
<volume>125</volume>
<fpage>167</fpage>
<lpage>188</lpage>
<pub-id pub-id-type="doi">10.1007/BF00818163</pub-id>
</citation>
</ref>
<ref id="B49">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crooks</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>JM</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>WebLogo: A sequence logo generator</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<fpage>1188</fpage>
<lpage>1190</lpage>
<pub-id pub-id-type="pmid">15173120</pub-id>
<pub-id pub-id-type="doi">10.1101/gr.849004</pub-id>
</citation>
</ref>
<ref id="B50">
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Altschul</surname>
<given-names>SF</given-names>
</name>
<name>
<surname>Madden</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Schaffer</surname>
<given-names>AA</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Miller</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Lipman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
<article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title>
<source>Nucleic Acids Res</source>
<year>1997</year>
<volume>25</volume>
<fpage>3389</fpage>
<lpage>3402</lpage>
<pub-id pub-id-type="pmid">9254694</pub-id>
<pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id>
</citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000543  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000543  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021