Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature

Identifieur interne : 000570 ( Pmc/Corpus ); précédent : 000569; suivant : 000571

More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature

Auteurs : Istvan Ladunga

Source :

RBID : PMC:1802606

Abstract

Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at .


Url:
DOI: 10.1093/nar/gkl1065
PubMed: 17169992
PubMed Central: 1802606

Links to Exploration step

PMC:1802606

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature</title>
<author>
<name sortKey="Ladunga, Istvan" sort="Ladunga, Istvan" uniqKey="Ladunga I" first="Istvan" last="Ladunga">Istvan Ladunga</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">17169992</idno>
<idno type="pmc">1802606</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1802606</idno>
<idno type="RBID">PMC:1802606</idno>
<idno type="doi">10.1093/nar/gkl1065</idno>
<date when="2006">2006</date>
<idno type="wicri:Area/Pmc/Corpus">000570</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature</title>
<author>
<name sortKey="Ladunga, Istvan" sort="Ladunga, Istvan" uniqKey="Ladunga I" first="Istvan" last="Ladunga">Istvan Ladunga</name>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at
<ext-link ext-link-type="uri" xlink:href="http://optirna.unl.edu/"></ext-link>
.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Huesken, D" uniqKey="Huesken D">D. Huesken</name>
</author>
<author>
<name sortKey="Lange, J" uniqKey="Lange J">J. Lange</name>
</author>
<author>
<name sortKey="Mickanin, C" uniqKey="Mickanin C">C. Mickanin</name>
</author>
<author>
<name sortKey="Weiler, J" uniqKey="Weiler J">J. Weiler</name>
</author>
<author>
<name sortKey="Asselbergs, F" uniqKey="Asselbergs F">F. Asselbergs</name>
</author>
<author>
<name sortKey="Warner, J" uniqKey="Warner J">J. Warner</name>
</author>
<author>
<name sortKey="Meloon, B" uniqKey="Meloon B">B. Meloon</name>
</author>
<author>
<name sortKey="Engel, S" uniqKey="Engel S">S. Engel</name>
</author>
<author>
<name sortKey="Rosenberg, A" uniqKey="Rosenberg A">A. Rosenberg</name>
</author>
<author>
<name sortKey="Cohen, D" uniqKey="Cohen D">D. Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elbashir, S M" uniqKey="Elbashir S">S.M. Elbashir</name>
</author>
<author>
<name sortKey="Martinez, J" uniqKey="Martinez J">J. Martinez</name>
</author>
<author>
<name sortKey="Patkaniowska, A" uniqKey="Patkaniowska A">A. Patkaniowska</name>
</author>
<author>
<name sortKey="Lendeckel, W" uniqKey="Lendeckel W">W. Lendeckel</name>
</author>
<author>
<name sortKey="Tuschl, T" uniqKey="Tuschl T">T. Tuschl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Reynolds, A" uniqKey="Reynolds A">A. Reynolds</name>
</author>
<author>
<name sortKey="Leake, D" uniqKey="Leake D">D. Leake</name>
</author>
<author>
<name sortKey="Boese, Q" uniqKey="Boese Q">Q. Boese</name>
</author>
<author>
<name sortKey="Scaringe, S" uniqKey="Scaringe S">S. Scaringe</name>
</author>
<author>
<name sortKey="Marshall, W S" uniqKey="Marshall W">W.S. Marshall</name>
</author>
<author>
<name sortKey="Khvorova, A" uniqKey="Khvorova A">A. Khvorova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yuan, B" uniqKey="Yuan B">B. Yuan</name>
</author>
<author>
<name sortKey="Latek, R" uniqKey="Latek R">R. Latek</name>
</author>
<author>
<name sortKey="Hossbach, M" uniqKey="Hossbach M">M. Hossbach</name>
</author>
<author>
<name sortKey="Tuschl, T" uniqKey="Tuschl T">T. Tuschl</name>
</author>
<author>
<name sortKey="Lewitter, F" uniqKey="Lewitter F">F. Lewitter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ui Tei, K" uniqKey="Ui Tei K">K. Ui-Tei</name>
</author>
<author>
<name sortKey="Naito, Y" uniqKey="Naito Y">Y. Naito</name>
</author>
<author>
<name sortKey="Takahashi, F" uniqKey="Takahashi F">F. Takahashi</name>
</author>
<author>
<name sortKey="Haraguchi, T" uniqKey="Haraguchi T">T. Haraguchi</name>
</author>
<author>
<name sortKey="Ohki Hamazaki, H" uniqKey="Ohki Hamazaki H">H. Ohki-Hamazaki</name>
</author>
<author>
<name sortKey="Juni, A" uniqKey="Juni A">A. Juni</name>
</author>
<author>
<name sortKey="Ueda, R" uniqKey="Ueda R">R. Ueda</name>
</author>
<author>
<name sortKey="Saigo, K" uniqKey="Saigo K">K. Saigo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amarzguioui, M" uniqKey="Amarzguioui M">M. Amarzguioui</name>
</author>
<author>
<name sortKey="Prydz, H" uniqKey="Prydz H">H. Prydz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khvorova, A" uniqKey="Khvorova A">A. Khvorova</name>
</author>
<author>
<name sortKey="Reynolds, A" uniqKey="Reynolds A">A. Reynolds</name>
</author>
<author>
<name sortKey="Jayasena, S D" uniqKey="Jayasena S">S.D. Jayasena</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, N S" uniqKey="Lee N">N.S. Lee</name>
</author>
<author>
<name sortKey="Dohjima, T" uniqKey="Dohjima T">T. Dohjima</name>
</author>
<author>
<name sortKey="Bauer, G" uniqKey="Bauer G">G. Bauer</name>
</author>
<author>
<name sortKey="Li, H" uniqKey="Li H">H. Li</name>
</author>
<author>
<name sortKey="Li, M J" uniqKey="Li M">M.J. Li</name>
</author>
<author>
<name sortKey="Ehsani, A" uniqKey="Ehsani A">A. Ehsani</name>
</author>
<author>
<name sortKey="Salvaterra, P" uniqKey="Salvaterra P">P. Salvaterra</name>
</author>
<author>
<name sortKey="Rossi, J" uniqKey="Rossi J">J. Rossi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bohula, E A" uniqKey="Bohula E">E.A. Bohula</name>
</author>
<author>
<name sortKey="Salisbury, A J" uniqKey="Salisbury A">A.J. Salisbury</name>
</author>
<author>
<name sortKey="Sohail, M" uniqKey="Sohail M">M. Sohail</name>
</author>
<author>
<name sortKey="Playford, M P" uniqKey="Playford M">M.P. Playford</name>
</author>
<author>
<name sortKey="Riedemann, J" uniqKey="Riedemann J">J. Riedemann</name>
</author>
<author>
<name sortKey="Southern, E M" uniqKey="Southern E">E.M. Southern</name>
</author>
<author>
<name sortKey="Macaulay, V M" uniqKey="Macaulay V">V.M. Macaulay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kretschmer Kazemi Far, R" uniqKey="Kretschmer Kazemi Far R">R. Kretschmer-Kazemi Far</name>
</author>
<author>
<name sortKey="Sczakiel, G" uniqKey="Sczakiel G">G. Sczakiel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B. Schölkopf</name>
</author>
<author>
<name sortKey="Smola, A J" uniqKey="Smola A">A.J. Smola</name>
</author>
<author>
<name sortKey="Williamson, R C" uniqKey="Williamson R">R.C. Williamson</name>
</author>
<author>
<name sortKey="Bartlett, P L" uniqKey="Bartlett P">P.L. Bartlett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camps Valls, G" uniqKey="Camps Valls G">G. Camps-Valls</name>
</author>
<author>
<name sortKey="Chalk, A M" uniqKey="Chalk A">A.M. Chalk</name>
</author>
<author>
<name sortKey="Serrano Lopez, A J" uniqKey="Serrano Lopez A">A.J. Serrano-Lopez</name>
</author>
<author>
<name sortKey="Martin Guerrero, J D" uniqKey="Martin Guerrero J">J.D. Martin-Guerrero</name>
</author>
<author>
<name sortKey="Sonnhammer, E L" uniqKey="Sonnhammer E">E.L. Sonnhammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="S Trom, P" uniqKey="S Trom P">P. Sætrom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shabalina, S A" uniqKey="Shabalina S">S.A. Shabalina</name>
</author>
<author>
<name sortKey="Spiridonov, A N" uniqKey="Spiridonov A">A.N. Spiridonov</name>
</author>
<author>
<name sortKey="Ogurtsov, A Y" uniqKey="Ogurtsov A">A.Y. Ogurtsov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Holen, T" uniqKey="Holen T">T. Holen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chalk, A M" uniqKey="Chalk A">A.M. Chalk</name>
</author>
<author>
<name sortKey="Wahlestedt, C" uniqKey="Wahlestedt C">C. Wahlestedt</name>
</author>
<author>
<name sortKey="Sonnhammer, E L" uniqKey="Sonnhammer E">E.L. Sonnhammer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="S Trom, P" uniqKey="S Trom P">P. Sætrom</name>
</author>
<author>
<name sortKey="Snove, J O" uniqKey="Snove J">J.O. Snove</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xia, T" uniqKey="Xia T">T. Xia</name>
</author>
<author>
<name sortKey="Santalucia, J" uniqKey="Santalucia J">J., SantaLucia</name>
</author>
<author>
<name sortKey="Burkard, M E" uniqKey="Burkard M">M.E. Burkard</name>
</author>
<author>
<name sortKey="Kierzek, R" uniqKey="Kierzek R">R. Kierzek</name>
</author>
<author>
<name sortKey="Schroeder, S J" uniqKey="Schroeder S">S.J. Schroeder</name>
</author>
<author>
<name sortKey="Jiao, X" uniqKey="Jiao X">X. Jiao</name>
</author>
<author>
<name sortKey="Cox, C" uniqKey="Cox C">C. Cox</name>
</author>
<author>
<name sortKey="Turner, D H" uniqKey="Turner D">D.H. Turner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Muckstein, U" uniqKey="Muckstein U">U. Muckstein</name>
</author>
<author>
<name sortKey="Tafer, H" uniqKey="Tafer H">H. Tafer</name>
</author>
<author>
<name sortKey="Hackermuller, J" uniqKey="Hackermuller J">J. Hackermuller</name>
</author>
<author>
<name sortKey="Bernhart, S H" uniqKey="Bernhart S">S.H. Bernhart</name>
</author>
<author>
<name sortKey="Stadler, P F" uniqKey="Stadler P">P.F. Stadler</name>
</author>
<author>
<name sortKey="Hofacker, I L" uniqKey="Hofacker I">I.L. Hofacker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, Y" uniqKey="Ding Y">Y. Ding</name>
</author>
<author>
<name sortKey="Lawrence, C E" uniqKey="Lawrence C">C.E. Lawrence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, Y" uniqKey="Ding Y">Y. Ding</name>
</author>
<author>
<name sortKey="Lawrence, C E" uniqKey="Lawrence C">C.E. Lawrence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, Y" uniqKey="Ding Y">Y. Ding</name>
</author>
<author>
<name sortKey="Lawrence, C E" uniqKey="Lawrence C">C.E. Lawrence</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dantzig, G B" uniqKey="Dantzig G">G.B. Dantzig</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldfarb, D" uniqKey="Goldfarb D">D. Goldfarb</name>
</author>
<author>
<name sortKey="Todd, M" uniqKey="Todd M">M. Todd</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Scholkopf, B" uniqKey="Scholkopf B">B. Schölkopf</name>
</author>
<author>
<name sortKey="Smola, A J" uniqKey="Smola A">A.J. Smola</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tsang, I W" uniqKey="Tsang I">I.W. Tsang</name>
</author>
<author>
<name sortKey="Kwok, J T" uniqKey="Kwok J">J.T. Kwok</name>
</author>
<author>
<name sortKey="Cheung, P M" uniqKey="Cheung P">P.-M. Cheung</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Joachims, T" uniqKey="Joachims T">T. Joachims</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ladunga, I" uniqKey="Ladunga I">I. Ladunga</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bennett, K" uniqKey="Bennett K">K. Bennett</name>
</author>
<author>
<name sortKey="Mangasarian, O L" uniqKey="Mangasarian O">O.L. Mangasarian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chvatal, V" uniqKey="Chvatal V">V. Chvàtal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huber, P J" uniqKey="Huber P">P.J. Huber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kohavi, R" uniqKey="Kohavi R">R. Kohavi</name>
</author>
<author>
<name sortKey="John, G H" uniqKey="John G">G.H. John</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Guyon, I" uniqKey="Guyon I">I. Guyon</name>
</author>
<author>
<name sortKey="Weston, J" uniqKey="Weston J">J. Weston</name>
</author>
<author>
<name sortKey="Barnhill, S" uniqKey="Barnhill S">S. Barnhill</name>
</author>
<author>
<name sortKey="Vapnik, V" uniqKey="Vapnik V">V. Vapnik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Furlanello, C" uniqKey="Furlanello C">C. Furlanello</name>
</author>
<author>
<name sortKey="Serafini, M" uniqKey="Serafini M">M. Serafini</name>
</author>
<author>
<name sortKey="Merler, S" uniqKey="Merler S">S. Merler</name>
</author>
<author>
<name sortKey="Jurman, G" uniqKey="Jurman G">G. Jurman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Block, P" uniqKey="Block P">P. Block</name>
</author>
<author>
<name sortKey="Paern, J" uniqKey="Paern J">J. Paern</name>
</author>
<author>
<name sortKey="Hullermeier, E" uniqKey="Hullermeier E">E. Hullermeier</name>
</author>
<author>
<name sortKey="Sanschagrin, P" uniqKey="Sanschagrin P">P. Sanschagrin</name>
</author>
<author>
<name sortKey="Sotriffer, C A" uniqKey="Sotriffer C">C.A. Sotriffer</name>
</author>
<author>
<name sortKey="Klebe, G" uniqKey="Klebe G">G. Klebe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hoffmann, R" uniqKey="Hoffmann R">R. Hoffmann</name>
</author>
<author>
<name sortKey="Valencia, A" uniqKey="Valencia A">A. Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mladenic, D" uniqKey="Mladenic D">D. Mladenic</name>
</author>
<author>
<name sortKey="Brank, J" uniqKey="Brank J">J. Brank</name>
</author>
<author>
<name sortKey="Grobelnik, M" uniqKey="Grobelnik M">M. Grobelnik</name>
</author>
<author>
<name sortKey="Milic Frayling, N" uniqKey="Milic Frayling N">N. Milic-Frayling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Joachims, T" uniqKey="Joachims T">T. Joachims</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Le Cun, Y" uniqKey="Le Cun Y">Y. Le Cun</name>
</author>
<author>
<name sortKey="Denker, J S" uniqKey="Denker J">J.S. Denker</name>
</author>
<author>
<name sortKey="Solla, S A" uniqKey="Solla S">S.A. Solla</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boese, Q" uniqKey="Boese Q">Q. Boese</name>
</author>
<author>
<name sortKey="Leake, D" uniqKey="Leake D">D. Leake</name>
</author>
<author>
<name sortKey="Reynolds, A" uniqKey="Reynolds A">A. Reynolds</name>
</author>
<author>
<name sortKey="Read, S" uniqKey="Read S">S. Read</name>
</author>
<author>
<name sortKey="Scaringe, S A" uniqKey="Scaringe S">S.A. Scaringe</name>
</author>
<author>
<name sortKey="Marshall, W S" uniqKey="Marshall W">W.S. Marshall</name>
</author>
<author>
<name sortKey="Khvorova, A" uniqKey="Khvorova A">A. Khvorova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mathews, D H" uniqKey="Mathews D">D.H. Mathews</name>
</author>
<author>
<name sortKey="Turner, D H" uniqKey="Turner D">D.H. Turner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eisenfeld, J" uniqKey="Eisenfeld J">J. Eisenfeld</name>
</author>
<author>
<name sortKey="Vajda, S" uniqKey="Vajda S">S. Vajda</name>
</author>
<author>
<name sortKey="Sugar, I" uniqKey="Sugar I">I. Sugar</name>
</author>
<author>
<name sortKey="Delisi, C" uniqKey="Delisi C">C. DeLisi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matranga, C" uniqKey="Matranga C">C. Matranga</name>
</author>
<author>
<name sortKey="Tomari, Y" uniqKey="Tomari Y">Y. Tomari</name>
</author>
<author>
<name sortKey="Shin, C" uniqKey="Shin C">C. Shin</name>
</author>
<author>
<name sortKey="Bartel, D P" uniqKey="Bartel D">D.P. Bartel</name>
</author>
<author>
<name sortKey="Zamore, P D" uniqKey="Zamore P">P.D. Zamore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Far, R K" uniqKey="Far R">R.K. Far</name>
</author>
<author>
<name sortKey="Nedbal, W" uniqKey="Nedbal W">W. Nedbal</name>
</author>
<author>
<name sortKey="Sczakiel, G" uniqKey="Sczakiel G">G. Sczakiel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Patzel, V" uniqKey="Patzel V">V. Patzel</name>
</author>
<author>
<name sortKey="Rutz, S" uniqKey="Rutz S">S. Rutz</name>
</author>
<author>
<name sortKey="Dietrich, I" uniqKey="Dietrich I">I. Dietrich</name>
</author>
<author>
<name sortKey="Koberle, C" uniqKey="Koberle C">C. Koberle</name>
</author>
<author>
<name sortKey="Scheffold, A" uniqKey="Scheffold A">A. Scheffold</name>
</author>
<author>
<name sortKey="Kaufmann, S H E" uniqKey="Kaufmann S">S.H.E. Kaufmann</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="pmc">nar</journal-id>
<journal-id journal-id-type="publisher-id">Nucleic Acids Research</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">17169992</article-id>
<article-id pub-id-type="pmc">1802606</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkl1065</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Computational Biology</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ladunga</surname>
<given-names>Istvan</given-names>
</name>
<xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<aff>
<institution>Center for Biotechnology and Department of Statistics, University of Nebraska–Lincoln</institution>
<addr-line>Lincoln, NE 68588-0665, USA</addr-line>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor1">
<sup>*</sup>
Tel: +1 402 472 6074; Email:
<email>sladunga@unl.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>1</month>
<year>2007</year>
</pub-date>
<pub-date pub-type="epub">
<day>14</day>
<month>12</month>
<year>2006</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>14</day>
<month>12</month>
<year>2006</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>35</volume>
<issue>2</issue>
<fpage>433</fpage>
<lpage>440</lpage>
<history>
<date date-type="received">
<day>05</day>
<month>10</month>
<year>2006</year>
</date>
<date date-type="rev-recd">
<day>14</day>
<month>11</month>
<year>2006</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>11</month>
<year>2006</year>
</date>
</history>
<permissions>
<copyright-statement>© 2006 The Author(s).</copyright-statement>
<copyright-year>2006</copyright-year>
<license license-type="openaccess">
<license-p>
<pmc-comment>CREATIVE COMMONS</pmc-comment>
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/2.0/uk/">http://creativecommons.org/licenses/by-nc/2.0/uk/</ext-link>
) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Highly accurate knockdown functional analyses based on RNA interference (RNAi) require the possible most complete hydrolysis of the targeted mRNA while avoiding the degradation of untargeted genes (off-target effects). This in turn requires significant improvements to target selection for two reasons. First, the average silencing activity of randomly selected siRNAs is as low as 62%. Second, applying more than five different siRNAs may lead to saturation of the RNA-induced silencing complex (RISC) and to the degradation of untargeted genes. Therefore, selecting a small number of highly active siRNAs is critical for maximizing knockdown and minimizing off-target effects. To satisfy these needs, a publicly available and transparent machine learning tool is presented that ranks all possible siRNAs for each targeted gene. Support vector machines (SVMs) with polynomial kernels and constrained optimization models select and utilize the most predictive effective combinations from 572 sequence, thermodynamic, accessibility and self-hairpin features over 2200 published siRNAs. This tool reaches an accuracy of 92.3% in cross-validation experiments. We fully present the underlying biophysical signature that involves free energy, accessibility and dinucleotide characteristics. We show that while complete silencing is possible at certain structured target sites, accessibility information improves the prediction of the 90% active siRNA target sites. Fast siRNA activity predictions can be performed on our web server at
<ext-link ext-link-type="uri" xlink:href="http://optirna.unl.edu/"></ext-link>
.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>INTRODUCTION</title>
<p>It is a major challenge to select those target sites where a gene can be silenced most completely. Posttranscriptional regulation can silence tens of thousands of genes to different degrees (
<xref ref-type="bibr" rid="b1">1</xref>
). This indicates that whereas a wide spectrum of target sites responds to RNA interference, the knockdown remains incomplete for most of the sites. Opposing this diversity criterion, active siRNAs have to conform to requirements specific for the RNA-induced silencing complex (RISC) complex (
<xref ref-type="bibr" rid="b2">2</xref>
). As indicated by the 62% average activity of randomly selected siRNAs (
<xref ref-type="bibr" rid="b3">3</xref>
), these criteria are poorly satisfied by the majority of target sites. This paradox has inspired a number of researchers to capture these criteria in heuristic rules, statistical formulations or machine learning algorithms. Tuschl and his coworkers' rules (
<xref ref-type="bibr" rid="b2">2</xref>
,
<xref ref-type="bibr" rid="b4">4</xref>
) (
<ext-link ext-link-type="uri" xlink:href="http://www.rockefeller.edu/labheads/tuschl/sirna.html"></ext-link>
) specify a pattern of UU(N19)AA, limit the G + C content to a range of 30–70%, and suggest avoiding four or more consecutive A's or U's that act as terminator signals in vectors that utilize RNA polymerase III. Ui-Tei
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="b5">5</xref>
) expressed preference for siRNAs with A/U at the 5′ end, G/C at the 3′ terminus at least 5 A/U nucleotides in the 5′ third of the antisense strand, and the absence of any G/C runs of 9 or more nucleotides. Amarzguioui and Prydz (
<xref ref-type="bibr" rid="b6">6</xref>
) propose an A/U differential between the 5′ and 3′ trinucleotides, C/G at position 1, A at 6 and A/U at 19, while associating the motifs U1 and G19 with lack of functionality. Translating these sequence patterns to changes in Gibbs free energy (Δ
<italic>G</italic>
) shows that most sequence rules correlate highly with thermodynamic profiles (
<xref ref-type="bibr" rid="b7">7</xref>
). In contrast to the wider acceptance of the above rules, the effects of secondary structures at the target site remain debated (
<xref ref-type="bibr" rid="b2">2</xref>
). While certain structures like stable hairpins have been shown to decrease or abolish silencing efficiency (
<xref ref-type="bibr" rid="b8">8</xref>
<xref ref-type="bibr" rid="b10">10</xref>
), many other structures do not seem to attenuate RNAi.</p>
<p>Machine learning methods select the best targets more accurately than the heuristic rules. Key to this success is rigorous optimization over high numbers of features. Support vector machines (SVMs) (
<xref ref-type="bibr" rid="b11">11</xref>
) perform accurate binary classifications (BCs) between low- and high-activity molecules and regression analyses (
<xref ref-type="bibr" rid="b12">12</xref>
) and helped to formulate the Stockholm rules (
<xref ref-type="bibr" rid="b12">12</xref>
). Long and degenerate sequence patterns are revealed by the GPboost genetic algorithm (
<xref ref-type="bibr" rid="b13">13</xref>
). Among the artificial neural networks, BIOPREDsi (
<xref ref-type="bibr" rid="b1">1</xref>
) was trained on the largest number of siRNAs, but the method was limited to undisclosed sequence features. Shabalina
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="b14">14</xref>
) neural network model generated position-dependent consensus patterns from a smaller number of molecules by using both sequence and thermodynamic features. Unfortunately, these patterns remain to be disclosed.</p>
<p>Here we present a practical, freely accessible and transparent tool for the identification of target sites with over 90% knockdown activity. Our work is based on two postulates. First, we expected that optimal selection from a significantly more comprehensive set of initial features may lead to the discovery of a complex and probabilistic signature. In turn, the signature(s) may lead to more sensitive and selective predictions. That Holen (
<xref ref-type="bibr" rid="b15">15</xref>
) needed to apply as many as 73 positional mononucleotide occurrence rules in order to achieve reliable predictions is evidence to support this postulate. We have compiled the possible most comprehensive set of 572 sequence, thermodynamic and accessibility features as further direct evidence. Global and positional mono- and dinucleotide frequencies, the number of longer runs of each nucleotide, C or G, or A or U were computed. Global and positional values of Δ
<italic>G</italic>
and change in enthalpy (Δ
<italic>H</italic>
) and entropy (Δ
<italic>S</italic>
) as well as the Δ
<italic>H</italic>
<italic>S</italic>
ratio were calculated. Multiple predictors of the target site accessibility were computed (see
<xref ref-type="table" rid="tbl1">Table 1</xref>
; Supplementary Table S1 and Materials and Methods). Each of these individual features were correlated to the activities of the 2252 siRNAs in the Novartis dataset (
<xref ref-type="bibr" rid="b1">1</xref>
) (see Materials and Methods). No Pearson correlation coefficient exceeded
<italic>r</italic>
= 0.38 and only 15 features have
<italic>r</italic>
≥ 0.2 or
<italic>r</italic>
≤ −0.2 (
<xref ref-type="table" rid="tbl2">Table 2</xref>
). Several of these latter features represent the same phenomenon. For example, the decreased stability at the 5′ terminus of the antisense strand is represented in free energy, enthalpy, mono- or dinucleotide features, such as selection against extreme negative free energy, and GG, CC, GC and CG dinucleotides. The inferior performance of individual features is an even more serious issue. This performance is measured by the large overlaps in feature distributions between ≥90% and ≤80% active siRNAs (
<xref ref-type="fig" rid="fig1">Figure 1</xref>
). Because previous machine learning methods (
<xref ref-type="bibr" rid="b1">1</xref>
,
<xref ref-type="bibr" rid="b13">13</xref>
,
<xref ref-type="bibr" rid="b14">14</xref>
,
<xref ref-type="bibr" rid="b16">16</xref>
) used considerably less representative sets of features, significant improvements can be expected from their 86% prediction accuracy. This level is not satisfactory; even when applying multiple siRNA species, the risk of incomplete silencing remains substantial. However, to train a new method using 572 features over only 2252 siRNAs in the Novartis dataset would have led to overtraining; i.e. inferior performance on independent test sets. To avoid that, we applied constrained optimization models and SVMs for the optimal selection of a considerably smaller subset of features with the highest combined predictive value. We accomplished this objective by iteratively solving the models below with a stepwise elimination of the feature(s) using different methods. The comparability of diverse features was ensured by standardization to zero mean and unit SD.</p>
<table-wrap id="tbl1" position="float">
<label>Table 1</label>
<caption>
<p>Overview of the 572 sequence, thermodynamic and accessibility features of the siRNAs</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Both global and positional features:</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Δ
<italic>G</italic>
, Δ
<italic>H</italic>
and Δ
<italic>S</italic>
during the transition from double-stranded to single-stranded state of the RNA (
<xref ref-type="bibr" rid="b18">18</xref>
);</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • The ratio Δ
<italic>H</italic>
<italic>S</italic>
as above;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Average probabilities of target site positions to form secondary structures (mono-, di- and tetranucleotides);</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • G + C content.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Global features covering the complete antisense strand:</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Δ
<italic>G</italic>
during complex formation between the siRNA and the target mRNA;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Relative frequencies of mono- or dinucleotides;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Relative frequencies of homotri- and tetranucleotides;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Maximal length of the G/C runs;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Minimal free energy of the secondary structures at the mRNA target site;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Melting temperature of the double-stranded siRNA;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • The probability and Δ
<italic>G</italic>
of forming a self-hairpin;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Position of the target locus at the mRNA relative to the translation initiation site;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Concentration of the siRNA.</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Features specific to each position of the antisense strand:</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Presence or absence of mono- and dinucleotides;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Presence of G or C mononucleotides;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Probability of the target site positions to form secondary structures;</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">    • Change in free energy during complex formation between the siRNA and the target mRNA.</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Both global and positional features were used. SVM and constrained optimization methods performed the iterative selection of the most predictive features shown in Table 2 and Supplementary Table S1.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="tbl2" position="float">
<label>Table 2</label>
<caption>
<p>The predictive performance of features</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th colspan="6" align="left" rowspan="1">Predictive performance</th>
</tr>
<tr>
<th colspan="3" align="left" rowspan="1">Individual
<sup>a</sup>
</th>
<th colspan="3" align="left" rowspan="1">Combined
<sup>b</sup>
</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">Feature</th>
<th align="left" rowspan="1" colspan="1">Position</th>
<th align="left" rowspan="1" colspan="1">
<italic>r</italic>
</th>
<th align="left" rowspan="1" colspan="1">Feature</th>
<th align="left" rowspan="1" colspan="1">Position</th>
<th align="left" rowspan="1" colspan="1">Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">1–2</td>
<td align="left" rowspan="1" colspan="1">0.38</td>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="char" char="." rowspan="1" colspan="1">0.146</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">0.36</td>
<td align="left" rowspan="1" colspan="1">CC</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="char" char="." rowspan="1" colspan="1">−0.134</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">G</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.31</td>
<td align="left" rowspan="1" colspan="1">
<italic>p</italic>
<sub>3</sub>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="char" char="." rowspan="1" colspan="1">−0.128</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>H</italic>
</td>
<td align="left" rowspan="1" colspan="1">1–2</td>
<td align="left" rowspan="1" colspan="1">0.30</td>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="char" char="." rowspan="1" colspan="1">0.109</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>S</italic>
</td>
<td align="left" rowspan="1" colspan="1">1–2</td>
<td align="left" rowspan="1" colspan="1">0.27</td>
<td align="left" rowspan="1" colspan="1">Δ
<italic>H</italic>
</td>
<td align="left" rowspan="1" colspan="1">18–19</td>
<td align="char" char="." rowspan="1" colspan="1">−0.107</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.26</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">19</td>
<td align="char" char="." rowspan="1" colspan="1">−0.099</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.25</td>
<td align="left" rowspan="1" colspan="1">G</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="char" char="." rowspan="1" colspan="1">−0.094</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UU</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">0.23</td>
<td align="left" rowspan="1" colspan="1">UU</td>
<td align="left" rowspan="1" colspan="1">18–19</td>
<td align="char" char="." rowspan="1" colspan="1">−0.086</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">G</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">−0.22</td>
<td align="left" rowspan="1" colspan="1">Δ
<italic>H</italic>
</td>
<td align="left" rowspan="1" colspan="1">20–21</td>
<td align="char" char="." rowspan="1" colspan="1">0.084</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>H</italic>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.22</td>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="char" char="." rowspan="1" colspan="1">0.068</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>ΔG</italic>
</td>
<td align="left" rowspan="1" colspan="1">3–5 − 19–21</td>
<td align="left" rowspan="1" colspan="1">0.21</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="char" char="." rowspan="1" colspan="1">0.066</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>ΔG</italic>
</td>
<td align="left" rowspan="1" colspan="1">1–3 − 19–21</td>
<td align="left" rowspan="1" colspan="1">0.21</td>
<td align="left" rowspan="1" colspan="1">AU</td>
<td align="left" rowspan="1" colspan="1">6–7</td>
<td align="char" char="." rowspan="1" colspan="1">−0.063</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>S</italic>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.21</td>
<td align="left" rowspan="1" colspan="1">AA</td>
<td align="left" rowspan="1" colspan="1">17–18</td>
<td align="char" char="." rowspan="1" colspan="1">−0.059</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GG</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.20</td>
<td align="left" rowspan="1" colspan="1">GG</td>
<td align="left" rowspan="1" colspan="1">20–21</td>
<td align="char" char="." rowspan="1" colspan="1">0.058</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GC</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.20</td>
<td align="left" rowspan="1" colspan="1">AA</td>
<td align="left" rowspan="1" colspan="1">18–19</td>
<td align="char" char="." rowspan="1" colspan="1">−0.056</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UA</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.18</td>
<td align="left" rowspan="1" colspan="1">AU</td>
<td align="left" rowspan="1" colspan="1">9–10</td>
<td align="char" char="." rowspan="1" colspan="1">−0.055</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">3–4</td>
<td align="char" char="." rowspan="1" colspan="1">0.055</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">C</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.17</td>
<td align="left" rowspan="1" colspan="1">C</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="char" char="." rowspan="1" colspan="1">−0.054</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GG</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">−0.17</td>
<td align="left" rowspan="1" colspan="1">GG</td>
<td align="left" rowspan="1" colspan="1">16–17</td>
<td align="char" char="." rowspan="1" colspan="1">−0.053</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>ΔG</italic>
</td>
<td align="left" rowspan="1" colspan="1">1–5 − 17–21</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">CG</td>
<td align="left" rowspan="1" colspan="1">1–2</td>
<td align="char" char="." rowspan="1" colspan="1">−0.052</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">18</td>
<td align="left" rowspan="1" colspan="1">−0.17</td>
<td align="left" rowspan="1" colspan="1">AG</td>
<td align="left" rowspan="1" colspan="1">20–21</td>
<td align="char" char="." rowspan="1" colspan="1">0.052</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">13</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">G</td>
<td align="left" rowspan="1" colspan="1">14</td>
<td align="char" char="." rowspan="1" colspan="1">−0.050</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>G</italic>
</td>
<td align="left" rowspan="1" colspan="1">2</td>
<td align="left" rowspan="1" colspan="1">0.17</td>
<td align="left" rowspan="1" colspan="1">UG</td>
<td align="left" rowspan="1" colspan="1">4–5</td>
<td align="char" char="." rowspan="1" colspan="1">−0.049</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">GC</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">−0.16</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">20</td>
<td align="char" char="." rowspan="1" colspan="1">−0.047</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CC</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">−0.16</td>
<td align="left" rowspan="1" colspan="1">UG</td>
<td align="left" rowspan="1" colspan="1">20–21</td>
<td align="char" char="." rowspan="1" colspan="1">0.046</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">UU</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">0.16</td>
<td align="left" rowspan="1" colspan="1">CC</td>
<td align="left" rowspan="1" colspan="1">13–14</td>
<td align="char" char="." rowspan="1" colspan="1">−0.044</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CG</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.16</td>
<td align="left" rowspan="1" colspan="1">GU</td>
<td align="left" rowspan="1" colspan="1">5–6</td>
<td align="char" char="." rowspan="1" colspan="1">0.040</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">19</td>
<td align="left" rowspan="1" colspan="1">−0.16</td>
<td align="left" rowspan="1" colspan="1">A</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="char" char="." rowspan="1" colspan="1">0.039</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Δ
<italic>H</italic>
<italic>S</italic>
</td>
<td align="left" rowspan="1" colspan="1">All</td>
<td align="left" rowspan="1" colspan="1">−0.15</td>
<td align="left" rowspan="1" colspan="1">CC</td>
<td align="left" rowspan="1" colspan="1">20–21</td>
<td align="char" char="." rowspan="1" colspan="1">−0.036</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CC</td>
<td align="left" rowspan="1" colspan="1">1</td>
<td align="left" rowspan="1" colspan="1">−0.15</td>
<td align="left" rowspan="1" colspan="1">U</td>
<td align="left" rowspan="1" colspan="1">7</td>
<td align="char" char="." rowspan="1" colspan="1">0.035</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>Weights were optimized by an SVM with linear kernel. The absolute value of the weight indicates the contribution of that feature to the prediction in the linear kernel limited to 30 features. Note that the practical predictions use 142 features, shown in Supplementary Table S1 online.
<italic>p</italic>
<sub>3</sub>
is the probability of that each base of the tetranucleotide (
<italic>i</italic>
,
<italic>i</italic>
+ 1,
<italic>i</italic>
+ 2,
<italic>i</italic>
+ 3) is paired as predicted by the
<italic>sfold</italic>
algorithm.</p>
</fn>
<fn>
<p>
<sup>a</sup>
The 30 features with the strongest correlations to siRNA activity in the Novartis dataset.</p>
</fn>
<fn>
<p>
<sup>b</sup>
Features that
<italic>in combination</italic>
account for the most accurate predictions of the siRNA knockdown activity.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<fig id="fig1" position="float">
<label>Figure 1</label>
<caption>
<p>Overlap between the distributions of Δ
<italic>G</italic>
at positions 1 and 2, global G + C content and U content between siRNAs with >90% (full-lines) and <80% activity (dotted lines) in the Novartis dataset (
<xref ref-type="bibr" rid="b1">1</xref>
).</p>
</caption>
<graphic xlink:href="gkl1065f1"></graphic>
</fig>
</sec>
<sec sec-type="materials|methods">
<title>MATERIALS AND METHODS</title>
<p>The comparability of the conditions of RNAi experiments underlying the prediction methods has to be ensured. Only experiments with a single siRNA species are useful to us since it is difficult to discern the effects of individual molecules from multi-siRNA experiments. Comparability may be violated by using 19mers (
<xref ref-type="bibr" rid="b3">3</xref>
) instead of 21mers (
<xref ref-type="bibr" rid="b1">1</xref>
). Knockdown activity has to be measured at the same time following transfection while maintaining similar cellular concentrations of siRNAs. The latter requirement can be approximated by using identical cell lines, transfection agents and extracellular siRNA concentration. These criteria are satisfied in two large datasets known to us. First, activities and sequences of 2252 siRNas targeted to 34 mRNA species were obtained from a Novartis study (
<xref ref-type="bibr" rid="b1">1</xref>
). These 21mers included two deoxynucleotide overhangs at the antisense strand complementary to the mRNA. NCI-H1299 and HeLa cells were transfected using combined Lipofectamine™ and Oligofectamine™ agents. Second, two hundred forty 19mer siRNA molecules designed to silence human or humanized targets were taken from Dharmacon (
<xref ref-type="bibr" rid="b3">3</xref>
). While this study targeted as few as eight genes, a major advantage is that all experiments were conducted in HEK293 cells using Lipofectamine™ maintained at 95% transfection efficacy or higher, and the siRNA concentration was held constant at 100 nM. Knockdown activity was measured after 24 h. Holen's (
<xref ref-type="bibr" rid="b15">15</xref>
) collection of 176 additional siRNAs and the database published by Sætrom (
<xref ref-type="bibr" rid="b17">17</xref>
) were also analyzed.</p>
<sec>
<title>Features</title>
<p>SVMs and constrained optimization methods effectively selected the optimal subset of features from several hundred initial features in reasonable central processor unit (CPU) time. This allowed us to select from an unprecedented set of 572 sequence, thermodynamic and target accessibility features (
<xref ref-type="table" rid="tbl1">Table 1</xref>
). Sequence features included the global frequencies of mono- and dinucleotides and the presence or absence of mono- and dinucleotides at each of the 21 positions. Longer runs of identical bases were also considered since homotri- and tetranucleotides can act as termination signals for the RNA polymerase III enzyme used in certain vectors. Thermodynamic features, including the Gibbs free energy (Δ
<italic>G</italic>
), enthalpy (Δ
<italic>H</italic>
) and entropy (Δ
<italic>S</italic>
) differentials, and the Δ
<italic>H</italic>
<italic>S</italic>
ratio, which is the major determinant of
<italic>T</italic>
<sub>m</sub>
(melting point), were calculated according to Xia
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="b18">18</xref>
). Their derivative feature is the thermodynamic differential between the 5′ ends of the antisense and sense strands, which has been proposed as a distinctive feature of potent siRNAs (
<xref ref-type="bibr" rid="b7">7</xref>
). Δ
<italic>G</italic>
and the number of hydrogen-bonded nucleotide pairs characterize self-hairpins that can obstruct duplex formation. These features were predicted as described in (
<xref ref-type="bibr" rid="b19">19</xref>
). Target accessibility predictions require Bayesian sampling from a large number of alternative mRNA structures. The probability of the mRNA to form secondary structures and the free energy of these structures was calculated by the
<italic>sfold</italic>
tool (
<xref ref-type="bibr" rid="b20">20</xref>
<xref ref-type="bibr" rid="b22">22</xref>
) implemented at
<ext-link ext-link-type="uri" xlink:href="http://sfold.wadswort.org"></ext-link>
.</p>
<p>Feature selection required the compatibility of feature distributions. Therefore, feature values were standardized for the constrained optimization methods to a mean of zero and a SD of unity. For SVMs, feature values were normalized to the interval of [0,1].</p>
</sec>
<sec>
<title>Methods</title>
<p>We applied existing and created new machine learning methods for feature selection and predictions. Constrained optimization (mathematical programming or operations research) (
<xref ref-type="bibr" rid="b23">23</xref>
) is a powerful mathematical tool for maximizing or minimizing an objective function. Here we perform the optimal allocation of the regression plane to minimize the sum of deviations from this plane. Constrained optimization finds the globally optimal solution for a very large set of equations or inequalities in practically polynomial time (
<xref ref-type="bibr" rid="b24">24</xref>
).</p>
<p>SVMs are supervised learning methods used for classification and regression (
<xref ref-type="bibr" rid="b25">25</xref>
). SVMs transform the original data with nonlinear relationships into a higher dimension space to allow linear regression. SVMs have provided solutions to numerous biological problems as reviewed in Camps-Valls
<italic>et al</italic>
. (
<xref ref-type="bibr" rid="b12">12</xref>
). Support vectors were generated by the core vector machine (
<xref ref-type="bibr" rid="b26">26</xref>
) and the SVMlight (
<xref ref-type="bibr" rid="b27">27</xref>
) packages using linear, polynomial and Gaussian radial basis function kernels. To assess the robustness of the predictions and the underlying features, we implemented fundamentally different methods using constrained optimization. First, we created a BC model to separate above-average (>70% knockdown) siRNAs from those with <60% activity. A nontraditional multivariate regression was performed for the molecules predicted as above-average. Experimenting with other cutoffs for high- and low-activity siRNAs resulted in lower accuracy in the combined BC-MVR cross-validation analyses (data not shown).</p>
<p>Robust BC is performed by the iterative elimination of features and misclassified objects (
<xref ref-type="bibr" rid="b28">28</xref>
), a highly reliable method for feature selection, applying Misclassification Minimization models (
<xref ref-type="bibr" rid="b29">29</xref>
). The score
<italic>z
<sub>s</sub>
</italic>
for each sequence
<italic>s</italic>
is defined as the optimally weighted sum of values of the features
<italic>f</italic>
in the set of all features
<italic>F</italic>
:
<disp-formula id="e1">
<label>1</label>
<mml:math id="M1">
<mml:mrow>
<mml:msub>
<mml:mi>z</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo></mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>·</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>s</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
where
<italic>w
<sub>f</sub>
</italic>
is the weight for feature
<italic>f</italic>
. Scores for the highly active molecules are expected to exceed the scores of less active molecules by a value not less than a positive threshold parameter δ, which is the width of the separating zone between the two classes. Increasing δ improves the robustness of the solution: when predicting untrained molecules, we can reduce the number of misclassified molecules. This comes at the cost of increasing the number of unpredicted molecules since scores within the separating zone are not significant enough to classify the underlying siRNA.</p>
<p>The sets of above-average and low-activity siRNAs are linearly inseparable. To make the solution of the model feasible, nonnegative error variables ɛ
<italic>
<sub>h</sub>
</italic>
are introduced for each sequence
<italic>h</italic>
in the set
<italic>H</italic>
, sequences with experimentally determined high-activity:
<disp-formula id="e2">
<label>2</label>
<mml:math id="M2">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo></mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>h</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>γ</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>δ</mml:mi>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
where the geometric interpretation of γ is the intersection with the vertical axis. For each sequence
<italic>l</italic>
in the set
<italic>L</italic>
of low-activity sequences we require that
<disp-formula id="e3">
<label>3</label>
<mml:math id="M3">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo></mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>l</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>γ</mml:mi>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>The sum of absolute values of weights
<italic>w
<sub>f</sub>
</italic>
must be limited to keep the model from growing unbound:
<disp-formula id="e4">
<label>4</label>
<mml:math id="M4">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mi>w</mml:mi>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Here ‖
<italic>w</italic>
<sub>1</sub>
is the standard mathematical notation for the sum of the absolute values (first norm). We solve the system of the above inequalities and equations to minimize the sum of the error variables ɛ
<italic>
<sub>h</sub>
</italic>
.
<disp-formula id="e5">
<label>5</label>
<mml:math id="M5">
<mml:mrow>
<mml:mo>min</mml:mo>
<mml:mfrac>
<mml:mi>λ</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>H</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>·</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mi>H</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo></mml:mo>
<mml:mi>λ</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>n</mml:mi>
<mml:mi>L</mml:mi>
</mml:msub>
</mml:mrow>
</mml:mfrac>
<mml:mo>·</mml:mo>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>h</mml:mi>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ψ</mml:mi>
<mml:mo>·</mml:mo>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mi>w</mml:mi>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
</p>
<p>Here the user-defined parameter 0 < λ < 1 fine-tunes the balance between sensitivity and selectivity. When λ is set to a value higher than 0.5, errors related to above-average activity molecules are decreased by allowing more errors in the low-activity molecules.
<italic>n
<sub>H</sub>
</italic>
and
<italic>n
<sub>L</sub>
</italic>
are the number of the above-average and low-activity molecules in the training set, respectively. ψ is a small factor necessary for the calculation of the absolute values of the weights.</p>
<p>Solving the above system of linear inequalities by constrained optimization packages (e.g. CPLEX from ILOG, Incline Village, Nevada) leads to the minimization of errors by selecting the optimal values for the weights
<italic>w
<sub>f</sub>
</italic>
and the additive variable γ. Provided that the model has a unique, globally optimal solution, any of the simplex, dual or barrier algorithms (
<xref ref-type="bibr" rid="b23">23</xref>
,
<xref ref-type="bibr" rid="b30">30</xref>
) finds it in practically polynomial time (
<xref ref-type="bibr" rid="b24">24</xref>
).</p>
<p>Note that the solution for the above model is more sensitive to a few large errors than to several smaller ones. Incorrect experimental measurements of the knockdown activity may considerably exceed the magnitude of real prediction errors. Such incorrect input data may dislocate the separating zone, resulting in an unjustifiably large number of misclassified molecules. We reduce this effect by iteratively eliminating the siRNA with the largest error in the previous optimization. The saved basic solution allows solving the model about ten times faster than the first time. This is the key to the computational feasibility of several hundred iterations during feature selection (
<xref ref-type="bibr" rid="b28">28</xref>
).</p>
<p>For the numerical prediction of the knockdown activities, brute force traditional multivariate regression analysis has limited utility due to the high number of features. Robust Regression (
<xref ref-type="bibr" rid="b31">31</xref>
) was not as accurate as constrained optimization methods or SVMs (data not shown). In our regression model, for each sequence
<italic>s</italic>
, we minimize the absolute value distance from the regression plane:</p>
<disp-formula id="e6">
<label>6</label>
<mml:math id="M6">
<mml:mrow>
<mml:msub>
<mml:mi>a</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>f</mml:mi>
<mml:mo></mml:mo>
<mml:mi>F</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>w</mml:mi>
<mml:mi>f</mml:mi>
</mml:msub>
<mml:mo>·</mml:mo>
<mml:msub>
<mml:mi>r</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>f</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>γ</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where
<italic>a
<sub>s</sub>
</italic>
is the experimentally determined knockdown activity of molecule
<italic>s</italic>
. Now we minimize the sum of the error variables ɛ
<italic>
<sub>s</sub>
</italic>
and the sum of the absolute values of the
<italic>w
<sub>f</sub>
</italic>
weights:</p>
<disp-formula id="e7">
<label>7</label>
<mml:math id="M7">
<mml:mrow>
<mml:mo>min</mml:mo>
<mml:mstyle displaystyle="true">
<mml:munder>
<mml:mo></mml:mo>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo></mml:mo>
<mml:mi>S</mml:mi>
</mml:mrow>
</mml:munder>
</mml:mstyle>
<mml:mrow>
<mml:msub>
<mml:mi>ɛ</mml:mi>
<mml:mi>s</mml:mi>
</mml:msub>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mi>ψ</mml:mi>
<mml:mo>·</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo></mml:mo>
<mml:mi>w</mml:mi>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Here ψ is a small factor for the contribution of absolute values.</p>
<p>Feature (property or variable) selection emerges as a highly successful new technique (
<xref ref-type="bibr" rid="b32">32</xref>
) for finding those biological or physical features that indicate or cause a certain effect; e.g. a disease. Selecting the most predictive features by traditional manual methods from among several hundred initial features over thousands of observations is prohibitively time-consuming. Fortunately, machine learning tools can perform such complex tasks in short processor time. Examples include differentially expressed genes as indicators and/or causative agents of cancer (
<xref ref-type="bibr" rid="b33">33</xref>
), semi-supervised learning for molecular profiling (
<xref ref-type="bibr" rid="b34">34</xref>
) and optimal selection of hydrophobicity-related, structural and other features determining protein secretion signals (
<xref ref-type="bibr" rid="b28">28</xref>
), physicochemical descriptors to discriminate protein–protein interactions (
<xref ref-type="bibr" rid="b35">35</xref>
), and automatic parsing of the biomedical literature (
<xref ref-type="bibr" rid="b36">36</xref>
). These studies revealed diagnostic combinations of features that frequently constituted some important biological signature. Feature selection also reduces overtraining. This is a fundamental issue when we do not have 5–10 times more observations than features (
<xref ref-type="bibr" rid="b32">32</xref>
).</p>
<p>For linear SVMs and constrained optimization models, we use a weight-based feature elimination algorithm (
<xref ref-type="bibr" rid="b28">28</xref>
). For comparability with related algorithms below, we abbreviate this algorithm as WFE. A feature's weight is proportional to its contribution to the prediction (
<xref ref-type="disp-formula" rid="e2">Equations 2</xref>
and
<xref ref-type="disp-formula" rid="e3">3</xref>
). Features with zero weights do not contribute to the model and therefore should be eliminated. In each of the subsequent iterations, the feature with the lowest absolute value is eliminated. This iteration is repeated until the number of features reaches a user-specified limit and the cross-validation accuracy decreases. Fortunately, the
<italic>w
<sub>f</sub>
</italic>
feature weights are transparent in constrained optimization models. In SVMs with linear kernels,
<italic>w
<sub>f</sub>
</italic>
= ∑
<italic>
<sub>v</sub>
a
<sub>v</sub>
r
<sub>f,v</sub>
</italic>
, where
<italic>a
<sub>v</sub>
</italic>
is the Lagrangian multiplier of support vector
<italic>v</italic>
and
<italic>r
<sub>f,v</sub>
</italic>
is the normalized value of feature
<italic>f</italic>
in support vector
<italic>v</italic>
(
<xref ref-type="bibr" rid="b37">37</xref>
). For the compatibility of features measured in different units, feature values are normalized in SVMs since SVMlight (
<xref ref-type="bibr" rid="b38">38</xref>
) and similar implementations limit feature values to the [0,1] interval. In constrained optimization, we standardize feature values to zero mean and unit SD. Standardization is less sensitive to a few outliers than the above normalization.</p>
<p>For nonlinear SVMs, the effect of leaving out a feature on the objective function is more informative than the weight itself (
<xref ref-type="bibr" rid="b39">39</xref>
). This justifies the computationally much more intensive recursive feature elimination (RFE) (
<xref ref-type="bibr" rid="b33">33</xref>
) method. Basically, in every iteration, a leave-one-out procedure is performed for each for the surviving features. The feature with the smallest effect on the objective function is removed.</p>
</sec>
<sec>
<title>Validation</title>
<p>Ten independent cross-validation experiments were used. In each experiment, the Novartis data were divided into a training set and a test set of equal size using a random number generator. siRNAs with 16 or more identities were eliminated. Blind tests were performed using a large enough dataset (either the Novartis or the Dharmacon data) for training and any other set for testing.</p>
</sec>
</sec>
<sec sec-type="results">
<title>RESULTS</title>
<p>Predictions with 92.3% accuracy were achieved by SVMs with a polynomial kernel using WFE (
<xref ref-type="bibr" rid="b28">28</xref>
) in 10× cross-validation experiments (
<xref ref-type="fig" rid="fig2">Figures 2</xref>
and
<xref ref-type="fig" rid="fig3">3</xref>
). This accuracy is defined as 100 minus the average percentage difference between predicted and observed knockdown activities. SVMs with Gaussian radial basis function or linear kernel provided for less accurate predictions than the polynomial kernel. BC between <60% and >70% active siRNAs was 94% accurate. Here we set the parameter λ to 0.35 to reduce false positives. The subsequent MVR on the >70% active molecules is ∼95% accurate. Altogether, the BC-MVR combination predicted 89% of the ≥90% active siRNAs with a 12% false-positive rate. Regressing 19mers [from the Dharmacon (
<xref ref-type="bibr" rid="b3">3</xref>
), Holen's (
<xref ref-type="bibr" rid="b15">15</xref>
) and Sætrom's (
<xref ref-type="bibr" rid="b17">17</xref>
) sets] by any method trained on 21mers with deoxynucleotide overhangs in the Novartis set (
<xref ref-type="bibr" rid="b1">1</xref>
) or vice versa reduced the accuracy to 78% or lower (data not shown). Supplementing the missing two nucleotides did not lead to significant improvement.</p>
<fig id="fig2" position="float">
<label>Figure 2</label>
<caption>
<p>The accuracy of SVM using different kernels and constrained optimization methods as functions of the number of features. Results of 10× cross-validation experiments (see Materials and Methods) are shown. Note that constrained optimization eliminated all but 72 features in the first iteration.</p>
</caption>
<graphic xlink:href="gkl1065f2"></graphic>
</fig>
<fig id="fig3" position="float">
<label>Figure 3</label>
<caption>
<p>Observed versus predicted activities in the Novartis dataset (
<xref ref-type="bibr" rid="b1">1</xref>
). Predictions were performed by the polynomial kernel SVM using 142 features shown on Supplementary Table 1.</p>
</caption>
<graphic xlink:href="gkl1065f3"></graphic>
</fig>
<p>BC and MVR automatically reduced the number of features at the first iteration to 72 and 86, respectively. At identical feature numbers, WFE led to quite unexpected results: basically similar features were selected by constrained optimization methods and linear SVMs. This observation increases the confidence for finding the biological and thermodynamic signature for RNAi.</p>
<p>As a rule, either identical or analogous features are selected by WFE over linear methods and by RFE using a polynomial kernel (Supplementary Table S1). Although WFE requires as many as 142 features to reach maximal accuracy compared to 68 features with RFE/polynomial kernel, 30 features are shared between these two sets. More importantly, several remaining features form analogous combinations (
<xref ref-type="fig" rid="fig4">Figure 4</xref>
). As an example, the selection against AAA starting at 18 is expressed in WFE by selection against AA at positions 18 and 19. Analogously, RFE indicates selection against A at 18 and AA at 19. Another example is the negative preference for CC at 12, which is expressed in RFE by that single feature. However, WFE uses two features, AC at 11 and CC at 13, to the same effect. Yet another example is disfavoring C at 9 and CC at 10 in RFE, which is expressed by selection against AC at 8 and CC at 8, 9 and 10 in WFE.</p>
<fig id="fig4" position="float">
<label>Figure 4</label>
<caption>
<p>Three examples of aligned feature (dinucleotide) combinations selected by RFE and/or WFE with common sequence motifs. All of these features decrease siRNA activity. The selection against the dinucleotide CC at position 9 is expressed by disfavoring cytosines at position 9 in RFE and the dinucleotide CC at position 10 in both methods. In WFE, the selection against CC at 9 is expressed both directly (CC at 9) and indirectly by disfavoring AC and CC at 8, and CC at positions 8, 9, and 10 (see text).</p>
</caption>
<graphic xlink:href="gkl1065f4"></graphic>
</fig>
<p>As a more complex example, the global G + C content is selected by the polynomial kernels used in RFE, whereas WFE chooses a wide-array of local mono- and dinucleotide features that are clearly related to the global G + C content. We postulated that the features selected by WFE account for a more accurate prediction than the G + C content. To test this postulate, we complemented the feature set selected by WFE with G + C. As expected, adding G + C did not increase prediction accuracy, even with polynomial kernels.</p>
<p>However, the position of the target site was important for RFE but eliminated by WFE. We believe that the polynomial kernel uses this feature better since loci too close to or too far from the translation initiation site appear to decrease activity. To improve predictions, we overruled WFE and manually complemented it by the target site feature. The accessibility of the target site as measured by the
<italic>sfold p</italic>
<sub>3</sub>
feature is one of the heaviest weighted features of WFE both in MVF and SVM with a linear kernel. However, RFE with a polynomial kernel eliminated
<italic>p</italic>
<sub>3</sub>
.</p>
<p>Although WFE outperformed RFE with a small margin in our study, this does not substantiate far-reaching conclusions. WFE with a linear kernel is more robust and better in handling a high number of features. However, RFE can identify features that have highly nonlinear effects on silencing activity. An example would be the distance of the target from the translation initiation site. Such features may be missed by WFE.</p>
</sec>
<sec sec-type="discussion">
<title>DISCUSSION</title>
<p>Highly active siRNA molecules, although diverse in sequences, appear to conform to a widespread dinucleotide, thermodynamic and accessibility signature. This signature is highly probabilistic, meaning that there are numerous exceptions to each ‘rule.’ Fortunately, appropriate methods allow accurate prediction, which in turn lets us identify the most active siRNAs for the gene to be silenced.</p>
<p>A total of 92.3% accuracy was achieved in weight-based feature elimination. The most accurate predictions in cross-validation experiments required as many as 142 features (Supplementary Table S1). For brevity,
<xref ref-type="table" rid="tbl2">Table 2</xref>
shows the linear kernel that was limited to 30 features. Further indications include the need for ∼150 features and the lack of high weights (over 5% of the sum of the absolute values). RFE on polynomial kernels was somewhat less accurate (89.4%) than the weight-based feature elimination. However, this accuracy was achieved using as few as 68 features (Supplementary Table S1). Of these, 30 features are shared with the 142 obtained with weight-based feature elimination.</p>
<p>The lack of absolute criteria may be due to sequence diversity. Since a large number of genes are subject to posttranscriptional regulation, a wide spectrum of mRNA segments is sensitive to RNA interference. This diversity requirement can still accommodate probabilistic criteria specific for the RISC complex (see below). Silencing activity appears to be determined by a wide-range of flexible combinations of weighted sequence, thermodynamic and accessibility features.</p>
<p>A wide spectrum of sequences can fit this thermodynamic profile (
<xref ref-type="bibr" rid="b40">40</xref>
), which can provide a (partial) solution for the paradox of sequence diversity versus RISC-specific criteria. Accurate and rigorous analysis and prediction of RNAi in free energy terms may be a real possibility, akin to structural predictions of RNA (
<xref ref-type="bibr" rid="b41">41</xref>
) or proteins (
<xref ref-type="bibr" rid="b42">42</xref>
). Machine learning is also facilitated by the 16-fold reduction in dimensionality of Δ
<italic>G</italic>
profile as compared to dinucleotides.</p>
<p>Several key features are related to the change in free energy, enthalpy or entropy related to duplex formation. Global Δ
<italic>G</italic>
is assigned the highest weight by SVMs. For the 500 most active siRNAs, the average of Δ
<italic>G</italic>
is −164.43 kJ/mol, whereas for the 500 least active siRNAs it is −180.20 kJ/mol. In siRNAs with >90% activity, preference for lower stability is also indicated by the selection against CC and GG dinucleotides at the whole antisense strand. On the contrary to the expected antisense frequency of 0.0625, CC dinucleotides occur with a frequency of 0.0489 and GG with a frequency of 0.0540. CC was assigned a weight of −0.04503 and GG received a weight of −0.03433. The general preference for less negative global Δ
<italic>G</italic>
is fine-tuned by a preference at the 5′-terminus of the antisense for A and U and selection against G, C, CG and UG. The 3′ end shows a preference for C, G, GG, AG, UG, GU and a negative selection against A, UU, AA and CC. The putative cleavage site for the
<italic>Argonaute-2</italic>
(
<xref ref-type="bibr" rid="b43">43</xref>
) or similar endonuclease at around position 7 is rich in U, but GU is preferred to AU. These results complement the thermodynamic profile reported earlier (
<xref ref-type="bibr" rid="b7">7</xref>
) and the proposition that the lower terminal stability is supposed to facilitate duplex unwinding by the topoisomerase enzyme (
<xref ref-type="bibr" rid="b44">44</xref>
).</p>
<p>Using WFE, the accessibility of the target site emerges as the most predictive of the 142 features (Supplementary Table S1) and the third most important feature among the 30 shown in
<xref ref-type="table" rid="tbl2">Table 2</xref>
. Extreme negative weight is assigned to
<italic>p</italic>
<sub>3</sub>
, the probability that all bases of a tetranucleotide are involved in secondary structures.
<italic>p</italic>
<sub>3</sub>
is estimated by a Bayesian sampling from the Boltzmann probability distribution of conformations as implemented in the
<italic>sfold</italic>
algorithm (
<xref ref-type="bibr" rid="b20">20</xref>
). Therefore, it is not surprising that
<italic>p</italic>
<sub>3</sub>
consistently received more significant weights than the Δ
<italic>G</italic>
of the single most stable structure. However, for BC between <60% and >70% active siRNAs, all accessibility features receive zero weights (data not shown). This indicates that most structured target sites can be silenced by <70% efficacy. Whereas the correlation between activity and
<italic>p</italic>
<sub>3</sub>
is low
<italic>(r</italic>
= 0.0584), this is significant at the
<italic>p</italic>
= 0.0035 level. The considerable weight assigned to
<italic>p</italic>
<sub>3</sub>
indicates that the target sites of siRNAs with ≥90% activity are either highly accessible or other features must compensate for limited accessibility.</p>
<p>The formation of self-hairpins within a single strand may inhibit silencing action (
<xref ref-type="bibr" rid="b45">45</xref>
). SVMs with over 100 features (Supplementary Table S1), BC, and MLR assigned strong negative weights to this feature, which was estimated by the
<italic>RNAup</italic>
package (
<xref ref-type="bibr" rid="b19">19</xref>
). While self-hairpin probability received zero weights in the SVM models with <50 features, it was strongly penalized indirectly by
<italic>p</italic>
<sub>3</sub>
from the
<italic>sfold</italic>
predictions and sequence patterns that decrease the chances for Watson–Crick base pairing between the 5′ and the 3′ ends. Interestingly, while the 5′–3′ thermodynamic differential was eliminated during feature selection, high weights were assigned to sequence features that express the same thermodynamic differential. These include a preference for U and A at positions 1 and 2 but selection against these nucleotides at position 19. AG and UG are preferred at positions 20–21, whereas AA at 17–18, AA and UU at 18–19 and U at 20 are less frequent than expected on a random basis.</p>
<p>Contrary to some earlier rules (
<xref ref-type="bibr" rid="b2">2</xref>
), we found 12 siRNA molecules with ≥90% knockdown that contain
<italic>GGGG</italic>
tetranucleotide(s), which may form highly stable tetraplexes. Ten other highly active siRNAs contained overly stable runs of 7 or more G or C bases.</p>
<p>The distribution of weights along the sequence follows a consistent pattern across SVMs, BC and MVR with widely varying numbers of features (
<xref ref-type="fig" rid="fig5">Figure 5</xref>
). The first and second antisense positions dominate the predictions with the exception of BC and MVR. SVMs had another major peak at position 19, in line with the hypothesis that loose termini facilitate duplex unwinding by the topoisomerase enzyme (
<xref ref-type="bibr" rid="b7">7</xref>
). The importance of the possible
<italic>Argonaute-2</italic>
(
<xref ref-type="bibr" rid="b43">43</xref>
) cleavage site at position 7 was pronounced only with BC and SVM with 60 features. The most accurate models specified preferences for all positions. However, when the number of features was limited to 30, all features at positions 8, 11, 12 and 15 were eliminated. The accuracy of predictions dropped at such a low number of features (
<xref ref-type="fig" rid="fig2">Figure 2</xref>
).</p>
<fig id="fig5" position="float">
<label>Figure 5</label>
<caption>
<p> Contributions of the individual antisense sequence positions to the predictions in MVR, BC, SVM/WFE with linear kernel, and SVM/RFE with a polynomial kernel. For the first four methods, we show the sum of weights (absolute values) for the features at that position. For RFE (magenta line), we display the total decrease in the prediction accuracy when features specific to a given position are eliminated.</p>
</caption>
<graphic xlink:href="gkl1065f5"></graphic>
</fig>
<p>Cross-validation experiments and blind tests on untrained data show the robustness (stable high-performance over new data) of the biophysical signature and the predictions. Dinucleotide preferences form a marked pattern that cannot be attributed purely to energetic or entropic factors. We postulate that these patterns are related to at least three sets of criteria. First, siRNAs need to be integrated into the RISC complex and have to facilitate helix unwinding by the topoisomerase and cleavage by
<italic>Argonaute-2</italic>
enzymes. Second, accessible target sites are preferred or other features should compensate for reduced accessibility. Third, there is a selection against strands that can form self-hairpin structures.</p>
<p>
<italic>Availability</italic>
: fast siRNA activity predictions can be performed on our web server at
<ext-link ext-link-type="uri" xlink:href="http://optirna.unl.edu/"></ext-link>
.</p>
</sec>
<sec sec-type="supplementary-material">
<title>SUPPLEMENTARY DATA</title>
<p>Supplementary Data are available at NAR online.</p>
</sec>
</body>
<back>
<ack>
<p>The author is grateful to Drs M. E. Fromm, W. W. Stroup and J. J. M. Riethoven and J. Gardner for comments and suggestions and Dr F. Ma for systems administration. The web page was implemented by M. Eirich, E. Moss and A. Guru. Special thanks to Drs T. Holen, A. Khvorova and P. Sætrom for their siRNA collections. Support from the National Science Foundation, Tobacco Settlement Fund, and a Cyberinfrastructure Development Grant from the University of Nebraska–Lincoln are gratefully acknowledged. Funding to pay the Open Access publication charges for this article was provided by the National Science Foundation EPS-0346476.</p>
<p>
<italic>Conflict of interest statement.</italic>
None declared.</p>
</ack>
<ref-list>
<title>REFERENCES</title>
<ref id="b1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huesken</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lange</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mickanin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Weiler</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Asselbergs</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Warner</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Meloon</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Engel</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Rosenberg</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Cohen</surname>
<given-names>D.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Design of a genome-wide siRNA library using an artificial neural network</article-title>
<source>Nat. Biotechnol.</source>
<year>2005</year>
<volume>23</volume>
<fpage>995</fpage>
<lpage>1001</lpage>
<pub-id pub-id-type="pmid">16025102</pub-id>
</element-citation>
</ref>
<ref id="b2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elbashir</surname>
<given-names>S.M.</given-names>
</name>
<name>
<surname>Martinez</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Patkaniowska</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Lendeckel</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Functional anatomy of siRNAs for mediating efficient RNAi in
<italic>Drosophila melanogaster</italic>
embryo lysate</article-title>
<source>EMBO J.</source>
<year>2001</year>
<volume>20</volume>
<fpage>6877</fpage>
<lpage>6888</lpage>
<pub-id pub-id-type="pmid">11726523</pub-id>
</element-citation>
</ref>
<ref id="b3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Reynolds</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Leake</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Boese</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Scaringe</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>W.S.</given-names>
</name>
<name>
<surname>Khvorova</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Rational siRNA design for RNA interference</article-title>
<source>Nat. Biotechnol.</source>
<year>2004</year>
<volume>22</volume>
<fpage>326</fpage>
<lpage>330</lpage>
<pub-id pub-id-type="pmid">14758366</pub-id>
</element-citation>
</ref>
<ref id="b4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yuan</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Latek</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Hossbach</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tuschl</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Lewitter</surname>
<given-names>F.</given-names>
</name>
</person-group>
<article-title>siRNA Selection Server: an automated siRNA oligonucleotide prediction server</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>W130</fpage>
<lpage>W134</lpage>
<pub-id pub-id-type="pmid">15215365</pub-id>
</element-citation>
</ref>
<ref id="b5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ui-Tei</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Naito</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Takahashi</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Haraguchi</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Ohki-Hamazaki</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Juni</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Ueda</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Saigo</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference</article-title>
<source>Nucleic Acids Res.</source>
<year>2004</year>
<volume>32</volume>
<fpage>936</fpage>
<lpage>948</lpage>
<pub-id pub-id-type="pmid">14769950</pub-id>
</element-citation>
</ref>
<ref id="b6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amarzguioui</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Prydz</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>An algorithm for selection of functional siRNA sequences</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<year>2004</year>
<volume>316</volume>
<fpage>1050</fpage>
<lpage>1058</lpage>
<pub-id pub-id-type="pmid">15044091</pub-id>
</element-citation>
</ref>
<ref id="b7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khvorova</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Reynolds</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jayasena</surname>
<given-names>S.D.</given-names>
</name>
</person-group>
<article-title>Functional siRNAs and miRNAs exhibit strand bias</article-title>
<source>Cell</source>
<year>2003</year>
<volume>115</volume>
<fpage>209</fpage>
<lpage>216</lpage>
<pub-id pub-id-type="pmid">14567918</pub-id>
</element-citation>
</ref>
<ref id="b8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lee</surname>
<given-names>N.S.</given-names>
</name>
<name>
<surname>Dohjima</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Bauer</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M.J.</given-names>
</name>
<name>
<surname>Ehsani</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Salvaterra</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Rossi</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Expression of small interfering RNAs targeted against HIV-1 rev transcripts in human cells</article-title>
<source>Nat. Biotechnol.</source>
<year>2002</year>
<volume>20</volume>
<fpage>500</fpage>
<lpage>505</lpage>
<pub-id pub-id-type="pmid">11981565</pub-id>
</element-citation>
</ref>
<ref id="b9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bohula</surname>
<given-names>E.A.</given-names>
</name>
<name>
<surname>Salisbury</surname>
<given-names>A.J.</given-names>
</name>
<name>
<surname>Sohail</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Playford</surname>
<given-names>M.P.</given-names>
</name>
<name>
<surname>Riedemann</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Southern</surname>
<given-names>E.M.</given-names>
</name>
<name>
<surname>Macaulay</surname>
<given-names>V.M.</given-names>
</name>
</person-group>
<article-title>The efficacy of small interfering RNAs targeted to the type 1 insulin-like growth factor receptor (IGF1R) is influenced by secondary structure in the IGF1R transcript</article-title>
<source>J. Biol. Chem.</source>
<year>2003</year>
<volume>278</volume>
<fpage>15991</fpage>
<lpage>15997</lpage>
<pub-id pub-id-type="pmid">12604614</pub-id>
</element-citation>
</ref>
<ref id="b10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kretschmer-Kazemi Far</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sczakiel</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>The activity of siRNA in mammalian cells is related to structural target accessibility: a comparison with antisense oligonucleotides</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>4417</fpage>
<lpage>4424</lpage>
<pub-id pub-id-type="pmid">12888501</pub-id>
</element-citation>
</ref>
<ref id="b11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Schölkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Smola</surname>
<given-names>A.J.</given-names>
</name>
<name>
<surname>Williamson</surname>
<given-names>R.C.</given-names>
</name>
<name>
<surname>Bartlett</surname>
<given-names>P.L.</given-names>
</name>
</person-group>
<article-title>New support vector algorithms</article-title>
<source>Neural Comput.</source>
<year>2000</year>
<volume>12</volume>
<fpage>1207</fpage>
<lpage>1245</lpage>
<pub-id pub-id-type="pmid">10905814</pub-id>
</element-citation>
</ref>
<ref id="b12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camps-Valls</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Chalk</surname>
<given-names>A.M.</given-names>
</name>
<name>
<surname>Serrano-Lopez</surname>
<given-names>A.J.</given-names>
</name>
<name>
<surname>Martin-Guerrero</surname>
<given-names>J.D.</given-names>
</name>
<name>
<surname>Sonnhammer</surname>
<given-names>E.L.</given-names>
</name>
</person-group>
<article-title>Profiled support vector machines for antisense oligonucleotide efficacy prediction</article-title>
<source>BMC Bioinformatics</source>
<year>2004</year>
<volume>5</volume>
<fpage>135</fpage>
<pub-id pub-id-type="pmid">15383156</pub-id>
</element-citation>
</ref>
<ref id="b13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sætrom</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming</article-title>
<source>Bioinformatics</source>
<year>2004</year>
<volume>20</volume>
<fpage>3055</fpage>
<lpage>3063</lpage>
<pub-id pub-id-type="pmid">15201190</pub-id>
</element-citation>
</ref>
<ref id="b14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shabalina</surname>
<given-names>S.A.</given-names>
</name>
<name>
<surname>Spiridonov</surname>
<given-names>A.N.</given-names>
</name>
<name>
<surname>Ogurtsov</surname>
<given-names>A.Y.</given-names>
</name>
</person-group>
<article-title>Computational models with thermodynamic and composition features improve siRNA design</article-title>
<source>BMC Bioinformatics</source>
<year>2006</year>
<volume>7</volume>
<fpage>65</fpage>
<pub-id pub-id-type="pmid">16472402</pub-id>
</element-citation>
</ref>
<ref id="b15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Holen</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Efficient prediction of siRNAs with siRNArules 1.0: an open-source JAVA approach to siRNA algorithms</article-title>
<source>RNA</source>
<year>2006</year>
<volume>12</volume>
<fpage>1620</fpage>
<lpage>1625</lpage>
<pub-id pub-id-type="pmid">16870995</pub-id>
</element-citation>
</ref>
<ref id="b16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chalk</surname>
<given-names>A.M.</given-names>
</name>
<name>
<surname>Wahlestedt</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Sonnhammer</surname>
<given-names>E.L.</given-names>
</name>
</person-group>
<article-title>Improved and automated prediction of effective siRNA</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<year>2004</year>
<volume>319</volume>
<fpage>264</fpage>
<lpage>274</lpage>
<pub-id pub-id-type="pmid">15158471</pub-id>
</element-citation>
</ref>
<ref id="b17">
<label>17</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sætrom</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Snove</surname>
<given-names>J.O.</given-names>
</name>
</person-group>
<article-title>A comparison of siRNA efficacy predictors</article-title>
<source>Biochem. Biophys. Res. Commun.</source>
<year>2004</year>
<volume>321</volume>
<fpage>247</fpage>
<lpage>253</lpage>
<pub-id pub-id-type="pmid">15358242</pub-id>
</element-citation>
</ref>
<ref id="b18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xia</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>SantaLucia</surname>
<given-names>J.,</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Burkard</surname>
<given-names>M.E.</given-names>
</name>
<name>
<surname>Kierzek</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Schroeder</surname>
<given-names>S.J.</given-names>
</name>
<name>
<surname>Jiao</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Cox</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>D.H.</given-names>
</name>
</person-group>
<article-title>Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs</article-title>
<source>Biochemistry</source>
<year>1998</year>
<volume>37</volume>
<fpage>14719</fpage>
<lpage>14735</lpage>
<pub-id pub-id-type="pmid">9778347</pub-id>
</element-citation>
</ref>
<ref id="b19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Muckstein</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Tafer</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hackermuller</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bernhart</surname>
<given-names>S.H.</given-names>
</name>
<name>
<surname>Stadler</surname>
<given-names>P.F.</given-names>
</name>
<name>
<surname>Hofacker</surname>
<given-names>I.L.</given-names>
</name>
</person-group>
<article-title>Thermodynamics of RNA-RNA binding</article-title>
<source>Bioinformatics</source>
<year>2006</year>
<volume>22</volume>
<fpage>1177</fpage>
<lpage>1182</lpage>
<pub-id pub-id-type="pmid">16446276</pub-id>
</element-citation>
</ref>
<ref id="b20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>C.E.</given-names>
</name>
</person-group>
<article-title>A statistical sampling algorithm for RNA secondary structure prediction</article-title>
<source>Nucleic Acids Res.</source>
<year>2003</year>
<volume>31</volume>
<fpage>7280</fpage>
<lpage>7301</lpage>
<pub-id pub-id-type="pmid">14654704</pub-id>
</element-citation>
</ref>
<ref id="b21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>C.E.</given-names>
</name>
</person-group>
<article-title>Statistical prediction of single-stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond</article-title>
<source>Nucleic Acids Res.</source>
<year>2001</year>
<volume>29</volume>
<fpage>1034</fpage>
<lpage>1046</lpage>
<pub-id pub-id-type="pmid">11222752</pub-id>
</element-citation>
</ref>
<ref id="b22">
<label>22</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Lawrence</surname>
<given-names>C.E.</given-names>
</name>
</person-group>
<article-title>A Bayesian statistical algorithm for RNA secondary structure prediction</article-title>
<source>Comput. Chem.</source>
<year>1999</year>
<volume>23</volume>
<fpage>387</fpage>
<lpage>400</lpage>
<pub-id pub-id-type="pmid">10404626</pub-id>
</element-citation>
</ref>
<ref id="b23">
<label>23</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Dantzig</surname>
<given-names>G.B.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Koopmans</surname>
<given-names>T.J.C.</given-names>
</name>
</person-group>
<article-title>Maximization of a linear function of variables subject to linear inequalities</article-title>
<source>Activity Analysis of Production and Allocation</source>
<year>1951</year>
<publisher-loc>NY</publisher-loc>
<publisher-name>Wiley</publisher-name>
<fpage>339</fpage>
<lpage>347</lpage>
</element-citation>
</ref>
<ref id="b24">
<label>24</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Goldfarb</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Todd</surname>
<given-names>M.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Nemhauser</surname>
<given-names>G.L.</given-names>
</name>
<name>
<surname>Rinnooy Kan</surname>
<given-names>M.J.</given-names>
</name>
<name>
<surname>Todd</surname>
<given-names>M.J.</given-names>
</name>
</person-group>
<article-title>Linear programming</article-title>
<source>Optimization</source>
<year>1994</year>
<volume>Vol. 1</volume>
<publisher-loc>Amsterdam</publisher-loc>
<publisher-name>Elsevier</publisher-name>
<fpage>73</fpage>
<lpage>170</lpage>
</element-citation>
</ref>
<ref id="b25">
<label>25</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Schölkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Smola</surname>
<given-names>A.J.</given-names>
</name>
</person-group>
<source>Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond</source>
<year>2002</year>
<publisher-loc>Cambridge, MA</publisher-loc>
<publisher-name>MIT Press</publisher-name>
</element-citation>
</ref>
<ref id="b26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tsang</surname>
<given-names>I.W.</given-names>
</name>
<name>
<surname>Kwok</surname>
<given-names>J.T.</given-names>
</name>
<name>
<surname>Cheung</surname>
<given-names>P.-M.</given-names>
</name>
</person-group>
<article-title>Core vector machines: fast SVM training on very large data sets</article-title>
<source>J. Mach. Learn Res.</source>
<year>2005</year>
<volume>6</volume>
<fpage>363</fpage>
<lpage>392</lpage>
</element-citation>
</ref>
<ref id="b27">
<label>27</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Joachims</surname>
<given-names>T.</given-names>
</name>
</person-group>
<source>Learning to Classify Text Using Support Vector Machines. Methods, Theory and Algorithms</source>
<year>2002</year>
<publisher-loc>Berlin</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="b28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ladunga</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids</article-title>
<source>Bioinformatics</source>
<year>1999</year>
<volume>15</volume>
<fpage>1028</fpage>
<lpage>1038</lpage>
<pub-id pub-id-type="pmid">10745993</pub-id>
</element-citation>
</ref>
<ref id="b29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bennett</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Mangasarian</surname>
<given-names>O.L.</given-names>
</name>
</person-group>
<article-title>Robust linear programming discrimination of two linearly inseparable sets</article-title>
<source>Optim. Meth. Software</source>
<year>1992</year>
<volume>1</volume>
<fpage>23</fpage>
<lpage>34</lpage>
</element-citation>
</ref>
<ref id="b30">
<label>30</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Chvàtal</surname>
<given-names>V.</given-names>
</name>
</person-group>
<source>Linear Programming</source>
<year>1983</year>
<publisher-loc>NY</publisher-loc>
<publisher-name>Freeman</publisher-name>
</element-citation>
</ref>
<ref id="b31">
<label>31</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huber</surname>
<given-names>P.J.</given-names>
</name>
</person-group>
<article-title>Robust estimation of a location parameter</article-title>
<source>Ann. Math. Stat.</source>
<year>1964</year>
<volume>35</volume>
<fpage>73</fpage>
<lpage>101</lpage>
</element-citation>
</ref>
<ref id="b32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kohavi</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>John</surname>
<given-names>G.H.</given-names>
</name>
</person-group>
<article-title>Wrappers for feature subset selection</article-title>
<source>Int. J. Digit. Libr.</source>
<year>1997</year>
<volume>1</volume>
<fpage>108</fpage>
<lpage>121</lpage>
</element-citation>
</ref>
<ref id="b33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Guyon</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Weston</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Barnhill</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Vapnik</surname>
<given-names>V.</given-names>
</name>
</person-group>
<article-title>Gene selection for cancer classification using support vector machines</article-title>
<source>Mach. Learn.</source>
<year>2002</year>
<volume>46</volume>
<fpage>389</fpage>
<lpage>422</lpage>
</element-citation>
</ref>
<ref id="b34">
<label>34</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Furlanello</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Serafini</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Merler</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Jurman</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Semisupervised learning for molecular profiling</article-title>
<source>IEEE/ACM Trans. Comput. Biol. Bioinform.</source>
<year>2005</year>
<volume>2</volume>
<fpage>110</fpage>
<lpage>118</lpage>
<pub-id pub-id-type="pmid">17044176</pub-id>
</element-citation>
</ref>
<ref id="b35">
<label>35</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Block</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Paern</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Hullermeier</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Sanschagrin</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Sotriffer</surname>
<given-names>C.A.</given-names>
</name>
<name>
<surname>Klebe</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms</article-title>
<source>Proteins</source>
<year>2006</year>
<volume>65</volume>
<fpage>607</fpage>
<lpage>622</lpage>
<pub-id pub-id-type="pmid">16955490</pub-id>
</element-citation>
</ref>
<ref id="b36">
<label>36</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hoffmann</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Valencia</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Implementing the iHOP concept for navigation of biomedical literature</article-title>
<source>Bioinformatics</source>
<year>2005</year>
<volume>21</volume>
<fpage>ii252</fpage>
<lpage>ii258</lpage>
<pub-id pub-id-type="pmid">16204114</pub-id>
</element-citation>
</ref>
<ref id="b37">
<label>37</label>
<element-citation publication-type="confproc">
<person-group person-group-type="author">
<name>
<surname>Mladenic</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Brank</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Grobelnik</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Milic-Frayling</surname>
<given-names>N.</given-names>
</name>
</person-group>
<person-group person-group-type="author">
<name>
<surname>Järvelin</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Allan</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Bruza</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Sanderson</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Feature selection using linear classifier weights: interaction with classifier models</article-title>
<year>2004</year>
<conf-name>27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</conf-name>
<publisher-loc>Sheffield, UK</publisher-loc>
<publisher-name>ACM Press</publisher-name>
<fpage>234</fpage>
<lpage>241</lpage>
</element-citation>
</ref>
<ref id="b38">
<label>38</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Joachims</surname>
<given-names>T.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Schölkopf</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Burges</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Smola</surname>
<given-names>A.J.</given-names>
</name>
</person-group>
<article-title>Making large-scale SVM learning practical</article-title>
<source>Advances in Kernel Methods—Support Vector Learning</source>
<year>1998</year>
<publisher-loc>Cambridge, MA</publisher-loc>
<publisher-name>MIT Press</publisher-name>
<fpage>41</fpage>
<lpage>56</lpage>
</element-citation>
</ref>
<ref id="b39">
<label>39</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Le Cun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Denker</surname>
<given-names>J.S.</given-names>
</name>
<name>
<surname>Solla</surname>
<given-names>S.A.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Touretzky</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>Optimum brain damage</article-title>
<source>Advances in Neural Information Processing Systems</source>
<year>1990</year>
<volume>2</volume>
<publisher-loc>San Francisco, CA,</publisher-loc>
<publisher-name>Morgan Kaufmann</publisher-name>
<fpage>598</fpage>
<lpage>605</lpage>
</element-citation>
</ref>
<ref id="b40">
<label>40</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boese</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Leake</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Reynolds</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Read</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Scaringe</surname>
<given-names>S.A.</given-names>
</name>
<name>
<surname>Marshall</surname>
<given-names>W.S.</given-names>
</name>
<name>
<surname>Khvorova</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Mechanistic insights aid computational short interfering RNA design</article-title>
<source>Meth. Enzymol.</source>
<year>2005</year>
<volume>392</volume>
<fpage>73</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="pmid">15644176</pub-id>
</element-citation>
</ref>
<ref id="b41">
<label>41</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mathews</surname>
<given-names>D.H.</given-names>
</name>
<name>
<surname>Turner</surname>
<given-names>D.H.</given-names>
</name>
</person-group>
<article-title>Prediction of RNA secondary structure by free energy minimization</article-title>
<source>Curr. Opin. Struct. Biol.</source>
<year>2006</year>
<volume>16</volume>
<fpage>270</fpage>
<lpage>278</lpage>
<pub-id pub-id-type="pmid">16713706</pub-id>
</element-citation>
</ref>
<ref id="b42">
<label>42</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eisenfeld</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Vajda</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Sugar</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>DeLisi</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>Constrained optimization and protein structure determination</article-title>
<source>Am. J. Physiol.</source>
<year>1991</year>
<volume>261</volume>
<fpage>C376</fpage>
<lpage>C386</lpage>
<pub-id pub-id-type="pmid">1872378</pub-id>
</element-citation>
</ref>
<ref id="b43">
<label>43</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matranga</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Tomari</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Shin</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Bartel</surname>
<given-names>D.P.</given-names>
</name>
<name>
<surname>Zamore</surname>
<given-names>P.D.</given-names>
</name>
</person-group>
<article-title>Passenger-strand cleavage facilitates assembly of siRNA into Ago2-containing RNAi enzyme complexes</article-title>
<source>Cell</source>
<year>2005</year>
<volume>123</volume>
<fpage>607</fpage>
<lpage>620</lpage>
<pub-id pub-id-type="pmid">16271386</pub-id>
</element-citation>
</ref>
<ref id="b44">
<label>44</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Far</surname>
<given-names>R.K.</given-names>
</name>
<name>
<surname>Nedbal</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Sczakiel</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Concepts to automate the theoretical design of effective antisense oligonucleotides</article-title>
<source>Bioinformatics</source>
<year>2001</year>
<volume>17</volume>
<fpage>1058</fpage>
<lpage>1061</lpage>
<pub-id pub-id-type="pmid">11724735</pub-id>
</element-citation>
</ref>
<ref id="b45">
<label>45</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Patzel</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Rutz</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Dietrich</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Koberle</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Scheffold</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Kaufmann</surname>
<given-names>S.H.E.</given-names>
</name>
</person-group>
<article-title>Design of siRNAs producing unstructured guide-RNAs results in improved RNA interference efficiency</article-title>
<source>Nat. Biotechnol.</source>
<year>2005</year>
<volume>23</volume>
<fpage>1440</fpage>
<lpage>1444</lpage>
<pub-id pub-id-type="pmid">16258545</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000570 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000570 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:1802606
   |texte=   More complete gene silencing by fewer siRNAs: transparent optimized design and biophysical signature
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:17169992" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024