Serveur d'exploration SRAS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Uncovering text mining: A survey of current work on web-based epidemic intelligence

Identifieur interne : 001033 ( Pmc/Corpus ); précédent : 001032; suivant : 001034

Uncovering text mining: A survey of current work on web-based epidemic intelligence

Auteurs : Nigel Collier

Source :

RBID : PMC:3438486

Abstract

Real world pandemics such as SARS 2002 as well as popular fiction like the movie Contagion graphically depict the health threat of a global pandemic and the key role of epidemic intelligence (EI). While EI relies heavily on established indicator sources a new class of methods based on event alerting from unstructured digital Internet media is rapidly becoming acknowledged within the public health community. At the heart of automated information gathering systems is a technology called text mining. My contribution here is to provide an overview of the role that text mining technology plays in detecting epidemics and to synthesise my existing research on the BioCaster project.


Url:
DOI: 10.1080/17441692.2012.699975
PubMed: 22783909
PubMed Central: 3438486

Links to Exploration step

PMC:3438486

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Uncovering text mining: A survey of current work on web-based epidemic intelligence</title>
<author>
<name sortKey="Collier, Nigel" sort="Collier, Nigel" uniqKey="Collier N" first="Nigel" last="Collier">Nigel Collier</name>
<affiliation>
<nlm:aff id="A1">National Institute of Informatics, Tokyo, Japan</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">22783909</idno>
<idno type="pmc">3438486</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438486</idno>
<idno type="RBID">PMC:3438486</idno>
<idno type="doi">10.1080/17441692.2012.699975</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">001033</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001033</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Uncovering text mining: A survey of current work on web-based epidemic intelligence</title>
<author>
<name sortKey="Collier, Nigel" sort="Collier, Nigel" uniqKey="Collier N" first="Nigel" last="Collier">Nigel Collier</name>
<affiliation>
<nlm:aff id="A1">National Institute of Informatics, Tokyo, Japan</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Global Public Health</title>
<idno type="ISSN">1744-1692</idno>
<idno type="eISSN">1744-1706</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>Real world pandemics such as SARS 2002 as well as popular fiction like the movie Contagion graphically depict the health threat of a global pandemic and the key role of epidemic intelligence (EI). While EI relies heavily on established indicator sources a new class of methods based on event alerting from unstructured digital Internet media is rapidly becoming acknowledged within the public health community. At the heart of automated information gathering systems is a technology called
<italic>text mining.</italic>
My contribution here is to provide an overview of the role that text mining technology plays in detecting epidemics and to synthesise my existing research on the BioCaster project.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Berry, M W" uniqKey="Berry M">M.W Berry</name>
</author>
<author>
<name sortKey="Kogan, M" uniqKey="Kogan M">M. Kogan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brownstein, J" uniqKey="Brownstein J">J. Brownstein</name>
</author>
<author>
<name sortKey="Freifeld, C" uniqKey="Freifeld C">C. Freifeld</name>
</author>
<author>
<name sortKey="Reis, B" uniqKey="Reis B">B. Reis</name>
</author>
<author>
<name sortKey="Mandl, K" uniqKey="Mandl K">K. Mandl</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buckeridge, D" uniqKey="Buckeridge D">D. Buckeridge</name>
</author>
<author>
<name sortKey="Burkom, H" uniqKey="Burkom H">H. Burkom</name>
</author>
<author>
<name sortKey="Campbell, M" uniqKey="Campbell M">M. Campbell</name>
</author>
<author>
<name sortKey="Hogan, W R" uniqKey="Hogan W">W.R. Hogan</name>
</author>
<author>
<name sortKey="Moore, A W" uniqKey="Moore A">A.W. Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chanlekha, H" uniqKey="Chanlekha H">H. Chanlekha</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaudet, H" uniqKey="Chaudet H">H. Chaudet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Doan, S" uniqKey="Doan S">S. Doan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Doan, S" uniqKey="Doan S">S. Doan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Doan, S" uniqKey="Doan S">S. Doan</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
<author>
<name sortKey="Matsuda Goodwin, R" uniqKey="Matsuda Goodwin R">R. Matsuda Goodwin</name>
</author>
<author>
<name sortKey="Conway, M" uniqKey="Conway M">M. Conway</name>
</author>
<author>
<name sortKey="Tateno, Y" uniqKey="Tateno Y">Y. Tateno</name>
</author>
<author>
<name sortKey="Ngo, Q" uniqKey="Ngo Q">Q. Ngo</name>
</author>
<author>
<name sortKey="Dien, D" uniqKey="Dien D">D. Dien</name>
</author>
<author>
<name sortKey="Kawtrakul, A" uniqKey="Kawtrakul A">A. Kawtrakul</name>
</author>
<author>
<name sortKey="Takeuchi, K" uniqKey="Takeuchi K">K. Takeuchi</name>
</author>
<author>
<name sortKey="Shigematsu, M" uniqKey="Shigematsu M">M. Shigematsu</name>
</author>
<author>
<name sortKey="Taniguchi, K" uniqKey="Taniguchi K">K. Taniguchi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
<author>
<name sortKey="Jin, L" uniqKey="Jin L">L. Jin</name>
</author>
<author>
<name sortKey="Shigematsu, M" uniqKey="Shigematsu M">M. Shigematsu</name>
</author>
<author>
<name sortKey="Dien, D" uniqKey="Dien D">D. Dien</name>
</author>
<author>
<name sortKey="Barrero, R" uniqKey="Barrero R">R. Barrero</name>
</author>
<author>
<name sortKey="Takeuchi, K" uniqKey="Takeuchi K">K. Takeuchi</name>
</author>
<author>
<name sortKey="Kawtrakul, A" uniqKey="Kawtrakul A">A. Kawtrakul</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
<author>
<name sortKey="Shigematsu, M" uniqKey="Shigematsu M">M. Shigematsu</name>
</author>
<author>
<name sortKey="Taniguchi, K" uniqKey="Taniguchi K">K. Taniguchi</name>
</author>
<author>
<name sortKey="Jin, L" uniqKey="Jin L">L. Jin</name>
</author>
<author>
<name sortKey="Mccrae, J" uniqKey="Mccrae J">J. McCrae</name>
</author>
<author>
<name sortKey="Dien, D" uniqKey="Dien D">D. Dien</name>
</author>
<author>
<name sortKey="Hung, Q" uniqKey="Hung Q">Q. Hung</name>
</author>
<author>
<name sortKey="Takeuchi, K" uniqKey="Takeuchi K">K. Takeuchi</name>
</author>
<author>
<name sortKey="Kawtrakul, A" uniqKey="Kawtrakul A">A. Kawtrakul</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Goodwin, R M" uniqKey="Goodwin R">R.M. Goodwin</name>
</author>
<author>
<name sortKey="Mccrae, J" uniqKey="Mccrae J">J. McCrae</name>
</author>
<author>
<name sortKey="Doan, S" uniqKey="Doan S">S. Doan</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Conway, M" uniqKey="Conway M">M. Conway</name>
</author>
<author>
<name sortKey="Doan, S" uniqKey="Doan S">S. Doan</name>
</author>
<author>
<name sortKey="Kawazoe, A" uniqKey="Kawazoe A">A. Kawazoe</name>
</author>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corley, C D" uniqKey="Corley C">C.D. Corley</name>
</author>
<author>
<name sortKey="Cook, D J" uniqKey="Cook D">D.J. Cook</name>
</author>
<author>
<name sortKey="Mikler, A R" uniqKey="Mikler A">A.R. Mikler</name>
</author>
<author>
<name sortKey="Singh, K P" uniqKey="Singh K">K.P Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Culotta, A" uniqKey="Culotta A">A. Culotta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Damianos, L" uniqKey="Damianos L">L. Damianos</name>
</author>
<author>
<name sortKey="Ponte, J" uniqKey="Ponte J">J. Ponte</name>
</author>
<author>
<name sortKey="Wohlever, S" uniqKey="Wohlever S">S. Wohlever</name>
</author>
<author>
<name sortKey="Reeder, F" uniqKey="Reeder F">F. Reeder</name>
</author>
<author>
<name sortKey="Day, D" uniqKey="Day D">D. Day</name>
</author>
<author>
<name sortKey="Wilson, G" uniqKey="Wilson G">G. Wilson</name>
</author>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L. Hirschman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Eysenbach, G" uniqKey="Eysenbach G">G. Eysenbach</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fayyad, U" uniqKey="Fayyad U">U. Fayyad</name>
</author>
<author>
<name sortKey="Piatetsky Shapiro, G" uniqKey="Piatetsky Shapiro G">G. Piatetsky-Shapiro</name>
</author>
<author>
<name sortKey="Smyth, P" uniqKey="Smyth P">P. Smyth</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Feldman, R" uniqKey="Feldman R">R. Feldman</name>
</author>
<author>
<name sortKey="Sanger, J" uniqKey="Sanger J">J. Sanger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fuller, S" uniqKey="Fuller S">S. Fuller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ginsberg, J" uniqKey="Ginsberg J">J. Ginsberg</name>
</author>
<author>
<name sortKey="Mohebbi, M" uniqKey="Mohebbi M">M. Mohebbi</name>
</author>
<author>
<name sortKey="Patel, R" uniqKey="Patel R">R. Patel</name>
</author>
<author>
<name sortKey="Brammer, L" uniqKey="Brammer L">L. Brammer</name>
</author>
<author>
<name sortKey="Smolinski, M" uniqKey="Smolinski M">M. Smolinski</name>
</author>
<author>
<name sortKey="Brilliant, L" uniqKey="Brilliant L">L. Brilliant</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grishman, R" uniqKey="Grishman R">R. Grishman</name>
</author>
<author>
<name sortKey="Huttunen, S" uniqKey="Huttunen S">S. Huttunen</name>
</author>
<author>
<name sortKey="Yangarber, R" uniqKey="Yangarber R">R. Yangarber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hartley, D" uniqKey="Hartley D">D. Hartley</name>
</author>
<author>
<name sortKey="Nelson, N" uniqKey="Nelson N">N. Nelson</name>
</author>
<author>
<name sortKey="Walters, R" uniqKey="Walters R">R. Walters</name>
</author>
<author>
<name sortKey="Arthury, R" uniqKey="Arthury R">R. Arthury</name>
</author>
<author>
<name sortKey="Yangarber, R" uniqKey="Yangarber R">R. Yangarber</name>
</author>
<author>
<name sortKey="Madoff, L" uniqKey="Madoff L">L. Madoff</name>
</author>
<author>
<name sortKey="Linge, Y" uniqKey="Linge Y">Y Linge</name>
</author>
<author>
<name sortKey="Mawudeku, A" uniqKey="Mawudeku A">A. Mawudeku</name>
</author>
<author>
<name sortKey="Collier, N" uniqKey="Collier N">N. Collier</name>
</author>
<author>
<name sortKey="Brownstein, J" uniqKey="Brownstein J">J. Brownstein</name>
</author>
<author>
<name sortKey="Thinus, G" uniqKey="Thinus G">G. Thinus</name>
</author>
<author>
<name sortKey="Lightfoot, N" uniqKey="Lightfoot N">N. Lightfoot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hearst, M" uniqKey="Hearst M">M. Hearst</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hirschman, L" uniqKey="Hirschman L">L. Hirschman</name>
</author>
<author>
<name sortKey="Park, J C" uniqKey="Park J">J.C. Park</name>
</author>
<author>
<name sortKey="Tsujii, J" uniqKey="Tsujii J">J. Tsujii</name>
</author>
<author>
<name sortKey="Wong, L" uniqKey="Wong L">L. Wong</name>
</author>
<author>
<name sortKey="Wu, C H" uniqKey="Wu C">C.H. Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Humphreys, B" uniqKey="Humphreys B">B. Humphreys</name>
</author>
<author>
<name sortKey="Lindberg, D" uniqKey="Lindberg D">D. Lindberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hutwagner, L" uniqKey="Hutwagner L">L. Hutwagner</name>
</author>
<author>
<name sortKey="Thompson, W" uniqKey="Thompson W">W. Thompson</name>
</author>
<author>
<name sortKey="Seeman, M G" uniqKey="Seeman M">M.G. Seeman</name>
</author>
<author>
<name sortKey="Treadwell, T" uniqKey="Treadwell T">T. Treadwell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Janson, B" uniqKey="Janson B">B. Janson</name>
</author>
<author>
<name sortKey="Spink, A" uniqKey="Spink A">A. Spink</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jones, E" uniqKey="Jones E">E. Jones</name>
</author>
<author>
<name sortKey="Patel, N" uniqKey="Patel N">N. Patel</name>
</author>
<author>
<name sortKey="Levy, M" uniqKey="Levy M">M. Levy</name>
</author>
<author>
<name sortKey="Storeygard, A" uniqKey="Storeygard A">A. Storeygard</name>
</author>
<author>
<name sortKey="Balk, D" uniqKey="Balk D">D. Balk</name>
</author>
<author>
<name sortKey="Gittleman, J" uniqKey="Gittleman J">J. Gittleman</name>
</author>
<author>
<name sortKey="Daszak, P" uniqKey="Daszak P">P. Daszak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Keller, M" uniqKey="Keller M">M. Keller</name>
</author>
<author>
<name sortKey="Freifeld, C C" uniqKey="Freifeld C">C.C. Freifeld</name>
</author>
<author>
<name sortKey="Brownstein, J S" uniqKey="Brownstein J">J.S. Brownstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kosala, R" uniqKey="Kosala R">R. Kosala</name>
</author>
<author>
<name sortKey="Blockeel, H" uniqKey="Blockeel H">H. Blockeel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lampos, V" uniqKey="Lampos V">V. Lampos</name>
</author>
<author>
<name sortKey="Cristianini, N" uniqKey="Cristianini N">N. Cristianini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lin, S" uniqKey="Lin S">S. Lin</name>
</author>
<author>
<name sortKey="Ho, J" uniqKey="Ho J">J. Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lowe, H" uniqKey="Lowe H">H. Lowe</name>
</author>
<author>
<name sortKey="Barnett, G" uniqKey="Barnett G">G. Barnett</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lyon, A" uniqKey="Lyon A">A. Lyon</name>
</author>
<author>
<name sortKey="Nunn, M" uniqKey="Nunn M">M. Nunn</name>
</author>
<author>
<name sortKey="Grossel, G" uniqKey="Grossel G">G. Grossel</name>
</author>
<author>
<name sortKey="Burgman, M" uniqKey="Burgman M">M. Burgman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Madoff, L C" uniqKey="Madoff L">L.C. Madoff</name>
</author>
<author>
<name sortKey="Woodall, J P" uniqKey="Woodall J">J.P. Woodall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mawudeku, A" uniqKey="Mawudeku A">A. Mawudeku</name>
</author>
<author>
<name sortKey="Blench, M" uniqKey="Blench M">M. Blench</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mccallum, A" uniqKey="Mccallum A">A. McCallum</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nadeau, D" uniqKey="Nadeau D">D. Nadeau</name>
</author>
<author>
<name sortKey="Sekine, S" uniqKey="Sekine S">S. Sekine</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Paquet, C" uniqKey="Paquet C">C. Paquet</name>
</author>
<author>
<name sortKey="Coulombier, D" uniqKey="Coulombier D">D. Coulombier</name>
</author>
<author>
<name sortKey="Kaiser, R" uniqKey="Kaiser R">R. Kaiser</name>
</author>
<author>
<name sortKey="Ciotti, M" uniqKey="Ciotti M">M. Ciotti</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Polgreen, P M" uniqKey="Polgreen P">P.M. Polgreen</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y. Chen</name>
</author>
<author>
<name sortKey="Pennock, D M" uniqKey="Pennock D">D.M. Pennock</name>
</author>
<author>
<name sortKey="Nelson, F D" uniqKey="Nelson F">F.D. Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Price, C" uniqKey="Price C">C. Price</name>
</author>
<author>
<name sortKey="Spackman, K" uniqKey="Spackman K">K. Spackman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rosse, C" uniqKey="Rosse C">C. Rosse</name>
</author>
<author>
<name sortKey="Mejino, J L V" uniqKey="Mejino J">J.L.V. Mejino</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Signorini, A" uniqKey="Signorini A">A. Signorini</name>
</author>
<author>
<name sortKey="Segre, A M" uniqKey="Segre A">A.M. Segre</name>
</author>
<author>
<name sortKey="Polgreen, P M" uniqKey="Polgreen P">P.M. Polgreen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soergel, D" uniqKey="Soergel D">D. Soergel</name>
</author>
<author>
<name sortKey="Lauser, B" uniqKey="Lauser B">B. Lauser</name>
</author>
<author>
<name sortKey="Liang, A" uniqKey="Liang A">A. Liang</name>
</author>
<author>
<name sortKey="Fisseha, F" uniqKey="Fisseha F">F. Fisseha</name>
</author>
<author>
<name sortKey="Keizer, J" uniqKey="Keizer J">J. Keizer</name>
</author>
<author>
<name sortKey="Katz, S" uniqKey="Katz S">S. Katz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Steinberger, R" uniqKey="Steinberger R">R. Steinberger</name>
</author>
<author>
<name sortKey="Flavio, F" uniqKey="Flavio F">F. Flavio</name>
</author>
<author>
<name sortKey="Van Der Goot, E" uniqKey="Van Der Goot E">E. van der Goot</name>
</author>
<author>
<name sortKey="Best, C" uniqKey="Best C">C. Best</name>
</author>
<author>
<name sortKey="Von Etter, P" uniqKey="Von Etter P">P. von Etter</name>
</author>
<author>
<name sortKey="Yangarber, R" uniqKey="Yangarber R">R. Yangarber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Swanson, D R" uniqKey="Swanson D">D.R. Swanson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tolentino, H" uniqKey="Tolentino H">H. Tolentino</name>
</author>
<author>
<name sortKey="Kamadjeu, R" uniqKey="Kamadjeu R">R. Kamadjeu</name>
</author>
<author>
<name sortKey="Fontelo, P" uniqKey="Fontelo P">P. Fontelo</name>
</author>
<author>
<name sortKey="Liu, F" uniqKey="Liu F">F. Liu</name>
</author>
<author>
<name sortKey="Matters, M" uniqKey="Matters M">M. Matters</name>
</author>
<author>
<name sortKey="Pollack, M" uniqKey="Pollack M">M. Pollack</name>
</author>
<author>
<name sortKey="Madoff, L" uniqKey="Madoff L">L. Madoff</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Torii, M" uniqKey="Torii M">M. Torii</name>
</author>
<author>
<name sortKey="Yin, L" uniqKey="Yin L">L. Yin</name>
</author>
<author>
<name sortKey="Nguyen, T" uniqKey="Nguyen T">T. Nguyen</name>
</author>
<author>
<name sortKey="Mazumdar, C T" uniqKey="Mazumdar C">C.T. Mazumdar</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H. Liu</name>
</author>
<author>
<name sortKey="Hartlet, D M" uniqKey="Hartlet D">D.M. Hartlet</name>
</author>
<author>
<name sortKey="Nelson, N P" uniqKey="Nelson N">N.P. Nelson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vaillant, L" uniqKey="Vaillant L">L. Vaillant</name>
</author>
<author>
<name sortKey="Nys, J" uniqKey="Nys J">J. Nys</name>
</author>
<author>
<name sortKey="Gastellu Etchegorry, M" uniqKey="Gastellu Etchegorry M">M. Gastellu-Etchegorry</name>
</author>
<author>
<name sortKey="Barboza, P" uniqKey="Barboza P">P. Barboza</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vaillant, L" uniqKey="Vaillant L">L. Vaillant</name>
</author>
<author>
<name sortKey="Barboza, P" uniqKey="Barboza P">P. Barboza</name>
</author>
<author>
<name sortKey="Arthur, R R" uniqKey="Arthur R">R.R. Arthur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Von Etter, P" uniqKey="Von Etter P">P. von Etter</name>
</author>
<author>
<name sortKey="Huttunen, S" uniqKey="Huttunen S">S. Huttunen</name>
</author>
<author>
<name sortKey="Vihavainen, A" uniqKey="Vihavainen A">A. Vihavainen</name>
</author>
<author>
<name sortKey="Vourinen, M" uniqKey="Vourinen M">M. Vourinen</name>
</author>
<author>
<name sortKey="Yangarber, R" uniqKey="Yangarber R">R. Yangarber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wagner, M M" uniqKey="Wagner M">M.M. Wagner</name>
</author>
<author>
<name sortKey="Tsui, F C" uniqKey="Tsui F">F.C. Tsui</name>
</author>
<author>
<name sortKey="Espino, J U" uniqKey="Espino J">J.U. Espino</name>
</author>
<author>
<name sortKey="Dato, V M" uniqKey="Dato V">V.M. Dato</name>
</author>
<author>
<name sortKey="Sittig, D F" uniqKey="Sittig D">D.F. Sittig</name>
</author>
<author>
<name sortKey="Caruana, R A" uniqKey="Caruana R">R.A. Caruana</name>
</author>
<author>
<name sortKey="Mcginnis, L F" uniqKey="Mcginnis L">L.F. McGinnis</name>
</author>
<author>
<name sortKey="Deerfield, D W" uniqKey="Deerfield D">D.W Deerfield</name>
</author>
<author>
<name sortKey="Druzdzel, M J" uniqKey="Druzdzel M">M.J. Druzdzel</name>
</author>
<author>
<name sortKey="Fridsma, D B" uniqKey="Fridsma D">D.B. Fridsma</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilks, Y" uniqKey="Wilks Y">Y. Wilks</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zamite, J" uniqKey="Zamite J">J. Zamite</name>
</author>
<author>
<name sortKey="Silva, F A B" uniqKey="Silva F">F.A.B. Silva</name>
</author>
<author>
<name sortKey="Couto, F" uniqKey="Couto F">F. Couto</name>
</author>
<author>
<name sortKey="Silva, M J" uniqKey="Silva M">M.J. Silva</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Glob Public Health</journal-id>
<journal-id journal-id-type="iso-abbrev">Glob Public Health</journal-id>
<journal-id journal-id-type="publisher-id">rgph</journal-id>
<journal-title-group>
<journal-title>Global Public Health</journal-title>
</journal-title-group>
<issn pub-type="ppub">1744-1692</issn>
<issn pub-type="epub">1744-1706</issn>
<publisher>
<publisher-name>Taylor & Francis</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">22783909</article-id>
<article-id pub-id-type="pmc">3438486</article-id>
<article-id pub-id-type="doi">10.1080/17441692.2012.699975</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Uncovering text mining: A survey of current work on web-based epidemic intelligence</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Collier</surname>
<given-names>Nigel</given-names>
</name>
<xref ref-type="aff" rid="A1"></xref>
<xref ref-type="corresp" rid="COR1">*</xref>
</contrib>
<aff id="A1">National Institute of Informatics, Tokyo, Japan</aff>
</contrib-group>
<author-notes>
<corresp id="COR1">
<label>*</label>
Email:
<email>collier@nii.ac.jp</email>
</corresp>
</author-notes>
<pub-date pub-type="epub">
<day>11</day>
<month>7</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="ppub">
<month>8</month>
<year>2012</year>
</pub-date>
<volume>7</volume>
<issue>7</issue>
<fpage>731</fpage>
<lpage>749</lpage>
<history>
<date date-type="received">
<day>20</day>
<month>10</month>
<year>2011</year>
</date>
<date date-type="rev-recd">
<day>6</day>
<month>3</month>
<year>2012</year>
</date>
</history>
<permissions>
<copyright-statement>© 2012 Taylor & Francis</copyright-statement>
<copyright-year>2012</copyright-year>
<license license-type="open-access" xlink:href="http://www.informaworld.com/mpp/uploads/iopenaccess_tcs.pdf">
<license-p>This is an open access article distributed under the
<ext-link ext-link-type="uri" xlink:href="http://www.informaworld.com/mpp/uploads/iopenaccess_tcs.pdf">Supplemental Terms and Conditions for iOpenAccess articles published in Taylor & Francis journals</ext-link>
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract>
<p>Real world pandemics such as SARS 2002 as well as popular fiction like the movie Contagion graphically depict the health threat of a global pandemic and the key role of epidemic intelligence (EI). While EI relies heavily on established indicator sources a new class of methods based on event alerting from unstructured digital Internet media is rapidly becoming acknowledged within the public health community. At the heart of automated information gathering systems is a technology called
<italic>text mining.</italic>
My contribution here is to provide an overview of the role that text mining technology plays in detecting epidemics and to synthesise my existing research on the BioCaster project.</p>
</abstract>
<kwd-group>
<kwd>natural language processing</kwd>
<kwd>BioCaster</kwd>
<kwd>text mining</kwd>
<kwd>artificial intelligence</kwd>
<kwd>ontologies</kwd>
<kwd>evaluation</kwd>
<kwd>web-based discovery</kwd>
<kwd>social media</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>Introduction</title>
<p>Epidemic intelligence (EI) is the early identification, assessment and verification of potential public health hazards (
<xref ref-type="bibr" rid="R41">Paquet
<italic>et al.</italic>
2006</xref>
) and the timely dissemination of alerts to appropriate stakeholders. The discipline includes both indicator surveillance techniques such as sentinel networks of physicians as well as event techniques that gather data from Internet-based digital news media (
<xref ref-type="bibr" rid="R24">Hartley
<italic>et al</italic>
. 2010</xref>
) as well as official sources such as World Health Organisation (WHO) alerts. Event techniques, in particular, with their emphasis on sifting through large volumes of dynamically changing unstructured data, lie at the crossroads where public health and informatics intersect. The technological discipline that has grown from this and similar interactions is called text mining (
<xref ref-type="bibr" rid="R25">Hearst 1999</xref>
). Text mining is a relatively new human language processing technology that aims to meet the knowledge discovery needs of professionals struggling under pressure of information overload, be it from the need to find facts and opinions on the Internet or making new discoveries in literature databases like PubMed's Medline (
<xref ref-type="bibr" rid="R48">Swanson 1986</xref>
). Text mining aims to discover novel information in a timely manner from large-scale text collections by developing high performance algorithms for sourcing and converting unstructured textual data to a machine understandable format and then filtering this according to the needs of its users. In later stages, text mining systems perform domain analysis (e.g., to determine topical details or identify aberrations from past norms) and deliver results in customised forms so that users can rapidly synthesise situations of interest (
<xref ref-type="bibr" rid="R20">Feldman and Sanger 2006</xref>
).</p>
<p>Whilst dictionary-based search techniques certainly have their role to play, text mining usually goes far beyond keyword searching used by traditional search engines to find needles in the proverbial haystack. Rather the task can be characterised as a race to find a needle with a particular colour, weight and length. Uncovering documents on the topic of malaria for example, is no guarantee that the information contained in them is relevant to discovering a new epidemic. What is needed is to condense the facts contained in the document into a fixed format — an event frame — that embodies all aspects of interest to the expert. Is there a case reported, what are the symptoms and how severe are they? Where and when did the event happen? By incorporating sophisticated knowledge models, text mining aims to understand the meaning — the semantics — of texts, albeit in a limited area of human expertise.</p>
<p>While text mining has application in many real life scenarios as diverse as business intelligence, patent searching and market surveying, my focus here will be to highlight its contribution to the alerting of public health hazards in the online media and to briefly categorise the relevant methods and resources available. I conclude this article by discussing possible future trends and research issues.</p>
</sec>
<sec id="s2">
<title>Background</title>
<p>As shown by Hartley
<italic>et al.'s</italic>
survey paper (2010), event-driven surveillance systems are now widely used by national and trans-national public health organisations such as the WHO, the Centers for Disease Control and Prevention (CDC) and the European Centre for Disease Prevention and Control (ECDC), Public Health Agency of Canada (PHAC) and many other agencies. In November 2002, at the start of the SARS epidemic, the Global Public Health Intelligence Network (GPHIN) system (
<xref ref-type="bibr" rid="R38">Mawudeku and Blench 2006</xref>
) at PHAC was among the earliest, along with the ProMED network (
<xref ref-type="bibr" rid="R37">Madoff and Woodall 2005</xref>
), to provide early warning of the impending near-pandemic starting in Guandong Province in Southern China. During the A(H1N1) influenza pandemic in 2009, a number of systems are credited with the timely discovery of early events including MedISys (
<xref ref-type="bibr" rid="R47">Steinberger
<italic>et al.</italic>
2008</xref>
), Veratect (
<xref ref-type="bibr" rid="R56">Wikipedia 2009</xref>
), HealthMap (
<xref ref-type="bibr" rid="R2">Brownstein
<italic>et al.</italic>
2008</xref>
) and BioCaster (
<xref ref-type="bibr" rid="R10">Collier
<italic>et al.</italic>
2008</xref>
). Tools such as Riff from InSTEDD (
<xref ref-type="bibr" rid="R21">Fuller 2010</xref>
) were used to enhance decision support by integrating signals from virtual teams of experts with multiple streams of data from EI systems such as EpiSpider (
<xref ref-type="bibr" rid="R50">Tolentino
<italic>et al</italic>
. 2007</xref>
), SMS and electronic medical records in OpenMRS. Additionally, the MEDCollector system aims to integrate multiple Web-based sources (
<xref ref-type="bibr" rid="R58">Zamite
<italic>et al.</italic>
2010</xref>
). Of historical interest are two early systems: Proteus-Bio (
<xref ref-type="bibr" rid="R23">Grishman
<italic>et al</italic>
. 2002</xref>
) and MiTAP (
<xref ref-type="bibr" rid="R17">Damianos
<italic>et al</italic>
. 2002</xref>
).</p>
<p>
<xref ref-type="fig" rid="F1">Figure 1</xref>
illustrates the range of services available in the BioCaster EI system, produced by an international team based in Japan. As an example of the power of semantics driven text mining considers the following scenario. A public health expert is interested in finding out about a possible fatal case of person-to-person transmission of A(H5N1) in a family in Thailand. The expert who is in the field logs into a public Web portal on her smartphone and enters A(H5N1) as the search term along with
<italic>Thailand,</italic>
the date range of interest and requests only English language news articles. Internally the system recognises that the first term is an English variant of an index term in its disease ontology
<italic>(highly pathogenic H5N1 avian influenza).</italic>
The search is performed over thousands of possible events stored in the database but the results do not appear relevant to the expert's need. The system then offers the user the choice of searching using the disease symptoms. The user selects to search using symptoms such as
<italic>cough</italic>
,
<italic>high fever, pneumonia</italic>
,
<italic>acute respiratory distress</italic>
and all their synonyms. This time an article is found but the report is already two weeks out of date and missing some vital pieces of information about the name of the district and hospital. The user then chooses to search the Thai news and the search is automatically repeated using Thai term equivalents. A structured table is produced summarising each event in English with a flag indicating high priority items. The expert then finds the event that she is searching for and initiates a risk analysis procedure by transferring the event data to a secure watchboard for sharing with colleagues. In summary, the key component in this system is the analyst herself, but the technology has enabled her to increase her productivity by rapidly gaining insight into the context of a cluster outbreak so she can help her colleagues make a more informed decision. The EI system has enabled her to supplement whatever indicator-based information sources might have been available to her and to communicate better with her human network of contacts. Though I do not claim that mining the Web for reports is the only viable solution to EI, it is possible that without this service the expert might initially have had to rely on word-of-mouth, circulated news clippings or hit-and-miss ad hoc searches.</p>
<fig id="F1" position="float">
<label>Figure 1.</label>
<caption>
<p>The BioCaster portal (
<ext-link ext-link-type="uri" xlink:href="http://born.nii.ac.jp">http://born.nii.ac.jp</ext-link>
) is a 24/7 system designed to deliver a variety of methods for enhanced access to epidemic events reported in news and social media.</p>
</caption>
<graphic xlink:href="rgph7_731_f1"></graphic>
</fig>
<p>The aforementioned scenario represents the high-end of automated EI systems but is feasible by fully applying today's technology. The availability of Web 2.0 services such as mapping (e.g., Google Maps
<xref ref-type="fn" rid="FN1">
<sup>1</sup>
</xref>
/Bing Maps
<xref ref-type="fn" rid="FN2">
<sup>2</sup>
</xref>
), news aggregation (e.g., Google News
<xref ref-type="fn" rid="FN3">
<sup>3</sup>
</xref>
), photo sharing (e.g., Flickr
<xref ref-type="fn" rid="FN4">
<sup>4</sup>
</xref>
), video sharing (e.g., YouTube
<xref ref-type="fn" rid="FN5">
<sup>5</sup>
</xref>
), social media (e.g., Twitter
<xref ref-type="fn" rid="FN6">
<sup>6</sup>
</xref>
), text mining services (e.g., Open Calais
<xref ref-type="fn" rid="FN7">
<sup>7</sup>
</xref>
) and data converters (e.g., Google Translate
<xref ref-type="fn" rid="FN8">
<sup>8</sup>
</xref>
) along with traditional Linux-Apache-MySQL-Python (LAMP) architectures has made it possible to rapidly and cheaply deploy systems that can ingest, filter and visualise news data and individual reports posted on microblogging sites like Twitter. As I illustrated in the example, high-end systems combine such generic services into so-called Web 2.0
<italic>mashups</italic>
together with specialised knowledge of the domain in order to reduce ambiguity and increase precision. Interfaces often employ web-mapping services such as Google Maps to organise data simply across time and space. Users can then explore domain-specific relations, drill down, aggregate across events and communicate their findings and interpretations to colleagues.</p>
<p>Text mining services running on the back-end of such systems incorporate a rich fusion of technologies from natural language processing, machine translation (MT), ontologies and reasoning. The challenges to these technologies are to make accurate interpretations of massive volumes of multilingual text in near real-time and then make judgements about whether the detected events violate domain norms. Seemingly innocuous contexts such as vaccination campaigns, bursts of media interest in politicians/pop idols such as
<italic>Obama Fever/Bieber Fever</italic>
, and vague reports of mystery illnesses are all challenge areas for automated text understanding. Trying to see through the fog of media interest to extrapolate case counts is also a challenge area complicated by the seeming lack of correlation with published news reports.</p>
<p>In the remainder of this article I will look in more detail at some of the issues surrounding text mining services which lie at the heart of semantic data extraction from free text at the same time as synthesising my group's research in this area over the last six years.</p>
</sec>
<sec id="s3">
<title>Core technologies</title>
<p>In this section I aim to give a broad impression of the automated technologies involved in text mining for EI. Events start with the biology in the real world and then through a process we still know too little about, media organisations report some of these events in digital form. From this point text mining systems have a chance to pick up the story in a trawl of the Web and convert the free text data into a structured event frame for sharing (see
<xref ref-type="table" rid="T1">Table 1</xref>
). The news story as a structured event frame is then analysed using both statistics and human analysts. This might lead to the event being flagged as an immediate alert for verification, put on a watch list or archived for future reference.</p>
<table-wrap id="T1" position="float">
<label>Table 1.</label>
<caption>
<p>Summary of steps in text mining systems for epidemic intelligence.</p>
</caption>
<table frame="hsides" rules="groups">
<tbody>
<tr>
<td rowspan="1" colspan="1">
<italic>Data ingestion</italic>
is usually the first stage with a variety of textual sources such as emails, homepages, Really Simple Syndication (RSS) feeds, Microsoft office files and Portable Document Format (PDF) documents.
<break></break>
<italic>Data cleaning</italic>
is vital in practice to remove unwanted noise from the text (such as advertisements or links to unrelated news stories) and to join together broken sentences. At this stage systems often try to breakdown large documents that talk about multiple topics into separate sections in a process called Zoning in order to remove noise or reclassify the document (
<xref ref-type="bibr" rid="R4">Chanlekha
<italic>et al.</italic>
2010</xref>
).
<break></break>
<italic>Data triage</italic>
assigns the document a topic category for either trashing — in the case of nonrelevant documents — or subsequent processing using detailed fact extraction. At this stage redundant information — multiple reports of the same event — are detected through document clustering. This stage is also intended to remove the most obvious true negatives but systems may struggle to handle the more subtle cases on the borderline of their task definitions leading to high numbers of false positives.
<break></break>
<italic>Fact extraction</italic>
obtains structured information about an event such as the name of the disease, the type of agent, the number of victims and time and location where the event happened. With this information the computer can then begin to answer questions such as what happened, to who, where and when.
<break></break>
<italic>Ranking</italic>
is done by applying rules on the results of earlier stages of processing. High-end systems will use sophisticated statistical analysis to assign an alerting level based on a comparison of aggregated data in the present and past. In practice, this is often the most difficult stage for systems to perform automatically with high levels of accuracy.
<break></break>
<italic>Human judgement</italic>
is a key stage in the process. It is almost always needed to understand what is abnormal, to discovery rare events that the system may have missed, to make the final decision about vague reports and to link together disparate events. The limitations of the system will be most visible to the user at this stage and they have to apply their own judgments to correct for nuances of meaning that are clear to people but opaque to the computer software. Human analytical skills will also be able to discovery regularities in the data that can lead them to investigate new paths not available to current automated approaches.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>While my focus is on automated methods, human users naturally have a vital role to play at many levels: (1) skilled human analysts perform risk analysis and verification, (2) the general public can help suggest or rate reports in a process called crowdsourcing, e.g., in HealthMap and (3) users of social media sites can comment on their own health conditions on open access social media sites such as Twitter which can be aggregated for trend detection, e.g., in BioCaster's DIZIE project (
<xref ref-type="bibr" rid="R7">Collier and Doan 2011</xref>
).</p>
<sec id="s4">
<title>Data sourcing</title>
<p>Whilst accurate statistics are hard to find, the World Wide Web (Web) is now one of the primary information sources for people seeking information (
<xref ref-type="bibr" rid="R29">Janson and Spink 2006</xref>
). Anyone with Web browsing software has almost instant low-cost access to an extensive range of electronic news reports, blogs, search, academic bulletins, etc. EI systems can tap into this data in a variety of ways.</p>
<p>The lowest cost option for computers to systematically work through this wealth of information is to harness a Web crawler. When pointed at a list of news sites this software will systematically trawl the links and download any pages that are new. Such an approach though incurs a hidden cost in the maintenance of software to decode the HTML template for each Web site so that informative content can be separated from non-relevant content such as metadata, adverts, images, headlines for other stories and hyperlinks. Given the huge variety of templates and their constant revision the manual effort in maintaining such software is considerable. Several groups have developed generic content discovery algorithms based on heuristic rules and statistical models, e.g., (
<xref ref-type="bibr" rid="R34">Lin and Ho 2002</xref>
), but ready to use software may be difficult to find in the public domain.</p>
<p>A more efficient approach to locating news is to use the power of really simple syndication (RSS) feeds — syndicated news provided in a structured XML format. This option allows EI systems to regularly poll news servers, pull-out links to new stories and download their content. The issue of content discovery on the news page is still a problem, though.</p>
<p>Although freely available public news aggregators such as Google News and Yahoo News have access to a very wide range of sources, for mission critical systems as well as to ensure coverage, several EI systems have contracts with private news aggregation companies such as Factiva and LexisNexis. These companies offer the widest possible range of sources across a variety of languages with clean content. A practical question for system builders is to ensure quality of geographic coverage. This is not always so simple to achieve given the inherent biases in each media source.</p>
</sec>
<sec id="s5">
<title>Text analysis</title>
<p>Once news articles have been captured, the first stage of semantic analysis is to filter them for topical relevancy. The techniques used here that have enjoyed the most success are usually data driven based either on supervised (
<xref ref-type="bibr" rid="R14">Conway
<italic>et al</italic>
. 2009</xref>
), semisupervised (
<xref ref-type="bibr" rid="R51">Torii
<italic>et al.</italic>
2011</xref>
) or unsupervised machine learning. These techniques are distinguished by how much use they make of pre-classified example data.</p>
<p>Text mining systems are designed around a clearly defined task specification such as a case definition. For example, ‘Identify all infectious disease outbreak reports that contain evidence for human to human transmission’, or ‘Identify all events consist with the International Health Regulation Annex 2 Decision Instrument’.</p>
<p>To convert the unstructured data from a Web document into a structured event frame the computer requires knowledge about the syntactic and semantic structure of the language as well as the target output structure. This requirement tends to make text mining a language and domain-specific technology requiring interdisciplinary collaboration to develop system rulebooks. Building expert knowledge into a computer system for a specific task is economical only if the text collection is very large — such as the Web — and the nature of the information being found makes it very valuable to users. In addition to custom-built EI systems such as BioCaster, HealthMap, Epispider and MediSys, several private companies market generic text mining solutions including SAS, SPSS, Nstein and LexisNexis. Widely used open source toolkits include NLTK
<xref ref-type="fn" rid="FN9">
<sup>9</sup>
</xref>
, the R project's text mining package
<xref ref-type="fn" rid="FN10">
<sup>10</sup>
</xref>
and Sheffield University's GATE project.
<xref ref-type="fn" rid="FN11">
<sup>11</sup>
</xref>
</p>
<p>For computers to extract high quality information from text requires some degree of linguistic understanding. Systems typically require two sets of knowledge — domain knowledge that show the classes of objects of interest and their relationships and the patterns that show how these relationships are realised in the language of an actual text.</p>
<p>Most text mining systems start with a specialised module for recognising the names of important entities in the text — a process called named entity recognition (NER) (
<xref ref-type="bibr" rid="R40">Nadeau and Sekine 2007</xref>
), which can be done using either data driven techniques such as support vector machines (SVMs) or rule-based techniques. We illustrate this with an example from the BioCaster system's rule book which has the following pattern:
<disp-quote>
<p>D21:- name(disease) {list(%virus) ‘outbreak’}</p>
</disp-quote>
</p>
<p>In the language of SRL (
<xref ref-type="bibr" rid="R4">Collier
<italic>et al.</italic>
2010</xref>
) this rule indexed as D21 identifies objects of type DISEASE. It states that a sequence of words should be labelled as a DISEASE type if it matches to an entry in the virus list and is followed by the string ‘outbreak’. The output of this rule is to insert information into the text in the form of inline XML annotation for use in later processing steps. For example, the text ‘The AH1N1 outbreak occurred in communities across the region’ would be recognised internally as ‘The AH1N1 outbreak occurred across the region’. Following from NER is usually a stage of normalisation so that surface forms of names get linked to a unique identifier in a dictionary or ontology (i.e., a structured conceptual representation of the terms and relationships in the domain).</p>
<p>In SRL more sophisticated rules can be made to identify relations consisting of one or more objects like DISEASE, VIRUS, PERSON, SYMPTOM, ORGANIZATION, LOCATION and so on. For example:
<disp-quote>
<p>FW99: farm_worker(‘true’):- ‘death’ ‘of’ name*(person, P) {list(@farming_occupation)}</p>
</disp-quote>
</p>
<p>Rule FW99 is another string matching rule that looks for sequences of words showing the death of farm workers. If the rule matches then it outputs ‘farm_worker(“true”)’, i.e., the left hand side of the ‘:-’. The rule states that the string must match with a PERSON type containing a farming occupation listed in the dictionary such as abattoir workers, breeders, livestock handlers, veterinarians, ranchers etc. So, for example, the text ‘The ministry announced the death of 2 slaughterhouse workers from the virus’ would successfully match this rule.</p>
<p>While regular expression patterns like SRL can be quite effective, they are vulnerable to sensitivity constraints due to the large variety of surface patterns that need to be explicitly modelled. As in biomedical applications, more robust solutions are expected to come from full sentence parsing to uncover grammatical relations between words and phrases. Full parsing will also help to capture subtle aspects of the event such as polarity, certainty and temporality that can be hard to capture using regular expressions. However, full parsing may come at a cost to computational efficiency, potentially creating a bottleneck when timeliness is one key criterion for usability. This is particularly important during bursts of information that can occur during major epidemics.</p>
<p>Understanding time and location are key foundations for high quality EI (
<xref ref-type="bibr" rid="R4">Chanlekha
<italic>et al.</italic>
2010</xref>
). In practice, though, there are many pitfalls. Document time stamps for example, are not necessarily the best guides to deciding on the time when a reported event took place. For example a document dated 2 October 2008 might report ‘Last Tuesday avian influenza virus A was identified as the cause of an outbreak in two southern provinces of Viet Nam’. We would expect the text mining system to record the date of the case as the 30 September 2008.</p>
<p>In practice location names are also often highly ambiguous. For example, an equine influenza outbreak in Camden during the summer of 2007 would have to be identified as Camden near Sydney, Australia and not as Camden in London, UK. Equally confusing for automated systems is the fact that an outbreak of Venezuelan haemorrhagic fever might not be taking place in Venezuela and an outbreak of a food-borne disease from eating Satsuma's would probably have no relation to Japan. Much research has taken place on identifying geo-political named entities such as countries and cities in general news texts, e.g., (
<xref ref-type="bibr" rid="R39">McCallum and Li 2003</xref>
), with performance for English place names generally in the 1980s to low 1990s F-score on unseen texts, where F-score is the harmonic mean of recall and precision.
<xref ref-type="bibr" rid="R31">Keller
<italic>et al</italic>
. (2009)</xref>
provide a review of the issues for epidemic surveillance and present a new method for tackling the identification of a disease outbreak location based on neural networks trained on surface feature patterns in a window around geo-entity expressions. The resulting 64% F-score appears at first sight to be lower than we might have expected. The performance gap may be due to the variety of contexts in which geographic expressions for disease outbreaks occur and the lack of training data available. Contextual information for deciding on whether one of many mentioned locations mentioned in a report is the actual disease outbreak location is often dependent on contextual clues outside the scope of a single sentence. For example, a local hospital may be mentioned as the place of treatment and the attributable source may be mentioned as a health ministry spokesperson from the country's government. Since local names tend to be highly ambiguous both within and across countries, an EI system has a high chance of making a mistake in geocoding the event based only on this first piece of information. It requires a combination of clues from the health ministry name and the local name to fix the actual specific location.</p>
<p>Because geo-temporal disambiguation is so difficult and because of the variety of ways in which cases are described across different news reports, it is challenging to completely de-duplicate news reports about events and obtain accurate tracking of case counts. An approach that might begin to tackle this was the spatio-temporal event calculus proposed by
<xref ref-type="bibr" rid="R5">Chaudet (2006)</xref>
. Although the knowledge representation seems stable and repeatable, it is not clear yet how easily this can be operationalised.</p>
</sec>
<sec id="s6">
<title>Ontologies</title>
<p>It is clear that some a priori knowledge over and above that supplied in the media report is necessary for the text mining system to make sense of the report, e.g., to resolve sense ambiguities such as knowing that A(H1N1) influenza, swine flu and swine flu A all refer to the same disease, understand idiomatic expressions such as Venezuelan Hemorrhagic Fever and to exclude implausible contexts such as vaccination campaigns. Where does domain knowledge come from? Working systems often incorporate a fusion of knowledge both statistical and symbolic. For example,
<xref ref-type="bibr" rid="R31">Keller
<italic>et</italic>
al.'s (2009)</xref>
use of a neural network to detect the focus location of the outbreak is a statistical approach, and BioCaster's SRL rules for resolving the focus disease agent is a symbolic approach. Here I focus on the role of ontologies in EI, which is to help automate human understanding of key concepts and relations so that the desired level of filtering accuracy can be achieved.</p>
<p>One of the most important functions of ontologies is to decide how alike two concepts are to each other. Biomedical ontologies minimally contain lists of terms and their human definitions, which are then given unique identifiers and arranged into classes with common properties. These classes are then structured according to principles of classification such as the subsumption (is a) relation. For example, the Medical Subject Headings (MeSH) ontology (
<xref ref-type="bibr" rid="R35">Lowe and Barnett 1994</xref>
) says that the term ‘influenza, human’ is a type of
<italic>respiratory tract infection.</italic>
Other widely known examples of ontologies for human understanding include SNOMED Clinical Terms (
<xref ref-type="bibr" rid="R43">Price and Spackman 2000</xref>
), the Foundation Model of Anatomy (
<xref ref-type="bibr" rid="R44">Rosse and Mejino 2008</xref>
) the Unified Medical Language System (UMLS) (
<xref ref-type="bibr" rid="R27">Humphreys and Lindberg 1993</xref>
) and AGROVOC (
<xref ref-type="bibr" rid="R46">Soergel
<italic>et al.</italic>
2004</xref>
). Community efforts such as the Open Biomedical Ontologies (
<xref ref-type="bibr" rid="R49">OBO 2011</xref>
) have come a long way in recent years towards forming standards for ontology construction, highlighting common pitfalls in their construction and promoting inter-operability.</p>
<p>In the domain of EI it is necessary to identify and link term classes such as DISEASE, SYMPTOM and SPECIES in order to separate reports about human, animal or crop diseases. We might also include a CHEMICAL class if knowledge of chemical or nucleotide agents were important. In order to capture geospatial reference we also need to define types for COUNTRY, PROVINCE and CITY. This would help to integrate information from the system with geospatial browsers such as Google Maps or NASA's World Wind.</p>
<p>Currently there are few dedicated publicly available ontologies that contain all the terms necessary for EI systems. In addition to the general purpose biomedical ontologies mentioned earlier, the commercial knowledge management tools Gideon,
<xref ref-type="fn" rid="FN12">
<sup>12</sup>
</xref>
has extensive coverage, contains a sophisticated reasoning engine and is widely used to support expert diagnosis but is closed source and not designed to interoperate with automated text analytics. Within open source resources, we have provided the BioCaster ontology (BCO) version 3 (
<xref ref-type="bibr" rid="R4">Collier
<italic>et al.</italic>
2010</xref>
) in the OWL Semantic Web language to support automated reasoning across technical and laymen's terms in 12 languages for 336 conditions. The BCO supports a variety of relation types including term equivalence across languages, preferred term, causality between agents and conditions and between agents and symptoms. For example, if we find that a news document contains the disease ‘chicken pox’ then the ontology informs the system that the causal agent is the ‘varicella-zoster virus’, or if the news article mentions a disease outbreak of ‘swine flu’ and another of ‘swine influenza A’ then the ontology can provide a unifying root term of ‘A(H1N1) influenza’. Another application for the ontology is in helping to choose appropriate levels of generality for disease names. For example, if the document mentions both ‘Highly pathogenic H5N1 avian influenza’ and ‘avian influenza’ then the event will be designated as the more specific of the two. In addition to human diseases it also covers animal diseases where the disease is a potential zoonotic threat to humans or can have severe economic consequences for society.</p>
<p>As a final note it is important to consider how to keep the ontology up to date. Although disease vocabulary is relatively stable, when new types of diseases strike such as ‘swine flu’ during 2009 the nomenclature can evolve surprisingly rapidly. In the future we would like to explore community efforts to harness expertise for solving this issue.</p>
</sec>
<sec id="s7">
<title>Machine translation</title>
<p>Given the very large volumes of media reports and the variety of human languages in which they are written, high throughput MT (
<xref ref-type="bibr" rid="R57">Wilks 2009</xref>
) is usually required in order to make sense of news events in the timeliest manner. MT systems have been in widespread use for many years, e.g., the Systran system used by the European Commission, or Yahoo!'s Babelfish used for Web page translation. The fidelity of MT output generally varies from high for cognate language pairs such as English-French to mediocre for non-cognate pairs such as English-Japanese or English-Arabic. One issue complicating the choice of MT system is that it is not clear yet how quality of output impacts on the final performance of the EI system although we have seen in our own evaluations that MT output has proven useful for improving the timeliness and sensitivity of alerting (
<xref ref-type="bibr" rid="R18">Eysenbach 2002</xref>
).</p>
<p>A variety of general purpose MT systems exist from commercial companies such as Google Translate or Microsoft's Bing Translate each allowing a wide range of language pairs at a cost that is typically based on the volume of text translated per month. Systems that can be installed and run on a local server such as the commercial Systran or the freely available MOSES have at least one advantage over general purpose MT systems which is that they can usually be customised to the domain vocabulary if sufficient quantities of example texts exist in both the source and target languages.</p>
<p>Machine translation is often employed before text analysis — translating all languages to a common target language such as English so that rule books do not need to be developed and maintained for each language. MT is also useful to help analysts make a first pass at understanding the topicality and significance of news reports. However, in the absence of fully automated high quality MT, end users will need access to bilingual analysts who can interpret the content and context of the source language directly.</p>
</sec>
<sec id="s8">
<title>Aberration detection</title>
<p>Being able to detect a news report about a public health event is not enough to make an EI system useful. In order to have value EI systems must be able to differentiate between mundane and unusual reports in a timely manner and supply this information to people who can initiate the appropriate actions. Such systems must be flexible to adapt themselves to changing patterns of diseases without any bias for a particular country or language. In practice, human experts with familiarity of the country concerned will almost always be necessary to analyse and interpret warning signals. The question for text mining researchers and users is how far can the technology be trusted to detect aberrations and what kind of aberrations are capable of automated analysis? Given that the state of the physical world with regard to disease incidence is always changing and that new pathogens are constantly evolving this is not a problem that can be tackled solely using the static ontologies I discussed earlier.</p>
<p>Detecting aberrations relies on identifying metrics that strongly correlate to the target objectives of the system designers — the discipline of infodemiology that was coined by
<xref ref-type="bibr" rid="R18">Eysenbach (2002)</xref>
. News reports push the limits of what can be achieved using early warning data because of their biases, inaccuracies and vagueness. For example, the data can be strongly driven by fear and socio-economic biases which need to be compensated for. In addition to natural language processing, making sense of underlying trends draws on several established empirical disciplines: (1) knowledge discovery in databases (
<xref ref-type="bibr" rid="R19">Fayyad
<italic>et al.</italic>
1996</xref>
) and, (2) time series analysis (
<xref ref-type="bibr" rid="R55">Wagner
<italic>et al.</italic>
2001</xref>
,
<xref ref-type="bibr" rid="R3">Buckeridge
<italic>et al.</italic>
2005</xref>
) for change point detection. Many algorithms exist in both areas that can be adapted to the task at hand and compared.</p>
<p>The first stage in modelling begins by deciding on the objectives of the system such as coverage, alerting speed or low false alarm rates. A set of features are then identified, for example, the name of the disease and the country or province where it occurred, before establishing strong temporal and spatial baselines based on aggregated counts of these features over a history period. Deviation from such baselines by a significant margin constitutes an alert. Deciding on how to calculate the baseline and deviation, e.g., using statistical process control methods, is an ongoing research topic (
<xref ref-type="bibr" rid="R3">Buckeridge
<italic>et al</italic>
. 2005</xref>
).</p>
<p>My previous work in BioCaster has looked at flagging aberrations for a broad range of diseases using features from the structured event frame, specifically the disease and country where the event took place. By using aggregated counts of news events I was able to obtain high levels of alerting performance on a range of diseases and outbreak sizes against ProMED as the silver standard baseline. I could also compare a range of models and feature types. Since the actual state of the physical world is not usually known, I considered ProMED's human moderated network to be a reasonable standard for event alerting. My comparisons of English and multilingual news (
<xref ref-type="bibr" rid="R4">Collier 2010</xref>
,
<xref ref-type="bibr" rid="R7">2011</xref>
) showed high levels of performance for the CDC's Early Aberration and Reporting System's (EARS) C2 and C3 models (
<xref ref-type="bibr" rid="R28">Hutwagner
<italic>et al</italic>
. 2003</xref>
) with a 7 day baseline and 2 day buffer period. Both algorithms showed a good balance of F-score, timeliness and false alarm rates.</p>
<p>A different approach is adopted by (
<xref ref-type="bibr" rid="R54">von Etter
<italic>et al.</italic>
2010</xref>
) who uses supervised classification on textual features using naive Bayes and SVMs to categorise outbreak events on a 0–5 scale of relevance (F-score 79.24% on SVM with an RBF-kernel).</p>
</sec>
<sec id="s9">
<title>Dissemination</title>
<p>Notifying alerts to users and other systems is the final key stage. At present no interoperable standard for message structure, semantics or vocabulary appears to have been agreed internationally among Web-based EI systems. Although standards such as the Common Alerting Protocol have been proposed, the most popular format currently in use may be GeoRSS, a lightweight XML format for syndicating links to Web content that encodes geographic information. Minimal necessary elements might include for example, a unique message identifier, the time of the message, the time of the event, a uniformly agreed name for the disease, the outbreak location, the species affected, a description of the reporting source, the degree of certainty, the level of confidentiality of the report, the status of the report (e.g., a trial exercise), message type (e.g., an update or an error notification) and a unique identifier for the event by the reporting system.</p>
</sec>
</sec>
<sec id="s10">
<title>Case study: BioCaster</title>
<sec id="s11">
<title>Background</title>
<p>BioCaster is a fully automated experimental system for near real-time 24/7 global health intelligence based at the National Institute of Informatics in Tokyo. Major goals of the research are (1) to explore advanced algorithms for the semantic annotation of documents, (2) to acquire knowledge which can empower human language technologies and (3) to investigate early alerting methods from news and open access social media signals. Analysis and validation of signals is assumed to take place downstream of the system by the community of users.</p>
<p>The concept of BioCaster (
<xref ref-type="bibr" rid="R10">Collier
<italic>et al.</italic>
2008</xref>
) began in 2006 when grant-in-aid funding from the Japan Society for the Promotion of Science enabled the construction of a core high performance system (
<xref ref-type="bibr" rid="R12">Collier
<italic>et al</italic>
. 2007</xref>
) for semantic indexing of news related to disease outbreaks. At the start BioCaster's focus was on Asia-Pacific languages due to the perceived risk of newly emerging and re-emerging health threats in the region (
<xref ref-type="bibr" rid="R30">Jones
<italic>et al.</italic>
2008</xref>
) such as highly pathogenic A(H5N1) influenza. Work therefore began in 2006 on the construction of a multilingual ontology (
<xref ref-type="bibr" rid="R11">Collier
<italic>et al</italic>
. 2006</xref>
) that would form the conceptual framework for the system — a freely available community resource containing a structured public health vocabulary.</p>
<p>The core team involved in BioCaster's development at the National Institute of Informatics is usually three or four members with expertise in computational linguistics and software engineering. In 2006, collaboration with a network of academic partners was quickly established including groups at the National Institute of Infectious Diseases (Japan), Okayama University (Japan), the National Institute of Genetics (NIG, Japan), Kasetsart University (Thailand) and the Vietnam National University (VNU, Vietnam). These groups provide expertise in software engineering, public health, genetics and computational linguistics across several languages. Since 2007, BioCaster has partnered with the Early Alerting and Reporting Project of the Global Health Security Action Group, a G7 + Mexico + EC + WHO initiative bringing together stakeholders, EI experts, and system owners to share expertise and develop a common Web-based platform.</p>
</sec>
<sec id="s12">
<title>Funding</title>
<p>BioCaster is a non-governmental system developed with grant-in-aid support from national funding organisations. In 2009 BioCaster was awarded a 3-year grant-in-aid by the Japan Science and Technology (JST) agency under the Sakigake programme to investigate enhanced health threat understanding by computers.</p>
</sec>
<sec id="s13">
<title>Output</title>
<p>BioCaster's implicitly intended users are analysts working at national and international public health agencies but there has also been considerable interest from physicians, veterinarians, researchers and the general public. Unique user numbers tend to be in the thousands per month but can rise substantially during major epidemics such as pandemic A(H1N1) and cholera in Haiti. As shown in
<xref ref-type="fig" rid="F1">Figure 1</xref>
BioCaster makes its output available in several formats such as Google maps, graphs, GeoRSS feeds and email alerts. The Web portal operates in two modes: (1) a freely accessible mapping and graphing interface called the Global Health Monitor (see
<xref ref-type="fig" rid="F1">Figure 1</xref>
) and (2) a password restricted alerting interface which is currently used by a small test community of public and animal health experts. Additionally the open access multilingual ontology provides structured term sets in 12 languages and has been downloaded by over 250 academic, industrial and public health groups worldwide including the WHO.</p>
</sec>
<sec id="s14">
<title>Coverage</title>
<p>On a typical day BioCaster processes 30,000 reports. Of these approximately 55% will be in English, 11% in Chinese, 7% in German, 7% in Russian, 6% in Korean, 5% in French, 3% in Vietnamese, 2% in Portuguese, 2% in Chinese and the remainder in Thai, Italian and Arabic. Approximately 200 reports will be considered relevant after full analysis has taken place. About 80% of these reports will pertain to human cases and the remainder to animals with a very small number of plant diseases.</p>
<p>The range of health threats in BioCaster were prioritised according to notifiable diseases at health ministries in major countries in the Asia-Pacific region, Europe and North America as well as discussions with veterinarian and CBRN experts. In October 2011 the BioCaster database (GENI-DB) (
<xref ref-type="bibr" rid="R7">Collier 2011</xref>
) contained news event records (without personal identifiers) for over 176 infectious diseases and chemicals while the rulebook has the potential to find 182 human diseases, 143 zoonotic disease, 46 animal diseases and 21 plant diseases. Additionally 40 chemicals and 9 radio-nucleotides are also under surveillance.</p>
</sec>
<sec id="s15">
<title>Signals</title>
<p>In addition to direct signals on 18 concept types such as DISEASE, VIRUS, BACTERIUM, SYMPTOM and LOCATION names, BioCaster also looks for various event features such as international travel, drug resistance as well as a number of STEEP (Social Technological Economic Environmental Political) indicators. These include school closures, shortages of vaccines and panic buying of commodities.</p>
</sec>
<sec id="s16">
<title>Data sources</title>
<p>Data are ingested on a 1 hour cycle with approximately 27,000 news items analysed per day from news sources at a commercial news aggregation company, Google News, as well as various NPO and official sources such as WHO, OIE and European Media Monitor alerts. Additionally BioCaster's sister project in social media analysis (DIZIE) is analysing syndromic signals from the Twitter microblogging service. After testing is completed we expect to integrate DIZIE alerts within BioCaster.</p>
</sec>
</sec>
<sec id="s17">
<title>User feedback</title>
<p>BioCaster has been used by a variety of public health organisations including the ECDC, the US CDC, the WHO and the Ministry of Health in Japan. User feedback has been encouraging both about the quality of information the system provided and its scope. Public health analysts have asked for us to customise the system to monitor mass gathering events such as the Shanghai Expo in 2010 or the London Olympics in 2012 as well as possible outcomes of environmental disasters such as the Gulf of Mexico oil spill in 2010. Animal health analysts have begun to see the potential for systems like BioCaster and have asked us to expand the range of diseases we monitor to include notifiable conditions for animals.</p>
<p>The area where we receive the most requests is in user interface. In 2006 we focused information on a global bio-geographic map. As BioCaster's coverage has increased we have found that the map can easily overwhelm users and an adaptable alerting system was needed. In 2010 we therefore introduced hotspot alerts to draw the user's attention to specific reports. However, there is still much to be done, for example in removing duplication, clustering related events and integrating reports across languages and media types.</p>
<p>The information we provide is inevitably biased by BioCaster's input sources, which rely heavily on Google News. In recent years we have expanded BioCaster's language coverage to include news in several other languages such as Spanish, Vietnamese and Chinese but the source engine still appears to have a US-centric focus with significant gaps for sub-saharan Africa and parts of middle-Asia. We are currently trying to supplement the system with other sources such as news aggregators in China. In a seminal study of EI systems,
<xref ref-type="bibr" rid="R36">Lyon
<italic>et al.</italic>
(2011)</xref>
compared BioCaster, HealthMap and Epispider over the period from 2 to 30 August 2010 and found similar timeliness between the system alerts as well as complementarities in geographical and language focus between all three systems. The report highlighted the issue of automated location detection, e.g., BioCaster's missing of Pakistan during the study period. We have since corrected this anomaly but in the process discovered a number of issues stemming from the transliteration into English of place names in certain locations.</p>
</sec>
<sec id="s18">
<title>Future developments</title>
<p>Our current work on aberration detection has touched upon only the explicitly stated facts in news media reports. More sophisticated text mining techniques hold out the potential for greater accuracy. For example, using multi-variate features such as STEEP indicators, or symptom severity features might help to piece together seemingly disparate facts in order to better understand the significance of rare events. An improved model for spatial dispersion of events would also help. For example, a report of a mystery illness in two villages in north-eastern Italy might not in itself be significant enough to trigger an alert. However, the report could take on more significance if it were combined with the facts that (1) there were an unusually high number of cases, (2) several victims complained of mild to severe joint pain and severe headache, (3) the first cases included a traveller from Kerala, India, (4) there had been a recent severe outbreak of Chikungunya in Kerala and (5) the health authorities were recommending precautions to prevent contact with mosquitoes and suspended all blood donations.</p>
<p>As a first measure, coarse grained granularity of time and location needs to be improved so that events can be pinpointed down to at least a city and a day of occurrence, reducing the ‘late warning’ issue that I noted in (
<xref ref-type="bibr" rid="R4">Collier 2010</xref>
) where the tail of news reports about past events gets confused with newer events that share the same geographic feature.</p>
<p>On the issue of evaluation, other domains of text mining such as literature mining for bioinformatics (
<xref ref-type="bibr" rid="R17">Hirschman
<italic>et al</italic>
. 2002</xref>
) have made enormous progress in assessing quality, expanding participation and improving performance by organising shared evaluation challenges. In evaluations such as the DARPA sponsored TREC, TIPSTER and MUC, systems are compared against a common task-based benchmark, allowing for both technical comparisons as well as user-based evaluation However, adequate care needs to be taken to avoid ‘inbreeding’ of participating systems through over-sharing of methods and resources. In contrast, in Web-based EI there has been relatively little community organisation around evaluation or the sharing of tools and data. One recent study by
<xref ref-type="bibr" rid="R52">Vaillant
<italic>et al.</italic>
(2011a)</xref>
shows progress in this area by comparing seven EI systems for CBRN threats with a focus on sensitivity evaluation from a French public health perspective. Vaillant
<italic>et al.</italic>
show that by combining data from at least four systems over 94% sensitivity can be achieved. This result corroborates an earlier extrinsic evaluation highlighting high sensitivity and high timeliness perceived by users including international EI experts (
<xref ref-type="bibr" rid="R53">Vaillant
<italic>et al.</italic>
2011b</xref>
).</p>
<p>So far I have implicitly assumed that digital news reports should be the main source of information for EI systems. In reality, the landscape of digital sources is much richer: search queries, micro-blogs, digital radio, discussion boards, images, livecasts etc. Several works have already appeared looking at the potential to make use of individual health reports in Twitter (
<xref ref-type="bibr" rid="R15">Corley
<italic>et al.</italic>
2010</xref>
,
<xref ref-type="bibr" rid="R16">Culotta 2010</xref>
,
<xref ref-type="bibr" rid="R33">Lampos and Cristianini 2010</xref>
,
<xref ref-type="bibr" rid="R45">Signorini
<italic>et al.</italic>
2011</xref>
) for tracking influenza-like illness. Pearson correlations with CDC surveillance reports from sentinel providers and UK GP reports have been very encouraging. Although microblogs have no editorial control, they contain a direct real-time view into the health conditions of individuals. Another source that has received attention are search engine query trends from Google and Yahoo! (
<xref ref-type="bibr" rid="R22">Ginsberg
<italic>et al.</italic>
2008</xref>
,
<xref ref-type="bibr" rid="R42">Polgreen
<italic>et al.</italic>
2008</xref>
). As with all short message sources the challenge here is to interpret the search query's context — a user may query about a particular drug or health condition for a variety of reasons, e.g., general interest, a school report or concern about a health condition. Ginsberg's study clearly showed the potential to closely correlate query counts with CDC influenza data but research questions remain, particularly about geographic coverage as well as coverage across particular age groups, e.g., the young or old who may not be familiar or have access to the Internet. Other sources such as digital radio, potentially useful for countries in parts of Africa, SMS and livecast reports have yet to be explored.</p>
<p>The need for high performance computing to process data in real-time and adjust to surges during pandemics is a practical barrier to entry. Future systems may develop based around cloud computing services that are becoming available from companies such as Amazon, Google and Microsoft.</p>
</sec>
<sec sec-type="conclusion" id="s19">
<title>Conclusion</title>
<p>In this article I have just begun to uncover the surface of the complex technical aspects that Web-based EI system developers have grappled with over the last decade. Future developments in text mining will undoubtedly be necessary to harness the increasingly massive volumes of media and social network data and to combine this with non-media sources. Readers who wish to delve further into the issues raised here may find more detailed sources in several survey papers.
<xref ref-type="bibr" rid="R24">Hartley
<italic>et al.</italic>
(2010)</xref>
outline several active EI systems, Kosala and Blockeel's paper on mining the Web (
<xref ref-type="bibr" rid="R32">Kosala and Blockeel 2000</xref>
) raises many issues that are still relevant today and Howard Burkom's tutorial slides
<xref ref-type="fn" rid="FN13">
<sup>13</sup>
</xref>
from ISDS 2008 provide an excellent foundation for getting to grips with aberration detection along with R project software packages.
<xref ref-type="fn" rid="FN10">
<sup>10</sup>
</xref>
Among text mining books two accessible sources include
<xref ref-type="bibr" rid="R1">Berry and Kogan (2010)</xref>
and
<xref ref-type="bibr" rid="R20">Feldman and Sanger (2006)</xref>
. Data counts from the BioCaster system are available for study at GENI-DB database
<xref ref-type="fn" rid="FN14">
<sup>14</sup>
</xref>
(
<xref ref-type="bibr" rid="R9">Collier and Doan 2012</xref>
).</p>
</sec>
</body>
<back>
<sec id="s20">
<title>Notes</title>
<fn-group>
<fn id="FN1">
<label>1.</label>
<p>Google Maps:
<ext-link ext-link-type="uri" xlink:href="http://maps.google.com">http://maps.google.com</ext-link>
</p>
</fn>
<fn id="FN2">
<label>2.</label>
<p>Bing Maps:
<ext-link ext-link-type="uri" xlink:href="http://www.bing.com/maps">http://www.bing.com/maps</ext-link>
</p>
</fn>
<fn id="FN3">
<label>3.</label>
<p>Google News:
<ext-link ext-link-type="uri" xlink:href="http://news.google.com">http://news.google.com</ext-link>
</p>
</fn>
<fn id="FN4">
<label>4.</label>
<p>Flickr:
<ext-link ext-link-type="uri" xlink:href="http://www.flickr.com">http://www.flickr.com</ext-link>
</p>
</fn>
<fn id="FN5">
<label>5.</label>
<p>YouTube:
<ext-link ext-link-type="uri" xlink:href="http://www.youtube.com">http://www.youtube.com</ext-link>
</p>
</fn>
<fn id="FN6">
<label>6.</label>
<p>Twitter:
<ext-link ext-link-type="uri" xlink:href="http://twitter.com">http://twitter.com</ext-link>
</p>
</fn>
<fn id="FN7">
<label>7.</label>
<p>Open Calais:
<ext-link ext-link-type="uri" xlink:href="http://www.opencalais.com">http://www.opencalais.com</ext-link>
</p>
</fn>
<fn id="FN8">
<label>8.</label>
<p>Google Translate:
<ext-link ext-link-type="uri" xlink:href="http://translate.google.com">http://translate.google.com</ext-link>
</p>
</fn>
<fn id="FN9">
<label>9.</label>
<p>The Natural Language Toolkit:
<ext-link ext-link-type="uri" xlink:href="http://www.nltk.org/">http://www.nltk.org/</ext-link>
</p>
</fn>
<fn id="FN10">
<label>10.</label>
<p>The R project:
<ext-link ext-link-type="uri" xlink:href="http://cran.r-project.org/">http://cran.r-project.org/</ext-link>
</p>
</fn>
<fn id="FN11">
<label>11.</label>
<p>Sheffield University's GATE project:
<ext-link ext-link-type="uri" xlink:href="http://gate.ac.uk">http://gate.ac.uk</ext-link>
</p>
</fn>
<fn id="FN12">
<label>12.</label>
<p>Gideon:
<ext-link ext-link-type="uri" xlink:href="http://gideononline.com">http://gideononline.com</ext-link>
</p>
</fn>
<fn id="FN13">
<label>13.</label>
<p>Howard Burkom's 2008 ISID tutorial slides:
<ext-link ext-link-type="uri" xlink:href="http://isds.wikispaces.com/ISDS+Conference+Workshop+Materials">http://isds.wikispaces.com/ISDS+Conference+Workshop+Materials</ext-link>
</p>
</fn>
<fn id="FN14">
<label>14.</label>
<p>The GENI-DB database:
<ext-link ext-link-type="uri" xlink:href="http://born.nii.ac.jp/">http://born.nii.ac.jp/</ext-link>
</p>
</fn>
</fn-group>
</sec>
<ref-list>
<title>References</title>
<ref id="R1">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Berry</surname>
<given-names>M.W</given-names>
</name>
<name>
<surname>Kogan</surname>
<given-names>M.</given-names>
</name>
</person-group>
<source>Text mining: applications and theory</source>
<year>2010</year>
<publisher-loc>Edison, NJ</publisher-loc>
<publisher-name>Wiley</publisher-name>
</element-citation>
</ref>
<ref id="R2">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brownstein</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Freifeld</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Reis</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Mandl</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>Surveillance san frontières: Internet-based emerging infectious disease intelligence and the HealthMap project</article-title>
<source>Public Library of Science Medicine</source>
<year>2008</year>
<volume>5</volume>
<issue>7</issue>
<fpage>1019</fpage>
<lpage>1024</lpage>
</element-citation>
</ref>
<ref id="R3">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Buckeridge</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Burkom</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Campbell</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Hogan</surname>
<given-names>W.R.</given-names>
</name>
<name>
<surname>Moore</surname>
<given-names>A.W.</given-names>
</name>
</person-group>
<article-title>Algorithms for rapid outbreak detection: a research synthesis</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2005</year>
<volume>38</volume>
<issue>2</issue>
<fpage>99</fpage>
<lpage>113</lpage>
<pub-id pub-id-type="pmid">15797000</pub-id>
</element-citation>
</ref>
<ref id="R4">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chanlekha</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>A framework for enhancing spatial and temporal granularity in report-based health surveillance systems</article-title>
<source>BMC Medical Informatics and Decision Making</source>
<year>2010</year>
<volume>10</volume>
<issue>1</issue>
<fpage>e43</fpage>
</element-citation>
</ref>
<ref id="R5">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chaudet</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>Extending the event calculus for tracking epidemic spread</article-title>
<source>Artificial Intelligence in Medicine</source>
<year>2006</year>
<volume>38</volume>
<issue>2</issue>
<fpage>137</fpage>
<lpage>156</lpage>
<pub-id pub-id-type="pmid">16076554</pub-id>
</element-citation>
</ref>
<ref id="R6">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>What's unusual in online disease outbreak news?</article-title>
<source>Journal of Biomedical Semantics</source>
<year>2010</year>
<volume>1</volume>
<issue>1</issue>
<fpage>2</fpage>
<pub-id pub-id-type="pmid">20618980</pub-id>
</element-citation>
</ref>
<ref id="R7">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>Towards cross-lingual alerting for bursty epidemic events</article-title>
<source>Journal of Biomedical Semantics</source>
<year>2011</year>
<volume>2</volume>
<issue>Suppl. 5</issue>
<fpage>S10</fpage>
<pub-id pub-id-type="pmid">22166371</pub-id>
</element-citation>
</ref>
<ref id="R8">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Syndromic classification of Twitter messages</article-title>
<source>Proceedings of eHealth</source>
<year>2011</year>
<fpage>21</fpage>
<lpage>23</lpage>
<comment>November, Malaga, Spain, arXiv:1110.3094</comment>
</element-citation>
</ref>
<ref id="R9">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>GENI-DB: A database of global events for epidemic intelligence</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>8</issue>
<fpage>1186</fpage>
<lpage>1188</lpage>
<pub-id pub-id-type="pmid">22383735</pub-id>
</element-citation>
</ref>
<ref id="R10">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Matsuda Goodwin</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Conway</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Tateno</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Ngo</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Dien</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kawtrakul</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Takeuchi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Shigematsu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taniguchi</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>BioCaster: detecting public health rumors with a Web-based text mining system</article-title>
<source>Bioinformatics</source>
<year>2008</year>
<volume>24</volume>
<issue>24</issue>
<fpage>2940</fpage>
<lpage>2941</lpage>
<pub-id pub-id-type="pmid">18922806</pub-id>
</element-citation>
</ref>
<ref id="R11">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Shigematsu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Dien</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Barrero</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Takeuchi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kawtrakul</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>A multilingual ontology for infectious disease surveillance: rationale, design and challenges</article-title>
<source>Language Resources and Evaluation</source>
<year>2006</year>
<volume>40</volume>
<issue>3–4</issue>
<fpage>405</fpage>
<lpage>413</lpage>
</element-citation>
</ref>
<ref id="R12">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Shigematsu</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Taniguchi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Jin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>McCrae</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Dien</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Hung</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Takeuchi</surname>
<given-names>K.</given-names>
</name>
<name>
<surname>Kawtrakul</surname>
<given-names>A.</given-names>
</name>
</person-group>
<collab>Ontology-driven influenza surveillance from Web rumours</collab>
<source>Proceedings on Options for the Control of Influenza VI (Options 2007)</source>
<year>2007</year>
<fpage>17</fpage>
<lpage>23</lpage>
<comment>June, Toronto, Ontario, Canada</comment>
</element-citation>
</ref>
<ref id="R13">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Goodwin</surname>
<given-names>R.M.</given-names>
</name>
<name>
<surname>McCrae</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>An ontology-driven system for detecting global health events</article-title>
<source>Proceedings of the 23rd International Conference on Computational Linguistics (COLING</source>
<year>2010</year>
<fpage>215</fpage>
<lpage>222</lpage>
<comment>23–27 August, Beijing, China</comment>
</element-citation>
</ref>
<ref id="R14">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Conway</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Doan</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Kawazoe</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>Classifying disease outbreak reports using n-grams and semantic features</article-title>
<source>International Journal of Medical Informatics</source>
<year>2009</year>
<volume>78</volume>
<issue>12</issue>
<fpage>e47</fpage>
<lpage>e58</lpage>
<pub-id pub-id-type="pmid">19447070</pub-id>
</element-citation>
</ref>
<ref id="R15">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corley</surname>
<given-names>C.D.</given-names>
</name>
<name>
<surname>Cook</surname>
<given-names>D.J.</given-names>
</name>
<name>
<surname>Mikler</surname>
<given-names>A.R.</given-names>
</name>
<name>
<surname>Singh</surname>
<given-names>K.P</given-names>
</name>
</person-group>
<article-title>Text and structure data mining of influenza mentions in Web and social media</article-title>
<source>International Journal of Environmental Research and Public Health</source>
<year>2010</year>
<volume>7</volume>
<fpage>596</fpage>
<lpage>615</lpage>
<pub-id pub-id-type="pmid">20616993</pub-id>
</element-citation>
</ref>
<ref id="R16">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Culotta</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Detecting influenza outbreaks by analyzing Twitter messages</article-title>
<source>Southeastern Louisiana University Technical Report</source>
<year>2010</year>
<comment>Available from: arXiv:1007.4748v1 [cs.IR] [Accessed 25 July 2012]</comment>
</element-citation>
</ref>
<ref id="R17">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Damianos</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Ponte</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wohlever</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Reeder</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Day</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Hirschman</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>MiTAP for bio-security: a case study</article-title>
<source>AI Magazine</source>
<year>2002</year>
<volume>23</volume>
<issue>4</issue>
<fpage>13</fpage>
<lpage>29</lpage>
</element-citation>
</ref>
<ref id="R18">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Eysenbach</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Infodemiology: the epidemiology of (mis)information</article-title>
<source>American Journal of Medicine</source>
<year>2002</year>
<volume>113</volume>
<issue>9</issue>
<fpage>763</fpage>
<lpage>765</lpage>
<pub-id pub-id-type="pmid">12517369</pub-id>
</element-citation>
</ref>
<ref id="R19">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fayyad</surname>
<given-names>U.</given-names>
</name>
<name>
<surname>Piatetsky-Shapiro</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Smyth</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>From data mining to knowledge discovery in databases</article-title>
<source>AI Magazine</source>
<year>1996</year>
<volume>17</volume>
<issue>3</issue>
<fpage>37</fpage>
<lpage>54</lpage>
</element-citation>
</ref>
<ref id="R20">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Feldman</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Sanger</surname>
<given-names>J.</given-names>
</name>
</person-group>
<source>The text mining handbook: advanced approaches in analyzing unstructured data</source>
<year>2006</year>
<publisher-loc>Cambridge</publisher-loc>
<publisher-name>Cambridge University Press</publisher-name>
</element-citation>
</ref>
<ref id="R21">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fuller</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Tracking the global express: new tools addressing disease threats across the world</article-title>
<source>Epidemiology</source>
<year>2010</year>
<volume>21</volume>
<issue>6</issue>
<fpage>769</fpage>
<lpage>771</lpage>
<pub-id pub-id-type="pmid">20924231</pub-id>
</element-citation>
</ref>
<ref id="R22">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ginsberg</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Mohebbi</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Brammer</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Smolinski</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Brilliant</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Detecting influenza epidemics using search engine query data</article-title>
<source>Nature</source>
<year>2008</year>
<volume>457</volume>
<fpage>1012</fpage>
<lpage>1014</lpage>
<pub-id pub-id-type="pmid">19020500</pub-id>
</element-citation>
</ref>
<ref id="R23">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grishman</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Huttunen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Yangarber</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Information extraction for enhanced access to disease outbreak reports</article-title>
<source>Journal of Biomedical Informatics</source>
<year>2002</year>
<volume>35</volume>
<issue>4</issue>
<fpage>236</fpage>
<lpage>246</lpage>
<pub-id pub-id-type="pmid">12755518</pub-id>
</element-citation>
</ref>
<ref id="R24">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hartley</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Walters</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Arthury</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Yangarber</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Madoff</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Linge</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Mawudeku</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Collier</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Brownstein</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Thinus</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Lightfoot</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>The landscape of international event-based biosurveillance</article-title>
<source>Emerging Health Threats Journal</source>
<year>2010</year>
<volume>3</volume>
<fpage>e3</fpage>
<pub-id pub-id-type="pmid">22460393</pub-id>
</element-citation>
</ref>
<ref id="R25">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Hearst</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Untangling text data mining</article-title>
<source>Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics</source>
<year>1999</year>
<fpage>20</fpage>
<lpage>26</lpage>
<comment>June 1999, Maryland, USA, 3–10</comment>
</element-citation>
</ref>
<ref id="R26">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hirschman</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Park</surname>
<given-names>J.C.</given-names>
</name>
<name>
<surname>Tsujii</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Wong</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>C.H.</given-names>
</name>
</person-group>
<article-title>Accomplishments and challenges in literature data mining for biology</article-title>
<source>Bioinformatics</source>
<year>2002</year>
<volume>18</volume>
<issue>12</issue>
<fpage>1553</fpage>
<lpage>1561</lpage>
<pub-id pub-id-type="pmid">12490438</pub-id>
</element-citation>
</ref>
<ref id="R27">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Humphreys</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Lindberg</surname>
<given-names>D.</given-names>
</name>
</person-group>
<article-title>The UMLS project: making the conceptual connection between users and the information they need</article-title>
<source>Bulletin of the Medical Library Association</source>
<year>1993</year>
<volume>81</volume>
<issue>2</issue>
<fpage>170</fpage>
<pub-id pub-id-type="pmid">8472002</pub-id>
</element-citation>
</ref>
<ref id="R28">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hutwagner</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Thompson</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Seeman</surname>
<given-names>M.G.</given-names>
</name>
<name>
<surname>Treadwell</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>The bioterrorism preparedness and response early aberration and reporting system (EARS)</article-title>
<source>Journal of Urban Health</source>
<year>2003</year>
<volume>80</volume>
<issue>2</issue>
<fpage>i89</fpage>
<lpage>i96</lpage>
<pub-id pub-id-type="pmid">12791783</pub-id>
</element-citation>
</ref>
<ref id="R29">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Janson</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Spink</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>How are we searching the World Wide Web? A comparison of nine search engine transaction logs</article-title>
<source>Information Processing and Management</source>
<year>2006</year>
<volume>42</volume>
<issue>1</issue>
<fpage>248</fpage>
<lpage>263</lpage>
</element-citation>
</ref>
<ref id="R30">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jones</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Patel</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Levy</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Storeygard</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Balk</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Gittleman</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Daszak</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Global trends in emerging infectious diseases</article-title>
<source>Nature</source>
<year>2008</year>
<volume>451</volume>
<fpage>990</fpage>
<lpage>993</lpage>
<pub-id pub-id-type="pmid">18288193</pub-id>
</element-citation>
</ref>
<ref id="R31">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Keller</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Freifeld</surname>
<given-names>C.C.</given-names>
</name>
<name>
<surname>Brownstein</surname>
<given-names>J.S.</given-names>
</name>
</person-group>
<article-title>Automated vocabulary discovery for geo-parsing online epidemic intelligence</article-title>
<source>Bio Medical Central Bioinformatics</source>
<year>2009</year>
<volume>10</volume>
<fpage>385</fpage>
</element-citation>
</ref>
<ref id="R32">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kosala</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Blockeel</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>Web mining research: a survey</article-title>
<source>Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining Explorations</source>
<year>2000</year>
<volume>2</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>15</lpage>
</element-citation>
</ref>
<ref id="R33">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Lampos</surname>
<given-names>V.</given-names>
</name>
<name>
<surname>Cristianini</surname>
<given-names>N.</given-names>
</name>
</person-group>
<article-title>Tracking the flu pandemic by monitoring the social web</article-title>
<source>2nd IAPR Workshop on Cognitive Information Processing (CIP 2010)</source>
<year>2010</year>
<fpage>14</fpage>
<lpage>16</lpage>
<comment>June 2010, Tuscany, Italy, 411–416</comment>
</element-citation>
</ref>
<ref id="R34">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Lin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ho</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Discovering informative content blocks from Web documents</article-title>
<source>Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD)</source>
<year>2002</year>
<fpage>23</fpage>
<lpage>26</lpage>
<comment>July 2002, Alberta, Canada</comment>
</element-citation>
</ref>
<ref id="R35">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lowe</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Barnett</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches</article-title>
<source>Journal of the American Medical Association</source>
<year>1994</year>
<volume>271</volume>
<fpage>1103</fpage>
<lpage>1108</lpage>
<pub-id pub-id-type="pmid">8151853</pub-id>
</element-citation>
</ref>
<ref id="R36">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Lyon</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Nunn</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Grossel</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Burgman</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Comparison of Web-based biosecurity intelligence systems: BioCaster, EpiSPIDER and HealthMap</article-title>
<source>Transboundary and Emerging Diseases [E-publication ahead of print]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://onlinelibrary.wiley.com/doi/10.1111/j.1865-1682.2011.01258.x/abstract">http://onlinelibrary.wiley.com/doi/10.1111/j.1865-1682.2011.01258.x/abstract</ext-link>
[Accessed 25 July 2012]</source>
<year>2011</year>
</element-citation>
</ref>
<ref id="R37">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Madoff</surname>
<given-names>L.C.</given-names>
</name>
<name>
<surname>Woodall</surname>
<given-names>J.P.</given-names>
</name>
</person-group>
<article-title>The Internet and the global monitoring of emerging diseases: lessons from the first 10 years of ProMED</article-title>
<source>Archives of Medical Research</source>
<year>2005</year>
<volume>36</volume>
<fpage>724</fpage>
<lpage>730</lpage>
<pub-id pub-id-type="pmid">16216654</pub-id>
</element-citation>
</ref>
<ref id="R38">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Mawudeku</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Blench</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Global Public Health Intelligence Network (GPHIN)</article-title>
<source>Proceedings of the 7th Conference of the Association for Machine Translation in the Americas</source>
<year>2006</year>
<fpage>8</fpage>
<lpage>12</lpage>
<comment>August, Cambridge, MA</comment>
</element-citation>
</ref>
<ref id="R39">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>McCallum</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W.</given-names>
</name>
</person-group>
<article-title>Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons</article-title>
<source>Proceedings of the Seventh Conference on Natural Language Learning</source>
<year>2003</year>
<comment>31 May-1 June 2003, Edmonton, Canada, 188–191</comment>
</element-citation>
</ref>
<ref id="R40">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nadeau</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Sekine</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>A survey of named entity recognition and classification</article-title>
<source>Linguisticae Investigationes</source>
<year>2007</year>
<volume>30</volume>
<issue>1</issue>
<fpage>3</fpage>
<lpage>26</lpage>
</element-citation>
</ref>
<ref id="R41">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Paquet</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Coulombier</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Kaiser</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Ciotti</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Epidemic intelligence: a new framework for strengthening disease intelligence in Europe</article-title>
<source>EuroSurveillance</source>
<year>2006</year>
<volume>11</volume>
<issue>12</issue>
<fpage>665</fpage>
</element-citation>
</ref>
<ref id="R42">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Polgreen</surname>
<given-names>P.M.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Pennock</surname>
<given-names>D.M.</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>F.D.</given-names>
</name>
</person-group>
<article-title>Using Internet searches for influenza surveillance</article-title>
<source>Clinical Infectious Diseases</source>
<year>2008</year>
<volume>47</volume>
<issue>11</issue>
<fpage>1443</fpage>
<lpage>1448</lpage>
<pub-id pub-id-type="pmid">18954267</pub-id>
</element-citation>
</ref>
<ref id="R43">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Price</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Spackman</surname>
<given-names>K.</given-names>
</name>
</person-group>
<article-title>SNOMED clinical terms</article-title>
<source>British Journal of Healthcare Computing & Information Management</source>
<year>2000</year>
<volume>17</volume>
<issue>3</issue>
<fpage>27</fpage>
<lpage>31</lpage>
</element-citation>
</ref>
<ref id="R44">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Rosse</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Mejino</surname>
<given-names>J.L.V.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Burger</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Davidson</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Baldock</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>The foundational model of anatomy ontology</article-title>
<source>Anatomy ontologies for bioinformatics: principles and practice</source>
<year>2008</year>
<volume>6</volume>
<publisher-loc>London</publisher-loc>
<publisher-name>Springer</publisher-name>
<fpage>59</fpage>
<lpage>117</lpage>
</element-citation>
</ref>
<ref id="R45">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Signorini</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Segre</surname>
<given-names>A.M.</given-names>
</name>
<name>
<surname>Polgreen</surname>
<given-names>P.M.</given-names>
</name>
</person-group>
<article-title>The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic</article-title>
<source>Public Library of Science One</source>
<year>2011</year>
<volume>6</volume>
<issue>5</issue>
<fpage>19467</fpage>
</element-citation>
</ref>
<ref id="R46">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Soergel</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Lauser</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Fisseha</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Keizer</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Katz</surname>
<given-names>S.</given-names>
</name>
</person-group>
<source>Reengineering thesauri for new applications: the AGROVOC example. Journal of Digital Information</source>
<year>2004</year>
<volume>4</volume>
<issue>4</issue>
<comment>Available from:
<ext-link ext-link-type="uri" xlink:href="http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel">http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel</ext-link>
</comment>
</element-citation>
</ref>
<ref id="R47">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Steinberger</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Flavio</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>van der Goot</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Best</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>von Etter</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Yangarber</surname>
<given-names>R.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Fogelman-Soulié</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Perrotta</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Piskorski</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Steinberger</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Text mining from the web for medical intelligence</article-title>
<source>Mining massive data sets for security</source>
<year>2008</year>
<publisher-loc>Amsterdam, The Netherlands</publisher-loc>
<publisher-name>IOS Press</publisher-name>
<fpage>295</fpage>
<lpage>310</lpage>
</element-citation>
</ref>
<ref id="R48">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Swanson</surname>
<given-names>D.R.</given-names>
</name>
</person-group>
<article-title>Fish oil, Raynaud's syndrome, and undiscovered public knowledge</article-title>
<source>Perspectives in Biology and Medicine</source>
<year>1986</year>
<volume>30</volume>
<issue>1</issue>
<fpage>7</fpage>
<lpage>18</lpage>
<pub-id pub-id-type="pmid">3797213</pub-id>
</element-citation>
</ref>
<ref id="R49">
<element-citation publication-type="other">
<collab>The Open Biomedical Ontologies (OBO),</collab>
<source>The open biomedical ontologies [online]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://www.obofoundry.org/">http://www.obofoundry.org/</ext-link>
[Accessed 25 September 2011]</source>
<year>2011</year>
</element-citation>
</ref>
<ref id="R50">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tolentino</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kamadjeu</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Fontelo</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Matters</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Pollack</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Madoff</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>Scanning the emerging infectious disease horizon — visualizing ProMED emails using EpiSpider</article-title>
<source>Advances in Disease Surveillance</source>
<year>2007</year>
<volume>2</volume>
<fpage>169</fpage>
</element-citation>
</ref>
<ref id="R51">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Torii</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yin</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Nguyen</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Mazumdar</surname>
<given-names>C.T.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Hartlet</surname>
<given-names>D.M.</given-names>
</name>
<name>
<surname>Nelson</surname>
<given-names>N.P.</given-names>
</name>
</person-group>
<article-title>An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics</article-title>
<source>International Journal of Medical Informatics</source>
<year>2011</year>
<volume>80</volume>
<issue>1</issue>
<fpage>56</fpage>
<lpage>66</lpage>
<pub-id pub-id-type="pmid">21134784</pub-id>
</element-citation>
</ref>
<ref id="R52">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Vaillant</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Nys</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Gastellu-Etchegorry</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Barboza</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Enhancement of sensitivity with gathering Internet-based systems for early threat detection within the global health security initiative (GHSI): the EAR project</article-title>
<source>Proceedings of eHealth</source>
<year>2011a</year>
<fpage>21</fpage>
<lpage>23</lpage>
<comment>November, Malaga, Spain, (in press). Available from:
<ext-link ext-link-type="uri" xlink:href="http://electronic-health.org/poster_abstracts/ehealth2011_poster_GHSAG.pdf">http://electronic-health.org/poster_abstracts/ehealth2011_poster_GHSAG.pdf</ext-link>
[Accessed 3 July 2012]</comment>
</element-citation>
</ref>
<ref id="R53">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>Vaillant</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Barboza</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Arthur</surname>
<given-names>R.R.</given-names>
</name>
</person-group>
<article-title>Epidemic intelligence: assessing event-based tools and user's perception in the GHSAG community</article-title>
<source>Proceedings of IMED 2011</source>
<year>2011b</year>
<fpage>4</fpage>
<lpage>7</lpage>
<comment>February, Vienna, Austria</comment>
</element-citation>
</ref>
<ref id="R54">
<element-citation publication-type="other">
<person-group person-group-type="author">
<name>
<surname>von Etter</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Huttunen</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Vihavainen</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Vourinen</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Yangarber</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Assessment of utility in Web mining for the domain of public health</article-title>
<source>Proceedings of NAACL HLT 2010 Workshop on Text and Data Mining of Health Documents</source>
<year>2010</year>
<comment>5 June 2010, California, USA, 29–37</comment>
</element-citation>
</ref>
<ref id="R55">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wagner</surname>
<given-names>M.M.</given-names>
</name>
<name>
<surname>Tsui</surname>
<given-names>F.C.</given-names>
</name>
<name>
<surname>Espino</surname>
<given-names>J.U.</given-names>
</name>
<name>
<surname>Dato</surname>
<given-names>V.M.</given-names>
</name>
<name>
<surname>Sittig</surname>
<given-names>D.F.</given-names>
</name>
<name>
<surname>Caruana</surname>
<given-names>R.A.</given-names>
</name>
<name>
<surname>McGinnis</surname>
<given-names>L.F.</given-names>
</name>
<name>
<surname>Deerfield</surname>
<given-names>D.W</given-names>
</name>
<name>
<surname>Druzdzel</surname>
<given-names>M.J.</given-names>
</name>
<name>
<surname>Fridsma</surname>
<given-names>D.B.</given-names>
</name>
</person-group>
<article-title>The emerging science of very early detection of disease outbreak</article-title>
<source>Journal of Public Health Management Practices</source>
<year>2001</year>
<volume>7</volume>
<issue>6</issue>
<fpage>51</fpage>
<lpage>59</lpage>
</element-citation>
</ref>
<ref id="R56">
<element-citation publication-type="other">
<collab>Wikipedia,</collab>
<source>2009 flu pandemic timeline [online]. Available from:
<ext-link ext-link-type="uri" xlink:href="http://en.wikipedia.org/wiki/2009_flu_pandemic_timeline">http://en.wikipedia.org/wiki/2009_flu_pandemic_timeline</ext-link>
[Accessed 25 September 2011]</source>
<year>2009</year>
</element-citation>
</ref>
<ref id="R57">
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Wilks</surname>
<given-names>Y.</given-names>
</name>
</person-group>
<source>Machine translation — its scope and limits</source>
<year>2009</year>
<publisher-loc>London</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="R58">
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zamite</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>F.A.B.</given-names>
</name>
<name>
<surname>Couto</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Silva</surname>
<given-names>M.J.</given-names>
</name>
</person-group>
<article-title>MEDCollector: multisource epidemic data collector</article-title>
<source>Lecture Notes in Computer Science</source>
<year>2010</year>
<volume>6266</volume>
<fpage>16</fpage>
<lpage>30</lpage>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/SrasV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001033 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 001033 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    SrasV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3438486
   |texte=   Uncovering text mining: A survey of current work on web-based epidemic intelligence
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:22783909" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a SrasV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 28 14:49:16 2020. Site generation: Sat Mar 27 22:06:49 2021