Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

An evaluation framework for comparing geocoding systems

Identifieur interne : 000339 ( Pmc/Corpus ); précédent : 000338; suivant : 000340

An evaluation framework for comparing geocoding systems

Auteurs : Daniel W. Goldberg ; Morven Ballard ; James H. Boyd ; Narelle Mullan ; Carol Garfield ; Diana Rosman ; Anna M. Ferrante ; James B. Semmens

Source :

RBID : PMC:3834528

Abstract

Background

Geocoding, the process of converting textual information describing a location into one or more digital geographic representations, is a routine task performed at large organizations and government agencies across the globe. In a health context, this task is often a fundamental first step performed prior to all operations that take place in a spatially-based health study. As such, the quality of the geocoding system used within these agencies is of paramount concern to the agency (the producer) and researchers or policy-makers who wish to use these data (consumers). However, geocoding systems are continually evolving with new products coming on the market continuously. Agencies must develop and use criteria across a number axes when faced with decisions about building, buying, or maintaining any particular geocoding systems. To date, published criteria have focused on one or more aspects of geocode quality without taking a holistic view of a geocoding system’s role within a large organization. The primary purpose of this study is to develop and test an evaluation framework to assist a large organization in determining which geocoding systems will meet its operational needs.

Methods

A geocoding platform evaluation framework is derived through an examination of prior literature on geocoding accuracy. The framework developed extends commonly used geocoding metrics to take into account the specific concerns of large organizations for which geocoding is a fundamental operational capability tightly-knit into its core mission of processing health data records. A case study is performed to evaluate the strengths and weaknesses of five geocoding platforms currently available in the Australian geospatial marketplace.

Results

The evaluation framework developed in this research is proven successful in differentiating between key capabilities of geocoding systems that are important in the context of a large organization with significant investments in geocoding resources. Results from the proposed methodology highlight important differences across all axes of geocoding system comparisons including spatial data output accuracy, reference data coverage, system flexibility, the potential for tight integration, and the need for specialized staff and/or development time and funding. Such results can empower decisions-makers within large organizations as they make decisions and investments in geocoding systems.


Url:
DOI: 10.1186/1476-072X-12-50
PubMed: 24207169
PubMed Central: 3834528

Links to Exploration step

PMC:3834528

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An evaluation framework for comparing geocoding systems</title>
<author>
<name sortKey="Goldberg, Daniel W" sort="Goldberg, Daniel W" uniqKey="Goldberg D" first="Daniel W" last="Goldberg">Daniel W. Goldberg</name>
<affiliation>
<nlm:aff id="I1">Department of Geography, Texas A&M University, College Station, Texas, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ballard, Morven" sort="Ballard, Morven" uniqKey="Ballard M" first="Morven" last="Ballard">Morven Ballard</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boyd, James H" sort="Boyd, James H" uniqKey="Boyd J" first="James H" last="Boyd">James H. Boyd</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mullan, Narelle" sort="Mullan, Narelle" uniqKey="Mullan N" first="Narelle" last="Mullan">Narelle Mullan</name>
<affiliation>
<nlm:aff id="I3">Cooperative Research Centre for Spatial Information, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garfield, Carol" sort="Garfield, Carol" uniqKey="Garfield C" first="Carol" last="Garfield">Carol Garfield</name>
<affiliation>
<nlm:aff id="I4">Data Linkage Branch, Western Australia Department of Health, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rosman, Diana" sort="Rosman, Diana" uniqKey="Rosman D" first="Diana" last="Rosman">Diana Rosman</name>
<affiliation>
<nlm:aff id="I4">Data Linkage Branch, Western Australia Department of Health, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ferrante, Anna M" sort="Ferrante, Anna M" uniqKey="Ferrante A" first="Anna M" last="Ferrante">Anna M. Ferrante</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Semmens, James B" sort="Semmens, James B" uniqKey="Semmens J" first="James B" last="Semmens">James B. Semmens</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">24207169</idno>
<idno type="pmc">3834528</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834528</idno>
<idno type="RBID">PMC:3834528</idno>
<idno type="doi">10.1186/1476-072X-12-50</idno>
<date when="2013">2013</date>
<idno type="wicri:Area/Pmc/Corpus">000339</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">An evaluation framework for comparing geocoding systems</title>
<author>
<name sortKey="Goldberg, Daniel W" sort="Goldberg, Daniel W" uniqKey="Goldberg D" first="Daniel W" last="Goldberg">Daniel W. Goldberg</name>
<affiliation>
<nlm:aff id="I1">Department of Geography, Texas A&M University, College Station, Texas, USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ballard, Morven" sort="Ballard, Morven" uniqKey="Ballard M" first="Morven" last="Ballard">Morven Ballard</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Boyd, James H" sort="Boyd, James H" uniqKey="Boyd J" first="James H" last="Boyd">James H. Boyd</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mullan, Narelle" sort="Mullan, Narelle" uniqKey="Mullan N" first="Narelle" last="Mullan">Narelle Mullan</name>
<affiliation>
<nlm:aff id="I3">Cooperative Research Centre for Spatial Information, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Garfield, Carol" sort="Garfield, Carol" uniqKey="Garfield C" first="Carol" last="Garfield">Carol Garfield</name>
<affiliation>
<nlm:aff id="I4">Data Linkage Branch, Western Australia Department of Health, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rosman, Diana" sort="Rosman, Diana" uniqKey="Rosman D" first="Diana" last="Rosman">Diana Rosman</name>
<affiliation>
<nlm:aff id="I4">Data Linkage Branch, Western Australia Department of Health, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Ferrante, Anna M" sort="Ferrante, Anna M" uniqKey="Ferrante A" first="Anna M" last="Ferrante">Anna M. Ferrante</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Semmens, James B" sort="Semmens, James B" uniqKey="Semmens J" first="James B" last="Semmens">James B. Semmens</name>
<affiliation>
<nlm:aff id="I2">Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">International Journal of Health Geographics</title>
<idno type="eISSN">1476-072X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Geocoding, the process of converting textual information describing a location into one or more digital geographic representations, is a routine task performed at large organizations and government agencies across the globe. In a health context, this task is often a fundamental first step performed prior to all operations that take place in a spatially-based health study. As such, the quality of the geocoding system used within these agencies is of paramount concern to the agency (the producer) and researchers or policy-makers who wish to use these data (consumers). However, geocoding systems are continually evolving with new products coming on the market continuously. Agencies must develop and use criteria across a number axes when faced with decisions about building, buying, or maintaining any particular geocoding systems. To date, published criteria have focused on one or more aspects of geocode quality without taking a holistic view of a geocoding system’s role within a large organization. The primary purpose of this study is to develop and test an evaluation framework to assist a large organization in determining which geocoding systems will meet its operational needs.</p>
</sec>
<sec>
<title>Methods</title>
<p>A geocoding platform evaluation framework is derived through an examination of prior literature on geocoding accuracy. The framework developed extends commonly used geocoding metrics to take into account the specific concerns of large organizations for which geocoding is a fundamental operational capability tightly-knit into its core mission of processing health data records. A case study is performed to evaluate the strengths and weaknesses of five geocoding platforms currently available in the Australian geospatial marketplace.</p>
</sec>
<sec>
<title>Results</title>
<p>The evaluation framework developed in this research is proven successful in differentiating between key capabilities of geocoding systems that are important in the context of a large organization with significant investments in geocoding resources. Results from the proposed methodology highlight important differences across all axes of geocoding system comparisons including spatial data output accuracy, reference data coverage, system flexibility, the potential for tight integration, and the need for specialized staff and/or development time and funding. Such results can empower decisions-makers within large organizations as they make decisions and investments in geocoding systems.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Zandbergen, Pa" uniqKey="Zandbergen P">PA Zandbergen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mazumdar, S" uniqKey="Mazumdar S">S Mazumdar</name>
</author>
<author>
<name sortKey="Rushton, G" uniqKey="Rushton G">G Rushton</name>
</author>
<author>
<name sortKey="Smith, Bj" uniqKey="Smith B">BJ Smith</name>
</author>
<author>
<name sortKey="Zimmerman, Dl" uniqKey="Zimmerman D">DL Zimmerman</name>
</author>
<author>
<name sortKey="Donham, Kj" uniqKey="Donham K">KJ Donham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mcelroy, Ja" uniqKey="Mcelroy J">JA McElroy</name>
</author>
<author>
<name sortKey="Remington, Pl" uniqKey="Remington P">PL Remington</name>
</author>
<author>
<name sortKey="Trentham Dietz, A" uniqKey="Trentham Dietz A">A Trentham-Dietz</name>
</author>
<author>
<name sortKey="Roberts, Sa" uniqKey="Roberts S">SA Roberts</name>
</author>
<author>
<name sortKey="Newcomber, Pa" uniqKey="Newcomber P">PA Newcomber</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oliver, Mn" uniqKey="Oliver M">MN Oliver</name>
</author>
<author>
<name sortKey="Matthews, Ka" uniqKey="Matthews K">KA Matthews</name>
</author>
<author>
<name sortKey="Siadaty, M" uniqKey="Siadaty M">M Siadaty</name>
</author>
<author>
<name sortKey="Hauck, Fr" uniqKey="Hauck F">FR Hauck</name>
</author>
<author>
<name sortKey="Pickle, Lw" uniqKey="Pickle L">LW Pickle</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rushton, G" uniqKey="Rushton G">G Rushton</name>
</author>
<author>
<name sortKey="Armstrong, Mp" uniqKey="Armstrong M">MP Armstrong</name>
</author>
<author>
<name sortKey="Gittler, J" uniqKey="Gittler J">J Gittler</name>
</author>
<author>
<name sortKey="Greene, Br" uniqKey="Greene B">BR Greene</name>
</author>
<author>
<name sortKey="Pavlik, Ce" uniqKey="Pavlik C">CE Pavlik</name>
</author>
<author>
<name sortKey="West, Mm" uniqKey="West M">MM West</name>
</author>
<author>
<name sortKey="Zimmerman, Dl" uniqKey="Zimmerman D">DL Zimmerman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schootman, M" uniqKey="Schootman M">M Schootman</name>
</author>
<author>
<name sortKey="Sterling, Da" uniqKey="Sterling D">DA Sterling</name>
</author>
<author>
<name sortKey="Struthers, J" uniqKey="Struthers J">J Struthers</name>
</author>
<author>
<name sortKey="Yan, Y" uniqKey="Yan Y">Y Yan</name>
</author>
<author>
<name sortKey="Laboube, T" uniqKey="Laboube T">T Laboube</name>
</author>
<author>
<name sortKey="Emo, B" uniqKey="Emo B">B Emo</name>
</author>
<author>
<name sortKey="Higgs, G" uniqKey="Higgs G">G Higgs</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Skelly, C" uniqKey="Skelly C">C Skelly</name>
</author>
<author>
<name sortKey="Black, W" uniqKey="Black W">W Black</name>
</author>
<author>
<name sortKey="Hearnden, M" uniqKey="Hearnden M">M Hearnden</name>
</author>
<author>
<name sortKey="Eyles, R" uniqKey="Eyles R">R Eyles</name>
</author>
<author>
<name sortKey="Weinstein, P" uniqKey="Weinstein P">P Weinstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zandbergen, Pa" uniqKey="Zandbergen P">PA Zandbergen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhan, Fb" uniqKey="Zhan F">FB Zhan</name>
</author>
<author>
<name sortKey="Brender, Jd" uniqKey="Brender J">JD Brender</name>
</author>
<author>
<name sortKey="De Lima, I" uniqKey="De Lima I">I De Lima</name>
</author>
<author>
<name sortKey="Suarez, L" uniqKey="Suarez L">L Suarez</name>
</author>
<author>
<name sortKey="Langlois, Ph" uniqKey="Langlois P">PH Langlois</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gilboa, Sm" uniqKey="Gilboa S">SM Gilboa</name>
</author>
<author>
<name sortKey="Mendola, P" uniqKey="Mendola P">P Mendola</name>
</author>
<author>
<name sortKey="Olshan, Af" uniqKey="Olshan A">AF Olshan</name>
</author>
<author>
<name sortKey="Harness, C" uniqKey="Harness C">C Harness</name>
</author>
<author>
<name sortKey="Loomis, D" uniqKey="Loomis D">D Loomis</name>
</author>
<author>
<name sortKey="Langlois, Ph" uniqKey="Langlois P">PH Langlois</name>
</author>
<author>
<name sortKey="Savitz, Da" uniqKey="Savitz D">DA Savitz</name>
</author>
<author>
<name sortKey="Herring, Ah" uniqKey="Herring A">AH Herring</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Sd" uniqKey="Johnson S">SD Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lixin, Y" uniqKey="Lixin Y">Y Lixin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lovasi, Gs" uniqKey="Lovasi G">GS Lovasi</name>
</author>
<author>
<name sortKey="Weiss, Jc" uniqKey="Weiss J">JC Weiss</name>
</author>
<author>
<name sortKey="Hoskins, R" uniqKey="Hoskins R">R Hoskins</name>
</author>
<author>
<name sortKey="Whitsel, Ea" uniqKey="Whitsel E">EA Whitsel</name>
</author>
<author>
<name sortKey="Rice, K" uniqKey="Rice K">K Rice</name>
</author>
<author>
<name sortKey="Erickson, Cf" uniqKey="Erickson C">CF Erickson</name>
</author>
<author>
<name sortKey="Psaty, Bm" uniqKey="Psaty B">BM Psaty</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Swift, Jn" uniqKey="Swift J">JN Swift</name>
</author>
<author>
<name sortKey="Goldberg, Dw" uniqKey="Goldberg D">DW Goldberg</name>
</author>
<author>
<name sortKey="Wilson, Jp" uniqKey="Wilson J">JP Wilson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cayo, Mr" uniqKey="Cayo M">MR Cayo</name>
</author>
<author>
<name sortKey="Talbot, To" uniqKey="Talbot T">TO Talbot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Davis, Ca" uniqKey="Davis C">CA Davis</name>
</author>
<author>
<name sortKey="Fonseca, Ft" uniqKey="Fonseca F">FT Fonseca</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
<author>
<name sortKey="Cockburn, M" uniqKey="Cockburn M">M Cockburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karimi, Ha" uniqKey="Karimi H">HA Karimi</name>
</author>
<author>
<name sortKey="Durcik, M" uniqKey="Durcik M">M Durcik</name>
</author>
<author>
<name sortKey="Rasdorf, W" uniqKey="Rasdorf W">W Rasdorf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nicoara, G" uniqKey="Nicoara G">G Nicoara</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Whitsel, Ea" uniqKey="Whitsel E">EA Whitsel</name>
</author>
<author>
<name sortKey="Quibrera, Pm" uniqKey="Quibrera P">PM Quibrera</name>
</author>
<author>
<name sortKey="Smith, Rl" uniqKey="Smith R">RL Smith</name>
</author>
<author>
<name sortKey="Catellier, Dj" uniqKey="Catellier D">DJ Catellier</name>
</author>
<author>
<name sortKey="Liao, D" uniqKey="Liao D">D Liao</name>
</author>
<author>
<name sortKey="Henley, Ac" uniqKey="Henley A">AC Henley</name>
</author>
<author>
<name sortKey="Heiss, G" uniqKey="Heiss G">G Heiss</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zandbergen, Pa" uniqKey="Zandbergen P">PA Zandbergen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abe, T" uniqKey="Abe T">T Abe</name>
</author>
<author>
<name sortKey="Stinchcomb, Dg" uniqKey="Stinchcomb D">DG Stinchcomb</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sperling, J" uniqKey="Sperling J">J Sperling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ward, Mh" uniqKey="Ward M">MH Ward</name>
</author>
<author>
<name sortKey="Nuckols, Jr" uniqKey="Nuckols J">JR Nuckols</name>
</author>
<author>
<name sortKey="Giglierano, J" uniqKey="Giglierano J">J Giglierano</name>
</author>
<author>
<name sortKey="Bonner, Mr" uniqKey="Bonner M">MR Bonner</name>
</author>
<author>
<name sortKey="Wolter, C" uniqKey="Wolter C">C Wolter</name>
</author>
<author>
<name sortKey="Airola, M" uniqKey="Airola M">M Airola</name>
</author>
<author>
<name sortKey="Mix, W" uniqKey="Mix W">W Mix</name>
</author>
<author>
<name sortKey="Colt, Js" uniqKey="Colt J">JS Colt</name>
</author>
<author>
<name sortKey="Hartge, P" uniqKey="Hartge P">P Hartge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Johnson, Sd" uniqKey="Johnson S">SD Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Christen, P" uniqKey="Christen P">P Christen</name>
</author>
<author>
<name sortKey="Churches, T" uniqKey="Churches T">T Churches</name>
</author>
<author>
<name sortKey="Willmore, A" uniqKey="Willmore A">A Willmore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zandbergen, Pa" uniqKey="Zandbergen P">PA Zandbergen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zandbergen, Pa" uniqKey="Zandbergen P">PA Zandbergen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, J" uniqKey="Wu J">J Wu</name>
</author>
<author>
<name sortKey="Funk, Th" uniqKey="Funk T">TH Funk</name>
</author>
<author>
<name sortKey="Lurmann, Fw" uniqKey="Lurmann F">FW Lurmann</name>
</author>
<author>
<name sortKey="Winer, Am" uniqKey="Winer A">AM Winer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bakshi, R" uniqKey="Bakshi R">R Bakshi</name>
</author>
<author>
<name sortKey="Knoblock, Ca" uniqKey="Knoblock C">CA Knoblock</name>
</author>
<author>
<name sortKey="Thakkar, S" uniqKey="Thakkar S">S Thakkar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
<author>
<name sortKey="Cockburn, M" uniqKey="Cockburn M">M Cockburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boscoe, Fp" uniqKey="Boscoe F">FP Boscoe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Christen, P" uniqKey="Christen P">P Christen</name>
</author>
<author>
<name sortKey="Churches, T" uniqKey="Churches T">T Churches</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jaro, M" uniqKey="Jaro M">M Jaro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jaro, M" uniqKey="Jaro M">M Jaro</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, D H" uniqKey="Yang D">D-H Yang</name>
</author>
<author>
<name sortKey="Bilaver, Lm" uniqKey="Bilaver L">LM Bilaver</name>
</author>
<author>
<name sortKey="Hayes, O" uniqKey="Hayes O">O Hayes</name>
</author>
<author>
<name sortKey="Goerge, R" uniqKey="Goerge R">R Goerge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
<author>
<name sortKey="Wilson, Jp" uniqKey="Wilson J">JP Wilson</name>
</author>
<author>
<name sortKey="Knoblock, C" uniqKey="Knoblock C">C Knoblock</name>
</author>
<author>
<name sortKey="Ritz, B" uniqKey="Ritz B">B Ritz</name>
</author>
<author>
<name sortKey="Cockburn, M" uniqKey="Cockburn M">M Cockburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bonner, Mr" uniqKey="Bonner M">MR Bonner</name>
</author>
<author>
<name sortKey="Han, D" uniqKey="Han D">D Han</name>
</author>
<author>
<name sortKey="Nie, J" uniqKey="Nie J">J Nie</name>
</author>
<author>
<name sortKey="Rogerson, P" uniqKey="Rogerson P">P Rogerson</name>
</author>
<author>
<name sortKey="Vena, Je" uniqKey="Vena J">JE Vena</name>
</author>
<author>
<name sortKey="Freudenheim, Jl" uniqKey="Freudenheim J">JL Freudenheim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hurley, Se" uniqKey="Hurley S">SE Hurley</name>
</author>
<author>
<name sortKey="Saunders, Tm" uniqKey="Saunders T">TM Saunders</name>
</author>
<author>
<name sortKey="Nivas, R" uniqKey="Nivas R">R Nivas</name>
</author>
<author>
<name sortKey="Hertz, A" uniqKey="Hertz A">A Hertz</name>
</author>
<author>
<name sortKey="Reynolds, P" uniqKey="Reynolds P">P Reynolds</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kravets, N" uniqKey="Kravets N">N Kravets</name>
</author>
<author>
<name sortKey="Hadden, Wc" uniqKey="Hadden W">WC Hadden</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wey, Cl" uniqKey="Wey C">CL Wey</name>
</author>
<author>
<name sortKey="Griesse, J" uniqKey="Griesse J">J Griesse</name>
</author>
<author>
<name sortKey="Kightlinger, L" uniqKey="Kightlinger L">L Kightlinger</name>
</author>
<author>
<name sortKey="Wimberly, Mc" uniqKey="Wimberly M">MC Wimberly</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krieger, N" uniqKey="Krieger N">N Krieger</name>
</author>
<author>
<name sortKey="Chen, Jt" uniqKey="Chen J">JT Chen</name>
</author>
<author>
<name sortKey="Waterman, Pd" uniqKey="Waterman P">PD Waterman</name>
</author>
<author>
<name sortKey="Soobader, Mj" uniqKey="Soobader M">MJ Soobader</name>
</author>
<author>
<name sortKey="Subramanian, Sv" uniqKey="Subramanian S">SV Subramanian</name>
</author>
<author>
<name sortKey="Carson, R" uniqKey="Carson R">R Carson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krieger, N" uniqKey="Krieger N">N Krieger</name>
</author>
<author>
<name sortKey="Waterman, Pd" uniqKey="Waterman P">PD Waterman</name>
</author>
<author>
<name sortKey="Chen, Jt" uniqKey="Chen J">JT Chen</name>
</author>
<author>
<name sortKey="Soobader, Mj" uniqKey="Soobader M">MJ Soobader</name>
</author>
<author>
<name sortKey="Subramanian, Sv" uniqKey="Subramanian S">SV Subramanian</name>
</author>
<author>
<name sortKey="Carson, R" uniqKey="Carson R">R Carson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Krieger, N" uniqKey="Krieger N">N Krieger</name>
</author>
<author>
<name sortKey="Waterman, P" uniqKey="Waterman P">P Waterman</name>
</author>
<author>
<name sortKey="Lemieux, K" uniqKey="Lemieux K">K Lemieux</name>
</author>
<author>
<name sortKey="Zierler, S" uniqKey="Zierler S">S Zierler</name>
</author>
<author>
<name sortKey="Hogan, Jw" uniqKey="Hogan J">JW Hogan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zimmerman, Dl" uniqKey="Zimmerman D">DL Zimmerman</name>
</author>
<author>
<name sortKey="Fang, X" uniqKey="Fang X">X Fang</name>
</author>
<author>
<name sortKey="Mazumdar, S" uniqKey="Mazumdar S">S Mazumdar</name>
</author>
<author>
<name sortKey="Rushton, G" uniqKey="Rushton G">G Rushton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Duncan, Dt" uniqKey="Duncan D">DT Duncan</name>
</author>
<author>
<name sortKey="Castro, Mc" uniqKey="Castro M">MC Castro</name>
</author>
<author>
<name sortKey="Blossom, Jc" uniqKey="Blossom J">JC Blossom</name>
</author>
<author>
<name sortKey="Bennett, Gg" uniqKey="Bennett G">GG Bennett</name>
</author>
<author>
<name sortKey="Gortmaker, Sl" uniqKey="Gortmaker S">SL Gortmaker</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moncrieff, S" uniqKey="Moncrieff S">S Moncrieff</name>
</author>
<author>
<name sortKey="Venkatesh, S" uniqKey="Venkatesh S">S Venkatesh</name>
</author>
<author>
<name sortKey="West, G" uniqKey="West G">G West</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beyer, Kmm" uniqKey="Beyer K">KMM Beyer</name>
</author>
<author>
<name sortKey="Schultz, Af" uniqKey="Schultz A">AF Schultz</name>
</author>
<author>
<name sortKey="Rushton, G" uniqKey="Rushton G">G Rushton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
<author>
<name sortKey="Cockburn, M" uniqKey="Cockburn M">M Cockburn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Goldberg, D" uniqKey="Goldberg D">D Goldberg</name>
</author>
<author>
<name sortKey="Wilson, J" uniqKey="Wilson J">J Wilson</name>
</author>
<author>
<name sortKey="Knoblock, C" uniqKey="Knoblock C">C Knoblock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Malczewski, J" uniqKey="Malczewski J">J Malczewski</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article" xml:lang="en">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Int J Health Geogr</journal-id>
<journal-id journal-id-type="iso-abbrev">Int J Health Geogr</journal-id>
<journal-title-group>
<journal-title>International Journal of Health Geographics</journal-title>
</journal-title-group>
<issn pub-type="epub">1476-072X</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">24207169</article-id>
<article-id pub-id-type="pmc">3834528</article-id>
<article-id pub-id-type="publisher-id">1476-072X-12-50</article-id>
<article-id pub-id-type="doi">10.1186/1476-072X-12-50</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An evaluation framework for comparing geocoding systems</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes" id="A1">
<name>
<surname>Goldberg</surname>
<given-names>Daniel W</given-names>
</name>
<xref ref-type="aff" rid="I1">1</xref>
<email>daniel.goldberg@tamu.edu</email>
</contrib>
<contrib contrib-type="author" id="A2">
<name>
<surname>Ballard</surname>
<given-names>Morven</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>morven.ballard@curtin.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A3">
<name>
<surname>Boyd</surname>
<given-names>James H</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>j.boyd@curtin.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A4">
<name>
<surname>Mullan</surname>
<given-names>Narelle</given-names>
</name>
<xref ref-type="aff" rid="I3">3</xref>
<email>n.mullan@curtin.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A5">
<name>
<surname>Garfield</surname>
<given-names>Carol</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>carol.garfield@health.wa.gov.au</email>
</contrib>
<contrib contrib-type="author" id="A6">
<name>
<surname>Rosman</surname>
<given-names>Diana</given-names>
</name>
<xref ref-type="aff" rid="I4">4</xref>
<email>diana.rosman@health.wa.gov.au</email>
</contrib>
<contrib contrib-type="author" id="A7">
<name>
<surname>Ferrante</surname>
<given-names>Anna M</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>a.ferrante@curtin.edu.au</email>
</contrib>
<contrib contrib-type="author" id="A8">
<name>
<surname>Semmens</surname>
<given-names>James B</given-names>
</name>
<xref ref-type="aff" rid="I2">2</xref>
<email>james.semmens@curtin.edu.au</email>
</contrib>
</contrib-group>
<aff id="I1">
<label>1</label>
Department of Geography, Texas A&M University, College Station, Texas, USA</aff>
<aff id="I2">
<label>2</label>
Centre for Population Health Research, Curtin University, Perth, Western Australia, Australia</aff>
<aff id="I3">
<label>3</label>
Cooperative Research Centre for Spatial Information, Perth, Western Australia, Australia</aff>
<aff id="I4">
<label>4</label>
Data Linkage Branch, Western Australia Department of Health, Perth, Western Australia, Australia</aff>
<pub-date pub-type="collection">
<year>2013</year>
</pub-date>
<pub-date pub-type="epub">
<day>8</day>
<month>11</month>
<year>2013</year>
</pub-date>
<volume>12</volume>
<fpage>50</fpage>
<lpage>50</lpage>
<history>
<date date-type="received">
<day>7</day>
<month>8</month>
<year>2013</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>9</month>
<year>2013</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright © 2013 Goldberg et al.; licensee BioMed Central Ltd.</copyright-statement>
<copyright-year>2013</copyright-year>
<copyright-holder>Goldberg et al.; licensee BioMed Central Ltd.</copyright-holder>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/2.0">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/2.0">http://creativecommons.org/licenses/by/2.0</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri xlink:href="http://www.ij-healthgeographics.com/content/12/1/50"></self-uri>
<abstract>
<sec>
<title>Background</title>
<p>Geocoding, the process of converting textual information describing a location into one or more digital geographic representations, is a routine task performed at large organizations and government agencies across the globe. In a health context, this task is often a fundamental first step performed prior to all operations that take place in a spatially-based health study. As such, the quality of the geocoding system used within these agencies is of paramount concern to the agency (the producer) and researchers or policy-makers who wish to use these data (consumers). However, geocoding systems are continually evolving with new products coming on the market continuously. Agencies must develop and use criteria across a number axes when faced with decisions about building, buying, or maintaining any particular geocoding systems. To date, published criteria have focused on one or more aspects of geocode quality without taking a holistic view of a geocoding system’s role within a large organization. The primary purpose of this study is to develop and test an evaluation framework to assist a large organization in determining which geocoding systems will meet its operational needs.</p>
</sec>
<sec>
<title>Methods</title>
<p>A geocoding platform evaluation framework is derived through an examination of prior literature on geocoding accuracy. The framework developed extends commonly used geocoding metrics to take into account the specific concerns of large organizations for which geocoding is a fundamental operational capability tightly-knit into its core mission of processing health data records. A case study is performed to evaluate the strengths and weaknesses of five geocoding platforms currently available in the Australian geospatial marketplace.</p>
</sec>
<sec>
<title>Results</title>
<p>The evaluation framework developed in this research is proven successful in differentiating between key capabilities of geocoding systems that are important in the context of a large organization with significant investments in geocoding resources. Results from the proposed methodology highlight important differences across all axes of geocoding system comparisons including spatial data output accuracy, reference data coverage, system flexibility, the potential for tight integration, and the need for specialized staff and/or development time and funding. Such results can empower decisions-makers within large organizations as they make decisions and investments in geocoding systems.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Geocoding</kwd>
<kwd>Georeferencing</kwd>
<kwd>Record linkage</kwd>
<kwd>Postal address data</kwd>
<kwd>Health records</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec sec-type="intro">
<title>Introduction</title>
<p>Across the world, individuals, research groups, and organizations of all sizes ranging from non-profit and commercial entities to local-, state- and national-level government agencies are often required to perform geocoding for numerous mission critical tasks [
<xref ref-type="bibr" rid="B1">1</xref>
,
<xref ref-type="bibr" rid="B2">2</xref>
]. Within a health context, geocoding – the process of converting textual information describing a location into one or more digital geographic representations – is used for such diverse processes as linking the individual-level addresses associated with health records to census enumeration units for disease surveillance at state and national levels, to determining individual levels exposures to environmental contaminants and identifying the accessibility of healthy food choices for populations of interest [
<xref ref-type="bibr" rid="B3">3</xref>
-
<xref ref-type="bibr" rid="B10">10</xref>
].</p>
<p>The person or organizational group responsible for performing the geocoding is faced with a number of choices regarding which geocoding system to employ to achieve a result that is sufficient for the purposes intended [
<xref ref-type="bibr" rid="B11">11</xref>
-
<xref ref-type="bibr" rid="B15">15</xref>
]. These choices may greatly impact the results of the geocoding process in terms of output data quality which propagates to subsequent studies that utilize these geocoded data as input [
<xref ref-type="bibr" rid="B4">4</xref>
,
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B11">11</xref>
,
<xref ref-type="bibr" rid="B12">12</xref>
,
<xref ref-type="bibr" rid="B14">14</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
-
<xref ref-type="bibr" rid="B22">22</xref>
]. Despite the best intentions of those responsible for providing geocoded data, many of the choices may be conditioned by the constraints of the organization for whom or within which the geocoded data are produced. Within large organizations such as state and national Health Departments and Disease Registries, for example, existing operational workflows, data confidentiality requirements, and strategic partnerships between external organizations and agencies may preclude a geocoding system from being purchased, implemented, or integrated [
<xref ref-type="bibr" rid="B4">4</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
-
<xref ref-type="bibr" rid="B25">25</xref>
].</p>
<p>As can be expected with any mission-critical operational task, evaluating the factors that one might use to determine which geocoding system will best meet the needs of a specific organization or individual is often not a simple matter. Likewise, these factors may not be readily transferrable from one situation to another [
<xref ref-type="bibr" rid="B2">2</xref>
]. Switching between geocoding systems represents the potential for expending significant levels of time, effort and funding due to the need to integrate a new system within existing production workflows, perform evaluation testing, re-train staff, etc. Given these up-front and continuing costs, the decision to change geocoding systems is generally not entered into lightly. An evaluation which compares the benefits versus the costs of each geocoding system is a useful way of determining which geocoding system is the correct choice for a particular individual, group, or organization given the specific scenario within which it must operate and the user-base it should serve.</p>
<p>In recent years, several research groups have undertaken studies which have evaluated and compared the performance of various geocoding systems. Contributions to the body of geocoding literature have included evaluations of the spatial accuracy and match rates of geocoded systems resulting from the use commercially available systems versus in-house custom-built solutions [
<xref ref-type="bibr" rid="B4">4</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B21">21</xref>
,
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B27">27</xref>
]; the use of various forms of reference data files - building centroids, address points, areal units describing parcels, street centerline files, etc. [
<xref ref-type="bibr" rid="B7">7</xref>
,
<xref ref-type="bibr" rid="B13">13</xref>
,
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B28">28</xref>
-
<xref ref-type="bibr" rid="B31">31</xref>
]; and the use of different interpolation algorithms - address range, uniform lot, population weighted centroids, geographic imputation, etc. [
<xref ref-type="bibr" rid="B18">18</xref>
,
<xref ref-type="bibr" rid="B22">22</xref>
,
<xref ref-type="bibr" rid="B32">32</xref>
,
<xref ref-type="bibr" rid="B33">33</xref>
]; the use of different feature matching methods - probabilistic, deterministic, etc. [
<xref ref-type="bibr" rid="B28">28</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
-
<xref ref-type="bibr" rid="B37">37</xref>
]; the use of pre-processing techniques - address standardization, normalization, etc. [
<xref ref-type="bibr" rid="B38">38</xref>
]; and the use and effectiveness of manual/clerical review processes to improve non-matchable addresses [
<xref ref-type="bibr" rid="B39">39</xref>
].</p>
<p>Other authors have investigated the non-random effects that an urban, rural, or remote geographic context plays on the accuracy, completeness and correctness of input address data, reference data layers and ultimate geocode output [
<xref ref-type="bibr" rid="B5">5</xref>
,
<xref ref-type="bibr" rid="B16">16</xref>
,
<xref ref-type="bibr" rid="B26">26</xref>
,
<xref ref-type="bibr" rid="B40">40</xref>
-
<xref ref-type="bibr" rid="B43">43</xref>
]. Similarly, research has investigated the non-random distribution of geocoding quality by demographic characteristics such as race, ethnicity, and income [
<xref ref-type="bibr" rid="B11">11</xref>
,
<xref ref-type="bibr" rid="B43">43</xref>
-
<xref ref-type="bibr" rid="B45">45</xref>
].</p>
<p>This rich body of prior work into geocoding comparisons has provided valuable insight into the role that various components of a geocoding system play in the quality of output produced and the effect these choices may have on subsequent research projects [
<xref ref-type="bibr" rid="B3">3</xref>
,
<xref ref-type="bibr" rid="B46">46</xref>
]. However, despite this diverse set of resources that detail the factors which influence geocode quality, there remains a lack of up-to-date guidance that an organization or individual could use to assist in the determination of which geocoding system is right for a particular application/usage context. In particular, these prior studies have not considered the particular operational, technical, policy and legal issues that are present in large organizations responsible for securely collecting, linking, curating, producing and/or disseminating health-related geocoded data such as state-level Health Departments and Disease Registries [
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B23">23</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
]. Given that any number of high-quality commercial off the shelf (COTS) geocoding systems are now available, this issue is particularly relevant if the data maintained by these agencies are to be employed to the their full potential to best serve the public at large.</p>
<p>The primary purpose of this study is to develop an evaluation framework to assist a large organization to determine which geocoding systems will meet its operational needs. The decision criteria presented offer an enumeration of the capabilities that a government agency can consider, ranging from the most basic principles of how the software gets installed to advanced requirements such as the flexibility of the system for allowing specialized matching rules for particular types of input data. The framework developed is applicable to organizations of all sizes across all regions of the world. The example used herein provides the upper bound of requirements in terms of operational needs and confidentiality requirements. While it is expected that, globally, similar agencies would have comparably high-level needs, individual researchers working on small-scope projects may not require such stringent requirements and as such would be able to make different choices than those required at an agency responsible for safeguarding confidential information.</p>
<p>However, to be clear, the current research does not recommend the usage of any particular geocoding system; instead, it offers a methodology and a set of criteria by which an organization or individual could make such a decision for themselves. The strengths and weaknesses of the proposed approach are evaluated through this case study in Western Australia. In this case study, none of the geocoding systems evaluated is listed by name due to non-disclosure agreements with the vendors who participated in the study.</p>
<p>The remainder of this paper is organized as follows. We first develop the evaluation framework by defining several axes of criteria by which a geocoding system can be characterized and measured. Within each, specific examples of capabilities, constraints, and features are provided. We next describe the context within which the current evaluation was performed. This includes the general characteristics of the types of input data, geocoding systems, and reference data that were used. Only general characteristics are provided because of the confidential nature of the data processed and non-disclosure agreements. This limits the specific details that can be reported about the geocoding systems evaluated and the data tested, but nonetheless provides an opportunity to evaluate the proposed approach. Following the descriptions of the data and systems used, we present the results of the evaluation process and offer a discussion as to their meaning.</p>
<sec>
<title>Evaluation framework</title>
<p>The evaluation framework developed and used to facilitate the experiments contained herein is a combination of traditional geocoding system performance tests (match rates, spatial variation, etc.) and a series of evaluations which capture the applicability of a geocoding system to a particular user scenario (workflow integration, cost, etc.). While both aspects are important, the combination of the two serves to highlight the balance that must be struck between performance and utility in order for an organization to decide upon an appropriate system given the requirements, limitations and constraints of any particular organization or individual.</p>
</sec>
<sec>
<title>Geocode quality</title>
<p>Output geocode quality is a primary concern for geocoded data producers and end-users of these data. Table 
<xref ref-type="table" rid="T1">1</xref>
lists the typical metrics used to measure geocode quality. These are: (a)
<bold>
<italic>match rates</italic>
</bold>
– the proportion of input data a geocoding system was capable of successfully geocoding; (b)
<bold>
<italic>match type</italic>
</bold>
– the level of geographic match for a geocode (parcel, street centroid, postcode, etc.); (c)
<bold>
<italic>match score</italic>
</bold>
– the level of similarity between the input address data requested and the reference geographic feature matched to; and (d)
<bold>
<italic>spatial accuracy</italic>
</bold>
– the distance between the true location and the computed geocode location [
<xref ref-type="bibr" rid="B2">2</xref>
,
<xref ref-type="bibr" rid="B6">6</xref>
,
<xref ref-type="bibr" rid="B34">34</xref>
,
<xref ref-type="bibr" rid="B47">47</xref>
,
<xref ref-type="bibr" rid="B48">48</xref>
]. In addition to these,
<bold>
<italic>administrative unit concordance</italic>
</bold>
is often used to indicate cases where two geocoding systems (or different configurations of the same system) result in the assignment of differing administrative unit codes.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption>
<p>Geocoding system quality metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Quality Metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Match rate (%)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Percentage of all records capable of being geocoded
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Match type (% by geographic level)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Geographic levels of geocode match – building level, parcel level, street centroid level, postcode level, etc. and percentages of matchable geocodes at each level
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Match score (% at score levels)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Frequency distribution of match scores for matchable geocodes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Spatial accuracy (% at distance levels)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Frequency distribution of distances between matchable geocodes and ground truth locations
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Spatial accuracy variation (% variation from other systems)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Frequency distribution of distances between the same geocode produced by multiple geocoding systems.
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Administrative unit concordance (% variation from other systems)</bold>
</td>
<td align="left">Frequency distribution of administrative unit concordance between the same geocode produced by multiple geocoding systems.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In the current study, the first three of these metrics were measured directly for each of the geocoding system configurations (i.e., combinations of input data, geocoding system, and reference data). Ground truth GPS points were not available for this research, so variation metrics were computed and reported for spatial accuracy and administrative unit concordance. Instances of high variation between geocoding configurations for particular addresses were used to guide the investigation of individual addresses that performed differently between geocoding configurations. Census unit concordance was not evaluated.</p>
</sec>
<sec>
<title>Geocoding system operating characteristics</title>
<p>The integration of a new geocoding system within an organization potentially represents a great deal of time, effort, and training, among other costs. Thus the decision to scrap an old system and integrate a new one is generally not made lightly. As noted above, the qualities of geocode output (match rate, spatial accuracy, etc.) are but one of the axes by which a geocoding system must be evaluated when considering the adoption of a geocoding system at an organizational level. The applicability of a geocoding system to a particular user scenario (workflow integration, cost, etc.) is paramount in the decision to adopt a new system. A brief overview of the categories and a few example metrics related to geocoding system operation that can be used to compare the applicability and appropriateness of geocoding systems to a particular set of user needs and usage scenarios are displayed in Table 
<xref ref-type="table" rid="T2">2</xref>
; each is discussed in detail in the following sections. These listing are not intended to be exhaustive because different organizations will have different needs.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption>
<p>Geocoding system operational capabilities metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Category and notes</bold>
</th>
<th align="left">
<bold>Metric</bold>
</th>
<th align="left">
<bold>Description and/or example</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>System Flexibility</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">User defined reference layers
<hr></hr>
</td>
<td align="left" valign="bottom">Is it possible to use any available reference data
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="3" align="left" valign="top">
<bold>
<italic>Ability of the user of the system to make changes and additions to data sources and methods used by the system</italic>
</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Specialized address parsing
<hr></hr>
</td>
<td align="left" valign="bottom">Add in support for new street types, named places
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Specialized matching algorithms
<hr></hr>
</td>
<td align="left" valign="bottom">Consider neighboring areas for matches
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Customized feature hierarchies
<hr></hr>
</td>
<td align="left" valign="bottom">Hierarchy based on organizational policy
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>System Integration</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Operating system support
<hr></hr>
</td>
<td align="left" valign="bottom">Windows, Unix/Linux/Solaris
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="5" align="left" valign="top">
<bold>
<italic>Ability to merge a geocoding system into an existing production system</italic>
</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">System/Workflow Integration
<hr></hr>
</td>
<td align="left" valign="bottom">Into tools and systems used by the organization (eg SAS wrappers)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Varying operational modes
<hr></hr>
</td>
<td align="left" valign="bottom">Batch/interactive/manual
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Desktop Version
<hr></hr>
</td>
<td align="left" valign="bottom">Standalone product for highly sensitive data
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">In-house Server Version
<hr></hr>
</td>
<td align="left" valign="bottom">Internal server for multiple users within agency firewall
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">API Version
<hr></hr>
</td>
<td align="left" valign="bottom">Vendor or custom-written code for off-site processing
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Metadata</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Spatial confidence values
<hr></hr>
</td>
<td align="left" valign="bottom">Descriptions of the region size (geographic area) that a geocode output is known to fall within
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>
<italic>Types and level of detail reported about the quality of the output data and/or the characteristics of the geocoding system</italic>
</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Input address/matched address concordance
<hr></hr>
</td>
<td align="left" valign="bottom">Descriptions of which attributes of the input address were incorrect, incomplete, partially matched or not used in the matching process
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Capabilities</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Automatic batch geocoding
<hr></hr>
</td>
<td align="left" valign="bottom">The ability to process a data file of records using a single process
<hr></hr>
</td>
</tr>
<tr>
<td rowspan="3" align="left" valign="top">
<bold>
<italic>Baseline functionality of paramount importance to agencies working with large health data sets</italic>
</bold>
</td>
<td align="left" valign="bottom">Interactive review
<hr></hr>
</td>
<td align="left" valign="bottom">The ability to perform manual review for non-matched records to attempt to determine a correct output geocode
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">Alias tables
<hr></hr>
</td>
<td align="left" valign="bottom">The ability to incorporate tables of named places, common synonyms for street address attributes
<hr></hr>
</td>
</tr>
<tr>
<td align="left">Weighted centroids</td>
<td align="left">The ability to bias the output location of a geocode based a known distribution of a characteristic of interest such as the distribution of population or specific subsets of a population in an area</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>System flexibility</title>
<p>The flexibility of a geocoding system describes the ability of the user of the system to make changes and additions to the data sources and methods used by the system (Table 
<xref ref-type="table" rid="T3">3</xref>
). In this evaluation, flexibility was determined by the ability of a geocoding system to: (a) permit the utilization of
<bold>
<italic>user</italic>
</bold>
<italic>-</italic>
<bold>
<italic>defined reference data layers</italic>
</bold>
(points, lines, and polygons), e.g., import and use any reference data sources; (b) create and use
<bold>
<italic>specialized address parsing rules</italic>
</bold>
, e.g., add in support for new street types, named places, etc.; (c) create and use
<bold>
<italic>specialized matching algorithms</italic>
</bold>
, e.g., look in neighboring postcodes or localities for matches; and (d) the ability to create and use
<bold>
<italic>specialized feature selection hierarchies</italic>
</bold>
based on organizational policies, e.g., search in parcels first, then localities or postcodes, or alternatively, choose whichever has the smaller area.</p>
<table-wrap position="float" id="T3">
<label>Table 3</label>
<caption>
<p>Geocoding system flexibility metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Flexibility Metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>User-defined reference data layers (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the user have the ability to include his/her own custom reference data layers? Example – including one’s own parcel layer for a locality if it is available.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Specialized address parsing rules (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the user have the ability to include his/her own custom address parsing rules? Example – including a parsing approach where the “St.” in “St. Patrick” is converted to “Saint” to provide higher match rates given a reference data source that has the term listed as “Saint”.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Specialized matching algorithms (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the user have the ability to include his/her own custom matching rules? Example – Inspecting nearby postal codes for similarly named streets and providing a higher matching score for candidate match features that are found in adjacent postal codes and lower match scored for candidate match features found in non-adjacent postal codes.
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Specialized feature selection hierarchies (Y/N)</bold>
</td>
<td align="left">Does the user have the ability to include his/her own custom ordering of reference layers? Example – Adding the ability to search first in postal codes then in municipalities in urban regions (where postal codes are small and municipalities are big) and municipalities first then postal codes in rural regions (where municipalities are small and postal codes are large).</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>System integration</title>
<p>The ability to merge a geocoding system into an existing production system is a major concern for large organizations that routinely perform geocoding as one aspect of a larger data processing system. Examples include the Western Australia (WA) Department of Health (DoH), Data Linkage Branch where the current study was undertaken. This group is responsible for providing data linkage services that consolidate data from numerous health-related sources for data consumers within the WA DoH and other local-, state-, and national-level agencies to facilitate research and policy-making. Geocoding services are provided as data are processed (linked) in order to associate census enumeration unit values with each record as part of the larger data linkage process. Similar systems are found in other Health Departments at the local-, regional-, state-, and national levels around the world, as well as Disease Registries where data consolidation, cleaning, and/or linkage tasks take place. In each scenario, the geocoding component of the overall organizational mission is tightly integrated into other dependent workflows. The geocoding process occurs as data are streamed through the system or in a batch-mode fashion from which the results are linked back to the output linked/consolidated records.</p>
<p>Table 
<xref ref-type="table" rid="T4">4</xref>
lists the primary concerns for these organizations in terms of system integration. These are: (a)
<bold>
<italic>operating system support</italic>
</bold>
– the geocoding system must be executable on the operating systems used by the organization (Windows, Unix/Linux/Solaris, etc.); (b)
<bold>
<italic>system and workflow integration</italic>
</bold>
– the geocoding system should be integrateable with the tools and systems used by the organization (SAS wrappers, COM components, dynamically linked libraries, APIs, etc.); (c)
<bold>
<italic>operational modes</italic>
</bold>
– the geocoding system must be usable in the modes necessary to support the organizational mission (batch-mode, interactive-mode, manual review/rematching, etc.).</p>
<table-wrap position="float" id="T4">
<label>Table 4</label>
<caption>
<p>Geocoding system integration metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Integration metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Operating system support (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system work on the operating system used by the organization? Example – Windows, Linux, Unix.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>System and workflow integration (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Can the system be integrated into existing systems and workflows used by the organization? Example – A system that can be wrapped as a SAS component so it can be integrated into automated SAS data processing workflows already used by the organization.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Operational mode integration – Batch mode (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system have the ability to geocode records in batch? Example – Uploading a large data set to a server and running the geocoding process over the whole file.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Operational mode integration – Interactive mode (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system have the ability to allow a user to interactively geocode records? Example – Displaying an interface that allows a user to geocode one record at a time.
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Operational mode integration – Manual review mode (Y/N)</bold>
</td>
<td align="left">Does the system have the ability to allow a user to interactively geocode records that do not process correctly in batch mode? Example – Displaying an interface that lists records that did not match in batch processing and allows the user to research, correct, and re-geocode individual records one-by-one.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Table 
<xref ref-type="table" rid="T5">5</xref>
lists various system interface modes that a geocoding platform could provide to a user. These refer to the ways in which a user would interact with the system. These interface modes are important because
<bold>
<italic>data security and/or confidentiality constraints</italic>
</bold>
may dictate certain forms of data processing. For example, it is the case that most health-related records cannot be transmitted outside of the secure environment within which they are housed so
<bold>
<italic>desktop or in-house server geocoding</italic>
</bold>
platforms may be the only option. In contrast, it may be acceptable to transmit non-confidential data over the Internet for offsite processing on a vendor’s servers through an
<bold>
<italic>application programmer interface (API)</italic>
</bold>
using custom-written code or a vendor-provided thin client.</p>
<table-wrap position="float" id="T5">
<label>Table 5</label>
<caption>
<p>Geocoding system interface metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Interface metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Desktop-based geocoding (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system work on a desktop computer?
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Server-based geocoding (Y/N)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system work on a server?
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Application programmer interface (API) geocoding (Y/N)</bold>
</td>
<td align="left">Does the system provide an API for which custom programs can be developed?</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Such APIs and other online batch-process geocoding services where users can upload a database of addresses and have them geocoded by web-based services can be categorized as web-based geocoding options. Many commercial providers offer these services such as the APIs available from Google, Yahoo and Esri [
<xref ref-type="bibr" rid="B49">49</xref>
-
<xref ref-type="bibr" rid="B51">51</xref>
]. There are other similar community-specific geocoding services like those offered by the North American Association of Central Cancer Registries (NAACCR) created to meet the needs of specific research and practice communities [
<xref ref-type="bibr" rid="B52">52</xref>
]. Within the context of health data specifically, organizations must be able to ensure data privacy, security, and confidentiality through data confidentiality and use agreements with these service providers. Current research into cyber-enabled GIS infrastructures (CyberGIS) [
<xref ref-type="bibr" rid="B53">53</xref>
] as well as secure computing environments for health data [
<xref ref-type="bibr" rid="B54">54</xref>
] is broadening the scope of what is considered acceptable. However, it is the case that in some instances, health organizations may be specifically prohibited from using web-based geocoding services. At the time this study was performed, the organization performing this study had this restriction in place. Therefore web-based geocoding systems were not included in the present evaluation.</p>
</sec>
<sec>
<title>Cost</title>
<p>The true cost of a geocoding system can be a difficult thing to quantify. However, some aspects of the geocoding system cost are easy to quantify. The price for a
<bold>
<italic>software license for the geocoding system</italic>
</bold>
, the price of a
<bold>
<italic>license for the required reference data layers</italic>
</bold>
, and the
<bold>
<italic>price for a support contract</italic>
</bold>
are examples of one-time (or yearly) fixed costs that can readily be obtained from a software vendor or assumed to be zero for open source software (Table 
<xref ref-type="table" rid="T6">6</xref>
). Each of these costs is a common expense in jurisdictions around the world, although there are free geocoding systems such as geocoder.us
<sup>a</sup>
, free reference data layers such as the US Census Bureau TIGER/Line files
<sup>b</sup>
, and unsupported geocoding systems such as the Postal Address Geocoder (PACG)
<sup>c</sup>
. However, others components that must be considered when estimating overall cost are more complicated because they involve computing time and effort for staff members.</p>
<table-wrap position="float" id="T6">
<label>Table 6</label>
<caption>
<p>Geocoding system cost metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Cost metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Software license cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Price of licensing the software
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Reference data layer cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Additional costs for licensing reference data
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Support contract cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Cost of support contract
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>FTE support cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Cost of full time equivalent support (in-house)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>FTE development cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Cost of full time equivalent development (in-house)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>FTE maintenance cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Cost of full time equivalent maintenance (in-house)
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>FTE training cost ($)</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Cost of full time equivalent training (in-house)
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>FTE specialized skills (Y/N)</bold>
</td>
<td align="left">Full time equivalent specialization required (in-house)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Table 
<xref ref-type="table" rid="T6">6</xref>
lists these costs, which include: (a) the level of effort and/or number of full time equivalent positions (FTE) required to
<bold>
<italic>support the geocoding system</italic>
</bold>
– e.g., time/effort for a staff member to identify, respond to, and/or fix errors reported by end-users; (b) the level of effort and/or number of FTE required to
<bold>
<italic>develop the geocoding system</italic>
</bold>
– e.g., time/effort for a staff member to build additional components into the geocoding system as needed; and (c) the level of effort and/or number of FTE required to
<bold>
<italic>maintain the geocoding system</italic>
</bold>
– e.g., time/effort for a staff member to update the system to use the latest reference data files.</p>
<p>When purchasing a commercial, off-the-shelf (COTS) package, many of these items disappear because the
<bold>
<italic>vendor may charge fees</italic>
</bold>
to provide them to the customer; however, the flexibility of the geocoding system may decline because the vendor may not be capable of building in all of the custom functionality required by the user. In contrast, when building and using a custom in-house geocoding solution, flexibility is maximized, but it requires the
<bold>
<italic>availability and retention of specialized staff</italic>
</bold>
with particular training and familiarity with the geocoding system and the programming languages and programming environments upon which it is built. In evaluation performed in this report, cost is not considered as a factor due to non-disclosure agreements with the vendors who participated. However, when the framework described herein is used to make geocoding decisions within an organization, it is expected that cost would be a highly weighted metric.</p>
</sec>
<sec>
<title>Metadata reporting</title>
<p>The level of metadata reported by a geocoding system represents a critical factor that discriminates one geocoding system from another. As described above and in numerous research reports [
<xref ref-type="bibr" rid="B29">29</xref>
,
<xref ref-type="bibr" rid="B45">45</xref>
,
<xref ref-type="bibr" rid="B46">46</xref>
,
<xref ref-type="bibr" rid="B55">55</xref>
,
<xref ref-type="bibr" rid="B56">56</xref>
], geocoding quality indicators both at the per-record level (match type, match score, spatial accuracy) and overall process level (match rate) are important factors that describe how well a geocoding system performs. However, these are not the only forms of metadata that a geocoding system could report. Other metadata items that data producers and consumers could be concerned with include: (a)
<bold>
<italic>spatial confidence values</italic>
</bold>
– descriptions of the region size (geographic area) that a geocode output is known to fall within; (b)
<bold>
<italic>input address/matched address concordance</italic>
</bold>
– descriptions of which attributes of the input address were incorrect, incomplete, partially matched with corrections, not used in the matching process, etc. (Table 
<xref ref-type="table" rid="T7">7</xref>
).</p>
<table-wrap position="float" id="T7">
<label>Table 7</label>
<caption>
<p>Geocoding system metadata metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Metadata metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Spatial confidence values</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system output spatial confidence intervals with each geocoded location? Example – Returning a buffer around the location within which the true geocode is known to be located
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Input address/Matched address concordance</bold>
</td>
<td align="left">Does the system return an indication of the similarity between the input address requested and the address of the geographic reference feature matched? Example – Providing a list of the input address attributes that matched or did not match the address attributes associated geographic reference feature used for interpolation</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Capabilities</title>
<p>The baseline capabilities that a geocoding system provides are of paramount importance when evaluating the appropriateness of a geocoding system within a particular usage scenario. In addition to simply providing the ability to geocode a data set of input addresses, other capabilities that a geocoding system either does or does not provide include: (a)
<bold>
<italic>automatic batch geocoding</italic>
</bold>
– the ability to process a data file of records using a single process; (b)
<bold>
<italic>interactive review</italic>
</bold>
– the ability to perform manual review for non-matched records to attempt to determine a correct output geocode; (c)
<bold>
<italic>alias tables</italic>
</bold>
– the ability to incorporate tables of named places, common synonyms for street address attributes (suffixes, additional street names, etc.); and (d)
<bold>
<italic>weighted centroids</italic>
</bold>
– the ability to bias the output location based on a known distribution of a characteristic of interest such as the distribution of population or specific subsets of a population in an area (postcode, locality, etc.) (Table 
<xref ref-type="table" rid="T8">8</xref>
).</p>
<table-wrap position="float" id="T8">
<label>Table 8</label>
<caption>
<p>Geocoding system capability metrics</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Capability metric</bold>
</th>
<th align="left">
<bold>Description</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>Automated batch geocoding</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system provide the ability to process a database of address records in batch mode? Example – Running the geocoding system over a database of records in a text file.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Interactive review</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system provide an interface that allows a user to review address records that do not match on a case-by-case basis? Example – Providing a graphical user interface (GUI) that allows a user to review geocoded results, make corrections and re-geocode.
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Alias tables</bold>
<hr></hr>
</td>
<td align="left" valign="bottom">Does the system provide the ability to add address alias tables into the geocoding process? Example – Providing the user with a capability to include the coordinates of named places, such as nursing homes, caravan parks, or prisons.
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Weighted centroids</bold>
</td>
<td align="left">Does the system allow for the use of weighting schemes to bias the placement of centroid-level output? Example – Including a population density layer that moves the output of a postcode-level geocode closer to the location within the postcode that has the highest level of population density.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>End-user needs and expectations</title>
<p>Although the producers of geocoded data often make use of these data in-house within research projects and policy-making initiatives, it is often the case that the ultimate end-user of geocoded data may be in another area of an organization or be within a completely separate organization. In each case, the
<bold>
<italic>user expectations and requirements</italic>
</bold>
will vary by the end-user in terms of data quality. For example, an end-user computing disease rates at the state level would have an entirely different expectation for the accuracy of census unit assignments than one who sought to quantify individual-level exposure metrics at a micro-scale environmental level such as indoor residential exposure to pesticide. Similarly, the
<bold>
<italic>user knowledge of the geocoding process</italic>
</bold>
and
<bold>
<italic>user capacity to handle different levels of detail</italic>
</bold>
(metadata) about the geocoding process will vary by end-user group. For example, a policy-maker or legislative analyst may be overwhelmed if provided with detail about the input postal address attributes that did and did not match in a geocoded result. In contrast, a spatial statistician may wish to know that a proximate postcode was used to produce an output geocode for an input address where the input postcode was incorrect but the locality name was correct. The evaluation of a geocoding system must take into account the
<bold>
<italic>end-user needs, wants, and abilities</italic>
</bold>
to determine which features of a geocoding system are absolutely critical given the usage scenarios that are anticipated in the end-user communities which an organization’s geocoded data serve.</p>
<p>Similarly, the
<bold>
<italic>frequency of geocoding requests</italic>
</bold>
that are expected of a geocoding provider from end-users is an important aspect to consider, as is the
<bold>
<italic>volume of records that must be processed</italic>
</bold>
in each instance. A time consuming geocoding process that results in highly accurate results may be an acceptable option if the staff that must perform the geocoding are asked to do so infrequently or the data files are small. In contrast, organizations that must continually process large amounts of data or do so as part of an automated process simply cannot spend a great deal of time on a per-record basis, and as such, may be willing to sacrifice some level of accuracy or metadata for processing speed. These issues relate to the means by which the geocoding process is integrated into the organizational workflow, and whether or not the geocoding is
<bold>
<italic>performed on a per-project basis</italic>
</bold>
(one at a time), or if the process is tightly integrated into the mission of the organization and is
<bold>
<italic>an integral part</italic>
</bold>
of the services which the organization provides.</p>
</sec>
<sec>
<title>Operating performance</title>
<p>The operating performance of a geocoding system defines characteristics of the geocoding system that affect how fast records can be processed. In most modern computing environments in use today, per-record processing speed is of little concern as many commercially available geocoding systems can process on the order of millions of records per hour. However, if large volumes of data must be continually processed or re-processed, speed may be an issue that can be used to discriminate between geocoding systems. An extreme example would be the need for real-time geocoding in a disaster or health emergency scenario such as a disease outbreak. Here, geocoded data are needed immediately to help resolve or understand a phenomenon as it is unfolding on the ground to assist in the decision-making process, determine where resources are needed, and identify a course of action to pursue to save lives and property.</p>
</sec>
</sec>
<sec sec-type="materials">
<title>Materials</title>
<sec>
<title>Geocoding systems evaluated</title>
<p>Five desktop geocoding systems were evaluated. The geocoding systems used in this analysis were chosen from among the members of the Cooperative Research Centre for Spatial Information (CRC-SI). All 43 industrial partners of the CRC-SI were solicited to participate in this project through an expression of interest (EOI) process which requested information on the geocoding platforms provided by each partner. A set of conditions had to be met, the main one being that the platform had to be a stand-alone desktop system. Of those that responded, five were able to provide evaluation licenses and reference data that could be installed and tested as part of the evaluation. Four of the five systems represent state-of-the-art and well known commercial geocoding system offerings from companies that provide geocoding solutions for Australia and elsewhere in the world. All systems remain anonymous in this paper as per non-disclosure agreements and are indicated simply by the names “Geocoder A” through “Geocoder E”; position in this list of five (A – E) was assigned randomly. Each geocoding system was tested using each applicable reference data source and input data combination.</p>
</sec>
<sec>
<title>Reference data sources</title>
<p>The reference data sources utilized in these experiments include the most up-to-date and accurate reference data files available for both the state of Western Australia (WA) and the entire country of Australia. The state-level files used were the Property Street Address (PSA) data files distributed by the Western Australian Land Information Authority (Landgate) [
<xref ref-type="bibr" rid="B57">57</xref>
]. These files include digital parcel boundaries (polygons) and parcel centroids (points) for all addresses in WA. Also used was an extension to the PSA, called PSA + within this report, which included spatially referenced place names also known as “alias tables”. These files are updated continuously and are the official government land records of the state which include the current postal address associated with each property.</p>
<p>The national-level files used in this study were the Geocoded National Address File (G-NAF) maintained and distributed by the Public Sector Mapping Agency (PSMA) Australia Limited [
<xref ref-type="bibr" rid="B58">58</xref>
]. These files are the nation-wide authoritative address data sources for the entire country of Australia. These data are collected from local, state, and national-level government agencies (including Landgate for WA), cleaned, integrated, and prepared for dissemination by PSMA. These data include the digital parcel boundaries (polygons) and parcel centroids (points) for nearly all addresses in Australia along with an associated current postal address associated with each property.</p>
</sec>
<sec>
<title>Input data sources</title>
<p>The input data used for this study were chosen to represent three tiers of data types. The three types of data include health service utilization data, administrative list data, and gold standard data. The quality of these data range from exceptionally clean data that have been manually corrected which all geocoding systems should be able to process correctly, to exceptionally dirty data that are known to contain high levels of challenging geocoding scenarios which should cause errors in all geocoding systems. These diverse sets of input data with varying quality were chosen in order to compare how each of the geocoding systems could handle differently input data qualities and tease out the differences in how the internal geocoder processing techniques added to or subtracted from the resulting geocode quality produced by each system. Data use agreements with the data stewards responsible for the collection, curation, and maintenance of the data sets (including the gold standard data) used in this evaluation preclude the naming of the data set or the government agencies that provided them.</p>
</sec>
<sec>
<title>Gold standard data</title>
<p>The gold standard data used for this study represent an exceptionally clean data set (data set A, n = 2,203) - a data source with no errors which should be correctly processed by all geocoding systems; non-matches in this system would be considered false negatives. This data set contained address data drawn from a previous, larger study. Each of the records in this data set represented an address that was not capable of being successfully geocoded using an automated geocoding system. These records were manually reviewed and processed to improve their output quality by verifying and/or correcting postal address attributes and the true location of the geocoded point following a method similar to that presented in Goldberg et al. (2008) [
<xref ref-type="bibr" rid="B39">39</xref>
]. The records were ground truthed using a variety of methods including aerial imagery, online “street view” software, contact of the parties responsible for the address to confirm address attributes, and linkage with official government records and public domain data sources. The result of these painstaking efforts was the construction of an input data set of addresses with attribute data (number, street name, suffix, locality, postcode, etc.) that were manually confirmed to be correct.</p>
</sec>
<sec>
<title>Administrative data</title>
<p>The administrative data set (data set B, n = 1,364,058) used for this study was drawn from official records of a large WA administrative database. These data contain the official addresses of a subset of residents of WA, and represent input address data that should be of fairly high quality. These data are representative of many administrative lists that are used to send out government mailings, confirm postal delivery addresses, and other essential government services.</p>
</sec>
<sec>
<title>Health service utilization data</title>
<p>The health service utilization data set (data set C, n = 1,264,941) used for this study was chosen to represent a data source with numerous errors in the input address which would be the most difficult to geocode and result in the highest number of non-matches, false positive matches (incorrect matches), and false negative non-matches (incorrect non-matches). These data were drawn from the health service utilization records of a specific Western Australian health agency and are representative of the quality of data that occur when data are collected through a patient-facing organization where the patient self-reports his/her postal address.</p>
<p>The primary challenges of these data were threefold –</p>
<p>• Blank fields in addresses resulting in input data with limited input address fields, sometimes with just a locality and/or just a postcode;</p>
<p>• Named places such as prisons, nursing homes, and Aboriginal communities, instead of street addresses; and</p>
<p>• Historical data which includes many versions of data input systems all of which captured data in different ways ranging over a number of years.</p>
<p>Variations to data collection procedures through time include:</p>
<p>• Truncations to save characters;</p>
<p>• Transposition and introduction of new fields as user interfaces were updated; and</p>
<p>• Use of various codes for unknown/missing information (e.g., entering postcode 9999 when the postcode was unknown versus leaving it blank or entering 0000).</p>
<p>These data included numerous types of other frequently occurring errors including misspellings to all components of the input address (number, street name, suffix, locality, postcode, etc.), the use of incorrect locality names and postcodes, and all combinations of missing attributes for all fields of the input address.</p>
</sec>
<sec>
<title>Experimental design</title>
<p>The experiments performed for this research attempted to apply the framework and metrics described above in the context of the Western Australia (WA) Department of Health (DoH) as a test-case for evaluating their applicability for comparing a set of available geocoding platforms. To do so, the characteristics of each geocoding system were assessed across each aspect of the evaluation framework presented earlier. Table 
<xref ref-type="table" rid="T9">9</xref>
was constructed in consultation with the WA DoH as the features and capabilities of geocoding systems which were important to the organization. Each system was evaluated based on published literature and documentation of the geocoding systems. Additional communication with each vendor was necessary to determine all capabilities because not all vendors use the same terminology for all items.</p>
<table-wrap position="float" id="T9">
<label>Table 9</label>
<caption>
<p>Operational capabilities results</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Evaluation metric</bold>
</th>
<th align="center">
<bold>Geocoder A</bold>
</th>
<th align="center">
<bold>Geocoder B</bold>
</th>
<th align="center">
<bold>Geocoder C</bold>
</th>
<th align="center">
<bold>Geocoder D</bold>
</th>
<th align="center">
<bold>Geocoder E</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>User-defined reference data layers license fee (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Specialized address parsing rules (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Specialized matching algorithms (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Specialized feature selection hierarchies (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes*
<hr></hr>
</td>
<td align="center" valign="bottom">Yes*
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Integration</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Operating system support (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes (Unix)
<hr></hr>
</td>
<td align="center" valign="bottom">No (Windows)
<hr></hr>
</td>
<td align="center" valign="bottom">No (Windows)
<hr></hr>
</td>
<td align="center" valign="bottom">Yes (Windows, Unix, Linux)
<hr></hr>
</td>
<td align="center" valign="bottom">Windows
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Native system and workflow integration (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Batch mode (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Interactive mode (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Manual review (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Confidentiality maintained (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Desktop version(Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>In-house server version (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Online API version (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Metadata</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Match rate (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Match type (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Match score (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Available (not by default)
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Spatial confidence (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Input address/matched address concordance (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Capabilities</bold>
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Automatic batch geocoding (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Manual review (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Alias tables (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>Weighted centroids (Y/N)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>Nearby address matching (Y/N)</bold>
</td>
<td align="center">Yes</td>
<td align="center">No</td>
<td align="center">No</td>
<td align="center">Yes</td>
<td align="center">Yes</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>* Only if street centroids, suburb and postcode reference data are available.</p>
</table-wrap-foot>
</table-wrap>
<p>The project team attempted to install each system 'out-of-the-box’ without customization as much as possible. This included importing reference data layers into some of the systems as necessary, i.e., those that did not include the reference data as part of the software, instead requiring a geocoding reference data layer to be constructed or specified. An exception to this is the programming required to install Geocoder A which is described below.</p>
<p>The three input data sets were batch-processed through each of the geocoding systems on the same team-member’s computer in sequence. No data filtering, data cleansing, address standardization, or address normalization operations were applied to any of the input data prior to geocoding being performed. All data were processed directly as received from the data custodians although the first step in most batch geocoding systems is to standardize and normalize the input data internally within the geocoding system [
<xref ref-type="bibr" rid="B59">59</xref>
].</p>
<p>The experiments performed controlled for differences in geocoding quality due to the three main components of geocoding systems: (a) input data quality; (b) geocoding algorithms which include all components of the geocoding system that are beyond the control of a geocode user – address standardization and normalization, feature matching, and feature interpolation; and (c) the reference data layers used. To do so, each of these three components was evaluated separately by constructing usage scenarios that attempted to vary one aspect and keep the other two constant. Each of these axes was tested by varying one and holding the other two constant.</p>
<p>For example, to test the effect of input data quality across each geocoding system, all three data sets where processed by each geocoder using the same reference data sources (as could be achieved based on different reference data set support per geocoder). Holding the reference data sets static and changing the input data set allowed for analysis of the overall effect of excellent (Gold Standard), moderate (Administrative), and poor (Health) quality data on each geocoding system. Similarly, the effect of reference data set usage was evaluated by holding the input data set constant and processing it with different combinations of reference data layers, per geocoding system.</p>
</sec>
</sec>
<sec>
<title>Results and discussion</title>
<sec>
<title>Reference data layers</title>
<p>Table 
<xref ref-type="table" rid="T10">10</xref>
lists the supported reference data layers per each geocoding system. Geocoding systems were evaluated on their ability to support the G-NAF, PSA, and PSA + data (PSA with additional alias names contained). Only one of the geocoding systems tested, Geocoder B, could support all three reference data layers without any additional development work and associated costs. All four other geocoding systems
<italic>could</italic>
have supported the additional reference data layers, but this would have required specialized customization and/or development work by the system providers which was beyond the budget and scope of the current research.</p>
<table-wrap position="float" id="T10">
<label>Table 10</label>
<caption>
<p>Reference data set support and setup time</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Geocoder</bold>
</th>
<th align="left">
<bold>G-NAF</bold>
</th>
<th align="left">
<bold>PSA</bold>
</th>
<th align="left">
<bold>PSA+</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom">
<bold>A</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – 20 mins
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – 3 weeks
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>B</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – 1 day
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – < 5 mins
<hr></hr>
</td>
<td align="center" valign="bottom">Yes– < 5 mins
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>C</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – < 5 mins
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – < 5 mins
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>D</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – < 5 mins
<hr></hr>
</td>
<td align="center" valign="bottom">Yes – < 5 mins
<hr></hr>
</td>
<td align="center" valign="bottom">No
<hr></hr>
</td>
</tr>
<tr>
<td align="left">
<bold>E</bold>
</td>
<td align="center">Yes – < 5 mins</td>
<td align="center">No</td>
<td align="center">No</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The most striking result shown in Table 
<xref ref-type="table" rid="T10">10</xref>
is that fact that Geocoder A took several weeks to import the latest data layers available. This amount of time was needed for specialized staff to perform custom programming to build in support for modern data formats (shapefiles and geodatabases instead of older formats). This update represented a large one-time investment for the Geocoder A system.</p>
</sec>
<sec>
<title>Processing time</title>
<p>Table 
<xref ref-type="table" rid="T11">11</xref>
lists processing times required to geocode all records within each input data set using each applicable reference data layer within each geocoding system. In general, all but Geocoder A processed data at roughly the same speed given the number of records. In all instances, the processing speed was deemed acceptable for the number of records due to the fact that they were processed in batch for non-real time purposes.</p>
<table-wrap position="float" id="T11">
<label>Table 11</label>
<caption>
<p>Processing time by geocoding system, reference data set, and input data set</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Dataset</bold>
</th>
<th align="left">
<bold>Reference data</bold>
</th>
<th align="left">
<bold>Geocoder A</bold>
</th>
<th align="left">
<bold>Geocoder B</bold>
</th>
<th align="left">
<bold>Geocoder C</bold>
</th>
<th align="left">
<bold>Geocoder D</bold>
</th>
<th align="left">
<bold>Geocoder E</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center" valign="bottom">
<bold>A – Gold Standard</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">
<bold>(n = 2,203)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA+
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">G-NAF
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
<td align="center" valign="bottom"><2 m
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">
<bold>B – Administrative</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA
<hr></hr>
</td>
<td align="center" valign="bottom">45 m
<hr></hr>
</td>
<td align="center" valign="bottom">19 m
<hr></hr>
</td>
<td align="center" valign="bottom">13 m
<hr></hr>
</td>
<td align="center" valign="bottom">17 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">
<bold>(n = 1,364,058)</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA+
<hr></hr>
</td>
<td align="center" valign="bottom">39 m
<hr></hr>
</td>
<td align="center" valign="bottom">19 m
<hr></hr>
</td>
<td align="center" valign="bottom">12 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">G-NAF
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom">24 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom">34 m
<hr></hr>
</td>
<td align="center" valign="bottom">13 m
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">
<bold>C – Health</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA
<hr></hr>
</td>
<td align="center" valign="bottom">55 m
<hr></hr>
</td>
<td align="center" valign="bottom">16 m
<hr></hr>
</td>
<td align="center" valign="bottom">16 m
<hr></hr>
</td>
<td align="center" valign="bottom">25 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center" valign="bottom">
<bold>(n = 998,066 )</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">PSA+
<hr></hr>
</td>
<td align="center" valign="bottom">2 h 25 m
<hr></hr>
</td>
<td align="center" valign="bottom">22 m
<hr></hr>
</td>
<td align="center" valign="bottom">17 m
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
<td align="center" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="center"> </td>
<td align="center">G-NAF</td>
<td align="center">-</td>
<td align="center">19 m</td>
<td align="center">-</td>
<td align="center">30 m</td>
<td align="center">23 m</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Operating metric comparison</title>
<p>Each of the five geocoding systems was evaluated using the operational capabilities described above. Table 
<xref ref-type="table" rid="T9">9</xref>
displays the evaluation results of each geocoding system against an operational capabilities matrix derived from the above metrics which were deemed important within the context of the WA DoH usage scenario. Using this data, it is possible to make comparative assessments of the match rates across the varying geocoding systems. It is expected that different organizations and/or usage scenarios would choose or develop alternative/additional framework metrics to evaluate geocoding systems based on the most important operational and performance needs of the organization.</p>
</sec>
<sec>
<title>Match type and match rate summary</title>
<p>The match type and match rate results from each of the five geocoding systems are displayed in Tables 
<xref ref-type="table" rid="T12">12</xref>
,
<xref ref-type="table" rid="T13">13</xref>
, and
<xref ref-type="table" rid="T14">14</xref>
. These results are divided between input data set and applicable reference data layers for each geocoding system. The results are divided into 'Parcel’ level match and 'Non-Parcel’ level match. For the geocoding systems that indicate a match type (Geocoder D and Geocoder E), this output was used directly to determine 'Parcel’ level and 'Non-Parcel’ level matches. For those systems which did not indicate a type of match, but instead assign a match score only – a level of similarity between the input and reference data – (Geocoder A, B, and C) thresholds of match scores were selected to represent 'Parcel’ level and 'Non-Parcel’ level geocodes.</p>
<table-wrap position="float" id="T12">
<label>Table 12</label>
<caption>
<p>Input data A (Gold standard) match type and match rate summary (n = 2203 records)</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Geocoder</bold>
</th>
<th align="left">
<bold>Reference</bold>
</th>
<th colspan="2" align="right">
<bold>'Parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>'Non-parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>Geocoded</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>Data</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>A</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1875
<hr></hr>
</td>
<td align="right" valign="bottom">85.1
<hr></hr>
</td>
<td align="right" valign="bottom">303
<hr></hr>
</td>
<td align="right" valign="bottom">13.8
<hr></hr>
</td>
<td align="right" valign="bottom">2178
<hr></hr>
</td>
<td align="right" valign="bottom">98.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1875
<hr></hr>
</td>
<td align="right" valign="bottom">85.1
<hr></hr>
</td>
<td align="right" valign="bottom">303
<hr></hr>
</td>
<td align="right" valign="bottom">13.8
<hr></hr>
</td>
<td align="right" valign="bottom">2178
<hr></hr>
</td>
<td align="right" valign="bottom">98.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>B</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1765
<hr></hr>
</td>
<td align="right" valign="bottom">80.1
<hr></hr>
</td>
<td align="right" valign="bottom">67
<hr></hr>
</td>
<td align="right" valign="bottom">3.0
<hr></hr>
</td>
<td align="right" valign="bottom">1832
<hr></hr>
</td>
<td align="right" valign="bottom">83.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1624
<hr></hr>
</td>
<td align="right" valign="bottom">73.7
<hr></hr>
</td>
<td align="right" valign="bottom">77
<hr></hr>
</td>
<td align="right" valign="bottom">3.5
<hr></hr>
</td>
<td align="right" valign="bottom">1701
<hr></hr>
</td>
<td align="right" valign="bottom">77.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1624
<hr></hr>
</td>
<td align="right" valign="bottom">73.7
<hr></hr>
</td>
<td align="right" valign="bottom">77
<hr></hr>
</td>
<td align="right" valign="bottom">3.5
<hr></hr>
</td>
<td align="right" valign="bottom">1701
<hr></hr>
</td>
<td align="right" valign="bottom">77.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>C</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1696
<hr></hr>
</td>
<td align="right" valign="bottom">77.0
<hr></hr>
</td>
<td align="right" valign="bottom">21
<hr></hr>
</td>
<td align="right" valign="bottom">1.0
<hr></hr>
</td>
<td align="right" valign="bottom">1717
<hr></hr>
</td>
<td align="right" valign="bottom">77.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1696
<hr></hr>
</td>
<td align="right" valign="bottom">77.0
<hr></hr>
</td>
<td align="right" valign="bottom">21
<hr></hr>
</td>
<td align="right" valign="bottom">1.0
<hr></hr>
</td>
<td align="right" valign="bottom">1717
<hr></hr>
</td>
<td align="right" valign="bottom">77.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>D</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1959
<hr></hr>
</td>
<td align="right" valign="bottom">88.9
<hr></hr>
</td>
<td align="right" valign="bottom">236
<hr></hr>
</td>
<td align="right" valign="bottom">10.7
<hr></hr>
</td>
<td align="right" valign="bottom">2195
<hr></hr>
</td>
<td align="right" valign="bottom">99.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1938
<hr></hr>
</td>
<td align="right" valign="bottom">88.0
<hr></hr>
</td>
<td align="right" valign="bottom">257
<hr></hr>
</td>
<td align="right" valign="bottom">11.7
<hr></hr>
</td>
<td align="right" valign="bottom">2195
<hr></hr>
</td>
<td align="right" valign="bottom">99.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>E</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1991
<hr></hr>
</td>
<td align="right" valign="bottom">90.4
<hr></hr>
</td>
<td align="right" valign="bottom">212
<hr></hr>
</td>
<td align="right" valign="bottom">9.6
<hr></hr>
</td>
<td align="right" valign="bottom">2203
<hr></hr>
</td>
<td align="right" valign="bottom">100.0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="center">
<bold>PSA+</bold>
</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T13">
<label>Table 13</label>
<caption>
<p>Input data B (Administrative) match type and match rate summary (n = 1364058 records)</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Geocoder</bold>
</th>
<th align="left">
<bold>Reference</bold>
</th>
<th colspan="2" align="right">
<bold>'Parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>'Non-parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>Geocoded</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>Data</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>A</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1306310
<hr></hr>
</td>
<td align="right" valign="bottom">95.8
<hr></hr>
</td>
<td align="right" valign="bottom">55907
<hr></hr>
</td>
<td align="right" valign="bottom">4.1
<hr></hr>
</td>
<td align="right" valign="bottom">1362217
<hr></hr>
</td>
<td align="right" valign="bottom">99.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1313046
<hr></hr>
</td>
<td align="right" valign="bottom">96.3
<hr></hr>
</td>
<td align="right" valign="bottom">49805
<hr></hr>
</td>
<td align="right" valign="bottom">3.7
<hr></hr>
</td>
<td align="right" valign="bottom">1362851
<hr></hr>
</td>
<td align="right" valign="bottom">99.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>B</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1136220
<hr></hr>
</td>
<td align="right" valign="bottom">83.3
<hr></hr>
</td>
<td align="right" valign="bottom">36915
<hr></hr>
</td>
<td align="right" valign="bottom">2.7
<hr></hr>
</td>
<td align="right" valign="bottom">1173135
<hr></hr>
</td>
<td align="right" valign="bottom">86.0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1165034
<hr></hr>
</td>
<td align="right" valign="bottom">85.4
<hr></hr>
</td>
<td align="right" valign="bottom">58664
<hr></hr>
</td>
<td align="right" valign="bottom">4.3
<hr></hr>
</td>
<td align="right" valign="bottom">1223698
<hr></hr>
</td>
<td align="right" valign="bottom">89.7
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1165034
<hr></hr>
</td>
<td align="right" valign="bottom">85.4
<hr></hr>
</td>
<td align="right" valign="bottom">58664
<hr></hr>
</td>
<td align="right" valign="bottom">4.3
<hr></hr>
</td>
<td align="right" valign="bottom">1223698
<hr></hr>
</td>
<td align="right" valign="bottom">89.7
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>C</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1219245
<hr></hr>
</td>
<td align="right" valign="bottom">89.4
<hr></hr>
</td>
<td align="right" valign="bottom">21932
<hr></hr>
</td>
<td align="right" valign="bottom">1.6
<hr></hr>
</td>
<td align="right" valign="bottom">1241177
<hr></hr>
</td>
<td align="right" valign="bottom">91.0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1219245
<hr></hr>
</td>
<td align="right" valign="bottom">89.4
<hr></hr>
</td>
<td align="right" valign="bottom">21932
<hr></hr>
</td>
<td align="right" valign="bottom">1.6
<hr></hr>
</td>
<td align="right" valign="bottom">1241177
<hr></hr>
</td>
<td align="right" valign="bottom">91.0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>D</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1318281
<hr></hr>
</td>
<td align="right" valign="bottom">96.6
<hr></hr>
</td>
<td align="right" valign="bottom">43825
<hr></hr>
</td>
<td align="right" valign="bottom">3.2
<hr></hr>
</td>
<td align="right" valign="bottom">1362106
<hr></hr>
</td>
<td align="right" valign="bottom">99.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1325911
<hr></hr>
</td>
<td align="right" valign="bottom">97.2
<hr></hr>
</td>
<td align="right" valign="bottom">35442
<hr></hr>
</td>
<td align="right" valign="bottom">2.6
<hr></hr>
</td>
<td align="right" valign="bottom">1361353
<hr></hr>
</td>
<td align="right" valign="bottom">99.8
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>E</bold>
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">1329627
<hr></hr>
</td>
<td align="right" valign="bottom">97.5
<hr></hr>
</td>
<td align="right" valign="bottom">34431
<hr></hr>
</td>
<td align="right" valign="bottom">2.5
<hr></hr>
</td>
<td align="right" valign="bottom">1364058
<hr></hr>
</td>
<td align="right" valign="bottom">100.0
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="center" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="center">
<bold>PSA+</bold>
</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap position="float" id="T14">
<label>Table 14</label>
<caption>
<p>Input data C (Health) match type and match rate summary (n = 998066 records)</p>
</caption>
<table frame="hsides" rules="groups" border="1">
<colgroup>
<col align="left"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
<col align="center"></col>
</colgroup>
<thead valign="top">
<tr>
<th align="left">
<bold>Geocoder</bold>
</th>
<th align="right">
<bold>Reference</bold>
</th>
<th colspan="2" align="right">
<bold>'Parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>'Non-parcel’ level</bold>
</th>
<th colspan="2" align="right">
<bold>Geocoded</bold>
</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>Data</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>N</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>%</bold>
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>A</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">712645
<hr></hr>
</td>
<td align="right" valign="bottom">71.4
<hr></hr>
</td>
<td align="right" valign="bottom">149309
<hr></hr>
</td>
<td align="right" valign="bottom">15.0
<hr></hr>
</td>
<td align="right" valign="bottom">861954
<hr></hr>
</td>
<td align="right" valign="bottom">86.4
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">724326
<hr></hr>
</td>
<td align="right" valign="bottom">72.6
<hr></hr>
</td>
<td align="right" valign="bottom">145595
<hr></hr>
</td>
<td align="right" valign="bottom">14.6
<hr></hr>
</td>
<td align="right" valign="bottom">869921
<hr></hr>
</td>
<td align="right" valign="bottom">87.2
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>B</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">446182
<hr></hr>
</td>
<td align="right" valign="bottom">44.7
<hr></hr>
</td>
<td align="right" valign="bottom">101049
<hr></hr>
</td>
<td align="right" valign="bottom">10.1
<hr></hr>
</td>
<td align="right" valign="bottom">547231
<hr></hr>
</td>
<td align="right" valign="bottom">54.8
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">486188
<hr></hr>
</td>
<td align="right" valign="bottom">48.7
<hr></hr>
</td>
<td align="right" valign="bottom">78508
<hr></hr>
</td>
<td align="right" valign="bottom">7.9
<hr></hr>
</td>
<td align="right" valign="bottom">564696
<hr></hr>
</td>
<td align="right" valign="bottom">56.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">486188
<hr></hr>
</td>
<td align="right" valign="bottom">48.7
<hr></hr>
</td>
<td align="right" valign="bottom">78508
<hr></hr>
</td>
<td align="right" valign="bottom">7.9
<hr></hr>
</td>
<td align="right" valign="bottom">564696
<hr></hr>
</td>
<td align="right" valign="bottom">56.6
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>C</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">440062
<hr></hr>
</td>
<td align="right" valign="bottom">44.1
<hr></hr>
</td>
<td align="right" valign="bottom">27806
<hr></hr>
</td>
<td align="right" valign="bottom">2.8
<hr></hr>
</td>
<td align="right" valign="bottom">467868
<hr></hr>
</td>
<td align="right" valign="bottom">46.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">440062
<hr></hr>
</td>
<td align="right" valign="bottom">44.1
<hr></hr>
</td>
<td align="right" valign="bottom">27806
<hr></hr>
</td>
<td align="right" valign="bottom">2.8
<hr></hr>
</td>
<td align="right" valign="bottom">467868
<hr></hr>
</td>
<td align="right" valign="bottom">46.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>D</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">734518
<hr></hr>
</td>
<td align="right" valign="bottom">73.6
<hr></hr>
</td>
<td align="right" valign="bottom">211175
<hr></hr>
</td>
<td align="right" valign="bottom">21.2
<hr></hr>
</td>
<td align="right" valign="bottom">945693
<hr></hr>
</td>
<td align="right" valign="bottom">94.8
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">725115
<hr></hr>
</td>
<td align="right" valign="bottom">72.7
<hr></hr>
</td>
<td align="right" valign="bottom">217965
<hr></hr>
</td>
<td align="right" valign="bottom">21.8
<hr></hr>
</td>
<td align="right" valign="bottom">943080
<hr></hr>
</td>
<td align="right" valign="bottom">94.5
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA+</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom">
<bold>E</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>G-NAF</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">716241
<hr></hr>
</td>
<td align="right" valign="bottom">71.8
<hr></hr>
</td>
<td align="right" valign="bottom">271326
<hr></hr>
</td>
<td align="right" valign="bottom">27.2
<hr></hr>
</td>
<td align="right" valign="bottom">987567
<hr></hr>
</td>
<td align="right" valign="bottom">98.9
<hr></hr>
</td>
</tr>
<tr>
<td align="left" valign="bottom"> 
<hr></hr>
</td>
<td align="right" valign="bottom">
<bold>PSA</bold>
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
<td align="right" valign="bottom">-
<hr></hr>
</td>
</tr>
<tr>
<td align="left"> </td>
<td align="right">
<bold>PSA+</bold>
</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
<td align="right">-</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Interpretation and discussion</title>
<p>Functionally, the biggest issues which affected geocoder performance were 1) the ability to include additional reference data layers; and 2) the ability to include alias tables. The geocoding systems evaluated in this project spanned the spectrum of flexibility in this regard. For example, Geocoder A included alias tables but could not include the G-NAF data. The Geocoding B and C systems allowed users to include any parcel based file (G-NAF and PSA) but encountered challenges including alias tables, although the documentation reports that these could be added if data layers can be obtained and formatted properly. Geocoder D could include both G-NAF and PSA but could not utilize alias tables (PSA+), while Geocoder E could only utilize G-NAF without costly development at the Geocoder E organization to include the PSA or PSA + files.</p>
<p>The impact of including alias tables was evident when inspecting the results of the data set C (Health). Geocoder A was the only one that could include these and, as a result, was the only system that performed well on this data set. These data were known to include a high degree of named places such as nursing homes and caravan parks which are not geocodable without the inclusion of alias tables. Conversely, the lack of support for G-NAF data did not appear to be a major problem that affected the quality of Geocode A performance. Australia has a unique addressing system, which is why address-range geocoding systems [
<xref ref-type="bibr" rid="B32">32</xref>
] are used less frequently than parcel or address-point based systems [
<xref ref-type="bibr" rid="B28">28</xref>
]. The increase in quality of output data from systems which included alias tables may also be an artefact of the addressing system used in Australia.</p>
<p>Other differences between geocoding systems related to the amount and quality of metadata returned along with a result. Geocoder A returned a quantitative value describing an area within which the geocode is considered to fall. Geocoders B and C, on the other hand, return a match score describing the similarity between the input address and the geographic feature that was matched to. Both Geocoders D and E provided a greater degree of detail about the specific attributes of the input address that matched the reference feature, as well as details about the geographic level of the match and/or mismatch of these attributes. These types of details permit a user a greater understanding of the match quality than a simple match score, but do not provide a quantitative spatial measure with which to understand how spatially in/accurate an output geocode could be.</p>
<p>As noted above, the most pronounced operational distinction between geocoding systems was the setup time necessary to build a geocoding system and the amount of specialized skill required to maintain the system. Geocoder A was the most difficult to setup for the evaluation due to required programming. With in-depth documentation and the upgrade to modern data formats completed, this time may be reduced going forward, but in all cases it will remain a time-consuming task which requires specially-trained staff to be in-house experts. All other geocoding systems could be installed, setup, and run in a fraction of the time required to update Geocoder A to the latest version of the PSA and PSA + data.</p>
<p>As results demonstrate, the overall quality of input data had a pronounced impact on the quality of the output results. Data set C (Health), known to include a high degree of difficult cases such as named places, resulted in the worst output geocode quality across systems. Similarly, the high quality data (Gold Standard) resulted in the highest quality matches. These results are indicative of the fact that input data quality matters. The results demonstrate that, wherever possible, input data should be cleaned as close to the source of collection as possible.</p>
<p>To account for these and other errors in the input data, the geocoding algorithms, or the reference data used by the geocoding systems, manual geocoding may need to be performed to correct or otherwise assign records that could not be processed. The degree to which manual geocoding procedures are linked into the automated geocode process varied between the geocoding systems. Geocoders A, B and C included a post-processing step to automatically update the output files. These geocoding systems offer the ability for a user to review specific types of records, make corrections, and offer candidate matches. Geocoders A, B and C take roughly the same amount of time to process individual records and offer the key benefit that they work directly on the output data file and update an output geocode’s value once it is reprocessed so that table joining between processed and post-processed data are not required.</p>
<p>A central question a reader should be asking at this point is: How should the findings presented here, or a similar evaluation performed by another organization or on a different set of geocoders, be used to decide which geocoding system should be the correct choice? The answer is unfortunately not straightforward. As discussed above, every organization is different and will value certain aspects or capabilities of geocoding systems more or less than another organization. Every organization will have different strengths (in-house programming skills, for example) or resources (access to reference data layers, for example) which will affect the cost-benefit equation used to rank geocoding choices.</p>
<p>One potential and simple method that could be used to determine the correct choice would be to borrow from suitability research [
<xref ref-type="bibr" rid="B60">60</xref>
]. First determine which geocoding system criteria are important and which are not. This list may include each of the criteria we have described here, a subset thereof, or others that may be important to an organization but were not listed in the set presented here. Next, assign a relative weight of importance to each of these criteria so that some things are more important than others – i.e., nice-to-have’s versus must-have’s. Next perform a capability analysis across each of the criteria for each geocoder and assign the appropriate binary (1/0) or scaled scores depending on the data type determined or each criteria (i.e., nominal, ordinal, ratio, or interval data). These analyses could simply assess capabilities like those listed in Table 
<xref ref-type="table" rid="T2">2</xref>
,
<xref ref-type="table" rid="T3">3</xref>
,
<xref ref-type="table" rid="T4">4</xref>
,
<xref ref-type="table" rid="T5">5</xref>
,
<xref ref-type="table" rid="T6">6</xref>
,
<xref ref-type="table" rid="T7">7</xref>
and
<xref ref-type="table" rid="T8">8</xref>
or they could include large-scale geocoding system performance tests as we have done here in order to determine a subset of the performance metrics listed in Table 
<xref ref-type="table" rid="T1">1</xref>
.</p>
<p>Once all geocoders are scored across all criteria, the most promising option should rise to the top. A central purpose of performing the current research to develop a methodology of assessing geocoding systems was to enable just this type of analysis for making geocoding system decision at the WA DoH. However, the exact criteria and their weightings to be used in the WA DoH decision-making process are not presented here; instead just the methodology organizations could follow to do similar tasks on their own.</p>
</sec>
<sec>
<title>Evaluation framework limitations</title>
<p>Not all enumerations of all geocoding test scenarios could be performed due to limitations in the flexibility of various geocoding systems. For example, the use of alias tables could not be turned off in Geocoder A; nor could G-NAF data be loaded. This mean that results from Geocoder A could not be included in the analyses that determined the benefits of (a) local versus national reference data files, and (b) the use of alias tables versus the non-use. Similarly, all but Geocoder B had limitations to the types of reference data layers that could be utilized.</p>
</sec>
</sec>
<sec sec-type="conclusions">
<title>Conclusion</title>
<p>The central goal of this paper was to present an objective methodology for comparing geocoding systems. The purpose of such a methodology is to assist in the decision-making process when evaluating the performance and utility of a range of geocoding systems. The particular evaluation context investigated here was a case study involving a typical geocoding use-case performed within a large government agency for which geocoding is a mission-critical task. This organizational case study and the current techniques employed within the organization geocoding can, in many ways, be seen as representative of many large organizations within the public or private sector around the globe. Like others, the organization in this study has spent a considerable amount of time and effort developing a geocoding process that is integral to its core business. The geocoding system currently in place is tightly integrated into other core operational and workflow systems, has been highly tailored to the type of data it encounters, and has produced results of sufficient quality for a range of users.</p>
<p>Notwithstanding current arrangements, there are many reasons why decision-makers may wish to perform an analysis of other available geocoding platforms, in part to identify other alternatives that might work better, be cheaper, or offer an enhanced set of services. In particular, government systems continue to be enhanced, the cost of hardware continues to drop, and data processing operations within government agencies are continually reviewed for opportunities for modernization and streamlining to better serve the public at lower costs. Government departments and private industry continually re-evaluate practices to seek better ways of operating.</p>
<p>The purpose of the methodology developed here is to act as a tool for gathering data for use by decision-makers. The quantitative data generated by the framework presented here must be used in coordination with other strategic initiatives within an organization in order to make the most informed and rational decision, given the specific context and plan of an organization.</p>
</sec>
<sec>
<title>Endnotes</title>
<p>
<sup>a</sup>
<ext-link ext-link-type="uri" xlink:href="http://geocoder.us/">http://geocoder.us/</ext-link>
</p>
<p>
<sup>b</sup>
<ext-link ext-link-type="uri" xlink:href="http://www.census.gov/geo/www/tiger/">http://www.census.gov/geo/www/tiger/</ext-link>
</p>
<p>
<sup>c</sup>
<ext-link ext-link-type="uri" xlink:href="http://www.pagcgeo.org/">http://www.pagcgeo.org/</ext-link>
</p>
</sec>
<sec>
<title>Competing interests</title>
<p>The authors declare they have no competing interests.</p>
</sec>
<sec>
<title>Authors’ contributions</title>
<p>DWG conceived of the study design, performed data analysis, and drafted the manuscript. MB contributed to study design, ran the experiments, and performed data analysis. NM contributed to study design and facilitated the acquisition of test software and reference data. GC and DR contributed to study design, facilitated the acquisition and preparation of study data, enabled study execution. AF, JB & JS contributed to study design, data preparation, and analysis. All authors contributed to and reviewed the final manuscript.</p>
</sec>
</body>
<back>
<sec>
<title>Acknowledgements</title>
<p>This study was performed with support from the Cooperative Research Center for Spatial Information (CRC-SI) under contract number P4.43 and the WA DoH under contract number P4.43. Dr. Goldberg was support in part by award number 5P30ES007048 from the US National Institute of Environmental Health Sciences, contract number N01-PC-35139 from the US National Cancer Institute, and by cooperative agreement number 1H13EH000793-01 from the US Centers for Disease Control and Prevention.</p>
<p>The authors wish to thank staff from the Western Australia Department of Health; Curtin University; CRC-SI; New South Wales Department of Health; Landgate for contributions which facilitated the execution of this work.</p>
</sec>
<ref-list>
<ref id="B1">
<mixed-citation publication-type="journal">
<name>
<surname>Zandbergen</surname>
<given-names>PA</given-names>
</name>
<article-title>Geocoding quality and implications for spatial analysis</article-title>
<source>Geogr Compass</source>
<year>2009</year>
<volume>3</volume>
<issue>2</issue>
<fpage>647</fpage>
<lpage>680</lpage>
<pub-id pub-id-type="doi">10.1111/j.1749-8198.2008.00205.x</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<mixed-citation publication-type="book">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<source>A geocoding best practices guide</source>
<year>2008</year>
<publisher-name>North American Association of Central Cancer Registries: Springfield, IL</publisher-name>
</mixed-citation>
</ref>
<ref id="B3">
<mixed-citation publication-type="journal">
<name>
<surname>Mazumdar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rushton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>BJ</given-names>
</name>
<name>
<surname>Zimmerman</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Donham</surname>
<given-names>KJ</given-names>
</name>
<article-title>Geocoding accuracy and the recovery of relationships between environmental exposures and health</article-title>
<source>Int J Health Geogr</source>
<year>2008</year>
<volume>7</volume>
<fpage>13</fpage>
<lpage>31</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-7-13</pub-id>
<pub-id pub-id-type="pmid">18387189</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<mixed-citation publication-type="journal">
<name>
<surname>McElroy</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Remington</surname>
<given-names>PL</given-names>
</name>
<name>
<surname>Trentham-Dietz</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Roberts</surname>
<given-names>SA</given-names>
</name>
<name>
<surname>Newcomber</surname>
<given-names>PA</given-names>
</name>
<article-title>Geocoding addresses from a large population based study: lessons learned</article-title>
<source>Epidemiology</source>
<year>2003</year>
<volume>14</volume>
<issue>4</issue>
<fpage>399</fpage>
<lpage>407</lpage>
<pub-id pub-id-type="pmid">12843762</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<mixed-citation publication-type="journal">
<name>
<surname>Oliver</surname>
<given-names>MN</given-names>
</name>
<name>
<surname>Matthews</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Siadaty</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Hauck</surname>
<given-names>FR</given-names>
</name>
<name>
<surname>Pickle</surname>
<given-names>LW</given-names>
</name>
<article-title>Geographic bias related to geocoding in epidemiologic studies</article-title>
<source>Int J Health Geogr</source>
<year>2005</year>
<volume>4</volume>
<fpage>29</fpage>
<lpage>38</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-4-29</pub-id>
<pub-id pub-id-type="pmid">16281976</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<mixed-citation publication-type="journal">
<name>
<surname>Rushton</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Armstrong</surname>
<given-names>MP</given-names>
</name>
<name>
<surname>Gittler</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Greene</surname>
<given-names>BR</given-names>
</name>
<name>
<surname>Pavlik</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>West</surname>
<given-names>MM</given-names>
</name>
<name>
<surname>Zimmerman</surname>
<given-names>DL</given-names>
</name>
<article-title>Geocoding in cancer research: a review</article-title>
<source>Am J Prev Med</source>
<year>2006</year>
<volume>30</volume>
<issue>2</issue>
<fpage>S16</fpage>
<lpage>S24</lpage>
<pub-id pub-id-type="doi">10.1016/j.amepre.2005.09.011</pub-id>
<pub-id pub-id-type="pmid">16458786</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<mixed-citation publication-type="journal">
<name>
<surname>Schootman</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Sterling</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Struthers</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Laboube</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Emo</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Higgs</surname>
<given-names>G</given-names>
</name>
<article-title>Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research</article-title>
<source>Ann Epidemiol</source>
<year>2007</year>
<volume>17</volume>
<issue>6</issue>
<fpage>379</fpage>
<lpage>387</lpage>
</mixed-citation>
</ref>
<ref id="B8">
<mixed-citation publication-type="journal">
<name>
<surname>Skelly</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Black</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Hearnden</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Eyles</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Weinstein</surname>
<given-names>P</given-names>
</name>
<article-title>Disease surveillance in rural communities is compromised by address geocoding uncertainty: a case study of campylobacteriosis</article-title>
<source>Aust J Rural Health</source>
<year>2002</year>
<volume>10</volume>
<issue>2</issue>
<fpage>87</fpage>
<lpage>93</lpage>
<pub-id pub-id-type="pmid">12047502</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<mixed-citation publication-type="journal">
<name>
<surname>Zandbergen</surname>
<given-names>PA</given-names>
</name>
<article-title>Influence of geocoding quality on environmental exposure assessment of children living near high traffic roads</article-title>
<source>BMC Public Health</source>
<year>2007</year>
<volume>7</volume>
<fpage>37</fpage>
<lpage>50</lpage>
<pub-id pub-id-type="doi">10.1186/1471-2458-7-37</pub-id>
<pub-id pub-id-type="pmid">17367533</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<mixed-citation publication-type="journal">
<name>
<surname>Zhan</surname>
<given-names>FB</given-names>
</name>
<name>
<surname>Brender</surname>
<given-names>JD</given-names>
</name>
<name>
<surname>De Lima</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Suarez</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Langlois</surname>
<given-names>PH</given-names>
</name>
<article-title>Match rate and positional accuracy of two geocoding methods for epidemiologic research</article-title>
<source>Ann Epidemiol</source>
<year>2006</year>
<volume>16</volume>
<issue>11</issue>
<fpage>842</fpage>
<lpage>849</lpage>
<pub-id pub-id-type="doi">10.1016/j.annepidem.2006.08.001</pub-id>
<pub-id pub-id-type="pmid">17027286</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<mixed-citation publication-type="journal">
<name>
<surname>Gilboa</surname>
<given-names>SM</given-names>
</name>
<name>
<surname>Mendola</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Olshan</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Harness</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Loomis</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Langlois</surname>
<given-names>PH</given-names>
</name>
<name>
<surname>Savitz</surname>
<given-names>DA</given-names>
</name>
<name>
<surname>Herring</surname>
<given-names>AH</given-names>
</name>
<article-title>Comparison of residential geocoding methods in population-based study of air quality and birth defects</article-title>
<source>Environ Res</source>
<year>2006</year>
<volume>101</volume>
<issue>2</issue>
<fpage>256</fpage>
<lpage>262</lpage>
<pub-id pub-id-type="doi">10.1016/j.envres.2006.01.004</pub-id>
<pub-id pub-id-type="pmid">16483563</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<mixed-citation publication-type="journal">
<name>
<surname>Johnson</surname>
<given-names>SD</given-names>
</name>
<article-title>Address matching with stand-alone geocoding engines: part 2</article-title>
<source>Bus Geogr</source>
<year>1998</year>
<volume>6</volume>
<fpage>30</fpage>
<lpage>36</lpage>
</mixed-citation>
</ref>
<ref id="B13">
<mixed-citation publication-type="book">
<name>
<surname>Lixin</surname>
<given-names>Y</given-names>
</name>
<source>Development and evaluation of a framework for assessing the efficiency and accuracy of street address geocoding strategies</source>
<year>1996</year>
<publisher-name>Albany, NY: University at Albany, State University of New York</publisher-name>
</mixed-citation>
</ref>
<ref id="B14">
<mixed-citation publication-type="journal">
<name>
<surname>Lovasi</surname>
<given-names>GS</given-names>
</name>
<name>
<surname>Weiss</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Hoskins</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Whitsel</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Rice</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Erickson</surname>
<given-names>CF</given-names>
</name>
<name>
<surname>Psaty</surname>
<given-names>BM</given-names>
</name>
<article-title>Comparing a single-stage geocoding method to a multi-stage geocoding method: how much and where do they disagree?</article-title>
<source>Int J Health Geogr</source>
<year>2007</year>
<volume>6</volume>
<fpage>12</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-6-12</pub-id>
<pub-id pub-id-type="pmid">17367520</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<mixed-citation publication-type="book">
<name>
<surname>Swift</surname>
<given-names>JN</given-names>
</name>
<name>
<surname>Goldberg</surname>
<given-names>DW</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>JP</given-names>
</name>
<source>Geocoding best practices: review of eight commonly used geocoding systems</source>
<year>2008</year>
<publisher-name>Los Angeles, CA: University of Southern California GIS Research Laboratory</publisher-name>
</mixed-citation>
</ref>
<ref id="B16">
<mixed-citation publication-type="journal">
<name>
<surname>Cayo</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Talbot</surname>
<given-names>TO</given-names>
</name>
<article-title>Positional error in automated geocoding of residential addresses</article-title>
<source>Int J Health Geogr</source>
<year>2003</year>
<volume>2</volume>
<fpage>10</fpage>
<lpage>22</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-2-10</pub-id>
<pub-id pub-id-type="pmid">14687425</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<mixed-citation publication-type="journal">
<name>
<surname>Davis</surname>
<given-names>CA</given-names>
<suffix>Jr</suffix>
</name>
<name>
<surname>Fonseca</surname>
<given-names>FT</given-names>
</name>
<article-title>Assessing the certainty of locations produced by an address geocoding system</article-title>
<source>GeoInformatica</source>
<year>2007</year>
<volume>11</volume>
<issue>1</issue>
<fpage>103</fpage>
<lpage>129</lpage>
<pub-id pub-id-type="doi">10.1007/s10707-006-0015-7</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<mixed-citation publication-type="journal">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cockburn</surname>
<given-names>M</given-names>
</name>
<article-title>The effect of administrative boundaries and geocoding error on cancer rates in California</article-title>
<source>Spat Spatio-Temporal Epidemiol</source>
<year>2012</year>
<volume>3</volume>
<issue>1</issue>
<fpage>39</fpage>
<lpage>54</lpage>
<pub-id pub-id-type="doi">10.1016/j.sste.2012.02.005</pub-id>
<pub-id pub-id-type="pmid">22469490</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<mixed-citation publication-type="journal">
<name>
<surname>Karimi</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Durcik</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rasdorf</surname>
<given-names>W</given-names>
</name>
<article-title>Evaluation of uncertainties associated with geocoding techniques</article-title>
<source>J Comput Aided Civ Infrastruct Eng</source>
<year>2004</year>
<volume>19</volume>
<issue>3</issue>
<fpage>170</fpage>
<lpage>185</lpage>
<pub-id pub-id-type="doi">10.1111/j.1467-8667.2004.00346.x</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<mixed-citation publication-type="book">
<name>
<surname>Nicoara</surname>
<given-names>G</given-names>
</name>
<source>Exploring the geocoding process: a municipal case study using crime data</source>
<year>2005</year>
<publisher-name>Dallas, TX: University of Texas at Dallas</publisher-name>
</mixed-citation>
</ref>
<ref id="B21">
<mixed-citation publication-type="journal">
<name>
<surname>Whitsel</surname>
<given-names>EA</given-names>
</name>
<name>
<surname>Quibrera</surname>
<given-names>PM</given-names>
</name>
<name>
<surname>Smith</surname>
<given-names>RL</given-names>
</name>
<name>
<surname>Catellier</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Henley</surname>
<given-names>AC</given-names>
</name>
<name>
<surname>Heiss</surname>
<given-names>G</given-names>
</name>
<article-title>Accuracy of commercial geocoding: assessment and implications</article-title>
<source>Epidemiol Perspect Innov</source>
<year>2006</year>
<volume>3</volume>
<fpage>8</fpage>
<lpage>20</lpage>
<pub-id pub-id-type="doi">10.1186/1742-5573-3-8</pub-id>
<pub-id pub-id-type="pmid">16857050</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<mixed-citation publication-type="journal">
<name>
<surname>Zandbergen</surname>
<given-names>PA</given-names>
</name>
<article-title>A comparison of address point, parcel and street geocoding techniques</article-title>
<source>Comput Environ Urban Syst</source>
<year>2008</year>
<volume>32</volume>
<fpage>214</fpage>
<lpage>232</lpage>
<pub-id pub-id-type="doi">10.1016/j.compenvurbsys.2007.11.006</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<mixed-citation publication-type="book">
<name>
<surname>Abe</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Stinchcomb</surname>
<given-names>DG</given-names>
</name>
<person-group person-group-type="editor">Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL</person-group>
<article-title>Geocoding practices in cancer registries</article-title>
<source>Geocoding health data - the use of geographic codes in cancer prevention and control, research, and practice</source>
<year>2008</year>
<publisher-name>Boca Raton, FL: CRC Press</publisher-name>
<fpage>195</fpage>
<lpage>223</lpage>
</mixed-citation>
</ref>
<ref id="B24">
<mixed-citation publication-type="book">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<source>Geocoding best practices survey</source>
<year>2009</year>
<publisher-name>Los Angeles, CA: University of Southern California GIS Research Laboratory</publisher-name>
</mixed-citation>
</ref>
<ref id="B25">
<mixed-citation publication-type="book">
<name>
<surname>Sperling</surname>
<given-names>J</given-names>
</name>
<source>Enabling the national spatial data infrastructure: the need for a national geocoding service center</source>
<year>2002</year>
<publisher-name>U.S. Federal Geographic Data Committee: U.S Federal Geographic Data Committee</publisher-name>
</mixed-citation>
</ref>
<ref id="B26">
<mixed-citation publication-type="journal">
<name>
<surname>Ward</surname>
<given-names>MH</given-names>
</name>
<name>
<surname>Nuckols</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Giglierano</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Bonner</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Wolter</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Airola</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Mix</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Colt</surname>
<given-names>JS</given-names>
</name>
<name>
<surname>Hartge</surname>
<given-names>P</given-names>
</name>
<article-title>Positional accuracy of two methods of geocoding</article-title>
<source>Epidemiology</source>
<year>2005</year>
<volume>16</volume>
<issue>4</issue>
<fpage>542</fpage>
<lpage>547</lpage>
<pub-id pub-id-type="doi">10.1097/01.ede.0000165364.54925.f3</pub-id>
<pub-id pub-id-type="pmid">15951673</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<mixed-citation publication-type="other">
<name>
<surname>Johnson</surname>
<given-names>SD</given-names>
</name>
<article-title>Address matching with commercial spatial data: part 1</article-title>
<source>Bus Geogr</source>
<year>1998</year>
<fpage>24</fpage>
<lpage>32</lpage>
</mixed-citation>
</ref>
<ref id="B28">
<mixed-citation publication-type="book">
<name>
<surname>Christen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Churches</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Willmore</surname>
<given-names>A</given-names>
</name>
<source>A probabilistic geocoding system based on a national address file</source>
<year>2004</year>
<publisher-name>Proceedings of the Australasian Data Mining Conference: Cairns, AU</publisher-name>
</mixed-citation>
</ref>
<ref id="B29">
<mixed-citation publication-type="journal">
<name>
<surname>Zandbergen</surname>
<given-names>PA</given-names>
</name>
<article-title>Geocoding accuracy considerations in determining residency restrictions for sex offenders</article-title>
<source>Crim Justice Policy Rev</source>
<year>2009</year>
<volume>20</volume>
<issue>1</issue>
<fpage>62</fpage>
<lpage>90</lpage>
</mixed-citation>
</ref>
<ref id="B30">
<mixed-citation publication-type="journal">
<name>
<surname>Zandbergen</surname>
<given-names>PA</given-names>
</name>
<article-title>Influence of street reference data on geocoding qualit</article-title>
<source>Geocarto Int</source>
<year>2010</year>
<volume>26</volume>
<issue>1</issue>
<fpage>35</fpage>
<lpage>47</lpage>
<comment>Corrected proof published online December 4, 2010</comment>
</mixed-citation>
</ref>
<ref id="B31">
<mixed-citation publication-type="journal">
<name>
<surname>Wu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Funk</surname>
<given-names>TH</given-names>
</name>
<name>
<surname>Lurmann</surname>
<given-names>FW</given-names>
</name>
<name>
<surname>Winer</surname>
<given-names>AM</given-names>
</name>
<article-title>Improving spatial accuracy of roadway networks and geocoded addresses</article-title>
<source>Trans GIS</source>
<year>2005</year>
<volume>9</volume>
<issue>4</issue>
<fpage>585</fpage>
<lpage>601</lpage>
<pub-id pub-id-type="doi">10.1111/j.1467-9671.2005.00236.x</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<mixed-citation publication-type="book">
<name>
<surname>Bakshi</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Knoblock</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Thakkar</surname>
<given-names>S</given-names>
</name>
<article-title>Exploiting online sources to accurately geocode addresses</article-title>
<source>Proceedings of the 12th annual ACM international workshop on geographic information systems</source>
<year>2004</year>
<publisher-name>Washington, DC: ACM Press</publisher-name>
<fpage>194</fpage>
<lpage>203</lpage>
</mixed-citation>
</ref>
<ref id="B33">
<mixed-citation publication-type="journal">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cockburn</surname>
<given-names>M</given-names>
</name>
<article-title>Improving geocode accuracy with candidate selection criteria</article-title>
<source>Trans GIS</source>
<year>2010</year>
<volume>14</volume>
<issue>s1</issue>
<fpage>149</fpage>
<lpage>176</lpage>
</mixed-citation>
</ref>
<ref id="B34">
<mixed-citation publication-type="book">
<name>
<surname>Boscoe</surname>
<given-names>FP</given-names>
</name>
<person-group person-group-type="editor">Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL</person-group>
<article-title>The science and art of geocoding</article-title>
<source>Geocoding health data - the use of geographic codes in cancer prevention and control, research, and practice</source>
<year>2008</year>
<publisher-name>Boca Raton, FL: CRC Press</publisher-name>
<fpage>95</fpage>
<lpage>109</lpage>
</mixed-citation>
</ref>
<ref id="B35">
<mixed-citation publication-type="book">
<name>
<surname>Christen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Churches</surname>
<given-names>T</given-names>
</name>
<source>A probabilistic deduplication, record linkage and geocoding system</source>
<year>2005</year>
<publisher-name>Proceedings of the Australian Research Council Health Data Mining Workshop: Canberra, AU</publisher-name>
</mixed-citation>
</ref>
<ref id="B36">
<mixed-citation publication-type="book">
<name>
<surname>Jaro</surname>
<given-names>M</given-names>
</name>
<article-title>Record linkage research and the calibration of record linkage algorithms</article-title>
<source>Statistical research division report series</source>
<year>1984</year>
<publisher-name>Washington, DC: United States Census Bureau</publisher-name>
</mixed-citation>
</ref>
<ref id="B37">
<mixed-citation publication-type="journal">
<name>
<surname>Jaro</surname>
<given-names>M</given-names>
</name>
<article-title>Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida</article-title>
<source>J Am Stat Assoc</source>
<year>1989</year>
<volume>1989</volume>
<issue>89</issue>
<fpage>414</fpage>
<lpage>420</lpage>
</mixed-citation>
</ref>
<ref id="B38">
<mixed-citation publication-type="journal">
<name>
<surname>Yang</surname>
<given-names>D-H</given-names>
</name>
<name>
<surname>Bilaver</surname>
<given-names>LM</given-names>
</name>
<name>
<surname>Hayes</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Goerge</surname>
<given-names>R</given-names>
</name>
<article-title>Improving geocoding practices: evaluation of geocoding tools</article-title>
<source>J Med Syst</source>
<year>2004</year>
<volume>28</volume>
<issue>4</issue>
<fpage>361</fpage>
<lpage>370</lpage>
<pub-id pub-id-type="pmid">15366241</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<mixed-citation publication-type="journal">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>JP</given-names>
</name>
<name>
<surname>Knoblock</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ritz</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Cockburn</surname>
<given-names>M</given-names>
</name>
<article-title>An effective and efficient approach for manually improving geocoded data</article-title>
<source>Int J Health Geogr</source>
<year>2008</year>
<volume>7</volume>
<fpage>60</fpage>
<lpage>80</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-7-60</pub-id>
<pub-id pub-id-type="pmid">19032791</pub-id>
</mixed-citation>
</ref>
<ref id="B40">
<mixed-citation publication-type="journal">
<name>
<surname>Bonner</surname>
<given-names>MR</given-names>
</name>
<name>
<surname>Han</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Nie</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Rogerson</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Vena</surname>
<given-names>JE</given-names>
</name>
<name>
<surname>Freudenheim</surname>
<given-names>JL</given-names>
</name>
<article-title>Positional accuracy of geocoded addresses in epidemiologic research</article-title>
<source>Epidemiology</source>
<year>2003</year>
<volume>14</volume>
<issue>4</issue>
<fpage>408</fpage>
<lpage>411</lpage>
<pub-id pub-id-type="pmid">12843763</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<mixed-citation publication-type="journal">
<name>
<surname>Hurley</surname>
<given-names>SE</given-names>
</name>
<name>
<surname>Saunders</surname>
<given-names>TM</given-names>
</name>
<name>
<surname>Nivas</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Hertz</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Reynolds</surname>
<given-names>P</given-names>
</name>
<article-title>Post office box addresses: a challenge for geographic information system-based studies</article-title>
<source>Epidemiology</source>
<year>2003</year>
<volume>14</volume>
<fpage>386</fpage>
<lpage>391</lpage>
<pub-id pub-id-type="pmid">12843760</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<mixed-citation publication-type="journal">
<name>
<surname>Kravets</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Hadden</surname>
<given-names>WC</given-names>
</name>
<article-title>The accuracy of address coding and the effects of coding errors</article-title>
<source>Health Place</source>
<year>2007</year>
<volume>13</volume>
<issue>1</issue>
<fpage>293</fpage>
<lpage>298</lpage>
<pub-id pub-id-type="doi">10.1016/j.healthplace.2005.08.006</pub-id>
<pub-id pub-id-type="pmid">16162420</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<mixed-citation publication-type="journal">
<name>
<surname>Wey</surname>
<given-names>CL</given-names>
</name>
<name>
<surname>Griesse</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kightlinger</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Wimberly</surname>
<given-names>MC</given-names>
</name>
<article-title>Geographic variability in geocoding success for West Nile virus cases in South Dakota</article-title>
<source>Health Place</source>
<year>2009</year>
<volume>15</volume>
<issue>4</issue>
<fpage>1108</fpage>
<lpage>1114</lpage>
<pub-id pub-id-type="doi">10.1016/j.healthplace.2009.06.001</pub-id>
<pub-id pub-id-type="pmid">19577505</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<mixed-citation publication-type="journal">
<name>
<surname>Krieger</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Soobader</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Subramanian</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>R</given-names>
</name>
<article-title>Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: does the choice of area-based measure and geographic level matter?</article-title>
<source>Am J Epidemiol</source>
<year>2002</year>
<volume>156</volume>
<issue>5</issue>
<fpage>471</fpage>
<lpage>482</lpage>
<pub-id pub-id-type="doi">10.1093/aje/kwf068</pub-id>
<pub-id pub-id-type="pmid">12196317</pub-id>
</mixed-citation>
</ref>
<ref id="B45">
<mixed-citation publication-type="journal">
<name>
<surname>Krieger</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>PD</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Soobader</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Subramanian</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Carson</surname>
<given-names>R</given-names>
</name>
<article-title>ZIP code caveat: bias due to spatiotemporal mismatches between ZIP codes and US census-defined areas: the public health disparities geocoding project</article-title>
<source>Am J Public Health</source>
<year>2002</year>
<volume>92</volume>
<issue>7</issue>
<fpage>1100</fpage>
<lpage>1102</lpage>
<pub-id pub-id-type="doi">10.2105/AJPH.92.7.1100</pub-id>
<pub-id pub-id-type="pmid">12084688</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<mixed-citation publication-type="journal">
<name>
<surname>Krieger</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Waterman</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Lemieux</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Zierler</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Hogan</surname>
<given-names>JW</given-names>
</name>
<article-title>On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research</article-title>
<source>Am J Public Health</source>
<year>2001</year>
<volume>91</volume>
<issue>7</issue>
<fpage>1114</fpage>
<lpage>1116</lpage>
<pub-id pub-id-type="pmid">11441740</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<mixed-citation publication-type="journal">
<name>
<surname>Zimmerman</surname>
<given-names>DL</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Mazumdar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rushton</surname>
<given-names>G</given-names>
</name>
<article-title>Modeling the probability distribution of positional errors incurred by residential address geocoding</article-title>
<source>Int J Health Geogr</source>
<year>2007</year>
<volume>6</volume>
<fpage>1</fpage>
<lpage>16</lpage>
<pub-id pub-id-type="doi">10.1186/1476-072X-6-1</pub-id>
<pub-id pub-id-type="pmid">17214903</pub-id>
</mixed-citation>
</ref>
<ref id="B48">
<mixed-citation publication-type="journal">
<name>
<surname>Duncan</surname>
<given-names>DT</given-names>
</name>
<name>
<surname>Castro</surname>
<given-names>MC</given-names>
</name>
<name>
<surname>Blossom</surname>
<given-names>JC</given-names>
</name>
<name>
<surname>Bennett</surname>
<given-names>GG</given-names>
</name>
<name>
<surname>Gortmaker</surname>
<given-names>SL</given-names>
</name>
<article-title>Evaluation of the positional difference between two common geocoding methods</article-title>
<source>Geospat Health</source>
<year>2011</year>
<volume>5</volume>
<issue>2</issue>
<fpage>265</fpage>
<lpage>273</lpage>
<pub-id pub-id-type="pmid">21590677</pub-id>
</mixed-citation>
</ref>
<ref id="B49">
<mixed-citation publication-type="other">
<source>Maps API reference - Google maps API - Google code</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://code.google.com/apis/maps/documentation/reference.html">http://code.google.com/apis/maps/documentation/reference.html</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B50">
<mixed-citation publication-type="other">
<source>Yahoo! maps Web services - geocoding API</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://developer.yahoo.com/maps/rest/V1/geocode.html">http://developer.yahoo.com/maps/rest/V1/geocode.html</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B51">
<mixed-citation publication-type="other">
<source>ArcGIS online geocoding service</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://geocode.arcgis.com/arcgis/index.html">http://geocode.arcgis.com/arcgis/index.html</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B52">
<mixed-citation publication-type="other">
<source>North American association of central cancer registries</source>
<comment>
<ext-link ext-link-type="uri" xlink:href="http://www.naaccr.org">http://www.naaccr.org</ext-link>
</comment>
</mixed-citation>
</ref>
<ref id="B53">
<mixed-citation publication-type="journal">
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<article-title>A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis</article-title>
<source>Annu Assoc Am Geogr</source>
<year>2010</year>
<volume>100</volume>
<issue>3</issue>
<fpage>535</fpage>
<lpage>557</lpage>
<pub-id pub-id-type="doi">10.1080/00045601003791243</pub-id>
</mixed-citation>
</ref>
<ref id="B54">
<mixed-citation publication-type="book">
<name>
<surname>Moncrieff</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Venkatesh</surname>
<given-names>S</given-names>
</name>
<name>
<surname>West</surname>
<given-names>G</given-names>
</name>
<source>A framework for the design of privacy preserving pervasive healthcare</source>
<year>2009</year>
<publisher-name>In: IEEE International Conference on Multimedia and Expo. IEEE</publisher-name>
<fpage>1696</fpage>
<lpage>1699</lpage>
</mixed-citation>
</ref>
<ref id="B55">
<mixed-citation publication-type="book">
<name>
<surname>Beyer</surname>
<given-names>KMM</given-names>
</name>
<name>
<surname>Schultz</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Rushton</surname>
<given-names>G</given-names>
</name>
<person-group person-group-type="editor">Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, Zimmerman DL</person-group>
<article-title>Using ZIP codes as geocodes in cancer research</article-title>
<source>Geocoding health data - the Use of geographic codes in cancer prevention and control, research, and practice</source>
<year>2008</year>
<publisher-name>Boca Raton, FL: CRC Press</publisher-name>
<fpage>37</fpage>
<lpage>68</lpage>
</mixed-citation>
</ref>
<ref id="B56">
<mixed-citation publication-type="book">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Cockburn</surname>
<given-names>M</given-names>
</name>
<article-title>Toward quantitative geocode accuracy metrics</article-title>
<source>Proceedings of the ninth international symposium on spatial accuracy assessment in natural resources and environmental sciences</source>
<year>2010</year>
<publisher-name>Leicester, UK</publisher-name>
<fpage>329</fpage>
<lpage>332</lpage>
</mixed-citation>
</ref>
<ref id="B57">
<mixed-citation publication-type="book">
<collab>Western Australian Land Information Authority</collab>
<source>Property street address</source>
<year>2012</year>
<publisher-name>Perth, WA: Western Australian Land Information Authority</publisher-name>
</mixed-citation>
</ref>
<ref id="B58">
<mixed-citation publication-type="book">
<collab>Public Sector Mapping Agency Australia Limited</collab>
<source>Geocoded national address file</source>
<year>2012</year>
<publisher-name>Canberra: ACT; PSMA Australia</publisher-name>
</mixed-citation>
</ref>
<ref id="B59">
<mixed-citation publication-type="journal">
<name>
<surname>Goldberg</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Wilson</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Knoblock</surname>
<given-names>C</given-names>
</name>
<article-title>From text to geographic coordinates: the current state of geocoding</article-title>
<source>Urisa J</source>
<year>2007</year>
<volume>19</volume>
<issue>1</issue>
<fpage>33</fpage>
<lpage>47</lpage>
</mixed-citation>
</ref>
<ref id="B60">
<mixed-citation publication-type="journal">
<name>
<surname>Malczewski</surname>
<given-names>J</given-names>
</name>
<article-title>GIS-based land-use suitability analysis: a critical overview</article-title>
<source>Prog Plann</source>
<year>2004</year>
<volume>62</volume>
<issue>1</issue>
<fpage>3</fpage>
<lpage>65</lpage>
<pub-id pub-id-type="doi">10.1016/j.progress.2003.09.002</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000339 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000339 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:3834528
   |texte=   An evaluation framework for comparing geocoding systems
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:24207169" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024