Le SIDA en Afrique subsaharienne (serveur d'exploration)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping

Identifieur interne : 002287 ( Pmc/Corpus ); précédent : 002286; suivant : 002288

Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping

Auteurs : Maria T. Patterson ; Robert L. Grossman

Source :

RBID : PMC:5647508

Abstract

Abstract

We introduce a method called neighbor-based bootstrapping (NB2) that can be used to quantify the geospatial variation of a variable. We applied this method to an analysis of the incidence rates of disease from electronic medical record data (International Classification of Diseases, Ninth Revision codes) for ∼100 million individuals in the United States over a period of 8 years. We considered the incidence rate of disease in each county and its geospatially contiguous neighbors and rank ordered diseases in terms of their degree of geospatial variation as quantified by the NB2 method. We show that this method yields results in good agreement with established methods for detecting spatial autocorrelation (Moran's I method and kriging). Moreover, the NB2 method can be tuned to identify both large area and small area geospatial variations. This method also applies more generally in any parameter space that can be partitioned to consist of regions and their neighbors.


Url:
DOI: 10.1089/big.2017.0028
PubMed: 28933946
PubMed Central: 5647508

Links to Exploration step

PMC:5647508

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping</title>
<author>
<name sortKey="Patterson, Maria T" sort="Patterson, Maria T" uniqKey="Patterson M" first="Maria T." last="Patterson">Maria T. Patterson</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grossman, Robert L" sort="Grossman, Robert L" uniqKey="Grossman R" first="Robert L." last="Grossman">Robert L. Grossman</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28933946</idno>
<idno type="pmc">5647508</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647508</idno>
<idno type="RBID">PMC:5647508</idno>
<idno type="doi">10.1089/big.2017.0028</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">002287</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">002287</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping</title>
<author>
<name sortKey="Patterson, Maria T" sort="Patterson, Maria T" uniqKey="Patterson M" first="Maria T." last="Patterson">Maria T. Patterson</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Grossman, Robert L" sort="Grossman, Robert L" uniqKey="Grossman R" first="Robert L." last="Grossman">Robert L. Grossman</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff4"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Big Data</title>
<idno type="ISSN">2167-6461</idno>
<idno type="eISSN">2167-647X</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>We introduce a method called neighbor-based bootstrapping (NB2) that can be used to quantify the geospatial variation of a variable. We applied this method to an analysis of the incidence rates of disease from electronic medical record data (International Classification of Diseases, Ninth Revision codes) for ∼100 million individuals in the United States over a period of 8 years. We considered the incidence rate of disease in each county and its geospatially contiguous neighbors and rank ordered diseases in terms of their degree of geospatial variation as quantified by the NB2 method. We show that this method yields results in good agreement with established methods for detecting spatial autocorrelation (Moran's
<italic>I</italic>
method and kriging). Moreover, the NB2 method can be tuned to identify both large area and small area geospatial variations. This method also applies more generally in any parameter space that can be partitioned to consist of regions and their neighbors.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Friedman, Dj" uniqKey="Friedman D">DJ Friedman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Murdoch, T" uniqKey="Murdoch T">T Murdoch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brownstein, Js" uniqKey="Brownstein J">JS Brownstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Salathe, M" uniqKey="Salathe M">M Salathe</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Generous, N" uniqKey="Generous N">N Generous</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elliott, P" uniqKey="Elliott P">P Elliott</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rushton, G" uniqKey="Rushton G">G Rushton</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Beale, L" uniqKey="Beale L">L Beale</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cromley, Ek" uniqKey="Cromley E">EK Cromley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Noble, D" uniqKey="Noble D">D Noble</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Becker, Km" uniqKey="Becker K">KM Becker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tiefelsdorf, M" uniqKey="Tiefelsdorf M">M. Tiefelsdorf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moran, Pa" uniqKey="Moran P">PA Moran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ripley, Bd" uniqKey="Ripley B">BD Ripley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kitron, U" uniqKey="Kitron U">U Kitron</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hay, S" uniqKey="Hay S">S Hay</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moonan, Pk" uniqKey="Moonan P">PK Moonan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sasaki, S" uniqKey="Sasaki S">S Sasaki</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kamadjeu, R" uniqKey="Kamadjeu R">R Kamadjeu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nuckols, Jr" uniqKey="Nuckols J">JR Nuckols</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weis, Bk" uniqKey="Weis B">BK Weis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Huang, Y L" uniqKey="Huang Y">Y-L Huang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jarup, L" uniqKey="Jarup L">L Jarup</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Graves, Ba" uniqKey="Graves B">BA Graves</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Dean, Hd" uniqKey="Dean H">HD Dean</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Harrison, Km" uniqKey="Harrison K">KM Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gatrell, Ac" uniqKey="Gatrell A">AC Gatrell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luther, Sl" uniqKey="Luther S">SL Luther</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Geraghty, Em" uniqKey="Geraghty E">EM Geraghty</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Comer, Kf" uniqKey="Comer K">KF Comer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rodriguez, Ra" uniqKey="Rodriguez R">RA Rodriguez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rzhetsky, A" uniqKey="Rzhetsky A">A Rzhetsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Klein, Rj" uniqKey="Klein R">RJ Klein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Day, Jc" uniqKey="Day J">JC Day</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Waller, La" uniqKey="Waller L">LA Waller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bivand, Rs" uniqKey="Bivand R">RS Bivand</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benedict, K" uniqKey="Benedict K">K Benedict</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mujib, M" uniqKey="Mujib M">M Mujib</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grossman, Rl" uniqKey="Grossman R">RL Grossman</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Big Data</journal-id>
<journal-id journal-id-type="iso-abbrev">Big Data</journal-id>
<journal-id journal-id-type="publisher-id">big</journal-id>
<journal-title-group>
<journal-title>Big Data</journal-title>
</journal-title-group>
<issn pub-type="ppub">2167-6461</issn>
<issn pub-type="epub">2167-647X</issn>
<publisher>
<publisher-name>Mary Ann Liebert, Inc.</publisher-name>
<publisher-loc>140 Huguenot Street, 3rd FloorNew Rochelle, NY 10801USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28933946</article-id>
<article-id pub-id-type="pmc">5647508</article-id>
<article-id pub-id-type="publisher-id">10.1089/big.2017.0028</article-id>
<article-id pub-id-type="doi">10.1089/big.2017.0028</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Patterson</surname>
<given-names>Maria T.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1,</sup>
</xref>
<xref ref-type="author-notes" rid="fn1">
<sup>*</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Grossman</surname>
<given-names>Robert L.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1,</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2,</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3,</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4,</sup>
</xref>
<xref ref-type="corresp" rid="corr1">
<sup></sup>
</xref>
</contrib>
<aff id="aff1">
<label>
<sup>1</sup>
</label>
Center for Data Intensive Science,
<institution>University of Chicago</institution>
, Chicago, Illinois.</aff>
<aff id="aff2">
<label>
<sup>2</sup>
</label>
Computation Institute,
<institution>University of Chicago</institution>
, Chicago, Illinois.</aff>
<aff id="aff3">
<label>
<sup>3</sup>
</label>
Section of Computational Biomedicine and Biomedical Data Science, Department of Medicine,
<institution>University of Chicago</institution>
, Chicago, Illinois.</aff>
<aff id="aff4">
<label>
<sup>4</sup>
</label>
Institute for Genomics and Systems Biology,
<institution>University of Chicago</institution>
, Chicago, Illinois.</aff>
</contrib-group>
<author-notes>
<fn id="fn1" fn-type="other">
<label>
<sup>*</sup>
</label>
<p>Current affiliation: Department of Astronomy, University of Washington, Seattle, Washington.</p>
</fn>
<corresp id="corr1">
<label>
<sup></sup>
</label>
Address correspondence to:
<italic>Robert L. Grossman, Center for Data Intensive Science, University of Chicago, 900 East 57th Street, KCBD 10142 Chicago, IL 60637,</italic>
E-mail:
<email xlink:href="mailto:robert.grossman@uchicago.edu">robert.grossman@uchicago.edu</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>01</day>
<month>9</month>
<year>2017</year>
<pmc-comment>string-date: September 2017</pmc-comment>
</pub-date>
<pub-date pub-type="epub">
<day>01</day>
<month>9</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>01</day>
<month>9</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>5</volume>
<issue>3</issue>
<fpage>213</fpage>
<lpage>224</lpage>
<permissions>
<copyright-statement>© Maria T. Patterson and Robert L. Grossman 2017; Published by Mary Ann Liebert, Inc.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="open-access">
<license-p> This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="big.2017.0028.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>We introduce a method called neighbor-based bootstrapping (NB2) that can be used to quantify the geospatial variation of a variable. We applied this method to an analysis of the incidence rates of disease from electronic medical record data (International Classification of Diseases, Ninth Revision codes) for ∼100 million individuals in the United States over a period of 8 years. We considered the incidence rate of disease in each county and its geospatially contiguous neighbors and rank ordered diseases in terms of their degree of geospatial variation as quantified by the NB2 method. We show that this method yields results in good agreement with established methods for detecting spatial autocorrelation (Moran's
<italic>I</italic>
method and kriging). Moreover, the NB2 method can be tuned to identify both large area and small area geospatial variations. This method also applies more generally in any parameter space that can be partitioned to consist of regions and their neighbors.</p>
</abstract>
<kwd-group kwd-group-type="author">
<title>
<bold>Keywords:</bold>
</title>
<kwd>geospatial variation of disease incidence</kwd>
<kwd>geospatial correlation</kwd>
<kwd>electronic medical records</kwd>
</kwd-group>
<counts>
<fig-count count="5"></fig-count>
<table-count count="3"></table-count>
<ref-count count="39"></ref-count>
<page-count count="12"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s001">
<title>Introduction</title>
<p>As the number of sources and the volume of electronic medical records (EMR) and electronic health records increases, there is a growing ability to aggregate these data and extract information about population and public health.
<sup>
<xref rid="B1" ref-type="bibr">1</xref>
,
<xref rid="B2" ref-type="bibr">2</xref>
</sup>
Over the past several years, new sources of digital health data, such as web searchers, social media, mobile phones, and personal health sensors have increased the number of sources and data volumes even more.
<sup>
<xref rid="B3" ref-type="bibr">3–5</xref>
</sup>
Much of these data can be geocoded with location information so that techniques from spatial epidemiology can be used to explore geospatial variation in disease, health outcomes, and population health using disease mapping, disease cluster analysis, and related techniques.
<sup>
<xref rid="B6" ref-type="bibr">6–9</xref>
</sup>
With this much geocoded digital health data, there is a need for simple tools and algorithms that can be used by researchers across disciplines for identifying the presence of spatial autocorrelation in disease incidence data, especially in large datasets.
<sup>
<xref rid="B10" ref-type="bibr">10</xref>
</sup>
</p>
<p>An initial starting point for evaluating the presence of patterns in disease or other geocoded data is determining whether the data are spatially autocorrelated, that is, whether the disease rates or values of interest are similar in nearby areas and fall off with distance, which could indicate the presence of core areas of disease risk.
<sup>
<xref rid="B11" ref-type="bibr">11</xref>
,
<xref rid="B12" ref-type="bibr">12</xref>
</sup>
</p>
<p>We introduce a Monte Carlo based algorithm that we call neighbor-based bootstrapping (NB2) that can be used to quantify geospatial autocorrelation. We apply this algorithm to ∼100 million geocoded EMR and rank order 548 diseases as determined by International Classification of Diseases, Ninth Revision (ICD-9) codes from those with the strongest geospatial autocorrelation to those with the weakest geospatial autocorrelation. We compare this method's results to Moran's
<italic>I</italic>
statistic
<sup>
<xref rid="B13" ref-type="bibr">13</xref>
</sup>
and to kriging,
<sup>
<xref rid="B14" ref-type="bibr">14</xref>
</sup>
<sup>(p.44)</sup>
two other techniques that have been used to quantify geospatial autocorrelation. The spatial size scale of disease patterns may range widely from small, localized affected regions to larger affected areas, depending on the nature of the underlying factors. We have developed two versions of the NB2 ranking, one favoring patterns of tight clusters and the other favoring broader less peaked patterns. In the
<xref ref-type="supplementary-material" rid="SD1">Supplementary Data</xref>
section, we provide the results of these two versions on NB2 ranking by category of disease and detail the spatial patterning of highly ranked ICD-9 codes (Supplementary Data are available online at
<uri xlink:type="simple" xlink:href="http://www.liebertpub.com/big">www.liebertpub.com/big</uri>
).</p>
<p>Applying geospatial analysis and visualization techniques to geocoded health data has long been understood to be important for identifying risk factors from the physical environment and for providing insights into the transmission of infectious and vector-borne diseases.
<sup>
<xref rid="B15" ref-type="bibr">15–21</xref>
</sup>
For example, spatial analysis of health data can be used to identify and manage risk associated with proximity to potentially harmful environmental exposures, such as chemical toxins or air pollutants.
<sup>
<xref rid="B18" ref-type="bibr">18</xref>
,
<xref rid="B22" ref-type="bibr">22</xref>
,
<xref rid="B23" ref-type="bibr">23</xref>
</sup>
More generally these techniques are also important for understanding a broader range of risk factors, including risk factors from the demographic, economic, social, cultural, regulatory, or legal environments.
<sup>
<xref rid="B24" ref-type="bibr">24–32</xref>
</sup>
</p>
</sec>
<sec sec-type="materials|methods" id="s002">
<title>Materials and Methods</title>
<sec id="s003">
<title>Data</title>
<p>The dataset consists of EMR data from the Truven Health MarketScan Commercial Claims and Encounters Database, which includes approved inpatient and outpatient insurance claim information for a total of ∼100 million unique and de-identified individuals across the United States for the time period from 2003 to 2010. The records include 1.3 billion diagnostic ICD-9 subdivided codes (12.89 unique codes per person), geotagged by county Federal Information Processing Standards code. Here, we restrict to using the ∼800 non-subdivided ICD-9 codes from 001 to 799, which excludes injuries, poisonings, and accidents. We refer to ICD-9 codes by the three digit integer group. (e.g., “005: Other bacterial poisoning” includes “005.0 Staphylococcal food poisoning.”)</p>
<p>We also restrict our analysis to the 3109 counties in the continental United States and to the ICD-9 codes that have data for two-thirds or more of the counties. This leaves 548 ICD-9 codes.</p>
<p>For each of these 548 codes, we adjust for age and gender by using standard populations
<sup>
<xref rid="B33" ref-type="bibr">33</xref>
</sup>
as follows. We determine crude incidence rates for the standard 19 groups of age populations for each gender by taking raw counts for each group and dividing by the population at risk, which in this case we take to be the total number of records for each county for each age/gender group converted to 100,000 person-year units:
<disp-formula>
<tex-math id="eq1" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} Y_ { crude } ^ { age , gender } = { \frac { { \kern 1pt } { \rm { cases } } \; { \rm { of } } \; { \rm { ICD-9 } } { \kern 1pt } } { { \kern 1pt } { \rm { total } } { \rm { cases } } { \kern 1pt } } } \times { \frac { 100000 { \kern 1pt } { \rm { persons } } { \kern 1pt } } { 8 { \kern 1pt } { \rm { years } } { \kern 1pt } } } . \tag { 1 } \end{align*} \end{document}</tex-math>
</disp-formula>
</p>
<p>In Equation (1), for each of the 548 truncated ICD-9 codes, by
<italic>cases of ICD-9</italic>
, we mean the count of the number of occurrences in the data of that ICD-9 code with the specified age and gender.</p>
<p>The age and gender adjusted rate is calculated by multiplying the crude rate for each group by the appropriate weight using the Census 2010 standard population and summing the products
<sup>
<xref rid="B34" ref-type="bibr">34</xref>
</sup>
:
<disp-formula>
<tex-math id="eq2" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { Y_ { adjusted } } = { \Sigma _ { age , gender } } Y_ { crude } ^ { age , gender } \times { \frac { { \kern 1pt } { \rm { group } } \; { \rm { population } } { \kern 1pt } } { { \kern 1pt } { \rm { total } } \; { \rm { population } } { \kern 1pt } } } \tag { 2 } \end{align*} \end{document}</tex-math>
</disp-formula>
</p>
</sec>
<sec id="s004">
<title>NB2 method</title>
<p>The NB2 method uses resampling to evaluate in this example whether or not the incidence rate of a disease can be accurately estimated from the incidence rate of the disease in counties that are neighbors. The first step in this method is to define regions and neighbors of regions. Here we define regions as counties and neighboring counties as counties that are geospatially contiguous to the county's polygon border, including vertices (Queen style), though it is important to note that there are many options to consider when defining neighbor relationships (contiguity, distance, spatial weights) that have varying effects on results.
<sup>
<xref rid="B35" ref-type="bibr">35</xref>
,
<xref rid="B36" ref-type="bibr">36</xref>
</sup>
In this article, we focus on geospatially defined neighbors, but an advantage of this method is that it is applicable without change to neighbors in any space of features, not just neighbors in two or three-dimensional physical space.</p>
<p>We compute a bootstrapped estimate as follows. Fix a county
<italic>Y</italic>
. For each ICD-9 code, we sample with replacement a set of neighboring counties and a set of random counties and compare the normalized disease incidence [from Eq. (2)].</p>
<p>More explicitly, fix a county
<italic>Y</italic>
and assume that it has
<italic>n
<sup>Y</sup>
</italic>
neighbors. We estimate the log incidence rate
<italic>Z
<sub>neighbor</sub>
</italic>
for county
<italic>Y</italic>
as the average log incidence rate of a list of
<italic>n
<sup>Y</sup>
</italic>
randomly chosen (with replacement) neighboring counties. We also estimate for each county the log incidence rate
<italic>Z
<sub>random</sub>
</italic>
for county
<italic>Y</italic>
as the average log incidence rate of
<italic>n
<sup>Y</sup>
</italic>
randomly chosen (with replacement) counties from the full set of all counties. These counties may or may not be neighbors.</p>
<p>We compare the two estimates (neighbors vs. random) to the known log incidence for each of the drawn counties in two separate ways (see Algorithms 1 and 2).</p>
<p>In the first implementation, we take the difference from actual of the estimates of
<italic>Z
<sub>neighbor</sub>
</italic>
versus
<italic>Z
<sub>random</sub>
</italic>
. We then use a paired Student's
<italic>t</italic>
-test to evaluate whether neighbor-based predictions are a significant improvement over random prediction. For ICD-9 codes with significant underlying spatial patterns, we expect that the
<italic>Z
<sub>neighbor</sub>
</italic>
estimates will be significantly closer to actual than the
<italic>Z
<sub>random</sub>
</italic>
estimates.</p>
<p>We repeat this process to obtain 1000 estimates of the neighbor-based versus random differences, and for each of these compute the paired
<italic>t</italic>
-test. We then take the median
<italic>t</italic>
-test value from these 1000 estimates. This gives us one
<italic>t</italic>
-test statistic value per ICD-9 code, describing how closely related incidence rates of that ICD-9 are in neighboring counties as compared with a random selection of counties.</p>
<p>In the second implementation, we compare the neighbors versus random estimates by counting, for each pair of bootstraps, the number of samples where the neighbor estimate is closer to actual than the random estimate. We then repeat this 1000 times, take the median number, and using this to calculate the log odds that the neighbor estimate is more accurate than the random estimate.</p>
<table-wrap id="T4" orientation="portrait" position="float">
<label>Algorithm 1:</label>
<caption>
<p>Neighbor-based bootstrapping method with paired
<italic>t</italic>
-test</p>
</caption>
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<col align="left"></col>
<tbody>
<tr>
<td align="left">
<bold>INPUT:</bold>
Set of records, {
<italic>counties</italic>
} with length N; the value of interest (log incidence rate
<italic>Z</italic>
) for each record
<italic>Y</italic>
; and the list of each record's neighbors {
<italic>neighbors</italic>
(
<italic>Y</italic>
)}.</td>
</tr>
<tr>
<td align="left">
<bold>OUTPUT:</bold>
<italic>N B</italic>
2 statistic using paired
<italic>t</italic>
-test</td>
</tr>
<tr>
<td align="left">
<bold>for</bold>
m ← 1 to M repetitions
<bold>do</bold>
</td>
</tr>
<tr>
<td align="left">  
<bold>for</bold>
N samples (with replacement) of
<italic>Y</italic>
∈ {
<italic>counties</italic>
}
<bold>do</bold>
</td>
</tr>
<tr>
<td align="left">    
<italic>Z
<sup>Y</sup>
</italic>
← log (incidence rate in county
<italic>Y</italic>
)</td>
</tr>
<tr>
<td align="left">    
<italic>n
<sup>Y</sup>
</italic>
← number of elements in {
<italic>neighbors</italic>
(
<italic>Y</italic>
)}</td>
</tr>
<tr>
<td align="left">    Choose
<italic>n
<sup>Y</sup>
</italic>
counties ∈ {
<italic>neighbors</italic>
(
<italic>Y</italic>
)} with replacement, call this
<italic>B
<sub>neighbor</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    Choose
<italic>n
<sup>Y</sup>
</italic>
counties ∈ {
<italic>counties</italic>
} with replacement, call this
<italic>B
<sub>random</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    
<inline-formula>
<tex-math id="eq3">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{neighbor}^Y \leftarrow$$ \end{document}</tex-math>
</inline-formula>
average
<italic>Z</italic>
of
<italic>B
<sub>neighbor</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    
<inline-formula>
<tex-math id="eq4">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{random}^Y \leftarrow$$ \end{document}</tex-math>
</inline-formula>
average
<italic>Z</italic>
of
<italic>B
<sub>random</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">  
<bold>end for</bold>
</td>
</tr>
<tr>
<td align="left">  
<italic>D
<sub>neighbor</sub>
</italic>
← List of
<inline-formula>
<tex-math id="eq5">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{neighbor}^Y$$ \end{document}</tex-math>
</inline-formula>
<sub></sub>
<italic>Z
<sup>Y</sup>
</italic>
for N sampled counties</td>
</tr>
<tr>
<td align="left">  
<italic>D
<sub>random</sub>
</italic>
← List of
<inline-formula>
<tex-math id="eq6">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{random}^Y$$ \end{document}</tex-math>
</inline-formula>
<sub></sub>
<italic>Z
<sup>Y</sup>
</italic>
for N sampled counties</td>
</tr>
<tr>
<td align="left">  Set
<italic>t
<sup>m</sup>
</italic>
equal to the paired Student's
<italic>t</italic>
-test statistic for
<italic>D
<sub>neighbor</sub>
</italic>
and
<italic>D
<sub>random</sub>
</italic>
:</td>
</tr>
<tr>
<td align="left">  
<inline-formula>
<tex-math id="eq7">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$ { t^m } = ( { \bar D_ { neighbor } } - { \bar D_ { random } } ) \sqrt { { \frac { l ( l - 1 ) } { \sum \nolimits_ { i = 1 } ^l \left( \hat D_ { neighbor } ^l - \hat D_ { random } ^l \right) } } } $$ \end{document}</tex-math>
</inline-formula>
</td>
</tr>
<tr>
<td align="left">
<bold>end for</bold>
</td>
</tr>
<tr>
<td align="left">
<italic>t</italic>
← List of
<italic>t
<sup>m</sup>
</italic>
for all M repetitions</td>
</tr>
<tr>
<td align="left">
<italic>NB</italic>
2 statistic = median(
<italic>t</italic>
)</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5" orientation="portrait" position="float">
<label>Algorithm 2:</label>
<caption>
<p>Neighbor-based bootstrapping method with log odds</p>
</caption>
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<col align="left"></col>
<tbody>
<tr>
<td align="left">
<bold>INPUT:</bold>
Set of records, {
<italic>counties</italic>
} with length N; the value of interest (log incidence rate
<italic>Z</italic>
) for each record
<italic>Y</italic>
; and the list of each record's neighbors {
<italic>neighbors</italic>
(
<italic>Y</italic>
)}.</td>
</tr>
<tr>
<td align="left">
<bold>OUTPUT:</bold>
<italic>N B</italic>
2 statistic using log odds</td>
</tr>
<tr>
<td align="left">
<bold>for</bold>
m ← 1 to M repetitions
<bold>do</bold>
</td>
</tr>
<tr>
<td align="left">  
<bold>for</bold>
N samples (with replacement) of
<italic>Y</italic>
∈ {
<italic>counties</italic>
}
<bold>do</bold>
</td>
</tr>
<tr>
<td align="left">    
<italic>Z
<sup>Y</sup>
</italic>
← log (incidence rate in county
<italic>Y</italic>
)</td>
</tr>
<tr>
<td align="left">    
<italic>n
<sup>Y</sup>
</italic>
← number of elements in {
<italic>neighbors</italic>
(
<italic>Y</italic>
)}</td>
</tr>
<tr>
<td align="left">    Choose
<italic>n
<sup>Y</sup>
</italic>
counties ∈ {
<italic>neighbors</italic>
(
<italic>Y</italic>
)} with replacement, call this
<italic>B
<sub>neighbor</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    Choose
<italic>n
<sup>Y</sup>
</italic>
counties ∈ {
<italic>counties</italic>
} with replacement, call this
<italic>B
<sub>random</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    
<inline-formula>
<tex-math id="eq8">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{neighbor}^Y \leftarrow$$ \end{document}</tex-math>
</inline-formula>
average
<italic>Z</italic>
of
<italic>B
<sub>neighbor</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">    
<inline-formula>
<tex-math id="eq9">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{random}^Y \leftarrow$$ \end{document}</tex-math>
</inline-formula>
average
<italic>Z</italic>
of
<italic>B
<sub>random</sub>
</italic>
</td>
</tr>
<tr>
<td align="left">  
<bold>end for</bold>
</td>
</tr>
<tr>
<td align="left">  
<italic>Z
<sub>neighbor</sub>
</italic>
← List of
<inline-formula>
<tex-math id="eq10">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{neighbor}^Y$$ \end{document}</tex-math>
</inline-formula>
for N sampled counties</td>
</tr>
<tr>
<td align="left">  
<italic>Z
<sub>random</sub>
</italic>
← List of
<inline-formula>
<tex-math id="eq11">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$Z_{random}^Y$$ \end{document}</tex-math>
</inline-formula>
for N sampled counties</td>
</tr>
<tr>
<td align="left">  Set
<italic>u
<sup>m</sup>
</italic>
equal to the number of samples where the neighbor estimate is closer to actual than the random estimate:</td>
</tr>
<tr>
<td align="left">  
<italic>u
<sup>m</sup>
</italic>
 = length(abs(
<italic>Z
<sub>neighbor</sub>
</italic>
<italic>Y</italic>
) < abs(
<italic>Z
<sub>random</sub>
</italic>
− Y) = = TRUE))</td>
</tr>
<tr>
<td align="left">
<bold>end for</bold>
</td>
</tr>
<tr>
<td align="left">
<italic>u</italic>
← List of
<italic>u
<sup>m</sup>
</italic>
for all M repetitions</td>
</tr>
<tr>
<td align="left">
<italic>NB</italic>
2 statistic
<inline-formula>
<tex-math id="eq12">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$= \log \left( { { \frac { median ( u ) } { N - median ( u ) } } } \right)$$ \end{document}</tex-math>
</inline-formula>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec sec-type="results" id="s005">
<title>Results</title>
<sec id="s006">
<title>Performance</title>
<p>We first evaluated the impact of varying the number of times
<italic>M</italic>
that we resampled. Running the entire procedure and resampling
<italic>M</italic>
 = 1000 times for all 548 diseases takes just over 28 hours on a virtual machine with 8 Xeon cores running at 2.00 GHz with 16 GB of RAM. This is about 25 minutes per disease using a single core.</p>
<p>For 100 bootstraps, the run time for 548 diseases on 8 cores takes about 220 minutes, or a little over 3 minutes per disease when using a single core. Comparing the NB2 statistic values for 1000 versus 100 simulations, the difference on average is 0.2% and at maximum 1.9%. For 10 bootstraps, the total run time is about 30 minutes, or about 30 seconds per disease using a single core. The mean difference between NB2 statistic values for 1000 and 10 simulations is 0.2%, and the maximum difference is 5.1%. The results are summarized in the table below.</p>
<table-wrap id="T6" orientation="portrait" position="float">
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<tbody>
<tr>
<td align="left">
<italic>No. of bootstraps (M)</italic>
</td>
<td align="center">
<italic>Time (minutes)</italic>
</td>
<td align="center">
<italic>Mean difference</italic>
</td>
<td align="center">
<italic>Max. difference</italic>
</td>
<td align="center">
<italic>Standard deviation difference</italic>
</td>
</tr>
<tr>
<td align="left">1000</td>
<td align="center">25</td>
<td align="center">NA</td>
<td align="center">NA</td>
<td align="center">NA</td>
</tr>
<tr>
<td align="left">100</td>
<td align="center">3</td>
<td align="center">0.2%</td>
<td align="center">1.9%</td>
<td align="center">0.4%</td>
</tr>
<tr>
<td align="left">10</td>
<td align="center">0.5</td>
<td align="center">0.2%</td>
<td align="center">5.1%</td>
<td align="center">0.9%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In the analysis that follows, we are primarily focused on the rank ordering of the ICD-9 codes according to these two implementations of the NB2 method. There is no significant difference in the rank orderings between 1000 and 100 or 10 repeated bootstraps.</p>
</sec>
<sec id="s007">
<title>Comparison with Moran's
<italic>I</italic>
statistic</title>
<p>We compare the neighbor-based bootstrapping results to the global Moran's
<italic>I</italic>
statistic for detecting spatial autocorrelation, which is based on the sum over weights between units multiplied by the mean-adjusted outcome of interest divided by the squared mean difference of each point. Moran's
<italic>I</italic>
is defined as
<sup>
<xref rid="B13" ref-type="bibr">13</xref>
</sup>
:
<disp-formula>
<tex-math id="eq13" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} I = \frac { n } { { { \Sigma _i } { \Sigma _j } { w_ { ij } } } } { \frac { { \Sigma _i } { \Sigma _j } { w_ { ij } } ( { y_i } - \bar y ) ( { y_j } - \bar y ) } { { \Sigma _i } { { ( { y_i } - \bar y ) } ^2 } } } \tag { 3 } \end{align*} \end{document}</tex-math>
</disp-formula>
</p>
<p>where
<italic>n</italic>
is the total number of spatial polygons (counties),
<italic>y
<sub>i</sub>
</italic>
is the value of interest of the
<italic>i</italic>
th polygon,
<inline-formula>
<tex-math id="eq14" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\bar y$$ \end{document}</tex-math>
</inline-formula>
is the global mean, and
<italic>w
<sub>ij</sub>
</italic>
is the spatial weight of the link between polygon
<italic>i</italic>
and
<italic>j</italic>
.</p>
<p>Moran's
<italic>I</italic>
ranges from −1 (perfect dispersion, as in a black and white checkerboard pattern) to 1 (black squares on one side, white on the other). A random distribution would have
<italic>I</italic>
close to 0. We compare the values of Moran's
<italic>I</italic>
for the set of log incidence rates across counties for each ICD-9 code to both the implementation of the NB2 method using the paired Student's
<italic>t</italic>
-test evaluation and the implementation using the log odds evaluation. If the geospatial variation that the NB2 method detects is similar to Moran's
<italic>I</italic>
, then the NB2 statistic values should increase as Moran's
<italic>I</italic>
goes to 1. In
<xref ref-type="fig" rid="f1">Figure 1</xref>
, we show the NB2
<italic>t</italic>
-test statistic estimate (left) and the log odds estimate (right) plotted against Moran's
<italic>I</italic>
statistic for all ICD-9 codes tested. In this figure, on both the left and the right, there is a data point for each of the ICD-9 codes tested. Generally, the NB2 statistic values increase as Moran's
<italic>I</italic>
statistic increases, though there is noticeable scatter.</p>
<fig id="f1" fig-type="figure" orientation="portrait" position="float">
<label>
<bold>FIG. 1.</bold>
</label>
<caption>
<p>Comparison of this NB2 method with Moran's
<italic>I</italic>
statistic for detecting spatial autocorrelation. We show the NB2 experiment's ability to detect spatial correlation (
<italic>y</italic>
-axis) measured by the paired Student's
<italic>t</italic>
-test estimate (left) and the log odds estimate (right) for the neighbor county predictions versus random county predictions plotted against the Moran's
<italic>I</italic>
statistic estimate (
<italic>x</italic>
-axis). NB2, neighbor-based bootstrapping.</p>
</caption>
<graphic xlink:href="fig-1"></graphic>
</fig>
<p>We rank ordered the ICD-9 codes using the two NB2 method implementations and the Moran's
<italic>I</italic>
statistic to produce three ordered lists of ICD-9 codes from the strongest spatial correlation (largest Moran's
<italic>I</italic>
statistic, largest NB2
<italic>t</italic>
-test, largest NB2 log odds test) to the weakest.
<xref ref-type="table" rid="T1 T2 T3">Tables 1–3</xref>
contain the top 25 ICD-9 codes for both NB2 implementations and Moran's
<italic>I</italic>
. Here, we will compare the properties of the spatial distributions for ICD-9 codes ranked highly by the two NB2 procedures with those ranked highly by Moran's
<italic>I</italic>
statistic.</p>
<table-wrap id="T1" orientation="portrait" position="float">
<label>Table 1.</label>
<caption>
<p>Top 25 ICD-9 codes as ranked by neighbor-based bootstrapping (
<italic>t</italic>
-test)</p>
</caption>
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead>
<tr>
<th align="left">
<italic>NB2 (</italic>
t
<italic>-test)</italic>
</th>
<th align="center">
<italic>NB2 (odds)</italic>
</th>
<th align="center">
<italic>Moran</italic>
</th>
<th align="center">
<italic>ICD-9 diagnosis name</italic>
</th>
<th align="center">
<italic>ICD-9</italic>
</th>
<th align="center">
<italic>Range</italic>
</th>
<th align="center">
<italic>Sill</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="center">19</td>
<td align="center">1.5</td>
<td align="left">Hypertensive heart disease</td>
<td align="center">402</td>
<td align="center">1890</td>
<td align="center">2.32</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">85</td>
<td align="center">3</td>
<td align="left">Trichomoniasis</td>
<td align="center">131</td>
<td align="center">610</td>
<td align="center">1.33</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">48</td>
<td align="center">5</td>
<td align="left">Legally induced abortion</td>
<td align="center">635</td>
<td align="center">330</td>
<td align="center">1.13</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">110</td>
<td align="center">4</td>
<td align="left">Other arthropod-borne diseases</td>
<td align="center">88</td>
<td align="center">580</td>
<td align="center">1.41</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">34.5</td>
<td align="center">7</td>
<td align="left">Histoplasmosis</td>
<td align="center">115</td>
<td align="center">830</td>
<td align="center">1.55</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">101</td>
<td align="center">17</td>
<td align="left">Other benign neoplasm of uterus</td>
<td align="center">219</td>
<td align="center">90</td>
<td align="center">0.95</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">9</td>
<td align="center">6</td>
<td align="left">Angina pectoris</td>
<td align="center">413</td>
<td align="center">790</td>
<td align="center">0.7</td>
</tr>
<tr>
<td align="left">8</td>
<td align="center">28</td>
<td align="center">1.5</td>
<td align="left">Nonallopathic lesions not elsewhere classified</td>
<td align="center">739</td>
<td align="center">490</td>
<td align="center">0.59</td>
</tr>
<tr>
<td align="left">9</td>
<td align="center">202</td>
<td align="center">26</td>
<td align="left">Other disorders of prostate</td>
<td align="center">602</td>
<td align="center">120</td>
<td align="center">0.81</td>
</tr>
<tr>
<td align="left">10.5</td>
<td align="center">167</td>
<td align="center">21</td>
<td align="left">Other venereal diseases</td>
<td align="center">99</td>
<td align="center">280</td>
<td align="center">0.66</td>
</tr>
<tr>
<td align="left">10.5</td>
<td align="center">32</td>
<td align="center">31</td>
<td align="left">Disorders of tooth development and eruption</td>
<td align="center">520</td>
<td align="center">130</td>
<td align="center">0.45</td>
</tr>
<tr>
<td align="left">12</td>
<td align="center">127.5</td>
<td align="center">21</td>
<td align="left">Ill-defined intestinal infections</td>
<td align="center">9</td>
<td align="center">190</td>
<td align="center">0.64</td>
</tr>
<tr>
<td align="left">13</td>
<td align="center">30</td>
<td align="center">10</td>
<td align="left">Other acute and subacute forms of ischemic heart disease</td>
<td align="center">411</td>
<td align="center">1020</td>
<td align="center">0.6</td>
</tr>
<tr>
<td align="left">14</td>
<td align="center">252.5</td>
<td align="center">51.5</td>
<td align="left">Fetus or newborn affected by other complications of labor and delivery</td>
<td align="center">763</td>
<td align="center">60</td>
<td align="center">0.78</td>
</tr>
<tr>
<td align="left">15</td>
<td align="center">63</td>
<td align="center">14.5</td>
<td align="left">Other deficiency anemias</td>
<td align="center">281</td>
<td align="center">1090</td>
<td align="center">0.69</td>
</tr>
<tr>
<td align="left">16</td>
<td align="center">238</td>
<td align="center">44</td>
<td align="left">Human immunodeficiency virus (HIV) infection</td>
<td align="center">42</td>
<td align="center">410</td>
<td align="center">0.72</td>
</tr>
<tr>
<td align="left">17</td>
<td align="center">254.5</td>
<td align="center">37</td>
<td align="left">Long labor</td>
<td align="center">662</td>
<td align="center">100</td>
<td align="center">1.01</td>
</tr>
<tr>
<td align="left">18</td>
<td align="center">71</td>
<td align="center">21</td>
<td align="left">Pulmonary congestion and hypostasis</td>
<td align="center">514</td>
<td align="center">480</td>
<td align="center">0.58</td>
</tr>
<tr>
<td align="left">19</td>
<td align="center">82</td>
<td align="center">26</td>
<td align="left">Vitamin D deficiency</td>
<td align="center">268</td>
<td align="center">310</td>
<td align="center">0.54</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">196</td>
<td align="center">74.5</td>
<td align="left">Other arthropod-borne viral diseases</td>
<td align="center">66</td>
<td align="center">370</td>
<td align="center">1.34</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">7</td>
<td align="center">9</td>
<td align="left">Other diseases of endocardium</td>
<td align="center">424</td>
<td align="center">570</td>
<td align="center">0.4</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">8</td>
<td align="center">11</td>
<td align="left">Influenza</td>
<td align="center">487</td>
<td align="center">830</td>
<td align="center">0.46</td>
</tr>
<tr>
<td align="left">23</td>
<td align="center">161</td>
<td align="center">44</td>
<td align="left">Sarcoidosis</td>
<td align="center">135</td>
<td align="center">540</td>
<td align="center">0.49</td>
</tr>
<tr>
<td align="left">24</td>
<td align="center">107</td>
<td align="center">26</td>
<td align="left">Other endocrine disorders</td>
<td align="center">259</td>
<td align="center">200</td>
<td align="center">0.49</td>
</tr>
<tr>
<td align="left">25</td>
<td align="center">231.5</td>
<td align="center">88</td>
<td align="left">Chronic laryngitis and laryngotracheitis</td>
<td align="center">476</td>
<td align="center">50</td>
<td align="center">0.62</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tf1">
<p>NB2, neighbor-based bootstrapping.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap id="T2" orientation="portrait" position="float">
<label>Table 2.</label>
<caption>
<p>Top 25 ICD-9 codes as ranked by neighbor-based bootstrapping (log odds)</p>
</caption>
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead>
<tr>
<th align="left">
<italic>NB2 (odds)</italic>
</th>
<th align="center">
<italic>NB2 (</italic>
t
<italic>-test)</italic>
</th>
<th align="center">
<italic>Moran</italic>
</th>
<th align="center">
<italic>ICD-9 diagnosis name</italic>
</th>
<th align="center">
<italic>ICD-9</italic>
</th>
<th align="center">
<italic>Range</italic>
</th>
<th align="center">
<italic>Sill</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="center">171</td>
<td align="center">37</td>
<td align="left">Essential hypertension</td>
<td align="center">401</td>
<td align="center">530</td>
<td align="center">0.07</td>
</tr>
<tr>
<td align="left">2</td>
<td align="center">47</td>
<td align="center">8</td>
<td align="left">Allergic rhinitis</td>
<td align="center">477</td>
<td align="center">630</td>
<td align="center">0.19</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">81</td>
<td align="center">12.5</td>
<td align="left">Other cellulitis and abscess</td>
<td align="center">682</td>
<td align="center">530</td>
<td align="center">0.14</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">70</td>
<td align="center">12.5</td>
<td align="left">Menopausal and postmenopausal disorders</td>
<td align="center">627</td>
<td align="center">530</td>
<td align="center">0.15</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">149</td>
<td align="center">21</td>
<td align="left">Diseases of esophagus</td>
<td align="center">530</td>
<td align="center">530</td>
<td align="center">0.11</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">75</td>
<td align="center">17</td>
<td align="left">Inflammatory disease of cervix vagina and vulva</td>
<td align="center">616</td>
<td align="center">630</td>
<td align="center">0.22</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">21</td>
<td align="center">9</td>
<td align="left">Other diseases of endocardium</td>
<td align="center">424</td>
<td align="center">570</td>
<td align="center">0.4</td>
</tr>
<tr>
<td align="left">8</td>
<td align="center">21</td>
<td align="center">11</td>
<td align="left">Influenza</td>
<td align="center">487</td>
<td align="center">830</td>
<td align="center">0.46</td>
</tr>
<tr>
<td align="left">9</td>
<td align="center">7</td>
<td align="center">6</td>
<td align="left">Angina pectoris</td>
<td align="center">413</td>
<td align="center">790</td>
<td align="center">0.7</td>
</tr>
<tr>
<td align="left">10</td>
<td align="center">261.5</td>
<td align="center">21</td>
<td align="left">Other disorders of urethra and urinary tract</td>
<td align="center">599</td>
<td align="center">530</td>
<td align="center">0.08</td>
</tr>
<tr>
<td align="left">11</td>
<td align="center">383</td>
<td align="center">74.5</td>
<td align="left">Disorders of lipoid metabolism</td>
<td align="center">272</td>
<td align="center">530</td>
<td align="center">0.06</td>
</tr>
<tr>
<td align="left">12</td>
<td align="center">108</td>
<td align="center">74.5</td>
<td align="left">Other forms of chronic ischemic heart disease</td>
<td align="center">414</td>
<td align="center">530</td>
<td align="center">0.15</td>
</tr>
<tr>
<td align="left">13</td>
<td align="center">56</td>
<td align="center">17</td>
<td align="left">Gastritis and duodenitis</td>
<td align="center">535</td>
<td align="center">720</td>
<td align="center">0.3</td>
</tr>
<tr>
<td align="left">14</td>
<td align="center">165.5</td>
<td align="center">51.5</td>
<td align="left">Contact dermatitis and other eczema</td>
<td align="center">692</td>
<td align="center">530</td>
<td align="center">0.12</td>
</tr>
<tr>
<td align="left">15</td>
<td align="center">245.5</td>
<td align="center">101</td>
<td align="left">Symptoms involving cardiovascular system</td>
<td align="center">785</td>
<td align="center">530</td>
<td align="center">0.09</td>
</tr>
<tr>
<td align="left">16</td>
<td align="center">401.5</td>
<td align="center">88</td>
<td align="left">Other symptoms involving abdomen and pelvis</td>
<td align="center">789</td>
<td align="center">530</td>
<td align="center">0.06</td>
</tr>
<tr>
<td align="left">17</td>
<td align="center">431</td>
<td align="center">101</td>
<td align="left">Symptoms involving respiratory system and other chest symptoms</td>
<td align="center">786</td>
<td align="center">530</td>
<td align="center">0.05</td>
</tr>
<tr>
<td align="left">18</td>
<td align="center">110</td>
<td align="center">37</td>
<td align="left">Nonspecific abnormal results of function studies</td>
<td align="center">794</td>
<td align="center">530</td>
<td align="center">0.15</td>
</tr>
<tr>
<td align="left">19</td>
<td align="center">1</td>
<td align="center">1.5</td>
<td align="left">Hypertensive heart disease</td>
<td align="center">402</td>
<td align="center">1890</td>
<td align="center">2.32</td>
</tr>
<tr>
<td align="left">20</td>
<td align="center">413</td>
<td align="center">44</td>
<td align="left">General symptoms</td>
<td align="center">780</td>
<td align="center">530</td>
<td align="center">0.05</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">101</td>
<td align="center">61.5</td>
<td align="left">Other and unspecified anemias</td>
<td align="center">285</td>
<td align="center">630</td>
<td align="center">0.19</td>
</tr>
<tr>
<td align="left">22.5</td>
<td align="center">289</td>
<td align="center">74.5</td>
<td align="left">Diabetes mellitus</td>
<td align="center">250</td>
<td align="center">530</td>
<td align="center">0.09</td>
</tr>
<tr>
<td align="left">22.5</td>
<td align="center">102</td>
<td align="center">61.5</td>
<td align="left">Calculus of kidney and ureter</td>
<td align="center">592</td>
<td align="center">630</td>
<td align="center">0.17</td>
</tr>
<tr>
<td align="left">24</td>
<td align="center">168.5</td>
<td align="center">88</td>
<td align="left">Dermatophytosis</td>
<td align="center">110</td>
<td align="center">530</td>
<td align="center">0.14</td>
</tr>
<tr>
<td align="left">25.5</td>
<td align="center">210</td>
<td align="center">74.5</td>
<td align="left">Functional digestive disorders not elsewhere classified</td>
<td align="center">564</td>
<td align="center">530</td>
<td align="center">0.1</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3" orientation="portrait" position="float">
<label>Table 3.</label>
<caption>
<p>Top 25 ICD-9 codes as ranked by Moran's
<italic>I</italic>
</p>
</caption>
<pmc-comment>OASIS TABLE HERE</pmc-comment>
<table frame="hsides" rules="groups">
<colgroup>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
<col align="left"></col>
</colgroup>
<thead>
<tr>
<th align="left">
<italic>Moran</italic>
</th>
<th align="center">
<italic>NB2 (</italic>
t
<italic>-test)</italic>
</th>
<th align="center">
<italic>NB2 (odds)</italic>
</th>
<th align="center">
<italic>ICD-9 diagnosis name</italic>
</th>
<th align="center">
<italic>ICD-9</italic>
</th>
<th align="center">
<italic>Range</italic>
</th>
<th align="center">
<italic>Sill</italic>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1.5</td>
<td align="center">1</td>
<td align="center">19</td>
<td align="left">Hypertensive heart disease</td>
<td align="center">402</td>
<td align="center">1890</td>
<td align="center">2.32</td>
</tr>
<tr>
<td align="left">1.5</td>
<td align="center">8</td>
<td align="center">28</td>
<td align="left">Nonallopathic lesions not elsewhere classified</td>
<td align="center">739</td>
<td align="center">490</td>
<td align="center">0.59</td>
</tr>
<tr>
<td align="left">3</td>
<td align="center">2</td>
<td align="center">85</td>
<td align="left">Trichomoniasis</td>
<td align="center">131</td>
<td align="center">610</td>
<td align="center">1.33</td>
</tr>
<tr>
<td align="left">4</td>
<td align="center">4</td>
<td align="center">110</td>
<td align="left">Other arthropod-borne diseases</td>
<td align="center">88</td>
<td align="center">580</td>
<td align="center">1.41</td>
</tr>
<tr>
<td align="left">5</td>
<td align="center">3</td>
<td align="center">48</td>
<td align="left">Legally induced abortion</td>
<td align="center">635</td>
<td align="center">330</td>
<td align="center">1.13</td>
</tr>
<tr>
<td align="left">6</td>
<td align="center">7</td>
<td align="center">9</td>
<td align="left">Angina pectoris</td>
<td align="center">413</td>
<td align="center">790</td>
<td align="center">0.7</td>
</tr>
<tr>
<td align="left">7</td>
<td align="center">5</td>
<td align="center">34.5</td>
<td align="left">Histoplasmosis</td>
<td align="center">115</td>
<td align="center">830</td>
<td align="center">1.55</td>
</tr>
<tr>
<td align="left">8</td>
<td align="center">47</td>
<td align="center">2</td>
<td align="left">Allergic rhinitis</td>
<td align="center">477</td>
<td align="center">630</td>
<td align="center">0.19</td>
</tr>
<tr>
<td align="left">9</td>
<td align="center">21</td>
<td align="center">7</td>
<td align="left">Other diseases of endocardium</td>
<td align="center">424</td>
<td align="center">570</td>
<td align="center">0.4</td>
</tr>
<tr>
<td align="left">10</td>
<td align="center">13</td>
<td align="center">30</td>
<td align="left">Other acute and subacute forms of ischemic heart disease</td>
<td align="center">411</td>
<td align="center">1020</td>
<td align="center">0.6</td>
</tr>
<tr>
<td align="left">11</td>
<td align="center">21</td>
<td align="center">8</td>
<td align="left">Influenza</td>
<td align="center">487</td>
<td align="center">830</td>
<td align="center">0.46</td>
</tr>
<tr>
<td align="left">12.5</td>
<td align="center">70</td>
<td align="center">4</td>
<td align="left">Menopausal and postmenopausal disorders</td>
<td align="center">627</td>
<td align="center">530</td>
<td align="center">0.15</td>
</tr>
<tr>
<td align="left">12.5</td>
<td align="center">81</td>
<td align="center">3</td>
<td align="left">Other cellulitis and abscess</td>
<td align="center">682</td>
<td align="center">530</td>
<td align="center">0.14</td>
</tr>
<tr>
<td align="left">14.5</td>
<td align="center">53</td>
<td align="center">34.5</td>
<td align="left">Neoplasm of uncertain behavior of other and unspecified sites and tissues</td>
<td align="center">238</td>
<td align="center">220</td>
<td align="center">0.3</td>
</tr>
<tr>
<td align="left">14.5</td>
<td align="center">15</td>
<td align="center">63</td>
<td align="left">Other deficiency anemias</td>
<td align="center">281</td>
<td align="center">1090</td>
<td align="center">0.69</td>
</tr>
<tr>
<td align="left">17</td>
<td align="center">6</td>
<td align="center">101</td>
<td align="left">Other benign neoplasm of uterus</td>
<td align="center">219</td>
<td align="center">90</td>
<td align="center">0.95</td>
</tr>
<tr>
<td align="left">17</td>
<td align="center">56</td>
<td align="center">13</td>
<td align="left">Gastritis and duodenitis</td>
<td align="center">535</td>
<td align="center">720</td>
<td align="center">0.3</td>
</tr>
<tr>
<td align="left">17</td>
<td align="center">75</td>
<td align="center">6</td>
<td align="left">Inflammatory disease of cervix vagina and vulva</td>
<td align="center">616</td>
<td align="center">630</td>
<td align="center">0.22</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">12</td>
<td align="center">127.5</td>
<td align="left">Ill-defined intestinal infections</td>
<td align="center">9</td>
<td align="center">190</td>
<td align="center">0.64</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">10.5</td>
<td align="center">167</td>
<td align="left">Other venereal diseases</td>
<td align="center">99</td>
<td align="center">280</td>
<td align="center">0.66</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">18</td>
<td align="center">71</td>
<td align="left">Pulmonary congestion and hypostasis</td>
<td align="center">514</td>
<td align="center">480</td>
<td align="center">0.58</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">149</td>
<td align="center">5</td>
<td align="left">Diseases of esophagus</td>
<td align="center">530</td>
<td align="center">530</td>
<td align="center">0.11</td>
</tr>
<tr>
<td align="left">21</td>
<td align="center">261.5</td>
<td align="center">10</td>
<td align="left">Other disorders of urethra and urinary tract</td>
<td align="center">599</td>
<td align="center">530</td>
<td align="center">0.08</td>
</tr>
<tr>
<td align="left">26</td>
<td align="center">24</td>
<td align="center">107</td>
<td align="left">Other endocrine disorders</td>
<td align="center">259</td>
<td align="center">200</td>
<td align="center">0.49</td>
</tr>
<tr>
<td align="left">26</td>
<td align="center">19</td>
<td align="center">82</td>
<td align="left">Vitamin D deficiency</td>
<td align="center">268</td>
<td align="center">310</td>
<td align="center">0.54</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s008">
<title>Scale of spatial influence</title>
<p>We applied a geostatistical ordinary kriging procedure using the R package automap to fit semivariograms models describing the spatial variation across the continental United States for the incidence rates of each of the ICD-9 diagnostic codes. The semivariograms show the mean semivariance of values in binned separation distances between all pairs of spatial points. Here, we use as input values the log incidence of the given ICD-9 in a county and approximate the spatial location of the observation as the county population centroid given by the US 2010 Census.</p>
<p>The semivariogram describes the distance within which the incidence rate is spatially autocorrelated. At separation distances where the semivariance is low, points have similar incidence rates. To quantify the size of spatial variation, we fit exponential semivariogram models to the data. The semivariogram model range describes the distance at which the model flattens to a constant semivariance. The semivariogram model sill describes the semivariance value at the range.</p>
<p>In
<xref ref-type="fig" rid="f2">Figure 2</xref>
, we show two sample semivariograms, one for an ICD-9 code ranked highly by both the NB2
<italic>t</italic>
-test implementation and Moran's
<italic>I</italic>
(219: Other benign neoplasms of uterus) but relatively low by the NB2 log odds implementations and one for an ICD-9 code ranked highly by both the NB2 log odds implementation and Moran's
<italic>I</italic>
but relatively low by the NB2
<italic>t</italic>
-test implementation (477: Allergic rhinitis). The semivariogram model for 219 has a steep rise that quickly flattens (shorter range), and the semivariogram model for 477 continues to rise at large distance. The incidence rate maps in the bottom of
<xref ref-type="fig" rid="f2">Figure 2</xref>
correspondingly show smaller, high peaked cluster patterns of spatial variation for 219 (top left) and a larger scale gradation for 477 (top right).</p>
<fig id="f2" fig-type="figure" orientation="portrait" position="float">
<label>
<bold>FIG. 2.</bold>
</label>
<caption>
<p>Example semivariograms and incidence rate maps for two ICD-9 codes. Top: Semivariograms for ICD-9 219, other benign neoplasms of uterus, (left) and 477, allergic rhinitis, (right). Bottom: corresponding incidence rate maps. For ease of reading, the figure can be viewed online at
<uri xlink:type="simple" xlink:href="http://www.liebertpub.com/big">www.liebertpub.com/big</uri>
</p>
</caption>
<graphic xlink:href="fig-2"></graphic>
</fig>
<p>We compare the results of semivariogram modeling for the highest ranked ICD-9 codes using the two NB2 method implementations to the semivariogram models for the highest ranked ICD-9 codes using Moran's
<italic>I</italic>
statistic. Specifically, we compare average semivariogram model properties between groups of the top
<italic>N</italic>
ranked ICD-9 codes for increasing values of
<italic>N</italic>
using the two NB2 rankings and Moran's
<italic>I</italic>
ranking. We will refer to
<italic>N</italic>
as the rank threshold.</p>
<p>In the top of
<xref ref-type="fig" rid="f3">Figure 3</xref>
(left), we show the mean semivariogram range versus the rank threshold
<italic>N</italic>
using the NB2 method with
<italic>t</italic>
-test comparison (black) and Moran's
<italic>I</italic>
statistic (gray). This shows the average distance range within which the incidence rates are autocorrelated for the top
<italic>N</italic>
ranked ICD-9 codes by each method. For example, the mean semivariogram model range for the top 25 ICD-9 codes ranked by the NB2 method is 495 versus 625 km for the top 25 Moran's
<italic>I</italic>
statistic rankings. For the top 100 ICD-9 codes, the mean semivariogram model range is 404 and 509 km for the NB2 method with
<italic>t</italic>
-test comparison and Moran's
<italic>I</italic>
statistic, respectively. Generally, the NB2 method using the
<italic>t</italic>
-test comparison implementation ranks more highly ICD-9 codes showing spatial variation with smaller ranges, or smaller areas of autocorrelation.</p>
<fig id="f3" fig-type="figure" orientation="portrait" position="float">
<label>
<bold>FIG. 3.</bold>
</label>
<caption>
<p>Average semivariogram properties for groups of top
<italic>N</italic>
ICD-9 codes by the NB2 and Moran's
<italic>I</italic>
methods. We show the mean range (left) and mean sill (right) for both the NB2 method (black points; top:
<italic>t</italic>
-test implementation, bottom: log odds implementation) and Moran's
<italic>I</italic>
statistic (gray crosses) plotted against the rank threshold
<italic>N</italic>
. Compared to Moran's
<italic>I</italic>
, the NB2 method with
<italic>t</italic>
-test implementation ranks more highly spatial variation with smaller ranges (autocorrelation within smaller distances) and larger sills (greater variance), whereas the NB2 method with log odds implementation ranks more highly spatial variation with larger ranges (autocorrelation within larger distances) and smaller sills (lower variance).</p>
</caption>
<graphic xlink:href="fig-3"></graphic>
</fig>
<p>In the bottom of
<xref ref-type="fig" rid="f3">Figure 3</xref>
(left), for comparison we show the same plot of the highest ranking semivariogram range properties for the NB2 method with log odds comparison (black) and Moran's
<italic>I</italic>
statistic (gray). Generally, the NB2 method using the log odds comparison implementation ranks more highly ICD-9 codes showing spatial variation with larger ranges, or larger regions of autocorrelation.</p>
<p>In the top of
<xref ref-type="fig" rid="f3">Figure 3</xref>
(right), we show the mean semivariogram sill versus the rank threshold
<italic>N</italic>
using the NB2
<italic>t</italic>
-test implementation (black) and Moran's
<italic>I</italic>
statistic (gray). This essentially shows an estimate of the average variance in incidence rates across the United States for the top
<italic>N</italic>
ranked ICD-9 codes by each method. For example, the mean sill for the top 25 ICD-9 codes ranked by the NB2 method is 0.85 versus 0.67 for the top 25 Moran's
<italic>I</italic>
statistic rankings. For the top 100 ICD-9 codes, the mean semivariogram model sill is 0.52 and 0.42 for the NB2 method
<italic>t</italic>
-test implementation and Moran's
<italic>I</italic>
statistic, respectively. In this case the NB2
<italic>t</italic>
-test implementation generally ranks more highly ICD-9 codes with larger variance in the incidence rates across the United States.</p>
<p>In the bottom of
<xref ref-type="fig" rid="f3">Figure 3</xref>
(right), we show the same plot of the highest ranking semivariogram sill properties for the NB2 method with log odds comparison (black) and Moran's
<italic>I</italic>
statistic (gray). Generally, the NB2 method using the log odds comparison implementation ranks more highly the autocorrelated ICD-9 codes with smaller variance.</p>
</sec>
</sec>
<sec sec-type="discussion" id="s009">
<title>Discussion</title>
<p>There are many possible explanations for spatial patterns in the incidence rates of ICD-9 EMR data, and the rank ordering of ICD-9 codes with the described methods does not attempt to attribute any inferred pattern to a specific cause or suggest that the spatial variation is due to a physical environmental factor. Rather, we provide here a spatial autocorrelation method that can be implemented in multiple ways depending on the type of spatial pattern of interest. This flexibility is useful given that different categories of underlying factors as well as categories of disease can manifest as different spatial patterns, as we discuss below.</p>
<p>Applying the NB2 algorithm to this dataset identified various known geospatial disease patterns. For example, histoplasmosis (ICD-9 code 115) is known to be associated with bats in caves around the Ohio and Mississippi River valley,
<sup>
<xref rid="B37" ref-type="bibr">37</xref>
</sup>
and this pattern was picked up on the fifth row of
<xref ref-type="table" rid="T1">Table 1</xref>
. As another example, hypertensive heart disease (ICD-9 code 402) and essential hypertension (ICD-code 401) are known to follow a geospatial “heart failure belt.”
<sup>
<xref rid="B38" ref-type="bibr">38</xref>
</sup>
Hypertensive heart disease is the highest rank under the NB2
<italic>t</italic>
-test in
<xref ref-type="table" rid="T1">Table 1</xref>
and essential hypertension is the highest rank under NB2 using the log odds test in
<xref ref-type="table" rid="T2">Table 2</xref>
. On the other hand, the underlying reasons for many of the other highly ranked ICD-9 codes remains to be investigated.</p>
<p>Incidence levels of diseases are influenced by a variety of factors, including:
<list list-type="simple">
<list-item>
<p>• 
<italic>Physical environment</italic>
—Some diseases are known to be related to the physical environment.</p>
</list-item>
<list-item>
<p>• 
<italic>Socioeconomic environment</italic>
—The incidence levels of some diseases are impacted by socioeconomic or regional cultural differences.</p>
</list-item>
<list-item>
<p>• 
<italic>Structural environment</italic>
—The incidence levels of some diseases reflect in part geospatial differences in insurance, provider billing or reimbursement patterns, local regulations, and related factors.</p>
</list-item>
</list>
</p>
<p>We show several incidence rate maps in
<xref ref-type="fig" rid="f4">Figure 4</xref>
as examples of patterns corresponding to these three types. These ICD-9 codes are all ranked in the top 25 according to at least one implementation of the NB2 method. In the top left is a map showing ICD-9 code 088: Other arthropod-borne diseases, which includes Lyme disease, a disease carried by ticks and known to have a regional concentration in the northeastern United States and western Wisconsin areas. We consider this as an example of an ICD-9 code with spatial variation due to the
<italic>physical environment</italic>
. In the top right is a map showing ICD-9 code 635: Legally induced abortion. The spatial variation for this ICD-9 code shows clear delineation of the borders between states, which is likely to be due to differences in the
<italic>structural environment</italic>
. The delineation is particularly apparent on the borders between California and Nevada and New York and Pennsylvania. In the bottom left of
<xref ref-type="fig" rid="f4">Figure 4</xref>
is a map showing ICD-9 code 402: Hypertensive heart disease, which is the ICD-9 code ranked highest by the NB2 method. The spatial variation shows a pattern of higher incidence rate across a large crescent in the southern United States. Given this cross-state regionally concentrated pattern, we define this to be an example of differences in the
<italic>socioeconomic environment</italic>
. In the bottom right we show a map of ICD-9 code 763: Fetus or newborn affected by other complications of labor and delivery, which is not easily classified as the previous three examples.</p>
<fig id="f4" fig-type="figure" orientation="portrait" position="float">
<label>
<bold>FIG. 4.</bold>
</label>
<caption>
<p>Incidence maps for several ICD-9 codes with different types of spatial variation. We show here 088: other arthropod-borne diseases (top left); 635: legally induced abortion (top right); 402: hypertensive heart disease (bottom left); and 763: fetus or newborn affected by other complications of labor and delivery (bottom right). For ease of reading, the figure can be viewed online at
<uri xlink:type="simple" xlink:href="http://www.liebertpub.com/big">www.liebertpub.com/big</uri>
</p>
</caption>
<graphic xlink:href="fig-4"></graphic>
</fig>
<p>As can be seen from
<xref ref-type="table" rid="T1">Tables 1</xref>
to 3, for patterns with hot spots that are large, well isolated, and sharply peaked (e.g., see ICD-9 code 402 in
<xref ref-type="table" rid="T1">Table 1</xref>
), any of the three methods rank such patterns high in the list. On the hand, for patterns that are more diffuse or with multiple smaller peaks that are closer together, the NB2 with the
<italic>t</italic>
-test ranks such patterns higher than Moran's
<italic>I</italic>
test (e.g., see ICD-9 code 763 in
<xref ref-type="table" rid="T1">Table 1</xref>
).</p>
<sec id="s010">
<title>Characteristics of disease categories</title>
<p>In building semivariogram models describing the spatial variation for each ICD-9 code, we also looked at the model properties for categories of disease collectively. We grouped the ICD-9 codes according to standard categories, for example, 001–139 Infectious and Parasitic Diseases, 140–239 Neoplasms, etc. For each group we found the mean semivariogram model range, excluding ICD-9 codes where the semivariogram model range fit failed to iterate beyond the initial starting value, which leaves 286 individual ICD-9 codes in 17 categories. In
<xref ref-type="fig" rid="f5">Figure 5</xref>
, we show a box plot of the semivariogram model ranges for each category ordered by increasing mean range.</p>
<fig id="f5" fig-type="figure" orientation="portrait" position="float">
<label>
<bold>FIG. 5.</bold>
</label>
<caption>
<p>Box plot of variogram model ranges for categories of ICD-9 codes. The categories are in increasing order by the average variogram model range (marked by red triangles) for each category.</p>
</caption>
<graphic xlink:href="fig-5"></graphic>
</fig>
<p>There is no correlation between the mean semivariogram model ranges of categories and the number of ICD-9 codes grouped into each category. The categories with the fewest remaining ICD-9 codes are Symptoms, Signs, and Ill-defined Conditions (three codes), Diseases of the Blood and Blood-forming Organs (six codes), and Diseases of the Skin and Subcutaneous Tissue (eight codes). The categories of Neoplasms and Infectious and Parasitic Diseases have the most codes with 29 and 26 codes, respectively. However, the diseases with the smallest ranges generally also have low mean incidence rates across the United States.</p>
<p>Given the variation of typical range values across different disease categories, one or the other presented implementation of the NB2 method may be appropriate for the detection of a spatial pattern for the type of disease of interest.</p>
</sec>
</sec>
<sec sec-type="conclusions" id="s011">
<title>Conclusions</title>
<p>We have described here a bootstrap method that can be implemented in multiple ways for detecting patterns in spatial variation based upon a region's neighbors. The NB2 method is a procedure for quantifying how much more accurate an estimate of the value of interest is based on values from bootstrapped neighboring units than bootstrapped randomly chosen units.</p>
<p>We have compared two implementations of the NB2 method to Moran's
<italic>I</italic>
statistic for measuring spatial autocorrelation. Generally, the NB2 method and Moran's
<italic>I</italic>
statistic are in rough agreement although with some scatter and interesting differences. Looking at the rank orderings of ICD-9 code county incidence rates across the United States ranked by the NB2 method and by Moran's
<italic>I</italic>
statistic shows that, by choosing one or the other implementation of the NB2 method, we can favor spatial variation with autocorrelation within smaller distances or of larger scale. Compared with Moran's
<italic>I</italic>
statistic, the NB2 method allows more flexibility in controlling the type of spatial autocorrelation of interest.</p>
<p>Compared to Moran's
<italic>I</italic>
statistic, the NB2 method using the
<italic>t</italic>
-test comparison ranks more highly the ICD-9 codes that appear to have multiple small clusters over a region whereas the NB2 method using a log odds comparison ranks more highly the ICD-9 codes with large regional gradients. We also compared the spatial properties of categories of disease by looking at the mean fitted semivariogram properties of each category and found that different categories of disease as a whole may have larger or smaller size scales of autocorrelation, as measured by average semivariogram model ranges. For example, ICD-9 codes related to conditions originating in the perinatal period generally have spatial variation that is autocorrelated within smaller distance ranges than ICD-9 codes related to diseases of the blood and blood-forming organs. Given this difference in spatial variation scale, one or the other implementation of the NB2 method may be more appropriate depending on the category of disease.</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="SD1">
<caption>
<title>Supplemental data</title>
</caption>
<media mimetype="application" mime-subtype="pdf" xlink:href="Supp_Data.pdf" orientation="portrait" xlink:type="simple" id="d38e2645" position="anchor"></media>
</supplementary-material>
</sec>
</body>
<back>
<sec id="s012" sec-type="ack">
<title>Acknowledgments</title>
<p>This material is based in part upon work supported by the National Science Foundation under grant number 1129076 and The National Cancer Institute (NCI) under Contract NIH/Leidos Biomedical Research, Inc. 13XS021/HHSN261200800001E. This work made use of the Open Science Data Cloud (OSDC), managed by the Open Commons Consortium (OCC) and funded in part by grants from the Gordon and Betty Moore Foundation.
<sup>
<xref rid="B39" ref-type="bibr">39</xref>
</sup>
</p>
</sec>
<sec id="s013" sec-type="COI-statement">
<title>Author Disclosure Statement</title>
<p>No competing financial interests exist.</p>
</sec>
<ref-list content-type="parsed">
<title>References</title>
<ref id="B1">
<label>1</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Friedman</surname>
<given-names>DJ</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Parrish</surname>
<given-names>RG</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Ross</surname>
<given-names>DA</given-names>
</name>
</person-group>
<article-title>Electronic health records and us public health: Current realities and future promise</article-title>
.
<source>Am J Public Health</source>
.
<year>2013</year>
;
<volume>103</volume>
:
<fpage>1560</fpage>
<lpage>1567</lpage>
<pub-id pub-id-type="pmid">23865646</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<label>2</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Murdoch</surname>
<given-names>T</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Detsky</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>The inevitable application of big data to health care</article-title>
.
<source>JAMA</source>
.
<year>2013</year>
;
<volume>309</volume>
:
<fpage>1351</fpage>
<lpage>1352</lpage>
<pub-id pub-id-type="pmid">23549579</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<label>3</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Brownstein</surname>
<given-names>JS</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Freifeld</surname>
<given-names>CC</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Madoff</surname>
<given-names>LC</given-names>
</name>
</person-group>
<article-title>Digital disease detection: Harnessing the web for public health surveillance</article-title>
.
<source>N Engl J Med</source>
.
<year>2009</year>
;
<volume>360</volume>
:
<fpage>2153</fpage>
<lpage>2157</lpage>
<pub-id pub-id-type="pmid">19423867</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<label>4</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Salathe</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Bengtsson</surname>
<given-names>l</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Bodnar</surname>
<given-names>TJ</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Digital epidemiology</article-title>
.
<source>PLoS Comput Biol</source>
.
<year>2012</year>
;
<volume>8</volume>
:
<fpage>e100261</fpage>
<lpage>6</lpage>
</mixed-citation>
</ref>
<ref id="B5">
<label>5</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Generous</surname>
<given-names>N</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Fairchild</surname>
<given-names>G</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Deshpande</surname>
<given-names>A</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Global disease monitoring and forecasting with wikipedia</article-title>
.
<source>PLoS Comput Biol</source>
.
<year>2014</year>
;
<volume>10</volume>
:
<fpage>e100389</fpage>
<lpage>2</lpage>
</mixed-citation>
</ref>
<ref id="B6">
<label>6</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Elliott</surname>
<given-names>P</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Wartenberg</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Spatial epidemiology: Current approaches and future challenges</article-title>
.
<source>Environ Health Perspect</source>
.
<year>2004</year>
;
<volume>112</volume>
:
<fpage>998</fpage>
<lpage>1006</lpage>
<pub-id pub-id-type="pmid">15198920</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<label>7</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Rushton</surname>
<given-names>G</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Armstrong</surname>
<given-names>MP</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Gittler</surname>
<given-names>J</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Geocoding health data: The use of geographic codes in cancer prevention and control, research and practice</article-title>
.
<publisher-name>CRC Press</publisher-name>
<year>2007</year>
</mixed-citation>
</ref>
<ref id="B8">
<label>8</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Beale</surname>
<given-names>L</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Abellan</surname>
<given-names>JJ</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Hodgson</surname>
<given-names>S</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Jarup</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Methodologic issues and approaches to spatial epidemiology</article-title>
.
<source>Environ Health Perspect</source>
.
<year>2008</year>
;
<volume>116</volume>
:
<fpage>1105</fpage>
<lpage>1110</lpage>
<pub-id pub-id-type="pmid">18709139</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<label>9</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Cromley</surname>
<given-names>EK</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>McLafferty</surname>
<given-names>SL</given-names>
</name>
</person-group>
<article-title>GIS and public health</article-title>
.
<publisher-name>Guilford Press</publisher-name>
<year>2011</year>
</mixed-citation>
</ref>
<ref id="B10">
<label>10</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Noble</surname>
<given-names>D</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Smith</surname>
<given-names>D</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Mathur</surname>
<given-names>R</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Feasibility study of geospatial mapping of chronic disease risk to inform public health commissioning</article-title>
.
<source>BMJ Open</source>
.
<year>2012</year>
;
<volume>2</volume>
:
<fpage>e00071</fpage>
<lpage>1</lpage>
</mixed-citation>
</ref>
<ref id="B11">
<label>11</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Becker</surname>
<given-names>KM</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Glass</surname>
<given-names>GE</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Brathwaite</surname>
<given-names>W</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Geographic epidemiology of gonorrhea in Baltimore, Maryland, using a geographic information system</article-title>
.
<source>Am J Epidemiol</source>
.
<year>1998</year>
;
<volume>147</volume>
:
<fpage>709</fpage>
<lpage>716</lpage>
<pub-id pub-id-type="pmid">9554611</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<label>12</label>
<mixed-citation publication-type="web">
<person-group person-group-type="author">
<name>
<surname>Tiefelsdorf</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Modelling spatial processes: The identification and analysis of spatial relationships in regression residuals by means of Moran's I (Germany)</article-title>
. Theses and Dissertations (Comprehensive), 480,
<year>1998</year>
Available at:
<uri xlink:type="simple" xlink:href="http://scholars.wlu.ca/etd/480">http://scholars.wlu.ca/etd/480</uri>
</mixed-citation>
</ref>
<ref id="B13">
<label>13</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moran</surname>
<given-names>PA</given-names>
</name>
</person-group>
<article-title>Notes on continuous stochastic phenomena</article-title>
.
<source>Biometrika</source>
.
<year>1950</year>
;
<volume>37</volume>
:
<fpage>17</fpage>
<lpage>23</lpage>
<pub-id pub-id-type="pmid">15420245</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<label>14</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Ripley</surname>
<given-names>BD</given-names>
</name>
</person-group>
<article-title>Spatial statistics</article-title>
, vol.
<volume>575</volume>
<publisher-loc>Hoboken, NJ</publisher-loc>
:
<publisher-name>John Wiley & Sons</publisher-name>
<year>2005</year>
</mixed-citation>
</ref>
<ref id="B15">
<label>15</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kitron</surname>
<given-names>U</given-names>
</name>
</person-group>
<article-title>Landscape ecology and epidemiology of vector-borne diseases: Tools for spatial analysis</article-title>
.
<source>J Med Entomol</source>
.
<year>1998</year>
;
<volume>35</volume>
:
<fpage>435</fpage>
<lpage>445</lpage>
<pub-id pub-id-type="pmid">9701925</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<label>16</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hay</surname>
<given-names>S</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Omumbo</surname>
<given-names>J</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Craig</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Snow</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Earth observation, geographic information systems and plasmodium falciparum malaria in sub-Saharan Africa</article-title>
.
<source>Adv Parasitol</source>
.
<year>2000</year>
;
<volume>47</volume>
:
<fpage>173</fpage>
<lpage>215</lpage>
<pub-id pub-id-type="pmid">10997207</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<label>17</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Moonan</surname>
<given-names>PK</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Using GIS technology to identify areas of tuberculosis transmission and incidence</article-title>
.
<source>Int J Health Geogr</source>
.
<year>2004</year>
;
<volume>3</volume>
:
<fpage>2</fpage>
<lpage>3</lpage>
<pub-id pub-id-type="pmid">14748926</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<label>18</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sasaki</surname>
<given-names>S</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Suzuki</surname>
<given-names>H</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Igarashi</surname>
<given-names>K</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Spatial analysis of risk factor of cholera outbreak for 2003–2004 in a peri-urban area of Lusaka, Zambia</article-title>
.
<source>Am J Trop Med Hyg</source>
.
<year>2008</year>
;
<volume>79</volume>
:
<fpage>414</fpage>
<lpage>421</lpage>
<pub-id pub-id-type="pmid">18784235</pub-id>
</mixed-citation>
</ref>
<ref id="B19">
<label>19</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kamadjeu</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Tracking the polio virus down the congo river: A case study on the use of Google earth? In public health planning and mapping</article-title>
.
<source>Int J Health Geogr</source>
.
<year>2009</year>
;
<volume>8</volume>
:
<fpage>4</fpage>
<pub-id pub-id-type="pmid">19161606</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<label>20</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nuckols</surname>
<given-names>JR</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Ward</surname>
<given-names>MH</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Jarup</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Using geographic information systems for exposure assessment in environmental epidemiology studies</article-title>
.
<source>Environ Health Perspect</source>
.
<year>2004</year>
;
<volume>112</volume>
:
<fpage>1007</fpage>
<lpage>1015</lpage>
<pub-id pub-id-type="pmid">15198921</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<label>21</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weis</surname>
<given-names>BK</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Balshaw</surname>
<given-names>D</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Barr</surname>
<given-names>JR</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Personalized exposure assessment: Promising approaches for human environmental health research</article-title>
.
<source>Environ Health Perspect</source>
.
<year>2005</year>
;
<volume>113</volume>
:
<fpage>840</fpage>
<lpage>848</lpage>
<pub-id pub-id-type="pmid">16002370</pub-id>
</mixed-citation>
</ref>
<ref id="B22">
<label>22</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Huang</surname>
<given-names>Y-L</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Batterman</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Residence location as a measure of environmental exposure: A review of air pollution epidemiology studies</article-title>
.
<source>J Expo Anal Environ Epidemiol</source>
.
<year>1999</year>
;
<volume>10</volume>
:
<fpage>66</fpage>
<lpage>85</lpage>
</mixed-citation>
</ref>
<ref id="B23">
<label>23</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jarup</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Health and environment information systems for exposure and disease mapping, and risk assessment</article-title>
.
<source>Environ Health Perspect</source>
.
<year>2004</year>
;
<volume>112</volume>
:
<fpage>995</fpage>
<lpage>997</lpage>
<pub-id pub-id-type="pmid">15198919</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<label>24</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Graves</surname>
<given-names>BA</given-names>
</name>
</person-group>
<article-title>Integrative literature review: A review of literature related to geographical information systems, healthcare access, and health outcomes</article-title>
.
<source>Perspect Health Inf Manag</source>
.
<year>2008</year>
;
<volume>5</volume>
:
<fpage>5</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">18458788</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<label>25</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Dean</surname>
<given-names>HD</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Fenton</surname>
<given-names>KA</given-names>
</name>
</person-group>
<article-title>Addressing social determinants of health in the prevention and control of HIV/AIDS, viral hepatitis, sexually transmitted infections, and tuberculosis</article-title>
.
<source>Public Health Rep</source>
.
<year>2010</year>
;
<volume>125</volume>
:
<fpage>1</fpage>
</mixed-citation>
</ref>
<ref id="B26">
<label>26</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Harrison</surname>
<given-names>KM</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Dean</surname>
<given-names>HD</given-names>
</name>
</person-group>
<article-title>Use of data systems to address social determinants of health: A need to do more</article-title>
.
<source>Public Health Rep</source>
.
<year>2011</year>
;
<volume>126</volume>
:
<fpage>1</fpage>
</mixed-citation>
</ref>
<ref id="B27">
<label>27</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Gatrell</surname>
<given-names>AC</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Elliott</surname>
<given-names>SJ</given-names>
</name>
</person-group>
<article-title>Geographies of health: An introduction</article-title>
.
<publisher-loc>West Sussex, UK</publisher-loc>
:
<publisher-name>John Wiley & Sons</publisher-name>
<year>2014</year>
</mixed-citation>
</ref>
<ref id="B28">
<label>28</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Luther</surname>
<given-names>SL</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Studnicki</surname>
<given-names>J</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Kromrey</surname>
<given-names>J</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>A method to measure the impact of primary care programs targeted to reduce racial and ethnic disparities in health outcomes</article-title>
.
<source>J Public Health Manag Pract</source>
.
<year>2003</year>
;
<volume>9</volume>
:
<fpage>243</fpage>
<lpage>248</lpage>
<pub-id pub-id-type="pmid">12747322</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<label>29</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Geraghty</surname>
<given-names>EM</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Balsbaugh</surname>
<given-names>T</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Nuovo</surname>
<given-names>J</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Using geographic information systems (GIS) to assess outcome disparities in patients with type 2 diabetes and hyperlipidemia</article-title>
.
<source>J Am Board Fam Med</source>
.
<year>2010</year>
;
<volume>23</volume>
:
<fpage>88</fpage>
<lpage>96</lpage>
<pub-id pub-id-type="pmid">20051547</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<label>30</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Comer</surname>
<given-names>KF</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Grannis</surname>
<given-names>S</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Dixon</surname>
<given-names>BE</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Incorporating geospatial capacity within clinical data systems to address social determinants of health</article-title>
.
<source>Public Health Rep</source>
.
<year>2011</year>
;
<volume>126</volume>
:
<fpage>5</fpage>
<lpage>4</lpage>
</mixed-citation>
</ref>
<ref id="B31">
<label>31</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rodriguez</surname>
<given-names>RA</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Hotchkiss</surname>
<given-names>JR</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>O'Hare</surname>
<given-names>AM</given-names>
</name>
</person-group>
<article-title>Geographic information systems and chronic kidney disease: Racial disparities, rural residence and forecasting</article-title>
.
<source>J Nephrol</source>
.
<year>2013</year>
;
<volume>26</volume>
:
<fpage>3</fpage>
<pub-id pub-id-type="pmid">23065915</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<label>32</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rzhetsky</surname>
<given-names>A</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Bagley</surname>
<given-names>SC</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>K</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>Environmental and state-level regulatory factors affect the incidence of autism and intellectual disability</article-title>
.
<source>PLoS Comput Biol</source>
.
<year>2014</year>
;
<volume>10</volume>
:
<fpage>e100351</fpage>
<lpage>8</lpage>
</mixed-citation>
</ref>
<ref id="B33">
<label>33</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Klein</surname>
<given-names>RJ</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Schoenborn</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Age adjustment using the 2000 projected us population. Healthy People Statistical Notes</article-title>
.
<publisher-loc>Hyattsville, MD</publisher-loc>
:
<publisher-name>National Center for Health Statistics</publisher-name>
,
<year>2001</year>
;
<fpage>20</fpage>
</mixed-citation>
</ref>
<ref id="B34">
<label>34</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Day</surname>
<given-names>JC</given-names>
</name>
</person-group>
<article-title>Population projections of the United States, by age, sex, race, and Hispanic origin: 1992 to 2050</article-title>
.
<year>1092</year>
,
<publisher-name>US Department of Commerce, Economics and Statistics Administration, Bureau of the Census</publisher-name>
<publisher-loc>Washington, DC</publisher-loc>
,
<year>1992</year>
</mixed-citation>
</ref>
<ref id="B35">
<label>35</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Waller</surname>
<given-names>LA</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Gotway</surname>
<given-names>CA</given-names>
</name>
</person-group>
<article-title>Applied spatial statistics for Public Health data</article-title>
, vol.
<volume>368</volume>
<publisher-loc>Hoboken, NJ</publisher-loc>
:
<publisher-name>John Wiley & Sons</publisher-name>
<year>2004</year>
</mixed-citation>
</ref>
<ref id="B36">
<label>36</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Bivand</surname>
<given-names>RS</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Pebesma</surname>
<given-names>E</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Gómez-Rubio</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>Applied spatial data analysis with R</article-title>
.
<publisher-loc>New York</publisher-loc>
:
<publisher-name>Springer</publisher-name>
<year>2013</year>
</mixed-citation>
</ref>
<ref id="B37">
<label>37</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Benedict</surname>
<given-names>K</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Mody</surname>
<given-names>RK</given-names>
</name>
</person-group>
<article-title>Epidemiology of histoplasmosis outbreaks, United States, 1938–2013</article-title>
.
<source>Emerg Infect Dis</source>
.
<year>2016</year>
;
<volume>22</volume>
:
<fpage>37</fpage>
<lpage>0</lpage>
</mixed-citation>
</ref>
<ref id="B38">
<label>38</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Mujib</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Feller</surname>
<given-names>MA</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Ahmed</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Evidence of a “heart failure belt” in the southeastern united states</article-title>
.
<source>Am J Cardiol</source>
.
<year>2011</year>
;
<volume>107</volume>
:
<fpage>935</fpage>
<lpage>937</lpage>
<pub-id pub-id-type="pmid">21247536</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<label>39</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Grossman</surname>
<given-names>RL</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Greenway</surname>
<given-names>M</given-names>
</name>
</person-group>
,
<person-group person-group-type="author">
<name>
<surname>Heath</surname>
<given-names>AP</given-names>
</name>
,
<etal>et al.</etal>
</person-group>
<article-title>The design of a community science cloud: The open science data cloud perspective</article-title>
. In:
<publisher-name>SC Companion, IEEE Computer Society</publisher-name>
,
<year>2012</year>
pp.
<fpage>1051</fpage>
<lpage>1057</lpage>
</mixed-citation>
</ref>
</ref-list>
<ref-list>
<p>
<bold>Cite this article as:</bold>
Patterson MT, Grossman RL (2017) Detecting spatial patterns of disease in large collections of electronic medical records using neighbor-based bootstrapping.
<italic>Big Data</italic>
5:3, 213–224, DOI: 10.1089/big.2017.0028.</p>
</ref-list>
<glossary>
<title>Abbreviations Used</title>
<def-list>
<def-item>
<term id="G1">EMR</term>
<def>
<p>electronic medical records</p>
</def>
</def-item>
<def-item>
<term id="G2">NB2</term>
<def>
<p>neighbor-based bootstrapping</p>
</def>
</def-item>
</def-list>
</glossary>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/SidaSubSaharaV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002287 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 002287 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    SidaSubSaharaV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:5647508
   |texte=   Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:28933946" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a SidaSubSaharaV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Mon Nov 13 19:31:10 2017. Site generation: Wed Mar 6 19:14:32 2024