Interactive online version:

Assessing local patterns of spatial heteroskedasticity

As outlined by Ord and Getis, consider a 10 x 10 grid of property values. Within this grid, there is a central high-rent district (identified by cells that are 1) and a surrounding lower-value area (identified by cells that are 0. We can visualize this spatial arrangement as:

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

0

0

0

0

0

0

1

1

1

1

0

0

0

0

0

0

1

1

1

1

0

0

0

0

0

0

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

While we can see that the inner core of the high-rent district have similar values to their neigbhors (i.e. a cells of 1 surrounded by other cells of 1), this is less true when we consider the outer rim of the high-rent district. This rim represents a possible transition region between the stable inner high-rent area and the stable outer lower value areas. Ord and Getis seek to use the LOSH statistic (\(H_i\)) to identify this transitional region between the two areas.

Understanding the LOSH statistic

Ord and Getis begin by defining a local mean, \(\bar{x}_i (d)\), as a re-scaled form of their \(G^*_i\) statistic. This local mean is:

Eq. 1

\[\bar{x}_i (d) = \frac{\sum_j w_{ij} (d) x_j}{\sum_j w_{ij}(d)}\]

With a local mean defined, we can understand local residuals as the value of a local unit minus the local mean, or:

Eq. 2

\[e_j(d) = x_j - \bar{x}_i(d), j \in N(i,d)\]

These local residuals may be incorporated into Eq. 1 to form a final local dispersion statistic called \(H_i\):

Eq. 3

\[H_i(d) = \frac{\sum w_{ij}(d) |e_j(d)|^a}{\sum w_{ij}(d)}\]

Note the addition of a new variable, \(a\). This handles the interpretation of the residuals. As explaiend by Ord and Getis:

When a = 1, we have an absolute deviations measure, Hi 1, and when a = 2 a variance measure, Hi2. Clearly, other choices are possible, along with various robust forms to avoid outliers. In order to produce a standard measure, we should divide by the mean absolute deviation or variance for the whole data set.

The default settings of the losh() function set \(a=2\).

Interpreting the LOSH statistic

Ord and Getis suggest that the \(H_i\) statistic may benefit from interpretation with the \(G^*_i\) statistic. The logic of this combined interpretation is that a the \(G^*_i\) value speaks to the local mean of the phenomenon of interest, whereas the \(H_i\) speaks to the local heterogeneity. Ord and Getis provide the following table as a simplified guide for interpretation:

Mean:nbsphinx-math:variance

High \(H_i\)

Low \(H_i\)

Large \(|G^*_i|\)

A hot spot with heterogeneous local conditions

A hot spot with similar surrounding areas; the map would indicate whether the affected region is larger than the single ‘cell’

Small \(|G^*_i|\)

Heterogeneous local conditions but at a low average level (an unlikely event)

Homogeneous local conditions and a low average level

Inference on the LOSH statistic

The current inference in the PySAL implementation of the LOSH statistic uses a \(\chi^2\) distribution. For each unit, we calculate a Z-score as:

\[Z_i = 2H_i / V_i\]

with \(2/V_i\) degrees of freedom. \(V_i\) is the local variance of the unit, calculated as:

\[Var_{p}(H_i) = \frac{1}{n-1} \left( \frac{1}{h_1 W_{i1}}\right) ^2 (h_2 - h_1^2) \left[nW_{i2} - W_{i1}^2\right]\]

Applying the LOSH statistic on a dataset

We first load the Bostonhsg example dataset from libpysal:

[1]:
import libpysal
import geopandas
boston = libpysal.examples.load_example('Bostonhsg')
boston_ds = geopandas.read_file(boston.get_path('boston.shp'))
/home/serge/anaconda3/envs/dev/lib/python3.8/site-packages/geopandas/_compat.py:84: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.
  warnings.warn(

Then we construct a Queen weight structure:

[2]:
w = libpysal.weights.Queen.from_dataframe(boston_ds)

We can now import the losh() function. Note that it is in the form of a scikit-learn type estimator, so we pass both a series of arguments and then call .fit().

[3]:
from esda.losh import LOSH
ls = LOSH(connectivity=w, inference="chi-square").fit(boston_ds['NOX'])

We can now examine the LOSH (\(H_i\)) values and their significance.

[4]:
ls.Hi[0:10]
[4]:
array([0.19690679, 0.51765774, 0.80382881, 0.80854441, 0.530667  ,
       0.525579  , 0.83291425, 0.84215733, 0.48875154, 0.41955327])
[5]:
ls.pval[0:10]
[5]:
array([0.86292242, 0.61157688, 0.45697742, 0.34426167, 0.57934554,
       0.55430556, 0.4135546 , 0.40999792, 0.54025022, 0.57801529])

If we want to map the LOSH (\(H_i\)) values, we need to add them back to the boston_ds dataframe.

[6]:
boston_ds['Hi'] = ls.Hi
boston_ds['Hi_pval'] = ls.pval

We can now map the LOSH (\(H_i\)) values. Note that we choose a Quantile break scheme here for visualization purposes.

[7]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
boston_ds.plot(column='Hi', scheme='Quantiles',
               k=5, cmap='GnBu',
               legend=True, ax=ax)
[7]:
<AxesSubplot:>
../_images/notebooks_LOSH_20_1.png

We can also examine significance of the values. We use cutoffs of 0.01, 0.05, 0.10, and above 0.10.

[8]:
fig, ax = plt.subplots(figsize=(12,10), subplot_kw={'aspect':'equal'})
boston_ds.plot(column='Hi_pval', cmap='RdBu',
               legend=True, ax=ax,
               scheme='user_defined',
               classification_kwds={'bins':[0.01, 0.05, 0.10, 1]})
[8]:
<AxesSubplot:>
../_images/notebooks_LOSH_22_1.png