Bivariate Spatial Association (Lee's L) (Spatial Statistics)

Summary

Calculates the spatial association between two continuous variables using the Lee's L statistic.

The Lee's L statistic characterizes both the degree of correlation and the degree of copatterning (similarity of spatial clustering) between the variables. The value will be between -1 and 1 and is conceptually similar to a correlation coefficient but is adjusted to account for spatial autocorrelation of the two variables. Lee's L values close to 1 indicate that the variables are highly positively correlated and that each variable has high spatial autocorrelation (high and low values of the variables each tend to cluster together). Values close to -1 indicate that the variables are highly negatively correlated and that each variable has highly positive spatial autocorrelation. Values close to 0 indicate that the variables are uncorrelated, not spatially autocorrelated, or both.

The Lee's L statistic can be partitioned to each input feature, called local Lee's L statistics, that show the local spatial association of the feature and its neighbors. This can be used to determine areas that have higher or lower spatial association than the global Lee's L statistic. The local statistics can also be classified into one of several categories based on the values of the neighbors of each feature. Both the global and local statistics are tested for statistical significance using permutations.

Learn more about how Bivariate Spatial Association (Lee's L) works

Illustration

Bivariate Spatial Association (Lee's L) tool illustration
The two variables on the top row have a positive spatial association, and the two variables on the bottom row have a negative spatial association.

Usage

  • The two analysis variables must be continuous (not binary or categorical), and the variables should have a linear relationship. If the relationship is not linear, use the Transform Field tool to apply transformations to the analysis variables to linearize the relationship and rerun the tool with the transformed values.

  • The tool returns a variety of outputs that allow you to investigate the spatial association between the two analysis variables. The geoprocessing messages display the Lee's L statistic and the p-value, and the output feature class contains fields summarizing the local Lee's L statistics, p-values, and statistical significance results. When run in an active map, the output feature layer will draw based on the local spatial association categories: Not Significant, High-High, Low-Low, High-Low, and Low-High. For example, if the local Lee's L statistic is at least 90 percent statistically significant, the first analysis variable is higher than the mean value, and the second variable is lower than the mean value, the category will be High-Low.

    Learn more about the outputs of the tool

  • The p-values for testing the global and local spatial associations for statistical significance are calculated using permutations.

  • Use at least 50 input features and include at least 8 neighbors for each feature.

  • The neighborhoods of each feature always include the feature. If a spatial weights file is used to define neighbors, a weight of 1 will be defined for the weight of a feature to itself, even if the spatial weights file does not have the weight defined. The weights of each neighborhood are row standardized so that they sum to 1.

  • The Random Number Generator environment can be used to reproduce the permutations and p-values. If no seed value is specified, the global and local p-values may change due to randomness. However, If the Parallel Processing Factor environment is set to a value larger than 1 (the default), the permutations will not be consistent, even with a fixed seed value of the random number generator.

  • Reversing the order of the two analysis variables will not change the global or local Lee's L statistics, but the p-values may change due to randomness of the permutations. The High-Low and Low-High categories will also reverse.

Parameters

LabelExplanationData Type
Input Features

The input features containing the fields of the two analysis variables.

Feature Layer
Analysis Field 1

The field of the first analysis variable. The field must be numeric.

Field
Analysis Field 2

The field of the second analysis variable. The field must be numeric.

Field
Output Features

The output features containing the local Lee's L statistics, spatial association categories, p-values, and the weighted averages of the neighbors of each feature.

Feature Class
Neighborhood Type
(Optional)

Specifies how neighbors of each feature will be determined. The feature is always included in the neighborhood, and all neighborhood weights are normalized to sum to 1.

  • Fixed distance bandFeatures within a specified critical distance of each feature will be included as neighbors. This is the default for point features.
  • K nearest neighborsThe closest k features will be included as neighbors.
  • Contiguity edges only Polygon features that share an edge will be included as neighbors.
  • Contiguity edges corners Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features.
  • Delaunay triangulationPoints whose Delaunay triangulation (Thiessen polygons) share an edge or corner will be included as neighbors.
  • Get spatial weights from fileNeighbors and weights will be defined by a spatial weights file.
String
Distance Band
(Optional)

The distance band that will be used to determine neighbors around the focal feature. If no value is provided, the distance will be the shortest distance such that each feature has at least one other neighbor in its neighborhood. For polygons, the distance between centroids will be used to determine neighbors.

Linear Unit
Number of Neighbors
(Optional)

The number of neighbors around each feature that will be included as neighbors. The value does not include the feature. For example, specifying 6 will use the feature and its six closest neighbors (seven features total). The default is 8. The value must be at least 2.

Long
Weights Matrix File
(Optional)

The path and file name of the spatial weights matrix file that defines the neighbors and weights between features.

File
Local Weighting Scheme
(Optional)

Specifies the weighting scheme that will be applied to neighbors when calculating spatial associations.

  • UnweightedNeighbors will not be weighted. This is the default.
  • BisquareNeighbors will be weighted using a bisquare (quartic) kernel.
String
Kernel Bandwidth
(Optional)

The bandwidth for the bisquare kernel. The bandwidth defines how quickly the weights decrease with distance. Larger bandwidths will provide comparatively larger weights to neighbors that are farther away from the feature. For the k nearest neighbors neighborhood, the default value (empty) will use an adaptive bandwidth equal to the distance to the (k+1)th neighbor of the focal feature. For the fixed distance band neighborhood, the default (empty) will use the same value as the distance band.

Linear Unit
Number of Permutations
(Optional)

Specifies the number of permutations that will be used to create reference distributions when calculating global and local p-values. All p-values are calculated using two-sided hypothesis tests.

  • 99The analysis will use 99 permutations. With 99 permutations, the smallest possible p-value is 0.02, and all other p-values will be multiples of this value.
  • 199The analysis will use 199 permutations. With 199 permutations, the smallest possible p-value is 0.01, and all other p-values will be multiples of this value.
  • 499The analysis will use 499 permutations. With 499 permutations, the smallest possible p-value is 0.004, and all other p-values will be multiples of this value.
  • 999The analysis will use 999 permutations. With 999 permutations, the smallest possible p-value is 0.002, and all other p-values will be multiples of this value. This option is recommended for 90 percent confidence tests. This is the default.
  • 4999The analysis will use 4,999 permutations. With 4,999 permutations, the smallest possible p-value is 0.0004, and all other p-values will be multiples of this value. This option is recommended for 95 percent confidence tests.
  • 9999The analysis will use 9,999 permutations. With 9,999 permutations, the smallest possible p-value is 0.0002, and all other p-values will be multiples of this value. This option is recommended for 99 percent confidence tests.
Long

Derived Output

LabelExplanationData Type
Lee's L

The Lee's L statistic for the analysis variables.

Double
P-value

The p-value for the Lee's L statistic.

Double
Pearson Correlation

The Pearson correlation between the analysis variables.

Double

arcpy.stats.BivariateSpatialAssociation(in_features, analysis_field1, analysis_field2, out_features, {neighborhood_type}, {distance_band}, {num_neighbors}, {weights_matrix_file}, {local_weighting_scheme}, {kernel_bandwidth}, {num_permutations})
NameExplanationData Type
in_features

The input features containing the fields of the two analysis variables.

Feature Layer
analysis_field1

The field of the first analysis variable. The field must be numeric.

Field
analysis_field2

The field of the second analysis variable. The field must be numeric.

Field
out_features

The output features containing the local Lee's L statistics, spatial association categories, p-values, and the weighted averages of the neighbors of each feature.

Feature Class
neighborhood_type
(Optional)

Specifies how neighbors of each feature will be determined. The feature is always included in the neighborhood, and all neighborhood weights are normalized to sum to 1.

  • DISTANCE_BANDFeatures within a specified critical distance of each feature will be included as neighbors. This is the default for point features.
  • K_NEAREST_NEIGHBORSThe closest k features will be included as neighbors.
  • CONTIGUITY_EDGES_ONLY Polygon features that share an edge will be included as neighbors.
  • CONTIGUITY_EDGES_CORNERS Polygon features that share an edge or corner will be included as neighbors. This is the default for polygon features.
  • DELAUNAY_TRIANGULATIONPoints whose Delaunay triangulation (Thiessen polygons) share an edge or corner will be included as neighbors.
  • GET_SPATIAL_WEIGHTS_FROM_FILENeighbors and weights will be defined by a spatial weights file.
String
distance_band
(Optional)

The distance band that will be used to determine neighbors around the focal feature. If no value is provided, the distance will be the shortest distance such that each feature has at least one other neighbor in its neighborhood. For polygons, the distance between centroids will be used to determine neighbors.

Linear Unit
num_neighbors
(Optional)

The number of neighbors around each feature that will be included as neighbors. The value does not include the feature. For example, specifying 6 will use the feature and its six closest neighbors (seven features total). The default is 8. The value must be at least 2.

Long
weights_matrix_file
(Optional)

The path and file name of the spatial weights matrix file that defines the neighbors and weights between features.

File
local_weighting_scheme
(Optional)

Specifies the weighting scheme that will be applied to neighbors when calculating spatial associations.

  • UNWEIGHTEDNeighbors will not be weighted. This is the default.
  • BISQUARENeighbors will be weighted using a bisquare (quartic) kernel.
String
kernel_bandwidth
(Optional)

The bandwidth for the bisquare kernel. The bandwidth defines how quickly the weights decrease with distance. Larger bandwidths will provide comparatively larger weights to neighbors that are farther away from the feature. For the k nearest neighbors neighborhood, the default value (empty) will use an adaptive bandwidth equal to the distance to the (k+1)th neighbor of the focal feature. For the fixed distance band neighborhood, the default (empty) will use the same value as the distance band.

Linear Unit
num_permutations
(Optional)

Specifies the number of permutations that will be used to create reference distributions when calculating global and local p-values. All p-values are calculated using two-sided hypothesis tests.

  • 99The analysis will use 99 permutations. With 99 permutations, the smallest possible p-value is 0.02, and all other p-values will be multiples of this value.
  • 199The analysis will use 199 permutations. With 199 permutations, the smallest possible p-value is 0.01, and all other p-values will be multiples of this value.
  • 499The analysis will use 499 permutations. With 499 permutations, the smallest possible p-value is 0.004, and all other p-values will be multiples of this value.
  • 999The analysis will use 999 permutations. With 999 permutations, the smallest possible p-value is 0.002, and all other p-values will be multiples of this value. This option is recommended for 90 percent confidence tests. This is the default.
  • 4999The analysis will use 4,999 permutations. With 4,999 permutations, the smallest possible p-value is 0.0004, and all other p-values will be multiples of this value. This option is recommended for 95 percent confidence tests.
  • 9999The analysis will use 9,999 permutations. With 9,999 permutations, the smallest possible p-value is 0.0002, and all other p-values will be multiples of this value. This option is recommended for 99 percent confidence tests.
Long

Derived Output

NameExplanationData Type
lee_l

The Lee's L statistic for the analysis variables.

Double
p_value

The p-value for the Lee's L statistic.

Double
corr

The Pearson correlation between the analysis variables.

Double

Code sample

BivariateSpatialAssociation example 1 (Python window)

The following Python window script demonstrates how to use the BivariateSpatialAssociation function.

# Calculate the Lee's L statistic using eight nearest neighbors
# and adaptive bandwidth.
arcpy.env.workspace = r"c:\data\project_data.gdb"
arcpy.stats.BivariateSpatialAssociation(
    in_features=r"myFeatureClass",
    analysis_field1="myAnalysisField1",
    analysis_field2="myAnalysisField2",
    out_features=r"myOutputFeatureClass",
    neighborhood_type="K_NEAREST_NEIGHBORS",
    distance_band=None,
    num_neighbors=8,
    weights_matrix_file=None,
    local_weighting_scheme="BISQUARE",
    kernel_bandwidth=None,
    num_permutations=9999
)
BivariateSpatialAssociation example 2 (stand-alone script)

The following stand-alone script demonstrates how to use the BivariateSpatialAssociation function.

# Calculate the Lee's L statistic for two analysis fields.  

import arcpy 

# Set the current workspace
arcpy.env.workspace = r"c:\data\project_data.gdb" 

# Run tool

arcpy.stats.BivariateSpatialAssociation(
    in_features=r"myFeatureClass",
    analysis_field1="myAnalysisField1",
    analysis_field2="myAnalysisField2",
    out_features=r"myOutputFeatureClass",
    neighborhood_type="CONTIGUITY_EDGES_CORNERS",
    distance_band=None,
    num_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None,
    num_permutations=9999
)

# Print the messages. The messages include the Lee's L statistic, p-value, 
# Pearson correlations, and spatial smoothing scalars.

print(arcpy.GetMessages())

Related topics