Spatial Autoregression (Spatial Statistics)

Summary

Estimates a global spatial regression model for a point or polygon feature class.

The assumptions of traditional linear regression models are often violated when using spatial data. When spatial autocorrelation is present in a dataset, coefficient estimates may be biased and lead to overconfident inference. This tool can be used to estimate a regression model that is robust in the presence of spatial dependence and heteroskedasticity, as well as measure spatial spillovers. The tool uses Lagrange Multiplier (LM), also known as a Rao Score, diagnostic tests to determine the model that is most appropriate. Based on the LM diagnostics, either an ordinary least square (OLS), spatial lag model (SLM), spatial error model (SEM), or spatial autoregressive combined model (SAC) may be estimated.

Learn more about how Spatial Autoregression works

Illustration

Spatial Autoregression tool illustration

Usage

  • The tool accepts only point and polygon inputs.

  • The dependent variable must be continuous (not binary or categorical).

  • Explanatory variables must be continuous (not binary or categorical). Do no use binary variables (containing only the values 0 and 1, as they may violate model assumptions and cause an error.

  • The output of the tool includes a Moran’s Scatter Plot of Residuals that can be used to identify autocorrelation in the model’s residuals.

  • The spatial weights matrix used cannot have more than 30 percent connectivity. An error will occur if this threshold is reached to prevent biased estimates.

  • When using k nearest neighbors with a local weighting scheme, an adaptive bandwidth will be calculated if no bandwidth is provided.

  • A Spatial Durbin model can be estimated by fitting a SLM and including each explanatory variable and their spatial lags. Use the Neighborhood Summary Statistics tool to calculate spatial lags.

  • The models are estimated using the following methods related to heteroskedasticity and normality:

    • SLM uses Spatial Two Stage Least Squares regression (S2SLS).
    • SEM uses Generalized Method of Moments (GMM).
    • SAC uses Generalized S2SLS (GS2SLS).

Parameters

LabelExplanationData Type
Input Features

The input features containing the dependent and explanatory variables.

Feature Layer
Dependent Variable

The numeric field that will be predicted in the regression model.

Field
Explanatory Variables

A list of fields that will be used to predict the dependent variable in the regression model.

Field
Output Features

The output feature class containing the predicted values of the dependent variable and the residuals.

Feature Class
Model Type

The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data.

  • Auto-detectLM diagnostic tests will be used to determine whether an OLS, SLM, SEM, or SAC will be estimated. This is the default.
  • Spatial error model (SEM)A SEM will be estimated regardless of the LM diagnostics.
  • Spatial lag model (SLM)A SLM will be estimated regardless of the LM diagnostics.
  • Spatial autoregressive combined (SAC)A SAC will be estimated regardless of the LM diagnostics.
String
Neighborhood Type
(Optional)

Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature.

  • Fixed distance bandFeatures within a specified distance of each feature will be considered neighbors.
  • K nearest neighborsThe closest k features will be considered neighbors. The number of neighbors is specified using the Number of Neighbors parameter.
  • Contiguity edges onlyPolygon features that share an edge will be included as neighbors.
  • Contiguity edges cornersPolygon features that share an edge or corner will be included as neighbors. This is the default for polygon features.
  • Delaunay triangulationFeatures whose Delaunay triangulation share an edge or corner will be included as neighbors. This is the default for point features.
  • Get spatial weights from fileNeighbors and weights will be defined by a specified spatial weights file. The file is specified using the Weights Matrix File parameter.
String
Distance Band
(Optional)

The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message.

Linear Unit
Number of Neighbors
(Optional)

The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8.

Long
Weights Matrix File
(Optional)

The path and file name of the spatial weights matrix file that defines spatial relationships among features.

File
Local Weighting Scheme
(Optional)

Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided.

  • UnweightedNeighbors will be assigned a weight equal to 1. This is the default.
  • BisquareNeighbors will be weighted using a bisquare (quartic) kernel.
  • GaussianNeighbors will be weighted using a Gaussian (normal distribution) kernel.
String
Kernel Bandwidth
(Optional)

The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth.

Linear Unit

arcpy.stats.SAR(in_features, dependent_variable, explanatory_variables, out_features, model_type, {neighborhood_type}, {distance_band}, {number_of_neighbors}, {weights_matrix_file}, {local_weighting_scheme}, {kernel_bandwidth})
NameExplanationData Type
in_features

The input features containing the dependent and explanatory variables.

Feature Layer
dependent_variable

The numeric field that will be predicted in the regression model.

Field
explanatory_variables
[explanatory_variables,...]

A list of fields that will be used to predict the dependent variable in the regression model.

Field
out_features

The output feature class containing the predicted values of the dependent variable and the residuals.

Feature Class
model_type

The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data.

  • AUTOLM diagnostic tests will be used to determine whether an OLS, SLM, SEM, or SAC will be estimated. This is the default.
  • ERRORA SEM will be estimated regardless of the LM diagnostics.
  • LAGA SLM will be estimated regardless of the LM diagnostics.
  • COMBINEDA SAC will be estimated regardless of the LM diagnostics.
String
neighborhood_type
(Optional)

Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature.

  • DISTANCE_BANDFeatures within a specified distance of each feature will be considered neighbors.
  • K_NEAREST_NEIGHBORSThe closest k features will be considered neighbors. The number of neighbors is specified using the number_of_neighbors parameter.
  • CONTIGUITY_EDGES_ONLYPolygon features that share an edge will be included as neighbors.
  • CONTIGUITY_EDGES_CORNERSPolygon features that share an edge or corner will be included as neighbors. This is the default for polygon features.
  • DELAUNAY_TRIANGULATIONFeatures whose Delaunay triangulation share an edge or corner will be included as neighbors. This is the default for point features.
  • GET_SPATIAL_WEIGHTS_FROM_FILENeighbors and weights will be defined by a specified spatial weights file. The file is specified using the weights_matrix_file parameter.
String
distance_band
(Optional)

The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message.

Linear Unit
number_of_neighbors
(Optional)

The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8.

Long
weights_matrix_file
(Optional)

The path and file name of the spatial weights matrix file that defines spatial relationships among features.

File
local_weighting_scheme
(Optional)

Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided.

  • UNWEIGHTEDNeighbors will be assigned a weight equal to 1. This is the default.
  • BISQUARENeighbors will be weighted using a bisquare (quartic) kernel.
  • GAUSSIANNeighbors will be weighted using a Gaussian (normal distribution) kernel.
String
kernel_bandwidth
(Optional)

The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth.

Linear Unit

Code sample

SAR example 1 (Python window)

The following Python window script demonstrates how to use the SAR function.

# Fit SAR model and auto-detect the regression model.
arcpy.stats.SAR(
    in_features=r"C:\data\data.gdb\house_price",
    dependent_variable="price",
    explanatory_variables=["crime", "income", "school_rate"],
    out_features=r"C:\data\data.gdb\house_price_SAR",
    model_type="AUTO",
    neighborhood_type="DELAUNAY_TRIANGULATION",
    distance_band=None,
    number_of_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None
)
SAR example 2 (stand-alone script)

The following stand-alone script demonstrates how to use the SAR function.

# Fit SAR model using SLM.  

# Import modules
import arcpy

# Set the current workspace
arcpy.env.workspace = r"C:\data\data.gdb"


# Run SAR tool with Spatial Lag model
arcpy.stats.SAR(
    in_features=r"health_factors_CA",
    dependent_variable="Diabetes",
    explanatory_variables=["Drink", "Inactivity"],
    out_features=r"Diabetes_SAR",
    model_type="LAG",
    neighborhood_type="CONTIGUITY_EDGES_CORNERS",
    distance_band=None,
    number_of_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None
)

Related topics