Label | Explanation | Data Type |
Input Features
| The input features containing the dependent and explanatory variables. | Feature Layer |
Dependent Variable
| The numeric field that will be predicted in the regression model. | Field |
Explanatory Variables | A list of fields that will be used to predict the dependent variable in the regression model. | Field |
Output Features
| The output feature class containing the predicted values of the dependent variable and the residuals. | Feature Class |
Model Type
| The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data.
| String |
Neighborhood Type (Optional) | Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature.
| String |
Distance Band
(Optional) | The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message. | Linear Unit |
Number of Neighbors
(Optional) | The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8. | Long |
Weights Matrix File
(Optional) | The path and file name of the spatial weights matrix file that defines spatial relationships among features. | File |
Local Weighting Scheme
(Optional) | Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided.
| String |
Kernel Bandwidth
(Optional) | The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth. | Linear Unit |
Summary
Estimates a global spatial regression model for a point or polygon feature class.
The assumptions of traditional linear regression models are often violated when using spatial data. When spatial autocorrelation is present in a dataset, coefficient estimates may be biased and lead to overconfident inference. This tool can be used to estimate a regression model that is robust in the presence of spatial dependence and heteroskedasticity, as well as measure spatial spillovers. The tool uses Lagrange Multiplier (LM), also known as a Rao Score, diagnostic tests to determine the model that is most appropriate. Based on the LM diagnostics, either an ordinary least square (OLS), spatial lag model (SLM), spatial error model (SEM), or spatial autoregressive combined model (SAC) may be estimated.
Illustration

Usage
The tool accepts only point and polygon inputs.
The dependent variable must be continuous (not binary or categorical).
Explanatory variables must be continuous (not binary or categorical). Do no use binary variables (containing only the values 0 and 1, as they may violate model assumptions and cause an error.
The output of the tool includes a Moran’s Scatter Plot of Residuals that can be used to identify autocorrelation in the model’s residuals.
The spatial weights matrix used cannot have more than 30 percent connectivity. An error will occur if this threshold is reached to prevent biased estimates.
When using k nearest neighbors with a local weighting scheme, an adaptive bandwidth will be calculated if no bandwidth is provided.
A Spatial Durbin model can be estimated by fitting a SLM and including each explanatory variable and their spatial lags. Use the Neighborhood Summary Statistics tool to calculate spatial lags.
The models are estimated using the following methods related to heteroskedasticity and normality:
- SLM uses Spatial Two Stage Least Squares regression (S2SLS).
- SEM uses Generalized Method of Moments (GMM).
- SAC uses Generalized S2SLS (GS2SLS).
Parameters
arcpy.stats.SAR(in_features, dependent_variable, explanatory_variables, out_features, model_type, {neighborhood_type}, {distance_band}, {number_of_neighbors}, {weights_matrix_file}, {local_weighting_scheme}, {kernel_bandwidth})
Name | Explanation | Data Type |
in_features | The input features containing the dependent and explanatory variables. | Feature Layer |
dependent_variable | The numeric field that will be predicted in the regression model. | Field |
explanatory_variables [explanatory_variables,...] | A list of fields that will be used to predict the dependent variable in the regression model. | Field |
out_features | The output feature class containing the predicted values of the dependent variable and the residuals. | Feature Class |
model_type | The model type that will be used for the estimation. By default, LM diagnostic tests will be used to determine the model that is the most appropriate for the input data.
| String |
neighborhood_type (Optional) | Specifies how neighbors will be chosen for each input feature. To identify local spatial patterns, neighboring features must be identified for each input feature.
| String |
distance_band (Optional) | The distance within which features will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message. | Linear Unit |
number_of_neighbors (Optional) | The number of neighbors that will be included as neighbors. The number does not include the focal feature. The default is 8. | Long |
weights_matrix_file (Optional) | The path and file name of the spatial weights matrix file that defines spatial relationships among features. | File |
local_weighting_scheme (Optional) | Specifies the weighting scheme that will be applied to neighbors. Weights will always be row-standardized unless a spatial weights matrix file is provided.
| String |
kernel_bandwidth (Optional) | The bandwidth of the weighting kernel. If no value is provided, an adaptive kernel will be used. An adaptive kernel uses the maximum distance from a neighbor to a focal feature as the bandwidth. | Linear Unit |
Code sample
The following Python window script demonstrates how to use the SAR function.
# Fit SAR model and auto-detect the regression model.
arcpy.stats.SAR(
in_features=r"C:\data\data.gdb\house_price",
dependent_variable="price",
explanatory_variables=["crime", "income", "school_rate"],
out_features=r"C:\data\data.gdb\house_price_SAR",
model_type="AUTO",
neighborhood_type="DELAUNAY_TRIANGULATION",
distance_band=None,
number_of_neighbors=None,
weights_matrix_file=None,
local_weighting_scheme="UNWEIGHTED",
kernel_bandwidth=None
)
The following stand-alone script demonstrates how to use the SAR function.
# Fit SAR model using SLM.
# Import modules
import arcpy
# Set the current workspace
arcpy.env.workspace = r"C:\data\data.gdb"
# Run SAR tool with Spatial Lag model
arcpy.stats.SAR(
in_features=r"health_factors_CA",
dependent_variable="Diabetes",
explanatory_variables=["Drink", "Inactivity"],
out_features=r"Diabetes_SAR",
model_type="LAG",
neighborhood_type="CONTIGUITY_EDGES_CORNERS",
distance_band=None,
number_of_neighbors=None,
weights_matrix_file=None,
local_weighting_scheme="UNWEIGHTED",
kernel_bandwidth=None
)