Label | Explanation | Data Type |
Input Features
| The input features containing fields of the explanatory and dependent variables that will be used in a prediction model. | Feature Layer |
Input Fields | The input fields of the explanatory and dependent variables that will be used in a prediction model. | Field |
Output Features
| The output features that will contain fields of the spatial components that can be used as additional explanatory variables in a prediction model. | Feature Class |
Append All Fields From Input Features
(Optional) | Specifies whether all fields will be copied from the input features to the output feature class.
| Boolean |
Input Spatial Weights Matrix File
(Optional) | The input SWM file (.swm). If a value is provided, the file will be used to define neighbors and weights of the input features. If no value is provided, the tool will test 28 different neighborhoods and use the one that creates components that are most effective as explanatory variables. | File |
Output Spatial Weights Matrix File
(Optional) | The output SWM file (.swm) of the neighbors and weights selected by the tool. This parameter does not apply if you provide an input .swm file. | File |
Unique ID Field (Optional) | The unique ID field of the output .swm file. The field must be an integer and must have a unique value for each input feature. | Field |
Summary
Creates a set of spatial component fields that best describe the spatial patterns of one or more numeric fields and serve as useful explanatory variables in a prediction or regression model.
The input fields should be the explanatory and dependent variables that will be used in a prediction model. The resulting spatial component fields (called Moran eigenvectors) can be used as explanatory variables (in addition to the original explanatory variables) that will often improve the predictive power of the model by accounting for spatial patterns of the other variables.
Illustration
Usage
The tool creates spatial components that can most accurately predict the values of the input fields. Each component represents a spatial pattern, and the selected components will be those whose patterns most closely resemble the patterns of the input fields. For example, if a field has a broad west-to-east trend but also contains small clusters of low and high values, the pattern could be represented by combining two components: one representing the west-to-east trend and the other representing the clusters. By including explanatory variables that resemble the spatial patterns of the explanatory and dependent variables, spatial effects are accounted for in prediction and regression tools such as Generalized Linear Regression and Forest-based and Boosted Classification and Regression. By accounting for spatial effects, these nonspatial prediction models will usually predict more accurately, and spatial bias (such as spatial patterns in the residuals) will often be reduced. This is important so that certain areas are not systematically underpredicted or overpredicted by the model. In addition, coefficients of the explanatory variables can be more easily interpreted because they will estimate the direct relationship between the explanatory variable and the dependent variable while filtering out the noise introduced by spatial effects.
This tool is intended to create explanatory variables that can be used in prediction models; however, the Filter Spatial Autocorrelation From Field tool can also be used for this purpose by removing the spatial autocorrelation from the residual or standardized residual field of a prediction model. The spatial components that effectively filter residual autocorrelation are frequently useful explanatory variables and can often provide equivalent model improvement to this tool using fewer components as explanatory variables. It is recommended that you try both tools and compare the results of including the spatial components from each one in the original prediction model (for example, by comparing the adjusted R-squared or AIC values).
The spatial components will be returned as fields in the output feature class, and when the tool is run in an active map, the output feature layer will draw based on the first spatial component. The input fields will also be included in the output feature class so that the original explanatory variables and the spatial component explanatory variables can be used to predict the dependent variable in prediction tools without needing to merge the input and output feature classes.
The geoprocessing messages include the following two tables that summarize the selection of spatial components used to spatially filter the input field:
- Neighborhood Search History—For each of the 28 spatial weight matrices (SWMs) that were tested, details of the SWM (such as the number of neighbors and weighting scheme), the p-value and adjusted R-squared value when using all components, the adjusted R-squared value when using only the selected components, and the number of components that were selected are displayed. The SWM with the highest adjusted R-squared value using the selected components will be used to create the components and will be indicated with bold text and an asterisk.
- Spatial Component Search History—For the selected SWM, the ID value of each component (for example, ID 4 means that it was the fourth spatial component), the Moran's I value and p-value of the component, and the adjusted R-squared value of the component (including all previously selected components) are displayed. The rows are ordered by the components that individually predicted the input fields most effectively (highest R-squared value).
The tool selects a SWM for the input features (unless one is provided in the Input Spatial Weights Matrix File parameter) and selects component explanatory variables using the following procedure:
- For each of 28 candidate SWMs, the SWM is tested for statistical significance by predicting the input fields using all spatial components as explanatory variables. The significance test uses the combined R-squared from all input fields and performs a Šidák correction to the p-value to account for the number of SWMs tested. Any SWM that is not statistically significant will be removed from the candidate list.
- For each of the remaining candidate SWMs, spatial components are sequentially added as explanatory variables until either the next component is not statistically significant alone (the p-value is greater than 0.05) or the adjusted R-squared value of the component (and all previously selected components) exceeds the adjusted R-squared value when using all components of the SWM. Each new component is selected by finding the one with the highest statistical significance (lowest p-value) when used to predict the input fields.
- The SWM file with the largest resulting adjusted R-squared value is selected as the final SWM, and the associated set of selected spatial components are returned as fields in the output feature class.
This procedure is called the FWD (Forward) selection method and is fully described in the following reference:
Blanchet, F. Guillaume, Pierre Legendre, and Daniel Borcard. 2008. "Forward selection of explanatory variables." Ecology 89, no. 9: 2623-2632. https://doi.org/10.1890/07-0986.1.
Parameters
arcpy.stats.CreateSpatialComponentExplanatoryVariables(in_features, input_fields, out_features, {append_all_fields}, {in_swm}, {out_swm}, {id_field})
Name | Explanation | Data Type |
in_features | The input features containing fields of the explanatory and dependent variables that will be used in a prediction model. | Feature Layer |
input_fields [input_fields,...] | The input fields of the explanatory and dependent variables that will be used in a prediction model. | Field |
out_features | The output features that will contain fields of the spatial components that can be used as additional explanatory variables in a prediction model. | Feature Class |
append_all_fields (Optional) | Specifies whether all fields will be copied from the input features to the output feature class.
| Boolean |
in_swm (Optional) | The input SWM file (.swm). If a value is provided, the file will be used to define neighbors and weights of the input features. If no value is provided, the tool will test 28 different neighborhoods and use the one that creates components that are most effective as explanatory variables. | File |
out_swm (Optional) | The output SWM file (.swm) of the neighbors and weights selected by the tool. This parameter does not apply if you provide an input .swm file. | File |
id_field (Optional) | The unique ID field of the output .swm file. The field must be an integer and must have a unique value for each input feature. | Field |
Code sample
The following Python window script demonstrates how to use the CreateSpatialComponentExplanatoryVariables function.
# Create fields that describe the spatial patterns of POPULATION.
arcpy.env.workspace = r"c:\data\project_data.gdb"
arcpy.stats.CreateSpatialComponentExplanatoryVariables(
in_features="states",
input_fields="POPULATION",
out_features=r"myOutputFeatureClass",
append_all_fields="ALL",
in_swm=None,
out_swm=None,
id_field=None
)
The following stand-alone script demonstrates how to use the CreateSpatialComponentExplanatoryVariables function.
# Create fields that describe the spatial patterns of two analysis fields.
import arcpy
# Set the current workspace.
arcpy.env.workspace = r"c:\data\project_data.gdb"
# Run the tool.
arcpy.stats.CreateSpatialComponentExplanatoryVariables(
in_features="myFeatureClass",
input_fields="myAnalysisField1;myAnalysis Field2",
out_features=r"myOutputFeatureClass",
append_all_fields="ALL",
in_swm=None,
out_swm=None,
id_field=None
)
# Print the messages.
print(arcpy.GetMessages())