Incremental Spatial Autocorrelation (Spatial Statistics)

Summary

Measures spatial autocorrelation for a series of distances and optionally creates a line graph of those distances and their corresponding z-scores. Z-scores reflect the intensity of spatial clustering, and statistically significant peak z-scores indicate distances where spatial processes promoting clustering are most pronounced. These peak distances are often appropriate values to use for tools with a Distance Band or Distance Radius parameter.

Illustration

Incremental Spatial Autocorrelation tool illustration
Z-score peaks reflect distances where the spatial processes promoting clustering are most pronounced.

Usage

  • Use this tool to specify an appropriate Distance Threshold or Radius parameter value for tools that have these parameters, such as Hot Spot Analysis or Point Density.

  • The Incremental Spatial Autocorrelation tool measures spatial autocorrelation for a series of distance increments and reports, for each distance increment, the associated Moran's Index, Expected Index, Variance, z-score and p-value. The values are written as messages when the tool runs. The messages also include a Spatial Autocorrelation by Distance line chart that displays the z-score for each distance.

  • When more than one statistically significant peak is present, clustering is pronounced at each of those distances. Select the peak distance that best corresponds to the scale of analysis you are interested in; often this is the first statistically significant peak encountered.

  • The Input Field parameter value should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. To use this tool to analyze the spatial pattern of incident data, consider aggregating the incident data.

  • When the Input Feature Class parameter value is not projected (that is, when coordinates are given in degrees, minutes, and seconds) or when the output coordinate system is set to a geographic coordinate system, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide good estimates of true geodesic distances, at least for points within about 30 degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect those two points. Chordal distances are reported in meters.

    Caution:

    Ensure that you project the data if the study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.

  • When chordal distances are used in the analysis, the Beginning Distance and Distance Increment parameter values, if provided, should be in meters.

  • For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.

  • Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.

  • For polygon features, you should almost always specify Row for the Row Standardization parameter. Row standardization mitigates bias when the number of neighbors each feature has is a function of the aggregation scheme or sampling process, rather than reflecting the actual spatial distribution of the variable you are analyzing.

  • If no Beginning Distance parameter value is specified, the default value is the minimum distance required for each feature in the dataset has at least one neighbor (the maximum nearest neighbor distance among all features). This may not be the most appropriate beginning distance if the dataset includes locational outliers.

  • If no Increment Distance parameter value is specified, the smaller of either the average nearest neighbor distance or (Td - B) / I is used, where Td is a maximum threshold distance, B is the Beginning Distance parameter value, and I is the Number of Distance Bands parameter value. This algorithm ensures calculations will always be performed for the specified Number of Distance Bands value and that the largest distance bands won't be so large that some features have all or almost all other features as neighbors.

  • If the Beginning Distance or the Increment Distance parameter values will result in a distance band that is larger than the maximum threshold distance, the Increment Distance value will automatically be scaled down. To avoid this adjustment, you can decrease the Increment Distance value or decrease the Number of Distance Bands value.

  • It is possible to run out of memory when you run this tool. This generally occurs when you specify a Beginning Distance or Increment Distance parameter value resulting in features having thousands of neighbors. It's best not to create spatial relationships where the features have thousands of neighbors. Use a smaller value for the Increment Distance value and temporarily remove locational outliers so that you start with a smaller Beginning Distance value.

  • Even when the tool calculates the Beginning Distance and Increment Distance parameter values, processing time can be long for large datasets. You can improve performance by doing the following:

    • Temporarily remove locational outliers (as stated above).
    • Run the analysis on select features in a representative portion of the study area rather than on all features.
    • Take a random sample of features from the dataset and run the analysis on the sampled features.

  • Distances are always based on the Output Coordinate System environment setting. The default option for the Output Coordinate System environment is Same as Input. Input features are projected to the output coordinate system before the analysis is run.

  • The optional Output Table parameter value will contain the distance value at each iteration, the Moran's I Index value, the expected Moran's I index value, the variance, the z-score, and the p-value. A peak would be an increase in the z-score value followed by a decrease in the z-score value. For example, if the tool finds z-scores for 50, 100, and 150 meter distances, 2.95, 3.68, 3.12, the peak would be 100 meters. The output table also includes a Spatial Autocorrelation by Distance line chart that displays the z-score for each distance that you can use to identify the peaks.

  • When using this tool from Python, the result object returned from running the tool has the following outputs:

    PositionDescriptionData Type

    0

    First Peak

    Double

    1

    Max Peak

    Double

Parameters

LabelExplanationData Type
Input Features

The feature class for which spatial autocorrelation will be measured over a series of distances.

Feature Layer
Input Field

The numeric field that will be used in assessing spatial autocorrelation.

Field
Number of Distance Bands

The number of times the neighborhood size will be incremented and the dataset will be analyzed for spatial autocorrelation. The starting point and size of the increment are specified by the Beginning Distance and Distance Increment parameters, respectively.

Long
Beginning Distance
(Optional)

The distance at which the analysis of spatial autocorrelation and the distance from which to increment will start. The value provided for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance Increment
(Optional)

The distance that will be increased after each iteration. The distance used in the analysis starts at the Beginning Distance parameter value and increases by the amount specified in the Distance Increment parameter value. The value provided for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance Method
(Optional)

Specifies how distances will be calculated from each feature to neighboring features.

  • EuclideanThe distances will be calculated using the straight-line distance between two points (as the crow flies). This is the default.
  • ManhattanThe distances will be calculated using the distance between two points measured along axes at right angles (city block), which is calculated by summing the (absolute) difference between the x- and y-coordinates.
String
Row Standardization
(Optional)

Specifies whether spatial weights will be standardized. Row standardization is recommended whenever feature distribution is potentially biased due to sampling design or an imposed aggregation scheme.

  • Checked—Spatial weights will be standardized. Each weight will divided by its row sum (the sum of the weights of all neighboring features). This is the default.
  • Unchecked—Spatial weights will not be standardized.
Boolean
Output Table
(Optional)

The table that will be created with each distance band and associated z-score result.

Table
Output Report File
(Optional)

The .pdf file that will be created containing a line graph summarizing results.

File

Derived Output

LabelExplanationData Type
First Peak

The first peak z-score.

Double
Maximum Peak

The maximum peak z-score.

Double

arcpy.stats.IncrementalSpatialAutocorrelation(Input_Features, Input_Field, Number_of_Distance_Bands, {Beginning_Distance}, {Distance_Increment}, {Distance_Method}, {Row_Standardization}, {Output_Table}, {Output_Report_File})
NameExplanationData Type
Input_Features

The feature class for which spatial autocorrelation will be measured over a series of distances.

Feature Layer
Input_Field

The numeric field that will be used in assessing spatial autocorrelation.

Field
Number_of_Distance_Bands

The number of times the neighborhood size will be incremented and the dataset will be analyzed for spatial autocorrelation. The starting point and size of the increment are specified by the Beginning_Distance and Distance_Increment parameters, respectively.

Long
Beginning_Distance
(Optional)

The distance at which the analysis of spatial autocorrelation and the distance from which to increment will start. The value provided for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance_Increment
(Optional)

The distance that will be increased after each iteration. The distance used in the analysis starts at the Beginning_Distance parameter value and increases by the amount specified in the Distance_Increment parameter value. The value provided for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance_Method
(Optional)

Specifies how distances will be calculated from each feature to neighboring features.

  • EUCLIDEANThe distances will be calculated using the straight-line distance between two points (as the crow flies). This is the default.
  • MANHATTANThe distances will be calculated using the distance between two points measured along axes at right angles (city block), which is calculated by summing the (absolute) difference between the x- and y-coordinates.
String
Row_Standardization
(Optional)

Specifies whether spatial weights will be standardized. Row standardization is recommended whenever feature distribution is potentially biased due to sampling design or an imposed aggregation scheme.

  • ROW_STANDARDIZATIONSpatial weights will be standardized. Each weight will divided by its row sum (the sum of the weights of all neighboring features). This is the default.
  • NO_STANDARDIZATIONSpatial weights will not be standardized.
Boolean
Output_Table
(Optional)

The table that will be created with each distance band and associated z-score result.

Table
Output_Report_File
(Optional)

The .pdf file that will be created containing a line graph summarizing results.

File

Derived Output

NameExplanationData Type
First_Peak

The first peak z-score.

Double
Max_Peak

The maximum peak z-score.

Double

Code sample

IncrementalSpatialAutocorrelation example 1 (Python window)

The following Python window script demonstrates how to use the IncrementalSpatialAutocorrelation function.

import arcpy
arcpy.env.workspace = r"C:\ISA"
arcpy.stats.IncrementalSpatialAutocorrelation("911CallsCount.shp", "ICOUNT", 
                                              "20", "", "", "EUCLIDEAN",
                                              "ROW_STANDARDIZATION", 
                                              "outTable.dbf")
IncrementalSpatialAutocorrelation example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the IncrementalSpatialAutocorrelation function.

# Hot Spot Analysis of 911 calls in a metropolitan area
# using the Incremental Spatial Autocorrelation and Hot Spot Analysis Tools

# Import system modules
import arcpy

# Set property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables
workspace = r"C:\ISA"

try:
    # Set the current workspace (to avoid having to specify the full path to 
    # the feature classes each time)
    arcpy.env.workspace = workspace

    # Copy the input feature class and integrate the points to snap together at 
    # 30 feet
    # Process: Copy Features and Integrate
    cf = arcpy.management.CopyFeatures("911Calls.shp", "911Copied.shp")
    integrate = arcpy.management.Integrate("911Copied.shp #", "30 Feet")

    # Use Collect Events to count the number of calls at each location
    # Process: Collect Events
    ce = arcpy.stats.CollectEvents("911Copied.shp", "911Count.shp")

    # Use Incremental Spatial Autocorrelation to get the peak distance
    # Process: Incremental Spatial Autocorrelation
    isa = arcpy.stats.IncrementalSpatialAutocorrelation(ce, "ICOUNT", "20", "", 
                     "", "EUCLIDEAN", "ROW_STANDARDIZATION", "outTable.dbf", 
                     "outReport.pdf")

    # Hot Spot Analysis of 911 Calls
    # Process: Hot Spot Analysis (Getis-Ord Gi*)
    distance = isa.getOutput(2)
    hs = arcpy.stats.HotSpots(ce, "ICOUNT", "911HotSpots.shp", "Fixed Distance Band",
                     "Euclidean Distance", "None",  distance, "", "")

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print out the error message.
    print(arcpy.GetMessages())

Environments

Special cases

Output Coordinate System

Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.

Related topics