Hot Spot Analysis (Getis-Ord Gi*) (Spatial Statistics)

Summary

Given a set of weighted features, identifies statistically significant hot spots and cold spots using the Getis-Ord Gi* statistic.

Learn more about how Hot Spot Analysis (Getis-Ord Gi*) works

Illustration

Hot Spot Analysis tool illustration

Usage

  • This tool identifies statistically significant spatial clusters of high values (hot spots) and low values (cold spots). It creates an output feature class with a z-score, p-value, and confidence level bin field (Gi_Bin) for each feature in the input features.

  • The z-scores and p-values are measures of statistical significance you can use to determine whether to reject the null hypothesis, feature by feature. In effect, they indicate whether the observed spatial clustering of high or low values is more pronounced than would be expected in a random distribution of those same values. The z-score and p-value fields do not reflect any type of false discovery rate (FDR) correction.

  • The Gi_Bin field identifies statistically significant hot and cold spots regardless of whether the FDR correction is applied. Features in the +/-3 bins reflect statistical significance with a 99 percent confidence level; features in the +/-2 bins reflect a 95 percent confidence level; features in the +/-1 bins reflect a 90 percent confidence level; and the clustering for features in bin 0 is not statistically significant. Without FDR correction, statistical significance is based on the p-value and z-score fields. When you check the Apply False Discovery Rate (FDR) Correction parameter, the critical p-values determining confidence levels are reduced to account for multiple testing and spatial dependence.

  • A high z-score and small p-value for a feature indicate a spatial clustering of high values. A low negative z-score and small p-value indicate a spatial clustering of low values. The higher (or lower) the z-score, the more intense the clustering. A z-score near zero indicates no apparent spatial clustering.

  • The z-score is based on the randomization null hypothesis computation. For more information on z-scores, see What is a z-score? What is a p-value?

  • When the input features are not projected (that is, when coordinates are in degrees, minutes, and seconds) or when the output coordinate system is set to a geographic coordinate system, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide very good estimates of true geodesic distances, at least for points within about 30 degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect those two points. Chordal distances are reported in meters.

    Caution:

    Be sure to project the data if the study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.

  • When chordal distances are used in the analysis, the Distance Band or Threshold Distance parameter value, if specified, should be in meters.

  • For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.

  • The input field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. To use this tool to analyze the spatial pattern of incident data, consider aggregating the incident data or using the Optimized Hot Spot Analysis tool.

    Note:

    Incident data are points representing events (crime, traffic accidents) or objects (trees, stores) where your focus is on presence or absence rather than some measured attribute associated with each point.

  • The Optimized Hot Spot Analysis tool evaluates the data to automatically select parameter settings that will optimize the hot spot results. It will aggregate incident data, select an appropriate scale of analysis, and adjust results for multiple testing and spatial dependence. The parameter options it selects are written as messages, which may help you refine your parameter choices when you use this tool. This tool allows you full control and flexibility over the parameter settings.

  • the option you specify for the Conceptualization of Spatial Relationships parameter should reflect inherent relationships among the features you are analyzing. The more realistically you can model how features interact with each other in space, the more accurate the results will be. Recommendations are outlined in Best practices for selecting a conceptualization of spatial relationships. The following are additional tips:

    • The Fixed distance band option is the default. The Distance Band or Threshold Distance parameter value will ensure that each feature has at least one neighbor. This is important, but often this default will not be the most appropriate distance to use for an analysis. Additional strategies for selecting an appropriate scale (distance band) for the analysis are outlined in Best practices for selecting a fixed distance band value.

    • For the Inverse distance and Inverse distance squared options, when 0 is provided as the Distance Band or Threshold Distance parameter value, all features are considered neighbors of all other features; when this parameter is left empty, the default distance will be applied.

      Weights for distances less than 1 become unstable when they are inverted. Consequently, the weighting for features separated by less than 1 unit of distance are assigned a weight of 1.

      For the inverse distance options (Inverse distance, Inverse distance squared, and Zone of indifference), any two points that are coincident will be assigned a weight of 1 to avoid zero division. This assures that features are not excluded from analysis.

  • Additional options for the Conceptualization of Spatial Relationships parameter, including space-time relationships, are available using the Generate Spatial Weights Matrix tool. To take advantage of these additional options, construct a spatial weights matrix file before the analysis; choose Get spatial weights from file for the Conceptualization of Spatial Relationships parameter; and for the Weights Matrix File parameter, provide the path to the spatial weights file you created.

  • More information about space-time cluster analysis is provided in the Space-Time Analysis documentation.

  • Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.

  • If you provide a weights matrix file with an .swm extension, a spatial weights matrix file created using either the Generate Spatial Weights Matrix or Generate Network Spatial Weights tool is expected; otherwise, an ASCII-formatted spatial weights matrix file is expected. In some cases, tool behavior is different depending on the type of spatial weights matrix file you use:

    • ASCII-formatted spatial weights matrix file
      • Weights are used as is. Missing feature-to-feature relationships are treated as zeros.
      • The default weight for self potential is 0, unless you specify a Self Potential Field parameter value or include self potential weights explicitly.
      • Asymmetric relationships are honored, allowing a feature to have a neighboring feature that doesn't have a neighbor. This means the neighboring feature is included in the local mean calculations for the original feature, but the neighboring feature is not included in the calculations for the global mean.
      • If the weights are row standardized, results may be incorrect for analyses on selection sets. To run an analysis on a selection set, convert the ASCII spatial weights file to an .swm file by reading the ASCII data into a table and using the Convert table option for the Conceptualization of Spatial Relationships parameter with the Generate Spatial Weights Matrix tool.
    • SWM-formatted spatial weights matrix file
      • If the weights are row standardized, they will be restandardized for selection sets; otherwise, weights are used as is.
      • The default weight for self potential is 1, unless you specify a Self Potential Field parameter value.

  • Running an analysis with an ASCII-formatted spatial weights matrix file is memory intensive. For analyses on more than 5,000 features, consider converting the ASCII-formatted spatial weights matrix file to an SWM-formatted file. First put the ASCII weights into a formatted table (using Excel, for example). Next, run the Generate Spatial Weights Matrix tool using the Convert table option for the Conceptualization of Spatial Relationships parameter. The output will be an SWM-formatted spatial weights matrix file.

  • The output feature class from this tool is automatically added to the table of contents with default rendering applied to the Gi_Bin field. The hot-to-cold rendering is defined by a layer file in <ArcGIS Pro>\Resources\ArcToolBox\Templates\Layers. You can reapply the default rendering, if needed, by re-applying the layer symbology.

  • The output of this tool includes a histogram charting the value of the input field, which can be accessed under the output feature class in the Contents pane.

  • The Modeling Spatial Relationships help topic provides additional information about this tool's parameters.

  • Caution:

    When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may store or interpret null values as zero. In some cases, nulls are stored as very large negative values in shapefiles. This can lead to unexpected results. See Geoprocessing considerations for shapefile output for more information.

    Legacy:

    Row standardization has no impact on this tool: Results from this tool would be identical with or without row standardization. The Standardization parameter is disabled; it remains only to support backward compatibility.

  • When using this tool in Python, the result object returned from the tool has the following outputs:

    Index positionDescriptionData type

    0

    Output feature class

    Feature Class

    1

    Results field name (GiZScore)

    Field

    2

    Probability field name (GiPValue)

    Field

    3

    Source ID field name (SOURCE_ID)

    Field

Parameters

LabelExplanationData Type
Input Feature Class

The feature class for which hot spot analysis will be performed.

Feature Layer
Input Field

The numeric field (for example, number of victims, crime rate, test scores, and so on) to be evaluated.

Field
Output Feature Class

The output feature class that will receive the z-score and p-value results.

Feature Class
Conceptualization of Spatial Relationships

Specifies how spatial relationships among features will be defined.

  • Inverse distanceNearby neighboring features will have a larger influence on the computations for a target feature than features that are far away.
  • Inverse distance squaredThis is the same as Inverse distance except that the slope is sharper, so influence will drop off more quickly, and only a target feature's closest neighbors will exert substantial influence on computations for that feature.
  • Fixed distance bandEach feature will be analyzed within the context of neighboring features. Neighboring features inside the specified critical distance (Distance Band or Threshold Distance) will receive a weight of 1 and exert influence on computations for the target feature. Neighboring features outside the critical distance will receive a weight of 0 and have no influence on a target feature's computations.
  • Zone of indifferenceFeatures within the specified critical distance (Distance Band or Threshold Distance) of a target feature will receive a weight of 1 and influence computations for that feature. Once the critical distance is exceeded, weights (and the influence a neighboring feature has on target feature computations) will diminish with distance.
  • K nearest neighborsThe closest k features will be included in the analysis; k is a specified numeric parameter.
  • Contiguity edges onlyOnly neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • Contiguity edges cornersPolygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • Get spatial weights from fileSpatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights Matrix File parameter.
String
Distance Method

Specifies how distances will be calculated from each feature to neighboring features.

  • EuclideanThe straight-line distance between two points (as the crow flies) will be used.
  • ManhattanThe distance between two points measured along axes at right angles (city block) calculated by summing the (absolute) difference between the x- and y-coordinates will be used.
String
Standardization

Row standardization has no impact on this tool: Results from this tool would be identical with or without row standardization. This parameter is disabled; it remains only to support backward compatibility.

  • NoneNo standardization of spatial weights is applied.
  • RowNo standardization of spatial weights is applied.
String
Distance Band or Threshold Distance
(Optional)

The cutoff distance for the inverse distance and fixed distance options. Features outside the specified cutoff for a target feature will be ignored in analyses for that feature. However, for Zone of indifference, the influence of features outside the given distance will be reduced with distance, while those inside the distance threshold will be equally considered. The distance value provided should match that of the output coordinate system.

For the inverse distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance will be applied; when this parameter is left blank, a default threshold value will be computed and applied. The default value is the Euclidean distance, which ensures that every feature has at least one neighbor.

This parameter has no effect when polygon contiguity (Contiguity edges only or Contiguity edges corners) or the Get spatial weights from file spatial conceptualization option is specified.

Double
Self Potential Field
(Optional)

The field representing self potential, which is the distance or weight between a feature and itself.

Field
Weights Matrix File
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
Apply False Discovery Rate (FDR) Correction
(Optional)

Specifies whether statistical significance will be assessed based on the FDR correction.

  • Checked—Statistical significance will be based on the FDR correction.
  • Unchecked—Statistical significance will not be based on the FDR correction; it will be based on the p-value and z-score fields. This is the default.
Boolean
Number of Neighbors
(Optional)

An integer specifying the number of neighbors that will be included in the analysis.

Long

Derived Output

LabelExplanationData Type
Results Field

The results field name (GiZScore).

Field
Probability Field

The probability field name (GiPValue).

Field
Source_ID

The source ID field name (SOURCE_ID).

Field

arcpy.stats.HotSpots(Input_Feature_Class, Input_Field, Output_Feature_Class, Conceptualization_of_Spatial_Relationships, Distance_Method, Standardization, {Distance_Band_or_Threshold_Distance}, {Self_Potential_Field}, {Weights_Matrix_File}, {Apply_False_Discovery_Rate__FDR__Correction}, {number_of_neighbors})
NameExplanationData Type
Input_Feature_Class

The feature class for which hot spot analysis will be performed.

Feature Layer
Input_Field

The numeric field (for example, number of victims, crime rate, test scores, and so on) to be evaluated.

Field
Output_Feature_Class

The output feature class that will receive the z-score and p-value results.

Feature Class
Conceptualization_of_Spatial_Relationships

Specifies how spatial relationships among features will be defined.

  • INVERSE_DISTANCENearby neighboring features will have a larger influence on the computations for a target feature than features that are far away.
  • INVERSE_DISTANCE_SQUAREDThis is the same as INVERSE_DISTANCE except that the slope is sharper, so influence will drop off more quickly, and only a target feature's closest neighbors will exert substantial influence on computations for that feature.
  • FIXED_DISTANCE_BANDEach feature will be analyzed within the context of neighboring features. Neighboring features inside the specified critical distance (Distance_Band_or_Threshold) will receive a weight of 1 and exert influence on computations for the target feature. Neighboring features outside the critical distance will receive a weight of 0 and have no influence on a target feature's computations.
  • ZONE_OF_INDIFFERENCEFeatures within the specified critical distance (Distance_Band_or_Threshold) of a target feature will receive a weight of 1 and influence computations for that feature. Once the critical distance is exceeded, weights (and the influence a neighboring feature has on target feature computations) will diminish with distance.
  • K_NEAREST_NEIGHBORSThe closest k features will be included in the analysis; k is a specified numeric parameter.
  • CONTIGUITY_EDGES_ONLYOnly neighboring polygon features that share a boundary or overlap will influence computations for the target polygon feature.
  • CONTIGUITY_EDGES_CORNERSPolygon features that share a boundary, share a node, or overlap will influence computations for the target polygon feature.
  • GET_SPATIAL_WEIGHTS_FROM_FILESpatial relationships will be defined by a specified spatial weights file. The path to the spatial weights file is specified by the Weights_Matrix_File parameter.
String
Distance_Method

Specifies how distances will be calculated from each feature to neighboring features.

  • EUCLIDEAN_DISTANCEThe straight-line distance between two points (as the crow flies) will be used.
  • MANHATTAN_DISTANCEThe distance between two points measured along axes at right angles (city block) calculated by summing the (absolute) difference between the x- and y-coordinates will be used.
String
Standardization

Row standardization has no impact on this tool: Results from this tool would be identical with or without row standardization. This parameter is disabled; it remains only to support backward compatibility.

  • NONENo standardization of spatial weights is applied.
  • ROWNo standardization of spatial weights is applied.
String
Distance_Band_or_Threshold_Distance
(Optional)

The cutoff distance for the inverse distance and fixed distance options. Features outside the specified cutoff for a target feature will be ignored in analyses for that feature. However, for ZONE_OF_INDIFFERENCE, the influence of features outside the given distance will be reduced with distance, while those inside the distance threshold will be equally considered. The distance value provided should match that of the output coordinate system.

For the inverse distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance will be applied; when this parameter is left blank, a default threshold value will be computed and applied. The default value is the Euclidean distance, which ensures that every feature has at least one neighbor.

This parameter has no effect when polygon contiguity (CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS) or the GET_SPATIAL_WEIGHTS_FROM_FILE spatial conceptualization option is specified.

Double
Self_Potential_Field
(Optional)

The field representing self potential, which is the distance or weight between a feature and itself.

Field
Weights_Matrix_File
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
Apply_False_Discovery_Rate__FDR__Correction
(Optional)

Specifies whether statistical significance will be assessed based on the FDR correction.

  • APPLY_FDRStatistical significance will be based on the FDR correction.
  • NO_FDRStatistical significance will not be based on the FDR correction; it will be based on the p-value and z-score fields. This is the default.
Boolean
number_of_neighbors
(Optional)

An integer specifying the number of neighbors that will be included in the analysis.

Long

Derived Output

NameExplanationData Type
Results_Field

The results field name (GiZScore).

Field
Probability_Field

The probability field name (GiPValue).

Field
Source_ID

The source ID field name (SOURCE_ID).

Field

Code sample

HotSpots example 1 (Python window)

The following Python window script demonstrates how to use the HotSpots function.

import arcpy
arcpy.env.workspace = "C:/data"
arcpy.stats.HotSpots("911Count.shp", "ICOUNT", "911HotSpots.shp",
                     "GET_SPATIAL_WEIGHTS_FROM_FILE", "EUCLIDEAN_DISTANCE", 
                     "NONE", "#", "#", "euclidean6Neighs.swm", "NO_FDR")
HotSpots example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the HotSpots function.


# Analyze the spatial distribution of 911 calls in a metropolitan area
# using the Hot Spot Analysis Tool (Local Gi*)

# Import system modules
import arcpy

# Set property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables...
workspace = "C:/Data"

try:
    # Set the current workspace (to avoid having to specify the full path to 
    # the feature classes each time)
    arcpy.env.workspace = workspace

    # Copy the input feature class and integrate the points to snap
    # together at 500 feet
    # Process: Copy Features and Integrate
    cf = arcpy.management.CopyFeatures("911Calls.shp", "911Copied.shp")

    integrate = arcpy.management.Integrate("911Copied.shp #", "500 Feet")

    # Use Collect Events to count the number of calls at each location
    # Process: Collect Events
    ce = arcpy.stats.CollectEvents("911Copied.shp", "911Count.shp", "Count", "#")

    # Add a unique ID field to the count feature class
    # Process: Add Field and Calculate Field
    af = arcpy.management.AddField("911Count.shp", "MyID", "LONG", "#", "#", "#", "#",
                     "NON_NULLABLE", "NON_REQUIRED", "#",
                     "911Count.shp")
    
    cf = arcpy.management.CalculateField("911Count.shp", "MyID", "[FID]", "VB")

    # Create Spatial Weights Matrix for Calculations
    # Process: Generate Spatial Weights Matrix... 
    swm = arcpy.stats.GenerateSpatialWeightsMatrix("911Count.shp", "MYID",
                        "euclidean6Neighs.swm",
                        "K_NEAREST_NEIGHBORS",
                        "#", "#", "#", 6,
                        "NO_STANDARDIZATION") 

    # Hot Spot Analysis of 911 Calls
    # Process: Hot Spot Analysis (Getis-Ord Gi*)
    hs = arcpy.stats.HotSpots("911Count.shp", "ICOUNT", "911HotSpots.shp", 
                     "GET_SPATIAL_WEIGHTS_FROM_FILE",
                     "EUCLIDEAN_DISTANCE", "NONE",
                     "#", "#", "euclidean6Neighs.swm", "NO_FDR")

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())

Environments

Special cases

Output Coordinate System

Feature geometry is projected to the output coordinate system prior to analysis, so values entered for the Distance Band or Threshold Distance parameter should match those specified in the output coordinate system. All mathematical computations are based on the spatial reference of the output coordinate system. When the output coordinate system is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances in meters.

Related topics