Label | Explanation | Data Type |
Input Feature Class | The feature class for which the General G statistic will be calculated. | Feature Layer |
Input Field | The numeric field to be evaluated. | Field |
Generate Report (Optional) | Specifies whether a graphical summary of result will be created as an .html file.
| Boolean |
Conceptualization of Spatial Relationships | Specifies how spatial relationships among features are defined.
| String |
Distance Method | Specifies how distances are calculated from each feature to neighboring features.
| String |
Standardization | Specifies whether standardization of spatial weights will be applied. Row standardization is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.
| String |
Distance Band or Threshold Distance (Optional) | Specifies a cutoff distance for the inverse distance and fixed distance options. Features outside the specified cutoff for a target feature are ignored in analyses for that feature. However, for Zone of indifference, the influence of features outside the given distance is reduced with distance, while those inside the distance threshold are equally considered. The distance value entered should match that of the output coordinate system. For the inverse distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance is applied; when this parameter is left blank, a default threshold value is computed and applied. This default value is the Euclidean distance that ensures that every feature has at least one neighbor. This parameter has no effect when polygon contiguity (Contiguity edges only or Contiguity edges corners) or Get spatial weights from file spatial conceptualizations are selected. | Double |
Weights Matrix File (Optional) | The path to a file containing weights that define spatial, and potentially temporal, relationships among features. | File |
Number of Neighbors
(Optional) | An integer specifying the number of neighbors that will be included in the analysis. | Long |
Summary
Measures the degree of clustering for either high or low values using the Getis-Ord General G statistic.
Learn more about how High/Low Clustering: Getis-Ord General G works
Illustration
Usage
The High/Low Clustering tool returns four values: Observed General G, Expected General G, z-score, and p-value. The values are written as messages at the bottom of the Geoprocessing pane during tool execution and passed as derived output values for potential use in models or scripts. You can access the messages by hovering over the progress bar, clicking the pop-out button, or expanding the messages section in the Geoprocessing pane. You can also access the messages for a previously run tool via the geoprocessing history. Optionally, you can use this tool to create an HTML report file with a graphical summary of results. The path to the report will be included with the messages summarizing the tool execution parameters. Click that path to open the report file.
The Input Field should contain a variety of nonnegative values. An error message will appear if the Input Field contains negative values. In addition, the math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. To use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data. The Optimized Hot Spot Analysis tool can also be used to analyze the spatial pattern of incident data.
Note:
Incident data are points that represent events (crime, traffic accidents) or objects (trees, stores) where the focus is on presence or absence rather than a measured attribute associated with each point.
The z-score and p-value are measures of statistical significance. These values can help you determine whether to reject the null hypothesis. For this tool, the null hypothesis states that the values associated with features are randomly distributed.
The z-score is based on the randomization null hypothesis computation. For more information about z-scores, see What is a z-score? What is a p-value?
The higher (or lower) the z-score, the stronger the intensity of the clustering. A z-score near zero indicates no apparent clustering within the study area. A positive z-score indicates clustering of high values. A negative z-score indicates clustering of low values.
When the Input Feature Class parameter value is not projected (that is, when coordinates are in degrees, minutes, and seconds) or when the Output Coordinate System environment is set to a geographic coordinate system, distances will be computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide good estimates of true geodesic distances, at least for points within approximately 30 degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect the two points. Chordal distances are reported in meters.
Caution:
Ensure that you project the data if the study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.
When chordal distances are used in the analysis, the Distance Band or Threshold Distance parameter value, if specified, should be in meters.
For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.
Your choice for the Conceptualization of Spatial Relationships parameter should reflect inherent relationships among the features you are analyzing. The more realistically you can model how features interact with each other in space, the more accurate your results will be. Recommendations are outlined in Selecting a conceptualization of spatial relationships: Best practices. The following are additional tips:
- Fixed distance band
The Distance Band or Threshold Distance parameter will ensure that each feature has at least one neighbor. This is important, but often the calculated default will not be the most appropriate distance to use for your analysis. Additional strategies for selecting an appropriate scale (distance band) for your analysis are outlined in Selecting a fixed distance band value.
- Inverse distance or Inverse distance squared
When zero is entered for the Distance Band or Threshold Distance parameter, all features are considered neighbors of all other features; when this parameter is left blank, the default distance will be applied.
Weights for distances less than 1 become unstable when they are inverted. Consequently, the weighting for features separated by less than 1 unit of distance are given a weight of 1.
For the inverse distance options (Inverse distance, Inverse distance squared, and Zone of indifference), any two points that are coincident will be given a weight of one to avoid zero division. This assures that features are not excluded from analysis.
- Fixed distance band
Additional options for the Conceptualization of Spatial Relationships parameter, including three-dimensional and space-time relationships, are available using the Generate Spatial Weights Matrix tool. To use these additional options, construct a spatial weights matrix file prior to analysis; for the Conceptualization of Spatial Relationships parameter, use the Get spatial weights from file option, and for the Weights Matrix File parameter, specify the path to the spatial weights file you created.
Map layers can be specified as the Input Feature Class parameter value. When using a layer with a selection, only the selected features will be included in the analysis.
If you provide a Weights Matrix File parameter value with a .swm extension, it is expected that a spatial weights matrix file will be created using the Generate Spatial Weights Matrix tool; otherwise, an ASCII-formatted spatial weights matrix file is expected. In some cases, behavior is different depending on the following type of spatial weights matrix file you use:
- ASCII-formatted spatial weights matrix file
- Weights will be used as is. Missing feature-to-feature relationships will be treated as zeros.
- If the weights are row standardized, results may be incorrect for analyses on selection sets. To run an analysis on a selection set, convert the ASCII spatial weights file to a .swm file by reading the ASCII data into a table, and using the Convert table option with the Generate Spatial Weights Matrix tool.
- SWM-formatted spatial weights matrix file
- If the weights are row standardized, they will be restandardized for selection sets; otherwise, weights will be used as is.
- ASCII-formatted spatial weights matrix file
Running an analysis with an ASCII-formatted spatial weights matrix file is memory intensive. For analyses of more than 5,000 features, consider converting the ASCII-formatted spatial weights matrix file to an SWM-formatted file. First, put the ASCII weights into a formatted table (using Excel, for example). Next, run the Generate Spatial Weights Matrix tool using Convert table for the Conceptualization of Spatial Relationships parameter value. The output will be an SWM-formatted spatial weights matrix file.
For additional information about this tool's parameters, see the Modeling spatial relationships help topic.
Caution:
Shapefiles cannot store null values. Tools or other procedures that create shapefiles from other types of inputs may store or interpret null values as zero. In some cases, null values are stored as large negative values in shapefiles, which can lead to unexpected results. See Geoprocessing considerations for shapefile output for more information.
Parameters
arcpy.stats.HighLowClustering(Input_Feature_Class, Input_Field, {Generate_Report}, Conceptualization_of_Spatial_Relationships, Distance_Method, Standardization, {Distance_Band_or_Threshold_Distance}, {Weights_Matrix_File}, {number_of_neighbors})
Name | Explanation | Data Type |
Input_Feature_Class | The feature class for which the General G statistic will be calculated. | Feature Layer |
Input_Field | The numeric field to be evaluated. | Field |
Generate_Report (Optional) | Specifies whether a graphical summary of result will be created as an .html file.
| Boolean |
Conceptualization_of_Spatial_Relationships | Specifies how spatial relationships among features are defined.
| String |
Distance_Method | Specifies how distances are calculated from each feature to neighboring features.
| String |
Standardization | Specifies whether standardization of spatial weights will be applied. Row standardization is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.
| String |
Distance_Band_or_Threshold_Distance (Optional) | Specifies a cutoff distance for the inverse distance and fixed distance options. Features outside the specified cutoff for a target feature are ignored in analyses for that feature. However, for ZONE_OF_INDIFFERENCE, the influence of features outside the given distance is reduced with distance, while those inside the distance threshold are equally considered. The distance value entered should match that of the output coordinate system. For the inverse distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance is applied; when this parameter is left blank, a default threshold value is computed and applied. This default value is the Euclidean distance that ensures that every feature has at least one neighbor. This parameter has no effect when polygon contiguity (CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS) or GET_SPATIAL_WEIGHTS_FROM_FILE spatial conceptualizations are selected. | Double |
Weights_Matrix_File (Optional) | The path to a file containing weights that define spatial, and potentially temporal, relationships among features. | File |
number_of_neighbors (Optional) | An integer specifying the number of neighbors that will be included in the analysis. | Long |
Derived Output
Name | Explanation | Data Type |
Observed_General_G | The observed General G statistic. | Double |
ZScore | The z-score. | Double |
PValue | The p-value. | Double |
Report_File | An HTML file with a graphical summary of results. | File |
Code sample
The following Python window script demonstrates how to use the HighLowClustering function.
import arcpy
arcpy.env.workspace = r"C:\data"
arcpy.stats.HighLowClustering("911Count.shp", "ICOUNT", "false",
"GET_SPATIAL_WEIGHTS_FROM_FILE", "EUCLIDEAN_DISTANCE",
"NONE", "#", "euclidean6Neighs.swm")
The following stand-alone Python script demonstrates how to use the HighLowClustering function.
# Analyze the spatial distribution of 911 calls in a metropolitan area
# using the High/Low Clustering (Getis-Ord General G) tool
# Import system modules
import arcpy
# Set property to overwrite existing outputs
arcpy.env.overwriteOutput = True
# Local variables...
workspace = r"C:\Data"
try:
# Set the current workspace (to avoid having to specify the full path to the feature classes each time)
arcpy.env.workspace = workspace
# Copy the input feature class and integrate the points to snap
# together at 500 feet
# Process: Copy Features and Integrate
cf = arcpy.management.CopyFeatures("911Calls.shp", "911Copied.shp",
"#", 0, 0, 0)
integrate = arcpy.management.Integrate("911Copied.shp #", "500 Feet")
# Use Collect Events to count the number of calls at each location
# Process: Collect Events
ce = arcpy.stats.CollectEvents("911Copied.shp", "911Count.shp", "Count", "#")
# Add a unique ID field to the count feature class
# Process: Add Field and Calculate Field
af = arcpy.management.AddField("911Count.shp", "MyID", "LONG", "#", "#", "#", "#",
"NON_NULLABLE", "NON_REQUIRED", "#",
"911Count.shp")
cf = arcpy.management.CalculateField("911Count.shp", "MyID", "!FID!", "PYTHON")
# Create Spatial Weights Matrix for Calculations
# Process: Generate Spatial Weights Matrix...
swm = arcpy.stats.GenerateSpatialWeightsMatrix("911Count.shp", "MYID",
"euclidean6Neighs.swm",
"K_NEAREST_NEIGHBORS",
"#", "#", "#", 6,
"NO_STANDARDIZATION")
# Cluster Analysis of 911 Calls
# Process: High/Low Clustering (Getis-Ord General G)
hs = arcpy.stats.HighLowClustering("911Count.shp", "ICOUNT",
"false",
"GET_SPATIAL_WEIGHTS_FROM_FILE",
"EUCLIDEAN_DISTANCE", "NONE",
"#", "euclidean6Neighs.swm")
except arcpy.ExecuteError:
# If an error occurred when running the tool, print out the error message.
print(arcpy.GetMessages())
Environments
Special cases
- Output Coordinate System
Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.