Available in big data analytics.
The Find Hot Spots tool identifies statistically significant hot spots and cold spots in the spatial pattern of the data using the Getis-Ord Gi* statistic.
Workflow diagram
Examples
The following are example uses of the Find Hot Spots tool:
- A police department is conducting an analysis to determine if there is a relationship between violent crimes and unemployment rates. An expanded summer job program will be implemented for high schools in areas where there is high violent crime and high unemployment. The Find Hot Spots tool can be used to find areas with statistically significant crime and unemployment hot spots.
- A conservation officer is studying disease in trees to prioritize which areas of the forest should receive treatment and to learn more about areas that are showing some resistance. The Find Hot Spots tool can be used to find clusters of diseased (hot spots) and healthy (cold spots) trees.
Usage notes
Keep the following in mind when working with the Find Hot Spots tool:
- Input features must be a point layer. Points will be aggregated into a square grid (bins) of a specified size before analysis
- The output layer will have additional fields containing information such as the statistical significance of each feature, the p-value, and the z-score.
- During analysis, the input points are aggregated into bins of a specified size. They are then analyzed to determine hot spots. The aggregated bins must contain a variety of values (counts of points in a bin should be highly variable).
- The z-scores and p-values are measures of statistical significance that indicate whether the observed spatial clustering of high or low values is more pronounced than one would expect in a random distribution of those same values. You can then determine whether to accept or reject the null hypothesis using aggregated bins. The z-score and p-value fields do not reflect any kind of False Discovery Rate (FDR) correction.
- A high z-score and small p-value for a feature indicate an intense presence of point incidents. A low negative z-score and small p-value indicate an absence of point incidents. The higher (or lower) the z-score, the more intense the clustering. A z-score near zero indicates no apparent spatial clustering.
- The z-score is based on the randomization null hypothesis computation. For more information on z-scores, see What is a z-score? What is a p-value?.
- The Find Hot Spots tool allows you to analyze using time steps. Each time step is analyzed independently of features outside of the time step. To use time stepping, the input data must be time enabled and represent an instant in time. When time stepping is applied, output features will be time intervals represented by the StartTime and EndTime fields.
- The Time Step Reference parameter can be a date and time value or solely a date value; it cannot be solely a time value.
Parameters
The following are the parameters for the Find Hot Spots tool:
Parameter | Description | Data type |
---|---|---|
Input Layer | The point features for which hot spots will be calculated. | Features |
Bin Type | The bin shape that will be used to create the regular bins. The default is Square. | String |
Bin Size | The distance interval that represents the bin size into which the input points will be analyzed. | String |
Neighborhood Size (optional) | The spatial extent of the analysis neighborhood. This value determines which features are analyzed together to assess local clustering. | String |
Time Step Interval (optional) | The interval for the time step. This parameter is only used if the input points' schema has a field tagged with the Start Time key field. | String |
Time Step Alignment (optional) | Specifies how time steps will be aligned. This parameter is only available if the input points are time enabled and represent an instant in time.
| String |
Time Step Reference (optional) | The reference time for time steps and time intervals to align with. This parameter only appears if Reference Time is used for the Time Step Alignment parameter. | Date |
Output layer
The output layer will contain the following fields in place of the original fields:
Field name | Description | Field type |
---|---|---|
value | The number of features in that bin. | Float64 |
GiZScore | The z-score of features in that bin. | Float64 |
GiPValue | The p-value of features in that bin. | Float64 |
Gi_Bin | The confidence level used to identify statistically significant hot and cold spots. Features with a Gi_Bin value of +/-3 reflect statistical significance with a 99 percent confidence level; features with a Gi_Bin value of +/-2 reflect a 95 percent confidence level; features with a Gi_Bin value of +/-1 reflect a 90 percent confidence level; and the clustering for features with a Gi_Bin value of 0 are not statistically significant. | Float64 |