Hot Spot Analysis Comparison (Spatial Statistics)

Summary

Compares two hot spot analysis result layers and measures their similarity and association.

The similarity and association between the hot spot result layers is determined by comparing the significance level categories between corresponding features in both input layers. The similarity measures how closely the hot spots, cold spots, and nonsignificant areas of both hot spot results spatially align. The association (or dependence) measures the strength of the underlying statistical relationship between the hot spot variables (similar to correlation for continuous variables).

Learn more about how Hot Spot Analysis Comparison works

Illustration

Hot Spot Analysis Comparison tool illustration
Two hot spot analysis result layers are compared. Deeper shades of orange indicate the larger differences between the layers.

Usage

  • All comparisons are performed by comparing the significance level categories (99% hot, 95% hot, 90% hot, not significant, 90% cold, 95% cold, and 99% cold) between corresponding features and their neighbors in both input layers. The similarity measures how closely the hot spots, cold spots, and nonsignificant areas of both hot spot results spatially align. The association (or dependence) measures the strength of the underlying statistical relationship between the hot spot variables (similar to correlation for continuous variables). The distinction between similarity and association is important because it is common for two hot spot results to be highly similar (many corresponding features and their neighbors have the same significance level) but still have little association or dependence. This means that despite the similarity of the significance levels, attempts to influence one variable (such as mitigation efforts) will not produce changes in the other variable. Highly similar but unassociated results often occur when both hot spot results are dominated by a single category, such as not significant, or when both results have large clusters of features with the same significance level.

    The similarity between the hot spot results is measured by a similarity value between 0 and 1. If many corresponding features in both results have the same significance level, the value will be close to 1. If many corresponding features do not have matching significance levels, the value will be close to 0. The association is measured by a kappa value: strongly associated results will have kappa values close to 1, and unassociated (independent) results will have kappa values close to 0 (or slightly negative). The kappa value is a rescaled version of the similarity value that accounts for spatial clustering and category frequencies in order to isolate the statistical association between the hot spot results. Both values use fuzzy set membership to allow partial matches between corresponding features based on significance level similarity and spatial neighborhoods. For example, 99% hot spots can be considered perfect matches to other 99% hot spots, partial matches to 95% hot spots, and complete mismatches to 99% cold spots. Two corresponding features can also be considered partial matches if the features themselves do not have the same significance level but their neighboring features do.

    The tool calculates a global similarity and global kappa value to measure the overall similarity and association between the hot spot results, and local versions are also calculated for each pair of corresponding features. This allows you to map the comparisons to explore areas that have higher or lower similarity or association than the global values. The output features also include charts and custom symbology that highlight areas where the hot spot results are most dissimilar and summarize the significance level pairs of all corresponding features.

  • The Input Hot Spot Result 1 and Input Hot Spot Result 2 parameter values must be the output features of the Hot Spot Analysis (Getis-Ord Gi*) or Optimized Hot Spot Analysis tools. Every feature in each result must be paired with a single feature of the other result so that their significance level categories can be compared. If the features of the two input hot spot results do not spatially align (such as polygons that do not have the same borders), the two feature layers will be intersected before the analysis, and the comparisons will be made on the feature intersections. Use caution when the two hot spot results are polygons of different sizes because the intersection will subdivide large polygons into many smaller polygons and change the frequencies of the significance level categories. At least 20 feature intersections are required to use the tool.

  • The results of the comparisons are returned through geoprocessing messages, a group layer of the output feature class, and charts.

    The messages display information about overall comparisons between the hot spot results. The messages display the following information:

    • Similarity Value—A value between 0 and 1 measuring the overall similarity between the hot spot result layers. The value can be interpreted as a fuzzy probability that any pair of corresponding features have the same significance level category.
    • Expected Similarity Value—The expected value of the similarity under the assumption that the two hot spot result layers are unassociated (independent). If the similarity value is larger than its expected value, this suggests an underlying dependence between the two maps. The value is mostly informational and is used to scale the similarity value when calculating the kappa value. The value is calculated by pairing each feature with random features in the other hot spot result and calculating the similarity. By pairing each feature with random features (rather than its corresponding feature), the expected value is spatially adjusted to account for spatial clustering and category frequencies in both hot spot results. The Number of Permutations parameter specifies the number of random pairings of each feature, and the expected similarity value is the average of the similarity values of the permutations.
    • Spatial Fuzzy Kappa—A measure of the association between the hot spot analysis variables that is calculated by scaling the similarity value by its expected value. Hot spot results that are perfectly associated will have the value 1, and unassociated (independent) results will have a value close to 0. Negative values indicate a negative relationship between the hot spot analysis variables. While the value has no lower bound, the values are rarely less than -3 in practice.
    • Summaries of the weights between each hot spot significance level pair.
    • Message tables displaying counts and percentages of each hot spot significance level pair. In the tables, the counts and percentages of the significance levels of the second hot spot result layer are broken down by the categories of the first result layer. For example, among 90% significant hot spots in the first result layer, you can see the count and percent that were also 90% significant hot spots in the second result layer, along with the counts and percentages for all other significance level categories. This is especially useful when the two hot spot results represent the same variable measured at different times. In this case, the table allows you to see how the categories transitioned in the time between the measurements.

    The output features contain fields of the similarity value, expected similarity value, kappa value, and significance level categories of each pair of corresponding features. When the tool is run in a map, three layers will be added to a group layer that allow you to explore and investigate the similarity, association, and significance level pairs spatially. The first layer displays the similarity values classified into five equal intervals between 0 and 1, and lower similarity values are in darker colors to emphasize the areas that are most dissimilar. The second layer displays the spatial fuzzy kappa values symbolized with equal intervals and six classes. The third layer displays each significance level combination with custom symbology to identify features where one input hot spot result was a statistically significant hot spot and the other was a statistically significant cold spot (in the custom symbology, 90%, 95%, and 99% significance is not distinguished in order to reduce the number of combinations).

    The final layer also has a heat chart and customized bar chart to further investigate the significance level pairs. These charts display the same information as the tables in the messages, but the charts are colored by the counts and percentages for ease of interpretation. You can also use selections between the charts and map to, for example, select all features that were 99% hot spots in one result and 99% cold spots in the other result, indicating the largest possible differences.

    Learn more about tool outputs

  • The Similarity Weighting Method parameter defines the similarity between each combination of significance level categories using fuzzy set membership. Each weight is a value between 0 and 1 that indicates how similarly the categories will be treated when performing comparisons. For example, you can define a weight of 0.75 between the 99% hot and 95% hot categories to indicate that they are not exactly the same, but they are more similar than they are different.

    The default Fuzzy weights option weights categories by the closeness of the significance level (determined by critical value ratios). Other options allow you to combine categories by assigning a weight value of 1 between them. For example, the Combine 95% and 99% significant option combines 99% hot and 95% hot into a single category, combines 99% cold and 95% cold, and combines 90% hot, not significant, and 90% cold. This option treats all hot (or cold) spots at or above 95% significance as being the same (statistically significant) and all features below 95% significance as being the same (not statistically significant). This is useful when you intended to perform the two hot spot analyses at a 95% significance level, and you want to treat all 90% significant hot and cold spots as if they are not significant. The Reverse hot and cold relationships option assigns large similarity weights between hot and cold spots. For example, 99% hot spots are considered perfectly similar to 99% cold spots and completely dissimilar to other 99% hot spots. This option is useful for measuring the similarity and association between variables that have a negative relationship, such as comparing hot spots of infant mortality to cold spots of median income.

    The Custom weights option allows you to define custom similarity weights to merge categories and define your preferences. You can provide the custom weights in the Custom Similarity Weights parameter. The parameter displays as a pop-out matrix with the 49 (7 by 7) significance level combinations. To specify a weight between a category pair, type the value into the associated cell and press Enter. You can export the custom weights to a table from the pop-out dialog box so that they can be reused later with the Get weights from table option.

    Note:

    Similarity weights only affect the calculation of the similarity and kappa values. Even if significance level categories are combined using similarity weights, the message tables, output layer symbology, and charts will treat them as separate categories.

    Learn more about categorical similarity

  • When large proportions of each hot spot result are not significant, the similarity value will be large due to the matching of nonsignificant areas. However, if the nonsignificant features are not of research interest, you may not want the similarity and kappa values to only reflect the abundance of nonsignificant areas in both results. You can use the Exclude Nonsignificant Features parameter to exclude any pair of corresponding features from the comparisons if both hot spot results are not statistically significant. If excluded, the tool calculates conditional similarity and kappa values that compare only the statistically significant hot and cold spots. By excluding the nonsignificant features from the calculations, you can calculate the similarity and kappa values only among the statistically significant hot and cold spots to accurately reflect their similarity and association.

    Note:

    If any significance level categories are combined with the nonsignificant category by providing a relative similarity weight of 1, those categories will also be excluded from the comparisons.

  • If either of the input hot spot result layers contains overlapping polygons, the overlaps will be intersected into new features. This can cause similarity values to not equal 1 even for result layers with identical significance level categories. Use the XY Tolerance environment to remove unintended overlaps, such as geocoding errors. It is recommended that you review the number of features in the output features to determine if there are more intersections than expected.

  • The Number of Neighbors parameter specifies the number of additional neighboring features that will be used for distance similarity. As with the similarity weighting method, distance similarity allows partial matches when the features themselves do not have the same significance level but other features in their neighborhood do have matching significance levels. Because hot spot analysis is a spatial method that uses local neighborhoods, the significance level of each feature is a characterization of the values of the feature and its closest neighbors, not just the feature. In this sense, if any neighboring feature is similar, it should contribute somewhat to the similarity of its neighbors.

    Partial similarity through neighbors is incorporated using a distance weight based on the ordering of the neighbors. The feature receives a distance weight of 1, and the weights decrease consistently for each additional neighbor. The overall similarity between any two features is their categorical similarity (from the similarity weighting method) multiplied by their distance similarity.

    Learn more about distance similarity and neighbor weighting

  • Changing the order of input hot spot results will not affect the similarity values, but the expected similarity and kappa values will change slightly due to randomness in permutations. The axes of the message tables and charts will also reverse, which will make it easier to interpret in some cases. Because the messages and charts display the significance level categories of the second hot spot result broken down by categories of the first result, you can instead display the categories of the first result broken down by categories of the second result by reversing the order of the input layers.

Parameters

LabelExplanationData Type
Input Hot Spot Result 1

The first hot spot analysis result layer.

Feature Layer
Input Hot Spot Result 2

The second hot spot analysis result layer.

Feature Layer
Output Features

The output feature class that will contain the local measures of similarity and association.

Feature Class
Number of Neighbors
(Optional)

The number of neighbors around each feature that will be used for distance weighting. Distance weighting is one component of the overall similarity, and any features with matching significance levels within the neighborhood will be considered partial matches when calculating similarity and association.

Long
Number of Permutations
(Optional)

The number of permutations that will be used to estimate the expected similarity and kappa values. A larger number of simulations will increase the precision of the estimates but will also increase calculation time.

  • 99The analysis will use 99 permutations.
  • 199The analysis will use 199 permutations.
  • 499The analysis will use 499 permutations. This is the default.
  • 999The analysis will use 999 permutations.
  • 9999The analysis will use 9,999 permutations.
Long
Similarity Weighting Method
(Optional)

Specifies how similarity weights between significance level categories will be defined. Similarity weights are numbers between 0 and 1 that define the categories of one result that are expected to match the categories of the other result. A value of 1 indicates that the categories will be considered exactly the same, and a value of 0 indicates that the categories will be considered completely different. Values between 0 and 1 indicate degrees of partial similarity between the categories. For example, 99% significant hot spots can be considered perfectly similar to other 99% hot spots, partially similar to 95% hot spots, and completely dissimilar to 99% cold spots.

  • Fuzzy weightsSimilarity weights will be fuzzy (nonbinary) and determined by the closeness of significance levels. For example, 99% significant hot spots will be perfectly similar to other 99% significant hot spots (weight = 1), but they will be partially similar to 95% significant hot spots (weight=0.71) and 90% significant hot spots (weight = 0.55). The weight between 95% significant and 90% significant is 0.78. All hot spots will be completely dissimilar to all cold spots and nonsignificant features (weight = 0). This is the default.
  • Exact significance level matchingFeatures must have the same significance level to be considered similar. For example, 99% significant hot spots will be considered completely dissimilar to 95% and 90% significant hot spots.
  • Combine 90%, 95%, and 99% significantFeatures that are 90%, 95%, and 99% significant hot spots will be considered perfectly similar to each other, and all features that are 90%, 95%, and 99% significant cold spots will be considered perfectly similar to each other. This option treats all features at or above 90% significance as being the same (statistically significant) and all features below 90% confidence as being the same (nonsignificant). This option is recommended when the hot spot analyses were performed at a 90% significance level.
  • Combine 95% and 99% significantFeatures that are 95% and 99% significant hot (or cold) spots will be considered perfectly similar, and features that are 95% and 99% significant cold spots will be considered perfectly similar. For example, 90% significant hot and cold spots will be considered completely dissimilar to higher significance levels. This option treats all features at or above 95% confidence as being the same (statistically significant) and all features below 95% confidence as being the same (nonsignificant). This option is recommended when the hot spot analyses were performed at a 95% significance level.
  • Use only 99% significantOnly features that are 99% significant hot (or cold) spots will be considered perfectly similar to each other. This option treats all features below 99% significance as being nonsignificant. This option is recommended when the hot spot analyses were performed at a 99% significance level.
  • Custom weightsCustom similarity weights provided in the Category Similarity Weights parameter will be used.
  • Get weights from tableSimilarity weights between significance levels will be defined by an input table. Provide the table in the Input Weight Tables parameter.
  • Reverse hot and cold relationshipsThe default fuzzy weights will be used, but hot spots of the first hot spot result will be considered similar to the cold spots of the second hot spot result. For example, 99% significant hot spots in one result will be considered perfectly similar to 99% cold spots in the other result and partially similar to 95% and 90% cold spots in the other result. This option is recommended when the hot spot analysis variables have a negative relationship. For example, you can measure how closely hot spots of infant mortality correspond to cold spots of healthcare access.
String
Category Similarity Weights
(Optional)

The custom similarity weights between significance level categories. The weights are values between 0 and 1 and indicate how similar to consider the two categories. A value of 0 indicates the categories are completely dissimilar, a value of 1 indicates the values are perfectly similar, and values between 0 and 1 indicate the categories are partially similar. In the weight matrix pop-out, click a cell, type the weight value, and press Enter to apply the weight.

Value Table
Input Weights Table
(Optional)

The table containing custom similarity weights for each combination of hot spot significance level categories. The table must contain CATEGORY1, CATEGORY2, and WEIGHT fields. Provide the significance level categories of the pair (the Gi_Bin field values of the input layers) in the category fields and the similarity weight between them in the weight field. If a combination is not provided in the table, the weight for the combination is assumed to be 0.

Table View
Exclude Nonsignificant Features
(Optional)

Specifies whether pairs of features will be excluded from the comparisons if both hot spot results are nonsignificant. If excluded, conditional similarity and kappa values will be calculated that compare only the statistically significant hot and cold spots. Excluding features is recommended when you are interested only in whether the hot and cold spots of the input layers align, not whether the nonsignificant areas align, such as comparing whether hot and cold spots of median income correspond to hot and cold spots of food access.

  • Checked—Nonsignificant features will be excluded, and the comparisons will be conditional on statistically significant hot and cold spots.
  • Unchecked—Nonsignificant features will be included. This is the default.

If any significance level categories are assigned a similarity weight of 1 to the nonsignificant category (indicating that the category will be treated the same as the nonsignificant category), features with that category will also be excluded from comparisons if they are paired with another nonsignificant feature.

Boolean

Derived Output

LabelExplanationData Type
Global Similarity Value

The similarity value between the hot spot results.

Double
Global Expected Similarity Value

The expected value of the similarity between the hot spot results.

Double
Global Spatial Fuzzy Kappa

The spatially-adjusted fuzzy kappa value between the hot spot results.

Double
Output Layer Group

A group layer of the output layers.

Group Layer

arcpy.stats.HotSpotAnalysisComparison(in_hot_spot_1, in_hot_spot_2, out_features, {num_neighbors}, {num_perms}, {weighting_method}, {similarity_weights}, {in_weights_table}, {exclude_nonsig_features})
NameExplanationData Type
in_hot_spot_1

The first hot spot analysis result layer.

Feature Layer
in_hot_spot_2

The second hot spot analysis result layer.

Feature Layer
out_features

The output feature class that will contain the local measures of similarity and association.

Feature Class
num_neighbors
(Optional)

The number of neighbors around each feature that will be used for distance weighting. Distance weighting is one component of the overall similarity, and any features with matching significance levels within the neighborhood will be considered partial matches when calculating similarity and association.

Long
num_perms
(Optional)

The number of permutations that will be used to estimate the expected similarity and kappa values. A larger number of simulations will increase the precision of the estimates but will also increase calculation time.

  • 99The analysis will use 99 permutations.
  • 199The analysis will use 199 permutations.
  • 499The analysis will use 499 permutations. This is the default.
  • 999The analysis will use 999 permutations.
  • 9999The analysis will use 9,999 permutations.
Long
weighting_method
(Optional)

Specifies how similarity weights between significance level categories will be defined. Similarity weights are numbers between 0 and 1 that define the categories of one result that are expected to match the categories of the other result. A value of 1 indicates that the categories will be considered exactly the same, and a value of 0 indicates that the categories will be considered completely different. Values between 0 and 1 indicate degrees of partial similarity between the categories. For example, 99% significant hot spots can be considered perfectly similar to other 99% hot spots, partially similar to 95% hot spots, and completely dissimilar to 99% cold spots.

  • FUZZYSimilarity weights will be fuzzy (nonbinary) and determined by the closeness of significance levels. For example, 99% significant hot spots will be perfectly similar to other 99% significant hot spots (weight = 1), but they will be partially similar to 95% significant hot spots (weight=0.71) and 90% significant hot spots (weight = 0.55). The weight between 95% significant and 90% significant is 0.78. All hot spots will be completely dissimilar to all cold spots and nonsignificant features (weight = 0). This is the default.
  • EXACT_MATCHFeatures must have the same significance level to be considered similar. For example, 99% significant hot spots will be considered completely dissimilar to 95% and 90% significant hot spots.
  • ABOVE_90Features that are 90%, 95%, and 99% significant hot spots will be considered perfectly similar to each other, and all features that are 90%, 95%, and 99% significant cold spots will be considered perfectly similar to each other. This option treats all features at or above 90% significance as being the same (statistically significant) and all features below 90% confidence as being the same (nonsignificant). This option is recommended when the hot spot analyses were performed at a 90% significance level.
  • ABOVE_95Features that are 95% and 99% significant hot (or cold) spots will be considered perfectly similar, and features that are 95% and 99% significant cold spots will be considered perfectly similar. For example, 90% significant hot and cold spots will be considered completely dissimilar to higher significance levels. This option treats all features at or above 95% confidence as being the same (statistically significant) and all features below 95% confidence as being the same (nonsignificant). This option is recommended when the hot spot analyses were performed at a 95% significance level.
  • ABOVE_99Only features that are 99% significant hot (or cold) spots will be considered perfectly similar to each other. This option treats all features below 99% significance as being nonsignificant. This option is recommended when the hot spot analyses were performed at a 99% significance level.
  • CUSTOMCustom similarity weights provided in the similarity_weights parameter will be used.
  • TABLESimilarity weights between significance levels will be defined by an input table. Provide the table in the in_weights_table parameter.
  • REVERSEThe default fuzzy weights will be used, but hot spots of the first hot spot result will be considered similar to the cold spots of the second hot spot result. For example, 99% significant hot spots in one result will be considered perfectly similar to 99% cold spots in the other result and partially similar to 95% and 90% cold spots in the other result. This option is recommended when the hot spot analysis variables have a negative relationship. For example, you can measure how closely hot spots of infant mortality correspond to cold spots of healthcare access.
String
similarity_weights
[similarity_weights,...]
(Optional)

The custom similarity weights between significance level categories. The weights are values between 0 and 1 and indicate how similar to consider the two categories. A value of 0 indicates the categories are completely dissimilar, a value of 1 indicates the values are perfectly similar, and values between 0 and 1 indicate the categories are partially similar.

Value Table
in_weights_table
(Optional)

The table containing custom similarity weights for each combination of hot spot significance level categories. The table must contain CATEGORY1, CATEGORY2, and WEIGHT fields. Provide the significance level categories of the pair (the Gi_Bin field values of the input layers) in the category fields and the similarity weight between them in the weight field. If a combination is not provided in the table, the weight for the combination is assumed to be 0.

Table View
exclude_nonsig_features
(Optional)

Specifies whether pairs of features will be excluded from the comparisons if both hot spot results are nonsignificant. If excluded, conditional similarity and kappa values will be calculated that compare only the statistically significant hot and cold spots. Excluding features is recommended when you are interested only in whether the hot and cold spots of the input layers align, not whether the nonsignificant areas align, such as comparing whether hot and cold spots of median income correspond to hot and cold spots of food access.

  • EXCLUDENonsignificant features will be excluded, and the comparisons will be conditional on statistically significant hot and cold spots.
  • NO_EXCLUDENonsignificant features will be included. This is the default.

If any significance level categories are assigned a similarity weight of 1 to the nonsignificant category (indicating that the category will be treated the same as the nonsignificant category), features with that category will also be excluded from comparisons if they are paired with another nonsignificant feature.

Boolean

Derived Output

NameExplanationData Type
SIM_VALUE

The similarity value between the hot spot results.

Double
EXP_SIM_VALUE

The expected value of the similarity between the hot spot results.

Double
KAPPA

The spatially-adjusted fuzzy kappa value between the hot spot results.

Double
output_layer_group

A group layer of the output layers.

Group Layer

Code sample

HotSpotAnalysisComparison example 1 (Python window)

The following Python script demonstrates how to use the HotSpotAnalysisComparison function.


arcpy.stats.HotSpotAnalysisComparison("c:/data/boston.gdb/robbery_hotspot", 
      "c:/data/boston.gdb/social_disorder_hotspot", "robbery_disorder_comparison",
      8, 499, "FUZZY", None, None, "EXCLUDE")
HotSpotAnalysisComparison example 2 (stand-alone script)

The following Python script demonstrates how to use the HotSpotAnalysisComparison function.


# Compare hot spot analysis results for robberies and social disorder.

# Import required modules.
import arcpy

# Set the workspace.
arcpy.env.workspace = "c:/data/boston.gdb"

# Create hot spot result for robberies in Boston.
robbery_hs = arcpy.stats.HotSpots(
    "boston_ecometrics_hex", "robbery", "robbery_hotspot",
     "K_NEAREST_NEIGHBORS", None, None, None, None, None, None, 8
)

# Create hot spot result of social disorder in Boston.
social_disorder_hs = arcpy.stats.HotSpots(
    "boston_ecometrics_hex", "scl_dsr", "social_disorder_hotspot", 
    "K_NEAREST_NEIGHBORS", None, None, None, None, None, None, 8
)

# Compare robbery and social disorder hot spot results.
try:
    hs_compare = arcpy.stats.HotSpotAnalysisComparison(
        robbery_hs, social_disorder_hs, "robbery_disorder_comparison", 8, 999, "FUZZY", 
        None, None, False
    )
except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())

# Save similarity and kappa derived outputs.
result_vals = [hs_compare.getOutput(out) for out in range(hs_compare.outputCount)]

# Apply labels to derived outputs
results_names = ["output_fc", "similarity", "expected_similarity", "fuzzy_kappa", 
    "output_layer"]

# Combine to dictionary and print derived outputs.
results = dict(zip(results_names, result_vals))
results

Related topics