Local Bivariate Relationships (Spatial Statistics)

Summary

Analyzes two variables for statistically significant relationships using local entropy. Each feature is classified into one of six categories based on the type of relationship. The output can be used to visualize areas where the variables are related and explore how their relationship changes across the study area.

Learn more about how Local Bivariate Relationships works

Illustration

Local Bivariate Relationships tool illustration
Detect and visualize the local relationship between two variables.

Usage

  • This tool accepts points and polygons as input and should be used with continuous variables. It is not appropriate for binary or categorical data.

  • It is recommended that you store the output features in a geodatabase rather than as a shapefile (.shp). Shapefiles cannot store null values in attributes and cannot store charts in their pop-up dialog boxes.

  • Each input feature will be classified into one of the following relationship categories based on how reliably the Explanatory Variable parameter can predict the Dependent Variable parameter value:

    • Not Significant—The relationship between the variables is not statistically significant.
    • Positive Linear—The dependent variable increases linearly as the explanatory variable increases.
    • Negative Linear—The dependent variable decreases linearly as the explanatory variable increases.
    • Concave—The dependent variable changes by a concave curve as the explanatory variable increases.
    • Convex—The dependent variable changes by a convex curve as the explanatory variable increases.
    • Undefined Complex—The variables are significantly related, but the type of relationship cannot be reliably described by any other category.

  • Whether there is a relationship between two variables does not depend on which is labeled as the explanatory variable and which is labeled as the dependent variable. For example, if diabetes is related to obesity, obesity is similarly related to diabetes. However, the classification of the type of relationship may change depending on which variable is labeled as the explanatory variable and which is labeled as the dependent variable. One variable may accurately predict a second variable, but the second variable may not accurately predict the first. If you are unsure which variable should be labeled explanatory and which should be dependent, run the tool twice and try both.

  • This tool supports parallel processing and uses 50 percent of available processors by default. The number of processors can be increased or decreased using the Parallel Processing Factor environment.

Parameters

LabelExplanationData Type
Input Features

The feature class containing fields representing the Dependent Variable and Explanatory Variable values.

Feature Layer
Dependent Variable

The numeric field representing the values of the dependent variable. When categorizing the relationships, the Explanatory Variable value is used to predict the Dependent Variable value.

Field
Explanatory Variable

The numeric field representing the values of the explanatory variable. When categorizing the relationships, the Explanatory Variable value is used to predict the Dependent Variable value.

Field
Output Features

The output feature class containing all input features with fields representing the Dependent Variable value, Explanatory Variable value, entropy score, pseudo p-value, level of significance, type of categorized relationship, and diagnostics related to the categorization.

Feature Class
Number of Neighbors
(Optional)

The number of neighbors around each feature (including the feature) that will be used to test for a local relationship between the variables. The number of neighbors must be between 30 and 1,000, and the default is 30. The provided value should be large enough to detect the relationship between features, but small enough to still identify local patterns.

Long
Number of Permutations
(Optional)

Specifies the number of permutations that will be used to calculate the pseudo p-value for each feature. Choosing a number of permutations is a balance between precision in the pseudo p-value and increased processing time.

  • 99 permutationsWith 99 permutations, the smallest possible pseudo p-value is 0.01, and all other pseudo p-values will be multiples of this value.
  • 199 permutationsWith 199 permutations, the smallest possible pseudo p-value is 0.005, and all other pseudo p-values will be multiples of this value. This is the default.
  • 499 permutationsWith 499 permutations, the smallest possible pseudo p-value is 0.002, and all other pseudo p-values will be multiples of this value.
  • 999 permutationsWith 999 permutations, the smallest possible pseudo p-value is 0.001, and all other pseudo p-values will be multiples of this value.
Long
Enable Local Scatterplot Pop-ups
(Optional)

Specifies whether scatterplot pop-ups will be generated for each output feature. Each scatterplot displays the values of the explanatory (horizontal axis) and dependent (vertical axis) variables in the local neighborhood along with a fitted line or curve visualizing the form of the relationship. Scatterplot charts are not supported for shapefile outputs.

  • Checked—Local scatterplot pop-ups will be generated for each feature in the dataset. This is the default.
  • Unchecked—Local scatterplot pop-ups will not be generated.
Boolean
Level of Confidence
(Optional)

Specifies a confidence level of the hypothesis test for significant relationships.

  • 90%The confidence level is 90 percent. This is the default.
  • 95%The confidence level is 95 percent.
  • 99%The confidence level is 99 percent.
String
Apply False Discovery Rate (FDR) Correction
(Optional)

Specifies whether False Discover Rate (FDR) correction will be applied to the pseudo p-values.

  • Checked—Statistical significance will be based on the FDR correction. This is the default.
  • Unchecked—Statistical significance will be based on the pseudo p-value.
Boolean
Scaling Factor (Alpha)
(Optional)

The level of sensitivity to subtle relationships between the variables. Larger values (closer to one) can detect relatively weak relationships, while smaller values (closer to zero) will only detect strong relationships. Smaller values are also more robust to outliers. The value must be between 0.01 and 1, and the default is 0.5.

Double

arcpy.stats.LocalBivariateRelationships(in_features, dependent_variable, explanatory_variable, output_features, {number_of_neighbors}, {number_of_permutations}, {enable_local_scatterplot_popups}, {level_of_confidence}, {apply_false_discovery_rate_fdr_correction}, {scaling_factor})
NameExplanationData Type
in_features

The feature class containing fields representing the dependent_variable and explanatory_variable values.

Feature Layer
dependent_variable

The numeric field representing the values of the dependent variable. When categorizing the relationships, the explanatory_variable value is used to predict the dependent_variable value.

Field
explanatory_variable

The numeric field representing the values of the explanatory variable. When categorizing the relationships, the explanatory_variable value is used to predict the dependent_variable value.

Field
output_features

The output feature class containing all input features with fields representing the dependent_variable value, explanatory_variable value, entropy score, pseudo p-value, level of significance, type of categorized relationship, and diagnostics related to the categorization.

Feature Class
number_of_neighbors
(Optional)

The number of neighbors around each feature (including the feature) that will be used to test for a local relationship between the variables. The number of neighbors must be between 30 and 1,000, and the default is 30. The provided value should be large enough to detect the relationship between features, but small enough to still identify local patterns.

Long
number_of_permutations
(Optional)

Specifies the number of permutations that will be used to calculate the pseudo p-value for each feature. Choosing a number of permutations is a balance between precision in the pseudo p-value and increased processing time.

  • 99With 99 permutations, the smallest possible pseudo p-value is 0.01, and all other pseudo p-values will be multiples of this value.
  • 199With 199 permutations, the smallest possible pseudo p-value is 0.005, and all other pseudo p-values will be multiples of this value. This is the default.
  • 499With 499 permutations, the smallest possible pseudo p-value is 0.002, and all other pseudo p-values will be multiples of this value.
  • 999With 999 permutations, the smallest possible pseudo p-value is 0.001, and all other pseudo p-values will be multiples of this value.
Long
enable_local_scatterplot_popups
(Optional)

Specifies whether scatterplot pop-ups will be generated for each output feature. Each scatterplot displays the values of the explanatory (horizontal axis) and dependent (vertical axis) variables in the local neighborhood along with a fitted line or curve visualizing the form of the relationship. Scatterplot charts are not supported for shapefile outputs.

  • CREATE_POPUPLocal scatterplot pop-ups will be generated for each feature in the dataset. This is the default.
  • NO_POPUPLocal scatterplot pop-ups will not be generated.
Boolean
level_of_confidence
(Optional)

Specifies a confidence level of the hypothesis test for significant relationships.

  • 90%The confidence level is 90 percent. This is the default.
  • 95%The confidence level is 95 percent.
  • 99%The confidence level is 99 percent.
String
apply_false_discovery_rate_fdr_correction
(Optional)

Specifies whether False Discover Rate (FDR) correction will be applied to the pseudo p-values.

  • APPLY_FDRStatistical significance will be based on the FDR correction. This is the default.
  • NO_FDRStatistical significance will be based on the pseudo p-value.
Boolean
scaling_factor
(Optional)

The level of sensitivity to subtle relationships between the variables. Larger values (closer to one) can detect relatively weak relationships, while smaller values (closer to zero) will only detect strong relationships. Smaller values are also more robust to outliers. The value must be between 0.01 and 1, and the default is 0.5.

Double

Code sample

LocalBivariateRelationships example 1 (Python window)

The following Python window script demonstrates how to use the LocalBivariateRelationships function.

import arcpy
arcpy.env.workspace = 'C:\\LBR\\MyData.gdb'
arcpy.stats.LocalBivariateRelationships('ObesityDiabetes', 'ObesityRate', 
                   'DiabetesRate','LBR_Results', 30, '199', 'CREATE_POPUP', 
                   '95%', 'APPLY_FDR', 0.5)
LocalBivariateRelationships example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the LocalBivariateRelationships function.

# Use the Local Bivariate Relationships tool to study the relationship between
# obesity and diabetes.

# Import system modules.
import arcpy
import os

# Set property to overwrite existing output by default.
arcpy.env.overwriteOutput = True

try:
    # Set the workspace and input features.
    arcpy.env.workspace = r"C:\\LBR\\MyData.gdb"
    inputFeatures = 'ObesityDiabetes'

    # Set the output workspace and output name.
    outws = 'C:\\LBR\\outputs.gdb'
    outputName = 'LBR_Results'

    # Set input features, dependent variable, and explanatory variable.
    depVar = 'DiabetesRate'
    explVar = 'ObesityRate'

    # Set number of neighbors and permutations.
    numNeighbors = 50
    numPerms = '999'

    # Choose to create pop-ups.
    popUps = 'CREATE_POPUP'

    # Choose confidence level and apply False Discovery Rate correction.
    confLevel = '95%'
    fdr = 'APPLY_FDR'

    # Set the scaling factor.
    scaleFactor = 0.5

    # Run Local Bivariate Relationships.
    arcpy.stats.LocalBivariateRelationships(inputFeatures, depVar, explVar, 
                                            os.path.join(outws, outputName), 
                                            numNeighbors, numPerms, popUps, 
                                            confLevel, fdr, scaleFactor)

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())

Related topics