Calculate Composite Index (Spatial Statistics)

Summary

Combines multiple numeric variables to create a single index.

Composite indices are used across social and environmental domains to represent complex information from multiple indicators as a single metric that can measure progress toward a goal and facilitate decisions. The tool supports the three main steps of the index creation process: standardize input variables to a common scale (preprocessing), combine variables to a single index variable (combination), and scale and classify the resulting index to meaningful values (postprocessing).

Learn more about how Calculate Composite Index works

Illustration

Calculate Composite Index tool illustration

Usage

  • Creating an appropriate index depends on careful consideration of the question the index is trying to answer, variable choice, and the methods applied. These should be done in consultation with domain experts and end users.

    Learn more about best practices when creating composite indices

  • Use the Input Variables parameter to designate numeric fields to use in the index. The tool omits records with missing values in any input variable.

  • You can use the Preset Method to Scale and Combine Variables parameter to specify a method for creating an index. For example, the Combine values (Mean of scaled values) option scales the input variables between 0 and 1 and uses the mean of the rescaled input variables as the index.

    • The Preset Method to Scale and Combine Variables parameter will change the values of the Method to Scale Input Variables and Method to Combine Scaled Variables parameters. To customize these further, choose the Custom option to set their values manually.

  • The Method to Scale Input Variables parameter will apply the selected method to all input variables.

    • The Minimum-maximum option is the simplest, as it preserves the distribution of the input variables and scales to a 0 to 1 scale that is easy to interpret.
    • When working with variables that have skewed distributions or outliers, use the Percentile or Rank method, which consider the rank of the data, or use the Flag by threshold (binary) option, which converts the variables to binary (0 or 1) values.
    • To create an index that will be re-created over time as new data is available—such as a yearly performance index—use the Minimum-maximum (custom data ranges) or Z-score (custom) option. Using either of these options, you can set stable benchmarks that allow comparisons across data with different ranges and distributions.
    • The Raw values option is useful when the variables are on a comparable scale, such as when using percentages or when the variables have been preprocessed using other tools.

  • Use the Transform Field, Standardize Field, or Reclassify Field tools if you need to apply preprocessing methods that are not available in the tool or when different preprocessing methods are needed for each input variable.

    • When performing preprocessing, ensure that the input variables are on a comparable scale.
    • Use the Raw values option of the Method to Scale Input Variables parameter when performing your own preprocessing.

  • If all the input variables are on a common measurement scale, such as percentages, use the Raw values option of the Method to Scale Input Variables parameter.

  • The Flag by threshold (binary) option of the Method to Scale Input Variables parameter can be used to convert the input variables to values of 0 and 1 based on thresholds. Use the Method to Scale for Thresholds parameter to optionally apply a preprocessing step to all variables prior to setting the threshold. For example, the following steps count the number of input variables that are above the 90th percentile for each location:

    1. Set the Method to Scale Input Variables parameter value to Flag by threshold (binary).
    2. Set the Method to Scale for Thresholds parameter value to Percentile.
    3. Set the Thresholds parameter value to Greater than 0.9 for each variable.
    4. Set the Method to Combine Scaled Variables parameter value to Sum.

  • The Method to Combine Scaled Variables parameter includes additive methods (sum and mean) and multiplicative methods (multiply and geometric mean).

    • Additive methods allow a variable with a high value to compensate for variables with low values.
    • Multiplicative methods do not allow high values to compensate for low values. High index values occur only when there are high values in multiple variables.

  • You can use the Weights parameter (in the Variable Weights parameter category) to indicate the relative importance of each input variable. All weights are set to 1 by default, meaning that each variable is equally weighted.

    • If you know that a variable should be twice as important as another variable, set the variable to a weight of 2, and the other variable to a weight of 1.
    • You can also set weights that add up to 1; for example, if three variables are used and one should be considered to be twice as important as the other two, you can use weight values of 0.5, 0.25, and 0.25.

  • Weights have a significant impact on the resulting index. Setting the relative importance of variables is a subjective part of the analysis and should be driven by domain knowledge and documented justification.

  • The tool will create an index field, a rank field, and a percentile field. An index raw field will also be created when reversing or when the index is scaled to a new minimum and maximum. Additional fields will be added for each of the classification options specified in the Additional Classified Outputs parameter. If the input is a feature class and a feature class is specified in the Output Features or Table parameter, the tool will provide a group layer displaying a layer for the index field, the percentile field, and each of the selected classification options.

  • The output index layer will include charts to view the distribution of the index, help identify whether the preprocessing steps achieved the intended result, and check for correlations among input variables and the index.

  • The output index layer includes a pop-up visualization you can use to examine the values of the resulting index and input variables at specific locations.

  • The concept that the index is measuring may be represented by multiple dimensions. For example, a vulnerability index might be composed of housing, transportation, and income domains, each comprising multiple variables. Consider constructing subindices to represent each dimension. This is achieved by running the tool multiple times, once for each dimension, and using the results as the input variables to the final index.

  • When constructing an index that comprises subindices, consider using ModelBuilder or a notebook in ArcGIS AllSource to streamline this process. If using ModelBuilder, create new features for each subindex by unchecking the Append Fields to Input Table parameter. This will create a separate output for each subindex, which should be joined together before creating the final index. The tool does not support chaining multiple indices together in ModelBuilder when the Append Fields to Input Table parameter is checked.

  • A best practice is to reduce the number of input variables while keeping enough to capture the essential information needed for the index. A large number of input variables may result in difficulty when interpreting the index. Additionally, if multiple variables pertain to the same domain, for example, median income and poverty, the influence of this domain may be overrepresented in the index.

Parameters

LabelExplanationData Type
Input Table

The table or features containing the variables that will be combined into the index.

Table View
Input Variables

A list of numeric fields representing the variables that will be combined as an index. Check the Reverse Direction check box to reverse the values of the variables. This means that the feature or record that originally had the highest value will have the lowest value, and vice versa.

Value Table
Append Fields to Input Table
(Optional)

Specifies whether the results will be appended to the input data or provided as an output feature class or table.

  • Checked—The results will be appended to the input data. This option modifies the input data.
  • Unchecked—An output feature class or table will be created containing the results. This is the default.

Boolean
Output Features or Table
(Optional)

The output features or table that will include the results.

Table; Feature Class
Preset Method to Scale and Combine Variables
(Optional)

Specifies the workflow that will be used when creating the index. The options represent common index creation workflows; each option sets default values for the Method to Scale Input Variables and Method to Combine Scaled Variables parameters.

  • Combine values (Mean of scaled values)An index will be created by scaling the input variables between 0 and 1 and averaging the scaled values. This method is useful for creating an index that is easy to interpret. The shape of the distribution and outliers in the input variables will impact the index. This is the default.
  • Combine ranks (Mean of percentiles)An index will be created by scaling the ranks of the input variables between 0 and 1 and averaging the scaled ranks. This option is useful when the rankings of the variable values are more important than the differences between values. The shape of the distribution and outliers in the input variables will not impact the index.
  • Compound differences (Geometric mean of scaled values)An index will be created by scaling the input variables between 0 and 1 and calculating the geometric average of the scaled values. High values will not fully compensate for low values, so this option is useful for creating an index in which higher index values will occur only when there are high values in multiple variables.
  • Highlight extremes (Count of values above 90th percentile)An index will be created that counts the number of input variables with values greater than or equal to the 90th percentile. This method is useful for identifying locations that may be considered the most extreme or the most in need.
  • CustomAn index will be created using customized variable scaling and combination options.
String
Method to Scale Input Variables
(Optional)

Specifies the method that will be used to convert the input variables to a common scale.

  • Minimum-maximumVariables will be scaled between 0 and 1 using the minimum and maximum values of each variable. This is the default.
  • Minimum-maximum (custom data ranges)Variables will be scaled between 0 and 1 using the possible minimum and possible maximum values for each variable specified by the Custom Data Ranges parameter. This method has many uses, including specifying the minimum and maximum based on a benchmark, on a reference statistic, or on theoretical values. For example, if ozone recordings for a single day range between 5 and 27 parts per million (ppm), you can use the theoretical minimum and maximum, based on prior observation and domain expertise, to set the possible maximum and possible minimum parameter value. This will ensure that the index can be compared across multiple days.
  • Percentile Variables will be converted to percentiles between 0 and 1 by scaling the rank of the data values. This option is useful when you want to ignore absolute differences between the data values, such as with outliers or skewed distributions.
  • RankVariables will be ranked. The smallest value is assigned rank value 1, the next value is assigned rank value 2, and so on. Ties are assigned the average of their ranks.
  • Z-scoreEach variable will be standardized by subtracting the mean value and dividing by the standard deviation (called a z-score). The z-score is the number of standard deviations above or below the mean value. This option is useful when the means of the variables are important comparison points. Values above the mean will receive positive z-scores, and values below the mean will receive negative z-scores.
  • Z-score (custom)Each variable will be standardized by subtracting a custom mean value and dividing by a custom standard deviation. Provide the custom values in the Custom Standardization parameter. This option is useful when the means and standard deviations of the variables are known from previous research.
  • Flag by threshold (binary)Variables will be identified when they are above or below a defined threshold. The resulting field contains binary (0 or 1) values indicating whether the threshold was exceeded. You can also use the Method to Scale for Thresholds parameter to scale the input variable values before defining the threshold, and use the Thresholds parameter to specify the threshold values. This method is useful when the values of the variables are less important than whether they exceed a particular threshold, such as a safety limit of a pollutant.
  • Raw valuesThe original values of the variables will be used. Use this method only when all variables are measured on a comparable scale, such as percentages or rates, or when the variables have been standardized before using this tool.
String
Method to Scale for Thresholds
(Optional)

Specifies the method that will be used to convert the input variables to a common scale prior to setting thresholds.

  • Minimum-maximumVariables between 0 and 1 will be scaled using the minimum and maximum values of each variable.
  • Minimum-maximum (custom data ranges)Variables between 0 and 1 will be scaled using the possible minimum and possible maximum values for each variable.
  • PercentileVariables will be converted to percentiles between 0 and 1.
  • Z-scoreEach variable will be standardized by subtracting the mean value and dividing by the standard deviation.
  • Z-score (custom)Each variable will be standardized by subtracting a custom mean value and dividing by a custom standard deviation.
  • Raw values The values of the variables will be used without change. This is the default.
String
Custom Standardization
(Optional)

The custom mean value and custom standard deviation that will be used when standardizing each input variable. For each variable, provide the custom mean in the Mean column and the custom standard deviation in the Standard Deviation column.

Value Table
Custom Data Ranges
(Optional)

The possible minimum and maximum values that will be used in the units of the variables. Each variable will be scaled between 0 and 1 based on the possible minimum and maximum values.

Value Table
Thresholds
(Optional)

The threshold that determines whether a feature will be flagged. Specify the value in the units of the scaled variables and specify whether values above or below the threshold value will be flagged.

Value Table
Method to Combine Scaled Variables
(Optional)

Specifies the method that will be used to combine the scaled variables into a single value.

You cannot multiply or calculate a geometric mean when any variables are scaled using z-scores, because z-scores always contain negative values.

  • SumThe values will be added.
  • MeanThe arithmetic (additive) mean of the values will be calculated. This is the default.
  • MultiplyThe values will be multiplied. All scaled values must be greater than or equal to zero.
  • Geometric meanThe geometric (multiplicative) mean of the values will be calculated. All scaled values must be greater than or equal to zero.
String
Weights
(Optional)

The weights that will set the relative influence of each input variable on the index. Each weight has a default value of 1, meaning that each variable has equal contribution. Increase or decrease the weights to reflect the relative importance of the variables. For example, if a variable is twice as important as another, use a weight value of 2. Using weight values larger than 1 while multiplying to combine scaled values can result in indices with very large values.

Value Table
Output Index Name
(Optional)

The name of the index. The value is used in the visualization of the outputs, such as field aliases and chart labels. The value is not used when the output (or appended input) is a shapefile.

String
Reverse Output Index Values
(Optional)

Specifies whether the output index values will be reversed in direction (for example, to treat high index values as low values).

  • Checked—The index values will be reversed in direction.
  • Unchecked—The index values will not be reversed in direction. This is the default.

Boolean
Output Index Minimum and Maximum Values
(Optional)

The minimum and maximum of the output index values. This scaling is applied after combining the scaled variables. If no values are provided, the output index is not scaled.

Value Table
Additional Classified Outputs
(Optional)

Specifies the method that will be used to classify the output index. An additional output field will be provided for each selected option.

  • Equal intervalClasses will be created by dividing the range of values into equally sized intervals.
  • QuantileClasses will be created in which each class includes an equal number of records.
  • Standard deviationClasses will be created corresponding to the number of standard deviations above and below the average of the index. The resulting values will be between -3 and 3.
  • CustomClass breaks and class values will be specified using the Output Index Custom Classes parameter.
String
Output Index Number of Classes
(Optional)

The number of classes that will be used for the equal interval and quantile classification methods.

Long
Output Index Custom Classes
(Optional)

The upper bounds and class values for the custom classification method. For example, you can use this variable to classify an index containing values between 0 and 100 into classes representing low, medium, and high values based on custom break values.

Value Table

Derived Output

LabelExplanationData Type
Updated Input Table

The updated input table.

Table View
Output Layer Group

If the input was a feature class and a feature class is specified for the Output Features or Table parameter, a group layer is provided displaying a layer for the index field, the percentile field, and each of the selected classification options.

Group Layer

arcpy.stats.CalculateCompositeIndex(in_table, in_variables, {append_to_input}, {out_table}, {index_preset}, {preprocessing}, {pre_threshold_scaling}, {pre_custom_zscore}, {pre_min_max}, {pre_thresholds}, {index_method}, {index_weights}, {out_index_name}, {out_index_reverse}, {post_min_max}, {post_reclass}, {post_num_classes}, {post_custom_classes})
NameExplanationData Type
in_table

The table or features containing the variables that will be combined into the index.

Table View
in_variables
[[var1, reverse1],[var2, reverse2],...]

A list of numeric fields representing the variables that will be combined as an index. The Reverse Direction column reverses the values of the variables. This means that the feature or record that originally had the highest value will have the lowest value, and vice versa. Values will be reversed after scaling.

Value Table
append_to_input
(Optional)

Specifies whether the results will be appended to the input data or provided as an output feature class or table.

  • APPEND_TO_INPUT The results will be appended to the input data. This option modifies the input data.
  • NEW_FEATURES An output feature class or table will be created containing the results. This is the default.
Boolean
out_table
(Optional)

The output features or table that will include the results.

Table; Feature Class
index_preset
(Optional)

Specifies the workflow that will be used when creating the index. The options represent common index creation workflows; each option sets default values for the preprocessing and index_method parameters.

  • MEAN_SCALEDAn index will be created by scaling the input variables between 0 and 1 and averaging the scaled values. This method is useful for creating an index that is easy to interpret. The shape of the distribution and outliers in the input variables will impact the index. This is the default.
  • MEAN_PCTLAn index will be created by scaling the ranks of the input variables between 0 and 1 and averaging the scaled ranks. This option is useful when the rankings of the variable values are more important than the differences between values. The shape of the distribution and outliers in the input variables will not impact the index.
  • GEOMEAN_SCALEDAn index will be created by scaling the input variables between 0 and 1 and calculating the geometric average of the scaled values. High values will not cancel low values, so this option is useful for creating an index in which higher index values will occur only when there are high values in multiple variables.
  • SUM_FLAGSPCTLAn index will be created that counts the number of input variables with values greater than or equal to the 90th percentile. This method is useful for identifying locations that may be considered the most extreme or the most in need.
  • CUSTOMAn index will be created using customized variable scaling and combination options.
String
preprocessing
(Optional)

Specifies the method that will be used to convert the input variables to a common scale.

  • MINMAX Variables will be scaled between 0 and 1 using the minimum and maximum values of each variable. This is the default.
  • CUST_MINMAX Variables will be scaled between 0 and 1 using the possible minimum and possible maximum values for each variable, specified by the pre_min_max parameter. This method has many uses, including specifying the minimum and maximum based on a benchmark, on a reference statistic, or on theoretical values. For example, if ozone recordings for a single day range between 5 and 27 parts per million (ppm), you can use the theoretical minimum and maximum based on prior observation and domain expertise to ensure that the index can be compared across multiple days
  • PERCENTILEVariables will be converted to percentiles between 0 and 1 by calculating the percent of data values less than the data value. This option is useful when you want to ignore absolute differences between the data values, such as with outliers or skewed distributions.
  • RANKVariables will be ranked. The smallest value is assigned rank value 1, the next value is assigned rank value 2, and so on. Ties are assigned the average of their ranks.
  • ZSCOREEach variable will be standardized by subtracting the mean value and dividing by the standard deviation (called a z-score). The z-score is the number of standard deviations above or below the mean value. This option is useful when the means of the variables are important comparison points. Values above the mean will receive positive z-scores, and values below the mean will receive negative z-scores.
  • CUST_ZSCOREEach variable will be standardized by subtracting a custom mean value and dividing by a custom standard deviation. Provide the custom values in the pre_custom_zscore parameter. This option is useful when the means and standard deviations of the variables are known from previous research.
  • BINARYVariables will be identified when they are above or below a defined threshold. The resulting field contains binary (0 or 1) values indicating whether the threshold was exceeded. You can also use the pre_threshold_scaling parameter to scale the input variable values before defining the threshold, and use the pre_thresholds parameter to specify the threshold values. This method is useful when the values of the variables are less important than whether they exceed a particular threshold, such as a safety limit of a pollutant.
  • RAWThe original values of the variables will be used. Use this method only when all variables are measured on a comparable scale, such as percentages or rates, or when the variables have been standardized before using this tool.
String
pre_threshold_scaling
(Optional)

Specifies the method that will be used to convert the input variables to a common scale prior to setting thresholds.

  • THRESHOLD_MINMAXVariables between 0 and 1 will be scaled using the minimum and maximum values of each variable.
  • THRESHOLD_CUST_MINMAXVariables between 0 and 1 will be scaled using the possible minimum and possible maximum values for each variable.
  • THRESHOLD_PERCENTILEVariables will be converted to percentiles between 0 and 1.
  • THRESHOLD_ZSCOREEach variable will be standardized by subtracting the mean value and dividing by the standard deviation.
  • THRESHOLD_CUST_ZSCOREEach variable will be standardized by subtracting a custom mean value and dividing by a custom standard deviation.
  • THRESHOLD_RAW The values of the variables will be used without change. This is the default.
String
pre_custom_zscore
[[field1, mean1, stdev1], [field2, mean2, stdev2],...]
(Optional)

The custom mean value and custom standard deviation that will be used when standardizing each input variable. For each variable, provide the custom mean in the Mean column and the custom standard deviation in the Standard Deviation column.

Value Table
pre_min_max
[[field1, min1, max1], [field2, min2, max2],...]
(Optional)

The possible minimum and maximum values that will be used in the units of the variables. Each variable will be scaled between 0 and 1 based on the possible minimum and maximum values.

Value Table
pre_thresholds
[[field1, method1, threshold1], [field2, method2, threshold2],...]
(Optional)

The threshold that determines whether a feature will be flagged. Specify the value in the units of the scaled variables and specify whether values above or below the threshold value will be flagged.

Value Table
index_method
(Optional)

Specifies the method that will be used to combine the scaled variables into a single value.

  • SUMThe values will be added.
  • MEANThe arithmetic (additive) mean of the values will be calculated. This is the default.
  • PRODUCTThe values will be multiplied. All scaled values must be greater than or equal to zero.
  • GEOMETRIC_MEANThe geometric (multiplicative) mean of the values will be calculated. All scaled values must be greater than or equal to zero.

You cannot multiply or calculate a geometric mean when any variables are scaled using z-scores, because z-scores always contain negative values.

String
index_weights
[[field1, weight1], [field2, weight2],...]
(Optional)

The weights that will set the relative influence of each input variable on the index. Each weight has a default value of 1, meaning that each variable has equal contribution. Increase or decrease the weights to reflect the relative importance of the variables. For example, if a variable is twice as important as another, use a weight value of 2. Using weight values larger than 1 while multiplying to combine scaled values can result in indices with very large values.

Value Table
out_index_name
(Optional)

The name of the index. The value is used in the visualization of the outputs, such as field aliases and chart labels. The value is not used when the output (or appended input) is a shapefile.

String
out_index_reverse
(Optional)

Specifies whether the output index values will be reversed in direction (for example, to treat high index values as low values).

  • REVERSE The index values will be reversed in direction.
  • NO_REVERSE The index values will not be reversed in direction. This is the default.
Boolean
post_min_max
[min, max]
(Optional)

The minimum and maximum of the output index values. This scaling is applied after combining the scaled variables. If no values are provided, the output index is not scaled.

Value Table
post_reclass
[post_reclass,...]
(Optional)

Specifies the method that will be used to classify the output index. An additional output field will be provided for each selected option.

  • EQINTERVALClasses will be created by dividing the range of values into equally sized intervals
  • QUANTILEClasses will be created in which each class includes an equal number of records.
  • STDDEVClasses will be created corresponding to the number of standard deviations above and below the average of the index. The resulting values will be between -3 and 3.
  • CUSTClass breaks and class values will be specified using the post_custom_classes parameter.
String
post_num_classes
(Optional)

The number of classes that will be used for the equal interval and quantile classification methods.

Long
post_custom_classes
[[min1, max1], [min2, max2],...]
(Optional)

The upper bounds and class values for the custom classification method. For example, you can use this variable to classify an index containing values between 0 and 100 into classes representing low, medium, and high values based on custom break values.

Value Table

Derived Output

NameExplanationData Type
updated_table

The updated input table.

Table View
output_layer_group

If the input was a feature class and a feature class is specified for the out_table parameter, a group layer is provided displaying a layer for the index field, the percentile field, and each of the selected classification options.

Group Layer

Code sample

CalculateCompositeIndex example 1 (Python window)

The following Python script demonstrates how to use the CalculateCompositeIndex function.


import arcpy
arcpy.stats.CalculateCompositeIndex(
    in_table=r"C:\MyData.gdb\CommunityCharacteristics", 
    out_table=r"C:\MyData.gdb\CommunityCharacteristicsIndex",
    in_variables=["ASTHMA_Prevalence_Percent", "Health_NoInsurance_Percent", 
                    "BelowPovertyLine_Percent"],
    index_preset="MEAN_SCALED")
CalculateCompositeIndex example 2 (stand-alone script)

The following Python script demonstrates how to use the CalculateCompositeIndex function.


# Import system modules 
import arcpy 
import os 

try: 
    # Set the workspace and overwrite properties
    arcpy.env.workspace = r"C:\temp\temp.gdb" 
    arcpy.env.overwriteOutput = True 
    
    # Set the input point feature parameters
    input_features = os.path.join(arcpy.env.workspace, "CommunityCharacteristics")

    # Set a list of variables that will be combined into an index
    input_variables = ["ASTHMA_Prevalence_Percent", "Health_NoInsurance_Percent", 
                       "BelowPovertyLine_Percent"]

    # Set the output name that will contain the index values.
    output_features = os.path.join(arcpy.env.workspace, "CommunityCharacteristicsIndex")

    # Set the method to scale the input variables
    preprocessing_method = "PERCENTILE"

    # Set the method to combine the input variables
    combination_method = "MEAN"
    variable_weights = [["ASTHMA_Prevalence_Percent", 2],
                        ["Health_NoInsurance_Percent", 1],
                        ["BelowPovertyLine_Percent", 1]]

    # Set the output settings
    output_index_name = "Asthma_Needs_Index"
    output_index_range = "0 100"
    output_classification = "QUANTILE"
    output_classification_num_classes = 5

    # Call the tool using the parameters defined above.
    arcpy.stats.CalculateCompositeIndex(
        in_table=input_features,
        in_variables=input_variables, 
        out_table=output_features,
        index_preset="CUSTOM",
        preprocessing=preprocessing_method,
        index_method=combination_method,
        index_weights=variable_weights,
        out_index_name=output_index_name,
        post_min_max=output_index_range,
        post_reclass=output_classification,
        post_num_classes=output_classification_num_classes)

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())

Related topics