Label | Explanation | Data Type |
Input Model File | The spatial statistics model file that will be used to make new predictions. | File |
Prediction Type
| Specifies the operation mode that will be used. The tool can predict new features or create a prediction raster surface.
| String |
Input Prediction Features
(Optional) | The feature class representing locations where predictions will be made. This feature class must also contain any explanatory variables provided as fields that correspond to those used to train the input model. | Feature Layer |
Output Predicted Features
(Optional) | The output feature class containing the prediction results. | Feature Class |
Output Predicted Raster
(Optional) | The output raster containing the prediction results. The default cell size will be the maximum cell size of the input rasters. | Raster Dataset |
Match Explanatory Variables
(Optional) | A list of the explanatory variables of the input model and corresponding fields of the input prediction features. For each explanatory variable in the Training column, provide the corresponding prediction field in the Prediction column. The Categorical column specifies whether the variable is categorical or continuous. | Value Table |
Match Distance Features
(Optional) | A list of the explanatory distance features of the input model and corresponding prediction distance features. For each explanatory distance feature in the Training column, provide the corresponding prediction distance feature in the Prediction column. | Value Table |
Match Explanatory Rasters
(Optional) | A list of the explanatory rasters of the input model and corresponding prediction rasters. For each explanatory raster in the Training column, provide the corresponding prediction raster in the Prediction column. The Categorical column specifies whether the raster is categorical or continuous. | Value Table |
Summary
Predicts continuous or categorical values using a trained spatial statistics model (.ssm file).
Usage
The following are example scenarios for use of this too:
- For a forest-based classification and regression model trained on the occurrence of seagrass using a number of environmental explanatory variables represented as both attributes and rasters, in addition to distances to factories upstream and major ports, future seagrass occurrence can be predicted based on future projections for those environmental explanatory variables.
- A model trained by an expert can be shared with others to make predictions without sharing sensitive data. For example, a model on the blood lead levels of children and the tax parcel ID of their homes, combined with parcel-level attributes such as age of home, census-level data such as income and education levels, and national datasets reflecting toxic release of lead and lead compounds, the risk of lead exposure for parcels without blood lead level data can be predicted. These risk predictions can drive policies and education programs in the area.
- A wildlife ecologist has collected field data for observed presence locations of an endangered species. They need to estimate the species' presence in a broader study area and share their work with other researchers. Using the known presence locations and providing underlying factors as rasters, the ecologist can model the species' presence using presence-only prediction and share the trained model without sharing sensitive information about species occurrence. The model can be used to create a map of predicted locations where the species is most likely to be found.
The Input Model File parameter value is an .ssm file created by various tools in the Modeling Spatial Relationships toolset of the Spatial Statistics toolbox. You can create the model file using the Generalized Linear Regression, Forest-based and Boosted Classification and Regression, and Presence-only Prediction (MaxEnt) tools by specifying the Output Trained Model File parameter value in each tool.
See How Generalized Linear Regression works, How Forest-based and Boosted Classification and Regression works, and How Presence-only Prediction works to learn how each model makes predictions for each model type.
When using the Predict to features option for the Prediction Type parameter, use the Output Predicted Feature parameter to create a feature class with the predictions. When using the Predict to raster option, use the Output Prediction Surface parameter to create a raster of the predicted values.
To predict to raster, the .ssm file must be trained using only rasters.
An ArcGIS Spatial Analyst extension license is required to use an .ssm file that was trained using rasters.
Note:
It is recommended that you run the Describe Spatial Statistics Model File tool before running this tool to learn about the variable names, types, descriptions, and units to prepare the data accordingly. You can also use the model diagnostics to assess the quality of the input model file.
Explanatory variables can come from fields, be calculated from distance features, or be extracted from rasters. The combination of the explanatory variables should match the input model file.
If an explanatory variable or raster is marked as categorical while creating the model file, the Categorical parameter will be checked and will treat the matching variable as categorical. You can use the Describe Spatial Statistics Model File tool before running this tool to determine which variables are categorical in the model file.
The training and prediction variables should be similar field types. For example, all numeric field types can be matched to all other numeric field types, but if the training field is text, the corresponding prediction variable should also be text.
Note:
It is recommended that you set the variable units before matching the training and prediction variables. If the trained model file and the prediction variable units are different, the results may be incorrect. For example, if you train a model using an income variable in United States dollars but you match that variable with income in Indian rupees when making predictions, the range of variables may be inconsistent between the trained and prediction variables resulting in inaccurate predicted variables.
This tool also creates messages and charts that describe the performance of the model. To access the messages, hover over the progress bar and click the pop-out button, or expand the messages section in the Geoprocessing pane. You can also access the messages for a previous run of the tool through the geoprocessing history. The messages include model diagnostics and other information about the model.
In the geoprocessing messages, the Model Parameters table describes the variable and field type to predict and the explanatory variables used to create the model. The table also contains units (if set using the Set Spatial Statistics Model File Properties tool) for each variable to help ensure that they align with the prediction variables when using the model to make predictions.
Caution:
It is recommended that you assess the model diagnostics before trusting the prediction results. If a model was trained without withholding any validation data, the accuracy of the predictions cannot be assessed.
Caution:
When running the tool using ArcPy, the order and case of the variables provided in the Match Explanatory Variables, Match Distance Features, and Match Explanatory Rasters parameter value tables are important. For example, if you have two explanatory variables representing temperature and humidity, and the temperature value is expected before the humidity value, you must provide the variables in that order. Use the derived outputs from the Describe Spatial Statistics Model File tool to get the correct order of the variables stored in the input model file.
Parameters
arcpy.stats.PredictUsingSSMFile(input_model, prediction_type, {features_to_predict}, {output_features}, {output_raster}, {explanatory_variable_matching}, {explanatory_distance_matching}, {explanatory_rasters_matching})
Name | Explanation | Data Type |
input_model | The spatial statistics model file that will be used to make new predictions. | File |
prediction_type | Specifies the operation mode that will be used. The tool can predict new features or create a prediction raster surface.
| String |
features_to_predict (Optional) | The feature class representing locations where predictions will be made. This feature class must also contain any explanatory variables provided as fields that correspond to those used to train the input model. | Feature Layer |
output_features (Optional) | The output feature class containing the prediction results. | Feature Class |
output_raster (Optional) | The output raster containing the prediction results. The default cell size will be the maximum cell size of the input rasters. | Raster Dataset |
explanatory_variable_matching [[pred1, train1, cat1], [pred2, train2, cat2],...] (Optional) | A list of the explanatory variables of the input model and corresponding fields of the input prediction features. For each explanatory variable in the Training column, provide the corresponding prediction field in the Prediction column. The Categorical column specifies whether the variable is categorical or continuous. | Value Table |
explanatory_distance_matching [[pred1, cat1], [pred2, cat2],...] (Optional) | A list of the explanatory distance features of the input model and corresponding prediction distance features. For each explanatory distance feature in the Training column, provide the corresponding prediction distance feature in the Prediction column. | Value Table |
explanatory_rasters_matching [[pred1, train1, cat1], [pred2, train2, cat2],...] (Optional) | A list of the explanatory rasters of the input model and corresponding prediction rasters. For each explanatory raster in the Training column, provide the corresponding prediction raster in the Prediction column. The Categorical column specifies whether the raster is categorical or continuous. | Value Table |
Code sample
The following Python window script demonstrates how to use the PredictUsingSSMFile function.
arcpy.stats.PredictUsingSSMFile(
"PredictAsthma_Forest.ssm", "PREDICT_FEATURES",
"MedicareSpendingData", "Predicted_features", None,
"AVERAGE_HCC_SCORE_2010_CAT AVERAGE_HCC_SCORE_2010_CAT true;
HOSPBEDSD_INT HOSPBEDSD_INT false;
PERCENT_ASTHMA_2010_DBL PERCENT_ASTHMA_2010_DBL false",
"Distance_Hospital DF_POLY", "EVANDMAND_RASTER EVANDMAND #")
The following stand-alone Python script demonstrates how to use the PredictUsingSSMFile function.
# Predict to Raster using the Predict using spatial statistics model file tool
# Import system modules.
import arcpy
import os
# Set workspace.
arcpy.env.workspace = r"C:\Analysis"
arcpy.env.overwriteOutput = True
# Read the explanatory raster order and variable names using Describe Spatial
# Statistics Model File tool.
in_model = "Suitability.ssm"
desc_result = arcpy.stats.DescribeSSMFile(in_model)
# Print the list of explanatory rasters.
print(desc_result[2])
# Split the explanatory raster strings into a list of variable names.
exp_ras = desc_result[2].split(";")
# Set Parameters for prediction.
prediction_type="PREDICT_RASTER"
out_raster= "suitability_predicted_raster.tif"
match_exp_ras0 = "Climate_Bio2050.tif"
match_exp_ras1 = "Climate_Temp2050.tif"
match_exp_ras2 = "Climate_Solar2050.tif"
match_rasters = [[match_exp_ras0, exp_ras[0], None],
[match_exp_ras1, exp_ras[1], None],
[match_exp_ras2, exp_ras[2], None]]
# Run tool.
arcpy.stats.PredictUsingSSMFile(in_model, prediction_type, "", "", out_raster,
"", "", match_rasters)
Environments
Special cases
- Random number generator
The Random Generator Type used is always Mersenne Twister.
Related topics
- An overview of the Modeling Spatial Relationships toolset
- Find a geoprocessing tool
- Introduction to spatial statistics model files
- Describe Spatial Statistics Model File
- Set Spatial Statistics Model File Properties
- Generalized Linear Regression
- Presence-only Prediction (MaxEnt)
- Forest-based and Boosted Classification and Regression