Label | Explanation | Data Type |
Input Rasters | The single-band, multidimensional, or multiband raster datasets, or mosaic datasets containing explanatory variables. | Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String |
Target Raster or Points
| The raster or point feature class containing the target variable (dependant variable) data. | Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service |
Output Regression Definition File
| A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier. | File |
Target Value Field
(Optional) | The field name of the information to model in the target point feature class or raster dataset. | Field |
Target Dimension Field
(Optional) | A date field or numeric field in the input point feature class that defines the dimension values. | Field |
Raster Dimension (Optional) | The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data. | String |
Output Importance Table (Optional) | A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1. | Table |
Max Number of Trees
(Optional) | The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50. | Long |
Max Tree Depth
(Optional) | The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30. | Long |
Max Number of Samples
(Optional) | The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000. | Long |
Average Points Per Cell
(Optional) | Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class.
| Boolean |
Percent of Samples for Testing
(Optional) | The percentage of test points that will be used for error checking. The tool checks for three types of errors: errors on training points, errors on test points, and errors on test location points. The default is 10. | Double |
Output Scatter Plots (pdf or html)
(Optional) | The output scatter plots in PDF or HTML format. The output will include scatter plots of training data, test data, and location test data. | File |
Output Sample Features
(Optional) | The output feature class that will contain target values and predicted values for training points, test points, and location test points. | Feature Class |
Available with Image Analyst license.
Summary
Models the relationship between explanatory variables (independent variables) and a target dataset (dependent variable).
Usage
The tool can be used to train with a variety of data types. The input rasters (explanatory variables) can be one raster or a list of rasters, a single band or a multiband in which each band is an explanatory variable, a multidimensional raster in which the variables in the raster are the explanatory variables, or a combination of data types.
An input mosaic dataset will be treated as a raster dataset (not a collection of rasters). To use a collection of rasters as input, build multidimensional info for the mosaic dataset and use the result as input.
The input target can be a feature class or a raster. When the target is a feature, the Target Value Field value must be set to a numeric field.
If the input target feature has a date field or a field that defines dimension, specify a value for both the Target Value Field and Target Dimension Field parameter.
The input raster target can also be a multidimensional raster.
If the input target is multidimensional, the corresponding input explanatory variables must have at least one multidimensional raster. Those that intersect the target dimensions will be used in training; other dimensionless rasters in the list will be applied to all dimensions. If no explanatory variables intersect or they are all dimensionless, no training will occur.
If the input target is dimensionless and the explanatory variables have dimension, the first slice will be used.
If the output is a multidimensional raster, use CRF format. If the output is a dimensionless raster, it can be stored in any output raster format.
The cell sizes of the input explanatory variables will affect the training result and the processing time. By default, the tool uses the cell size of the first explanatory raster; you can change it using the Cell Size environment setting. In general, training with a cell size lower than that of your data is not recommended.
The Output Importance Table parameter value can be used to analyze the importance of each explanatory variable contributing to predicting target the variable.
Check the Percent of Samples for Testing parameter to compute three types of errors: errors on training points, errors on test points, and errors on test location points. For example, if percent value is set to 10, 10 percent of the training sample points will be used for reference based on location. These reference points will be used to measure the error for interpolation in space, called test location points. The remaining training sample points will be divided into two groups—one group containing 90 percent of the training sample points and the other group containing 10 percent of the training sample points. The group containing 90 percent of the points will be used to train the regression model, and the group containing 10 percent of the points will be used in testing to derive the accuracy.
Checking the Percent of Samples for Testing parameter will produce a scatter plot of the predicted versus reference training sample values. The coefficient of determination (R-squared) is also computed as an estimate of the goodness of fit.
To create a scatter plot of predicted values and training values, you can use the Sample tool to extract predicted values from predicted rasters. Then perform a table join using the LocationID field in the Sample tool output and the ObjectID field in the target field class. If the target input is a raster, you can generate random points and extract values from both the input target raster and the predict raster.
Parameters
TrainRandomTreesRegressionModel(in_rasters, in_target_data, out_regression_definition, {target_value_field}, {target_dimension_field}, {raster_dimension}, {out_importance_table}, {max_num_trees}, {max_tree_depth}, {max_samples}, {average_points_per_cell}, {percent_testing}, {out_scatterplots}, {out_sample_features})
Name | Explanation | Data Type |
in_rasters [in_rasters,...] | The single-band, multidimensional, or multiband raster datasets, or mosaic datasets containing explanatory variables. | Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String |
in_target_data | The raster or point feature class containing the target variable (dependant variable) data. | Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service |
out_regression_definition | A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier. | File |
target_value_field (Optional) | The field name of the information to model in the target point feature class or raster dataset. | Field |
target_dimension_field (Optional) | A date field or numeric field in the input point feature class that defines the dimension values. | Field |
raster_dimension (Optional) | The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data. | String |
out_importance_table (Optional) | A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1. | Table |
max_num_trees (Optional) | The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50. | Long |
max_tree_depth (Optional) | The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30. | Long |
max_samples (Optional) | The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000. | Long |
average_points_per_cell (Optional) | Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class.
| Boolean |
percent_testing (Optional) | The percentage of test points that will be used for error checking. The tool checks for three types of errors: errors on training points, errors on test points, and errors on test location points. The default is 10. | Double |
out_scatterplots (Optional) | The output scatter plots in PDF or HTML format. The output will include scatter plots of training data, test data, and location test data. | File |
out_sample_features (Optional) | The output feature class that will contain target values and predicted values for training points, test points, and location test points. | Feature Class |
Code sample
This Python window script models the relationship between explanatory variables and a target dataset.
# Import system modules
import arcpy
from arcpy.ia import *
# Check out the ArcGIS Image Analyst extension license
arcpy.CheckOutExtension("ImageAnalyst")
# Execute
arcpy.ia.TrainRandomTreesRegressionModel("weather_variables.crf";"dem.tif", "pm2.5.shp", r"c:\data\pm2.5_trained.ecd", "mean_pm2.5", "date_collected", "StdTime”, r"c:\data\pm2.5_importanc.csv", 50, 30, 10000)
This Python stand-alone script models the relationship between explanatory variables and a target dataset.
# Import system modules
import arcpy
from arcpy.ia import *
# Check out the ArcGIS Image Analyst extension license
arcpy.CheckOutExtension("ImageAnalyst")
# Define input parameters
in_weather_variables = "C:/Data/ClimateVariables.crf"
in_dem_varaible = "C:/Data/dem.tif"
in_target = "C:/Data/pm2.5_observations.shp"
target_value_field = "mean_pm2.5"
Target_date_field = "date_collected"
Raster_dimension = “StdTime”
out_model_definition = "C:/Data/pm2.5_trained_model.ecd"
Out_importance_table = "C:/Data/pm2.5_importance_table.csv"
max_num_trees = 50
max_tree_depth = 30
max_num_samples = 10000
# Execute - train with random tree regression model
arcpy.ia.TrainRandomTreesRegressionModel(in_weather_variables;in_dem_varaible, in_target, out_model_definition, target_value_field, Target_date_field, Raster_dimension, max_num_trees, max_tree_depth, max_num_samples)