Label | Explanation | Data Type |
Input Point Features
| The point features representing locations where presence of a phenomenon of interest is known to occur. | Feature Layer |
Contains Background Points (Optional) | Specifies whether the input point features contain background points. If the input points do not contain background points, the tool will generate background points using cells in the explanatory training rasters. The tool uses background points to model the characteristics of the landscape in unknown locations and compare them to landscape characteristics in known presence locations. Therefore, background points can be considered as the study area. Generally, these are locations where presence of a phenomenon of interest is unknown. However, if any information is known about the background points, the Relative Weight of Presence to Background parameter can be used to indicate this.
| Boolean |
Presence Indicator Field
(Optional) | The field from the input point features containing binary values that indicate each point as presence (1) or background (0). The field must be numeric (Short, Long, Float, or Double types). | Field |
Explanatory Training Variables
(Optional) | A list of fields representing the explanatory variables that will help predict the probability of presence. You can specify whether each variable is categorical or numeric. Check the Categorical check box for each variable that represents a class or category (such as land cover). | Value Table |
Explanatory Training Distance Features
(Optional) | A list of feature layers or feature classes that will be used to automatically create explanatory variables that represent the distance from the input point features to the nearest provided distance features. If the input explanatory training distance features are polygons or lines, the distance attributes are calculated as the distance between the closest segment and the point. | Feature Layer |
Explanatory Training Rasters (Optional) | A list of rasters that will be used to automatically create explanatory training variables in the model whose values are extracted from rasters. For each feature (presence and background points) in the input point features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. You can specify whether each raster value is categorical or numeric. Check the Categorical check box for each raster that represents a class or category (such as land cover). | Value Table |
Explanatory Variable Expansions (Basis Functions)
(Optional) | Specifies the basis function that will be used to transform the provided explanatory variables for use in the model. If multiple basis functions are selected, the tool will produce multiple transformed variables and attempt to use them in the model.
| String |
Number of Knots
(Optional) | The number of knots that will be used by the hinge and threshold explanatory variable expansions. The value controls how many thresholds are created, which are used to create multiple explanatory variable expansions using each threshold. The value must be between 2 and 50. The default is 10. | Long |
Study Area
(Optional) | Specifies the type of study area that will be used to define where presence is possible when the input point features do not contain background points.
| String |
Study Area Polygon
(Optional) | A feature class containing the polygons that define a custom study area. The input point features must be located within the custom study area covered by the polygon features. A study area can be composed of multiple polygons. | Feature Layer |
Apply Spatial Thinning (Optional) | Specifies whether spatial thinning will be applied to presence and background points before training the model. Spatial thinning helps to reduce sampling bias by removing points and ensuring that remaining points have a minimum nearest-neighbor distance, set in the Minimum Nearest Neighbors parameter. Spatial thinning is also applied to background points whether they are provided in input point features or generated by the tool.
| Boolean |
Minimum Nearest Neighbor Distance
(Optional) | The minimum distance between any two presence points or any two background points when spatial thinning is applied. | Linear Unit |
Number of Iterations for Thinning
(Optional) | The number of runs that will be used to find the optimal spatial thinning solution, seeking to maintain as many presence and background points as possible while ensuring that no two presence or two background points are within the specified Minimum Nearest Neighbor Distance parameter value. The minimum possible is 1 iteration and the maximum possible is 50 iterations. The default is 10. This parameter is only applicable for spatial thinning applied to presence and background points in the input point features. Spatial thinning that is applied to background points generated from raster cells undergo spatial thinning by resampling the raster cells to the specified Minimum Nearest Distance parameter value without needing to iterate for an optimal solution. | Long |
Relative Weight of Presence to Background
(Optional) | A value between 1 and 100 that specifies the relative information weight of presence points to background points. The default is 100. A higher value indicates that presence points are the primary source of information; it is unknown whether background points represent presence or absence and background points receive lower weight in the model. A lower value indicates that background points also contribute valuable information that can be used in conjunction with presence points; there is greater confidence that background points represent absence and their information can be used in the model as absence locations. | Long |
Presence Probability Transformation (Link Function)
(Optional) | Specifies the function that will convert the unbounded outputs of the model to a number between 0 and 1. This value can be interpreted as the probability of presence at the location. Each option converts the same continuous value to a different probability.
| String |
Presence Probability Cutoff
(Optional) | A cutoff value between 0.01 and 0.99 that establishes which probabilities correspond with presence in the resulting classification. The cutoff value is used to help evaluate the model's performance using training data and known presence points. Classification diagnostics are provided in geoprocessing messages and in the output trained features. | Double |
Output Trained Features
(Optional) | An output feature class that will contain all features and explanatory variables used in the training of the model. | Feature Class |
Output Trained Raster
(Optional) | The output raster with cell values indicating the probability of presence using the selected link function. The default cell size is the maximum of the cell sizes of the explanatory training rasters. An output trained raster can only be created if the input point features do not contain background points. | Raster Dataset |
Output Response Curve Table
(Optional) | The output table that will contain diagnostics from the training model that indicate the effect of each explanatory variable on the probability of presence after accounting for the average effects of all other explanatory variables in the model. The table will have up to two derived charts of partial dependence plots: one set of line charts for continuous variables and one set of bar charts for categorical variables. | Table |
Output Sensitivity Table
(Optional) | The output table that will contain diagnostics of training model accuracy as the probability presence cutoff changes from 0 to 1. | Table |
Input Prediction Features
(Optional) | The feature class representing locations where predictions will be made. The feature class must contain any provided explanatory variable fields that were used from the input point features. When using spatial thinning, you can use the original input point features as input prediction features to receive a prediction for the entire dataset. | Feature Layer |
Output Prediction Features
(Optional) | The output feature class that will contain the results of the prediction model applied to the input prediction features. | Feature Class |
Output Prediction Raster
(Optional) | The output raster containing the prediction results at each cell of the matched explanatory rasters. The default cell size is the maximum of the cell sizes of the explanatory training rasters. | Raster Dataset |
Match Explanatory Variables
(Optional) | The matching explanatory variable fields for the input point features and input prediction features. | Value Table |
Match Distance Features
(Optional) |
The matching distance features for the training and prediction. | Value Table |
Match Explanatory Rasters
(Optional) | The matching rasters for the training and prediction. | Value Table |
Allow Predictions Outside of Data Ranges (Optional) | Specifies whether the prediction will allow extrapolation when explanatory variable values are out of the range of values used in training.
| Boolean |
Resampling Scheme
(Optional) | Specifies the method that will be used to perform cross validation of the prediction model. Cross validation excludes a portion of the data during training of the model and uses it to test the model's performance after it is trained.
| String |
Number of Groups
(Optional) | The number of groups that will be used in cross validation for the random resampling scheme. A field in the output trained features indicates the group that each point was assigned to. The default is 3. A minimum of 2 groups and a maximum of 10 groups are allowed. | Long |
Output Trained Model File
(Optional) | An output model file that will save the trained model, which can be used later for prediction. | File |
Summary
Models the presence of a phenomenon given known presence locations and explanatory variables using a maximum entropy approach (MaxEnt). The tool provides output features and rasters that include the probability of presence and can be applied to problems in which only presence is known and absence is not known.
Learn more about how Presence-only Prediction (MaxEnt) works
Illustration
Usage
The tool works with three primary inputs to create a presence prediction model: known presence locations, a study area where presence is possible, and explanatory variables.
- The Input Point Features parameter value is used to designate known presence locations of a phenomenon of interest.
- The study area is characterized by background points. Background points are locations distributed across the study area where presence of the phenomenon of interest may be possible but unknown. These can be automatically created by the tool or manually included with the input point features by checking the Contains Background Points parameter.
- The tool accepts explanatory variables in the form of rasters, fields, and distance features.
The tool can be run in two modes that are specified by the Contains Background Points parameter:
- Unchecked—The tool will run with presence-only points and only accept explanatory variables from raster sources.
- Checked—The tool will run with presence and background points and allow explanatory variable sources to include rasters, fields in the input point features, and distance features.
An ArcGIS Spatial Analyst extension license is required to use rasters as inputs to or outputs from the tool.
The Output Trained Model File parameter can be used to save the trained model results as a reusable file. The Predict Using Spatial Statistics Model File tool can be used to predict to new features using the model file.
The tool requires at least two presence points in the input point features to create a model. If the input features contain background points, the tool also requires at least two background points to create a model.
The Explanatory Training Distance Features parameter is inactive when the Contains Background Points parameter is unchecked. To include distances to features as explanatory variables for presence-only data, distance rasters can be calculated using the Distance Accumulation tool, and the distance rasters can be included in the Explanatory Training Rasters parameter.
The spatial resolution of the Explanatory Training Rasters parameter values is important in the following ways:
- The cell sizes have a significant impact on processing time. The higher the raster resolution, the longer the processing time.
- The tool will use cell centroids of the rasters to generate background points when using presence-only data (the Contains Background Points parameter is unchecked). The proportion of background points to presence points impacts the model; it is recommended that you consider the cell size of the rasters and investigate the resulting background points using the Output Trained Features parameter to ensure that assumptions about the study area are appropriate for your question.
Note:
You can use the Resample tool to decrease the spatial resolution of explanatory training rasters.
The defined study area, whether from the Study Area parameter or from the locations of input point features that include background points, contributes to the model’s outcome. The extent used will determine which raster cells are used as background points. This establishes the environment conditions that are compared with presence conditions and establish a relative occurrence rate, which affects the prediction results.
Use the Relative Weight of Presence to Background parameter to specify the meaning of background points. Use a value of 100 when background points represent locations with unknown presence. Use a value of 1 when background points represent locations with observed absence.
- The value affects how the model operates and the tool’s resulting predictions. When the value is close to 100, the model penalizes each misclassified presence point 100 times more than each misclassified background point (assuming that the correct classification of background is absence) and the traditional MaxEnt approach is applied. When the value is 1, the model penalizes each presence and background point equally and is similar to logistic regression.
- A value of 1 should be used cautiously when using presence-only mode (the Contains Background Points parameter is unchecked), since the tool generates background points that are treated as absence and weighted equally to provided presence points.
Sampling bias is inherent to most presence data and impacts the results of the analysis. You can use the Spatial Thinning parameter to help reduce this impact. However, while spatial thinning is a useful remediation to reduce the effects of sampling bias, it is recommended that you use data from structured surveys to further minimize the impact of sampling bias.
Classification diagnostics are available from geoprocessing messages and from the Classification Result Percentages chart that is provided with the resulting layer from the Output Trained Features parameter value. The chart displays a comparison of the observed and predicted classifications and you can use it to assess the model’s ability to predict performance on known presence points. For example, you can assess the model’s ability to predict presence by focusing on the portion of misclassified presence points in the training input point features. In use cases in which presence prediction on background points is important, you can also use the chart to view and select the background points that are predicted to have presence.
You can use the tool in two ways. You can focus on training and evaluating candidate models, or you can focus on predicting presence probabilities across a new dataset.
- Training and evaluating candidate models—Run the tool without specifying outputs to evaluate the model diagnostics included in geoprocessing messages. Once the diagnostic results seem appropriate, specify an Output Trained Features parameter value and use the classification diagnostic charts to further evaluate prediction performance across the training data. The charts included in the Output Sensitivity Table and Output Response Curve Table parameter values are diagnostic metrics for the training data and will also be useful as you adjust and find an appropriate model.
- Prediction—Specify the parameters in the Prediction Outputs parameter category to apply the model to new locations that are not part of the training data. The Input Prediction Features and the resulting Output Prediction Features parameter values represent new point locations where a prediction is needed. In addition to point features, a prediction surface can be created by specifying an Output Prediction Raster parameter value. Prediction features and prediction rasters must be used in conjunction with matched explanatory variables in the same form that was used in the training data (raster, fields, or distance features).
Spatial thinning can result in the training data not including all the input point features. To test the model’s performance across all points when spatial thinning is used, provide the same feature class for the Input Point Features and Input Prediction Features parameters.
The tool specifies coordinate systems for outputs by honoring the coordinate system of a feature dataset used in the output path. Otherwise, the tool will use the coordinate system specified in the Output Coordinate System environment. If you don't specify a feature dataset or an environment setting, the tool uses the following approaches for each output:
- For the Output Training Features and Output Training Raster parameter values, the tool uses the coordinate system of the Input Point Features parameter value.
- For the Output Prediction Features parameter value, the tool uses the coordinate system of the Input Prediction Features parameter value.
- For the Output Prediction Raster parameter value, the tool uses the coordinate system defined by the Output Prediction Features parameter value. If the output prediction features are not specified, the tool uses the coordinate system of the first raster provided in the Match Explanatory Rasters parameter.
The Explanatory Variable Expansions (Basis Functions) parameter options have restrictions. The Smoothed step (Hinge) and Discrete step (Threshold) options are mutually exclusive; when one is selected the other one cannot be selected. When an explanatory variable is specified as Categorical, only the Original (Linear) option will be used.
When the Resampling Scheme parameter is set to Random, the tool will group the data and validate the model's performance on a subset of the grouped data. Each training group is subject to the same data requirements of the broader model: at least two presence and two background points are required. If these requirements are not fulfilled after 10 attempts, the tool will stop attempting to cross-validate and warn that cross-validation was not possible.
Parameters
arcpy.stats.PresenceOnlyPrediction(input_point_features, {contains_background}, {presence_indicator_field}, {explanatory_variables}, {distance_features}, {explanatory_rasters}, {basis_expansion_functions}, {number_knots}, {study_area_type}, {study_area_polygon}, {spatial_thinning}, {thinning_distance_band}, {number_of_iterations}, {relative_weight}, {link_function}, {presence_probability_cutoff}, {output_trained_features}, {output_trained_raster}, {output_response_curve_table}, {output_sensitivity_table}, {features_to_predict}, {output_pred_features}, {output_pred_raster}, {explanatory_variable_matching}, {explanatory_distance_matching}, {explanatory_rasters_matching}, {allow_predictions_outside_of_data_ranges}, {resampling_scheme}, {number_of_groups}, {output_trained_model})
Name | Explanation | Data Type |
input_point_features | The point features representing locations where presence of a phenomenon of interest is known to occur. | Feature Layer |
contains_background (Optional) | Specifies whether the input point features contain background points. If the input points do not contain background points, the tool will generate background points using cells in the explanatory training rasters. The tool uses background points to model the characteristics of the landscape in unknown locations and compare them to landscape characteristics in known presence locations. Therefore, background points can be considered as the study area. Generally, these are locations where presence of a phenomenon of interest is unknown. However, if any information is known about the background points, the relative_weight parameter can be used to indicate this.
| Boolean |
presence_indicator_field (Optional) | The field from the input point features containing binary values that indicate each point as presence (1) or background (0). The field must be numeric (Short, Long, Float, or Double types). | Field |
explanatory_variables [[Variable, Categorical],...] (Optional) | A list of fields representing the explanatory variables that will help predict the probability of presence. You can specify whether each variable is categorical or numeric. Specify the CATEGORICAL option for each variable that represents a class or category (such as land cover). | Value Table |
distance_features [distance_features,...] (Optional) | A list of feature layers or feature classes that will be used to automatically create explanatory variables that represent the distance from the input point features to the nearest provided distance features. If the input explanatory training distance features are polygons or lines, the distance attributes are calculated as the distance between the closest segment and the point. | Feature Layer |
explanatory_rasters [[Variable, Categorical],...] (Optional) | A list of rasters that will be used to automatically create explanatory training variables in the model whose values are extracted from rasters. For each feature (presence and background points) in the input point features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. You can specify whether each raster value is categorical or numeric. Specify the CATEGORICAL option for each raster that represents a class or category (such as land cover). | Value Table |
basis_expansion_functions [basis_expansion_functions,...] (Optional) | Specifies the basis function that will be used to transform the provided explanatory variables for use in the model. If multiple basis functions are selected, the tool will produce multiple transformed variables and attempt to use them in the model.
| String |
number_knots (Optional) | The number of knots that will be used by the hinge and threshold explanatory variable expansions. The value controls how many thresholds are created, which are used to create multiple explanatory variable expansions using each threshold. The value must be between 2 and 50. The default is 10. | Long |
study_area_type (Optional) | Specifies the type of study area that will be used to define where presence is possible when the input point features do not contain background points.
| String |
study_area_polygon (Optional) | A feature class containing the polygons that define a custom study area. The input point features must be located within the custom study area covered by the polygon features. A study area can be composed of multiple polygons. | Feature Layer |
spatial_thinning (Optional) | Specifies whether spatial thinning will be applied to presence and background points before training the model. Spatial thinning helps to reduce sampling bias by removing points and ensuring that remaining points have a minimum nearest-neighbor distance, set in the thinning_distance_bandparameter. Spatial thinning is also applied to background points whether they are provided in input point features or generated by the tool.
| Boolean |
thinning_distance_band (Optional) | The minimum distance between any two presence points or any two background points when spatial thinning is applied. | Linear Unit |
number_of_iterations (Optional) | The number of runs that will be used to find the optimal spatial thinning solution, seeking to maintain as many presence and background points as possible while ensuring that no two presence or two background points are within the specified thinning_distance_band parameter value. The minimum possible is 1 iteration and the maximum possible is 50 iterations. The default is 10. This parameter is only applicable for spatial thinning applied to presence and background points in the input point features. Spatial thinning that is applied to background points generated from raster cells undergo spatial thinning by resampling the raster cells to the specified thinning_distance_band parameter value, without needing to iterate for an optimal solution. | Long |
relative_weight (Optional) | A value between 1 and 100 that specifies the relative information weight of presence points to background points. The default is 100. A higher value indicates that presence points are the primary source of information; it is unknown whether background points represent presence or absence and background points receive lower weight in the model. A lower value indicates that background points also contribute valuable information that can be used in conjunction with presence points; there is greater confidence that background points represent absence and their information can be used in the model as absence locations. | Long |
link_function (Optional) | Specifies the function that will convert the unbounded outputs of the model to a number between 0 and 1. This value can be interpreted as the probability of presence at the location. Each option converts the same continuous value to a different probability.
| String |
presence_probability_cutoff (Optional) | A cutoff value between 0.01 and 0.99 that establishes which probabilities correspond with presence in the resulting classification. The cutoff value is used to help evaluate the model's performance using training data and known presence points. Classification diagnostics are provided in geoprocessing messages and in the output trained features. | Double |
output_trained_features (Optional) | An output feature class that will contain all features and explanatory variables used in the training of the model. | Feature Class |
output_trained_raster (Optional) | The output raster with cell values indicating the probability of presence using the selected link function. The default cell size is the maximum of the cell sizes of the explanatory training rasters. An output trained raster can only be created if the input point features do not contain background points. | Raster Dataset |
output_response_curve_table (Optional) | The output table that will contain diagnostics from the training model that indicate the effect of each explanatory variable on the probability of presence after accounting for the average effects of all other explanatory variables in the model. The table will have up to two derived charts of partial dependence plots: one set of line charts for continuous variables and one set of bar charts for categorical variables. | Table |
output_sensitivity_table (Optional) | The output table that will contain diagnostics of training model accuracy as the probability presence cutoff changes from 0 to 1. | Table |
features_to_predict (Optional) | The feature class representing locations where predictions will be made. The feature class must contain any provided explanatory variable fields that were used from the input point features. When using spatial thinning, you can use the original input point features as input prediction features to receive a prediction for the entire dataset. | Feature Layer |
output_pred_features (Optional) | The output feature class that will contain the results of the prediction model applied to the input prediction features. | Feature Class |
output_pred_raster (Optional) | The output raster containing the prediction results at each cell of the matched explanatory rasters. The default cell size is the maximum of the cell sizes of the explanatory training rasters. | Raster Dataset |
explanatory_variable_matching [[Prediction, Training],...] (Optional) | The matching explanatory variable fields for the input point features and input prediction features. | Value Table |
explanatory_distance_matching [[Prediction, Training],...] (Optional) |
The matching distance features for the training and prediction. | Value Table |
explanatory_rasters_matching [[Prediction, Training],...] (Optional) | The matching rasters for the training and prediction. | Value Table |
allow_predictions_outside_of_data_ranges (Optional) |
| Boolean |
resampling_scheme (Optional) | Specifies the method that will be used to perform cross validation of the prediction model. Cross validation excludes a portion of the data during training of the model and uses it to test the model's performance after it is trained.
| String |
number_of_groups (Optional) | The number of groups that will be used in cross validation for the random resampling scheme. A field in the output trained features indicates the group that each point was assigned to. The default is 3. A minimum of 2 groups and a maximum of 10 groups are allowed. | Long |
output_trained_model (Optional) | An output model file that will save the trained model, which can be used later for prediction. | File |
Code sample
The following Python script demonstrates how to use the PresenceOnlyPrediction function.
# Import system modules
import arcpy
# Call Presence-only Prediction (MaxEnt)
arcpy.stats.PresenceOnlyPrediction(
input_point_features=r"C:\MyData.gdb\Presence_Points",
contains_background="PRESENCE_ONLY_POINTS",
presence_indicator_field=None,
explanatory_variables=None,
distance_features=None,
explanatory_rasters=[[r"C:\MyData.gdb\Elevation", "false"],
[r"C:\MyData.gdb\Canopy", "false"],
[r"C:\MyData.gdb\ClimacticWaterDeficit", "false"],
[r"C:\MyData.gdb\LandCoverClassification", "true"],
[r"C:\MyData.gdb\UpperSlope", "false"],
[r"C:\MyData.gdb\LowerSlope", "false"]],
basis_expansion_functions="LINEAR;QUADRATIC;PRODUCT;HINGE",
number_knots=10,
study_area_type="CONVEX_HULL",
study_area_polygon=None,
spatial_thinning="THINNING",
thinning_distance_band="500 Meters",
number_of_iterations=10
relative_weight=100
link_function="CLOGLOG"
presence_probability_cutoff=0.5
output_trained_features=r"C:\MyData.gdb\Out_Trained_Features"
output_trained_raster=r"C:\MyData.gdb\Out_Trained_Raster"
output_response_curve_table=r"C:\MyData.gdb\Out_Response_Curve_Table"
output_sensitivity_table=r"C:\MyData.gdb\Out_Sensitivity_Table"
features_to_predict=r"C:\MyData.gdb\In_Prediction_Features"
output_pred_features=r"C:\MyData.gdb\Out_Prediction_Features"
output_pred_raster=r"C:\MyData.gdb\Out_Prediction_Raster",
explanatory_variable_matching=None
explanatory_distance_matching=None
explanatory_rasters_matching=[[r"C:\MyData.gdb\Prediction_Elevation", "false"],
[r"C:\MyData.gdb\Prediction_Canopy", "false"],
[r"C:\MyData.gdb\Prediction_ClimacticWaterDeficit", "false"],
[r"C:\MyData.gdb\Prediction_LandCoverClassification", "true"],
[r"C:\MyData.gdb\Prediction_UpperSlope", "false"],
[r"C:\MyData.gdb\Prediction_LowerSlope", "false"]],
allow_predictions_outside_of_data_ranges="ALLOWED"
resampling_scheme="RANDOM"
number_of_groups=3)
The following Python script demonstrates how to use the PresenceOnlyPrediction function.
# This example is a simple run of the tool using presence-only points and
# explanatory training rasters to train an initial model. No outputs are
# specified, as the intent is to interrogate geoprocessing messages to gain
# an initial sense of model performance.
# Import system modules
import arcpy
try:
# Set the workspace and overwrite properties
arcpy.env.workspace = r"C:\MyData.gdb"
arcpy.env.overwriteOutput = True
# Set the input point feature parameters
in_point_features = "presence_observations"
contains_background = "PRESENCE_ONLY_POINTS”
# Set the explanatory Training variables, using only explanatory rasters
# Note the categorical setting for the LandCoverClassification raster
explanatory_rasters = [["Elevation", "false"],
["Canopy", "false"],
["ClimacticWaterDeficit", "false"],
["LandCoverClassification", "true"],
["UpperSlope", "false"],
["LowerSlope", "false"]]
# Set basis functions, adding quadratic to use the square of each variable
basis_functions = "LINEAR;QUADRATIC"
number_knots = 10
# Set the study area
study_area_type = "CONVEX_HULL"
study_area_polygon = None
# Set cross-validation options
resampling_scheme = "RANDOM"
number_of_groups = 3
# Call the tool using the parameters defined above.
arcpy.stats.PresenceOnlyPrediction(
input_point_features=in_point_features,
contains_background=contains_background,
explanatory_rasters=explanatory_rasters,
basis_expansion_functions=basis_functions,
study_area_type=study_area_type,
resampling_scheme=resampling_scheme,
number_of_groups=number_of_groups)
The following Python script demonstrates how to use the PresenceOnlyPrediction function.
# This example uses presence and background points and explanatory
# variables from rasters, fields, and distance features to train a
# model, using additional parameters to apply basis functions, use
# spatial thinning, perform cross-validation, and receive diagnostic
# training outputs.
# Import system modules
import arcpy
try:
# Set the workspace and overwrite properties
arcpy.env.workspace = r"C:\MyData.gdb"
arcpy.env.overwriteOutput = True
### MODEL INPUTS ###
# Set the input point feature parameters
in_point_features = "presence_observations"
contains_background = "PRESENCE_AND_BACKGROUND_POINTS
presence_indicator_field = "Presence"
# Set the explanatory Training variables
explanatory_fields = [["Survey_Region", "true"],
["Temperature", "false"],
["Humidity", "false"]]
explanatory_rasters = [["Elevation", "false"],
["Canopy", "false"],
["ClimacticWaterDeficit", "false"],
["LandCoverClassification", "true"],
["UpperSlope", "false"],
["LowerSlope", "false"]]
explanatory_dist_features = [["Streams", "false"],
["Lakes", "false"],
["Roads", "false"]]
### MODEL CONFIGURATION ###
# Set basis functions
basis_functions = "LINEAR;QUADRATIC;PRODUCT;HINGE"
number_knots = 10
# Set the study area
study_area_type = "CONVEX_HULL"
study_area_polygon = None
# Set spatial thinning
spatial_thinning = "THINNING"
min_nearest_neighbor_distance = "500 Meters"
number_of_iterations = 10
# Set the relative weight of presence to background and link function, using
# background points as observed absence
relative_weight = 1
link_function = "LOGISTIC"
# Set the presence probability cutoff
cutoff = 0.3
### MODEL OUTPUTS AND VALIDATION ###
# Set training outputs for model evaluation
out_trained_features = "Out_Trained_Features"
out_trained_raster = "Out_Trained_Raster"
out_response_curve_table = "Out_Response_Curves"
out_sensitivity_table = "Out_Sensitivity_Table"
# Set cross-validation options
resampling_scheme = "RANDOM"
number_of_groups = 3
# Call the tool using the parameters defined above.
arcpy.stats.PresenceOnlyPrediction(
input_point_features=in_point_features,
contains_background=contains_background,
explanatory_variables=explanatory_fields,
explanatory_rasters=explanatory_rasters,
distance_features=explanatory_dist_features,
basis_expansion_functions=basis_functions,
number_knots=number_knots,
study_area_type=study_area_type,
spatial_thinning=spatial_thinning,
thinning_distance_band=min_nearest_neighbor_distance,
number_of_iterations=number_of_iterations,
relative_weight=relative_weight,
link_function=link_function,
presence_probability_cutoff=cutoff,
output_trained_features=out_trained_features,
output_trained_raster=out_trained_raster,
output_response_curve_table=out_response_curve_table,
output_sensitivity_table=out_sensitivity_table,
resampling_scheme=resampling_scheme,
number_of_groups=number_of_groups)
Environments
Special cases
- Parallel Processing Factor
Parallel processing is only used when making predictions.
- Random number generator
The Mersenne Twister random number generator is always used.