Label | Explanation | Data Type |
Input Features
| The point features that will be used to build the spatial outlier detection model. Each point will be classified as an outlier or inlier based on its local outlier factor. | Feature Layer |
Output Features
|
The output feature class containing the local outlier factor for each input feature as well as an indicator of whether the point is a spatial outlier. | Feature Class |
Number of Neighbors
(Optional) | The number of neighbors that will be used to detect spatial outliers for each input point. For local outlier detection, the value must be at least 2, and all features within the neighborhood will be used as neighbors. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. For global outlier detection, only the farthest neighbor in the neighborhood will be used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point. | Long |
Percent of Locations Considered Outliers
(Optional) | The percent of locations that will be identified as spatial outliers by defining the threshold of the local outlier factor. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. A maximum of 50 percent of the features can be identified as spatial outliers. | Double |
Output Prediction Raster
(Optional) | The output raster containing the local outlier factors at each cell, which is calculated based on the spatial distribution of the input features. This parameter is only available with a Desktop Advanced license. | Raster Dataset |
Outlier Type
(Optional) | Specifies the type of outlier that will be detected. A global outlier is a point that is far away from all other points in the feature class. A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area.
| String |
Detection Sensitivity
(Optional) | Specifies the sensitivity level that will be used to detect global outliers. The higher the sensitivity, the more points that will be detected as outliers. The sensitivity value will determine the threshold, and any point with a neighbor distance larger than this threshold will be identified as a global outlier. The thresholds are determined using the box plot rule, in which the threshold for high sensitivity is one interquartile range above the third quartile. For medium sensitivity, the threshold is 1.5 interquartile ranges above the third quartile. For low sensitivity, the threshold is two interquartile ranges above the third quartile.
| String |
Keep Only Spatial Outliers (Optional) | Specifies whether the output features will contain all input features or only features identified as spatial outliers.
| Boolean |
Summary
Identifies global or local spatial outliers in point features.
A global outlier is a point that is far away from all other points in a feature class. Global outliers are detected by examining distances between each point and one of its closest neighbors (by default, the closest neighbor) and detecting points where the distance is large.
A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area. Local outliers are detected by calculating the local outlier factor (LOF) of each feature. The LOF is a measure that describes how isolated a location is compared to its local neighbors. A higher LOF value indicates greater isolation. The tool can also be used to produce a raster prediction surface that can be used to estimate whether new features will be classified as outliers based on the spatial distribution of the data.
Illustration
Usage
This tool identifies points supplied in the Input Features parameter as either spatial outliers or spatial inliers. The Keep Only Spatial Outliers parameter can be used to only return points identified as outliers.
The tool uses a local neighborhood around each feature, specified in the Number of Neighbors parameter. For local outlier detection, all points within the neighborhood are used, and the default is estimated by the tool at run time. For global outlier detection, only the farthest neighbor in the neighborhood is used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point.
For local outlier detection, the Percent of Locations Considered Outliers parameter is used to establish a threshold for the LOF to designate each point feature as an outlier or inlier.
Note:
Small differences in values for the Percent of Locations Considered Outliers parameter may result in the same count of output features designated as outliers. This can occur when similarities in spatial distribution for features result in the same LOF value for multiple features.
The output layer include two charts. The first is a bar chart that displays counts of outliers and inliers. The second chart is a histogram. For local outlier detection, the histogram displays the distribution of LOF values for all point features and the LOF threshold used to determine whether a feature is an outlier or an inlier. For global outlier detection, the histogram shows the distribution of neighbor distances and the associated threshold.
If the Input Features parameter value has a z-coordinate, the tool will honor the 3D nature of the data by detecting spatial outliers in 3D space. When added to a scene view, the output features display in 3D to visualize the 3D spatial outliers. If the unit (for example, meters) of z-coordinate is not defined in a vertical coordinate system, the unit is assumed to be the same as the x,y coordinates.
The Output Prediction Raster parameter is an optional output that displays the values used to determine whether each cell is an outlier as a continuous surface across the study area. For local outlier detection, the raster contains the LOF value calculated for the cell. For global outlier detection, the raster contains the distance to the nearest neighbor. The output can be used to determine whether future observations are outliers without needing to recalculate the value of the new point. The output can only be created for 2D input features.
Note:
The neighbor distances and LOF values of the points will not match the values of the raster cells under each point, even if the points coincide with a cell center of the raster. This is because the feature does not use itself as a neighbor, but the raster cell does use the feature as a neighbor, so each calculation uses different neighbors and produces a different value.
For more information about the local outlier factor and optimizing parameters, see the following references:
- Breunig, M. M., Kriegel, H. P., Ng, R. T., Sander, J. (2000). "LOF: identifying density-based local outliers." Proceedings of the 2000 ACM SIGMOD international conference on Management of data. (pp. 93-104).
- Xu, Z., Kakde, D., Chaudhuri, A. (2019). "Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection." 2019 IEEE International Conference on Big Data. (pp. 4201-4207).
Parameters
arcpy.stats.SpatialOutlierDetection(in_features, output_features, {n_neighbors}, {percent_outlier}, {output_raster}, {outlier_type}, {sensitivity}, {keep_type})
Name | Explanation | Data Type |
in_features | The point features that will be used to build the spatial outlier detection model. Each point will be classified as an outlier or inlier based on its local outlier factor. | Feature Layer |
output_features |
The output feature class containing the local outlier factor for each input feature as well as an indicator of whether the point is a spatial outlier. | Feature Class |
n_neighbors (Optional) | The number of neighbors that will be used to detect spatial outliers for each input point. For local outlier detection, the value must be at least 2, and all features within the neighborhood will be used as neighbors. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. For global outlier detection, only the farthest neighbor in the neighborhood will be used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point. | Long |
percent_outlier (Optional) | The percent of locations that will be identified as spatial outliers by defining the threshold of the local outlier factor. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. A maximum of 50 percent of the features can be identified as spatial outliers. | Double |
output_raster (Optional) | The output raster containing the local outlier factors at each cell, which is calculated based on the spatial distribution of the input features. This parameter is only available with a Desktop Advanced license. | Raster Dataset |
outlier_type (Optional) | Specifies the type of outlier that will be detected. A global outlier is a point that is far away from all other points in the feature class. A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area.
| String |
sensitivity (Optional) | Specifies the sensitivity level that will be used to detect global outliers. The higher the sensitivity, the more points that will be detected as outliers. The sensitivity value will determine the threshold, and any point with a neighbor distance larger than this threshold will be identified as a global outlier. The thresholds are determined using the box plot rule, in which the threshold for high sensitivity is one interquartile range above the third quartile. For medium sensitivity, the threshold is 1.5 interquartile ranges above the third quartile. For low sensitivity, the threshold is two interquartile ranges above the third quartile.
| String |
keep_type (Optional) | Specifies whether the output features will contain all input features or only features identified as spatial outliers.
| Boolean |
Code sample
The following Python window script demonstrates how to use the SpatialOutlierDetection function.
arcpy.stats.SpatialOutlierDetection("Transaction_Locations",
"Transactions_SpatialOutliers", 20, 5,
"Transactions_OutliersPredictionSurface")
The following stand-alone Python script demonstrates how to use the SpatialOutlierDetection function.
# Import system modules.
import arcpy
try:
# Set the workspace and input features.
arcpy.env.workspace = 'C:\\SpatialOutlierDetection\\MyData.gdb'
inputFeatures = "PM25_AirQualityStations"
# Set the name of the output features
outputFeatures = "AirQualityStations_SpatialOutliers"
# Set the number of neighbors
numberNeighbors = 8
# Set the percentage of locations considered outliers
pcntLocationsAsOutliers = 10
# Set the output prediction raster
outputPredictionRaster = airQualityStations_OutPredictionRaster
# Run the Spatial Outlier Detection tool
arcpy.stats.SpatialOutlierDetection(inputFeatures, outputFeatures,
numberNeighbors, pcntLocationsAsOutliers, outputPredictionRaster)
except arcpy.ExecuteError:
# If an error occurred when running the tool, print the error message.
print(arcpy.GetMessages())
Environments
Special cases
- Cell Size
This environment only impacts the output raster.
- Mask
This environment only impacts the output raster.
- Snap Raster
This environment only impacts the output raster.
- Extent
This environment only impacts the output raster.