The Find Outliers tool identifies statistically significant hot spots, cold spots, and spatial outliers using the Anselin Local Moran's I statistic.
Learn more about how Find Outliers works
A police precinct would like to identify the areas in its precinct with consistently higher burglaries. The precinct uses the Find Outliers tool to identify the streets that are hot spots and outliers with high values. Police officers use the results to design prevention strategies, allocate their sparse resources, and initiate neighborhood watch programs.
Find Outliers includes configurations for input features, outlier settings, and the result layer.
The Input features group includes the following parameter:
- Input layer specifies the point or polygon layer on which cluster and outlier analysis will be performed.
The Outlier settings group includes the following parameters:
- Variable type determines whether cluster and outlier analysis is performed on the feature counts or values. The options are as follows:
- Field—Cluster and outlier analysis will be applied to the values of the field specified by the Analysis field parameter.
- Point counts—Point features will be aggregated into polygons or cells and counted. Cluster and outlier analysis will be applied to the aggregated point counts. This option is available when Input layer is a point layer.
- Aggregation shape type specifies the shape of the cells within which the point features will be aggregated. This parameter is available when Variable type is Point counts. The following shape options are available:
- Fishnet cells—Point features will be aggregated within fishnet (square) cells.
- Hexagon cells—Point features will be aggregated within hexagon cells.
- Polygon layer—Point features will be aggregated within polygon features specified by Aggregation polygon layer.
- Aggregation polygon layer specifies the layer that contains the polygon features within which the points will be aggregated. This parameter is available when Aggregation shape type is Polygon layer.
- Define where points are possible specifies the layer that will define the extent of the cluster and outlier analysis. Points that fall outside of the bounds of the layer will not be included in the cluster and outlier analysis. This parameter is available when Aggregation shape type is either Fishnet cells or Hexagon cells.
- Analysis field specifies the field that will be analyzed to determine outliers. This parameter is available when Variable type is Field.
- Divided by field specifies the field that will be used to divide the values selected by the Analysis field, if Variable type is Field, or the aggregated point counts, if Variable type is Point counts.
- Optimization option specifies whether the number of permutations that will be selected to optimize the performance of the tool (Speed), the precision of the pseudo p-value (Precision), or both (Balance). The features in a target feature's neighborhood will be permuted to evaluate the observed Local Moran's I value and to determine the likelihood of finding the observed spatial distribution around a target feature. A permutation will randomly rearrange the features in a target feature's neighborhood then calculate a Local Moran's I value. Several permutations will result in a distribution of Local Moran's I values for a target feature. The pseudo p-value is then calculated by comparing the observed Local Moran's I value to the distribution of Local Moran's I values. The following optimization options are available:
- Speed—Runs 199 permutations to optimize the speed at which the tool runs. The smallest possible pseudo p-value is 0.005.
- Balance—Runs 499 permutations to optimize both speed and precision. The smallest possible pseudo p-value is 0.002.
- Precision—Runs 999 permutations to optimize the precision of the pseudo p-value. The smallest possible pseudo p-value is 0.001.
- Cell size is a numeric value that defines the length of a side of each cell specified by Aggregation shape type.
- Cell size unit specifies the units of the Cell size value.
- Distance band is a numeric value that defines the distance from a target feature that will be included in a target feature's neighborhood. All of the features that fall within the distance band will be included in the target feature's neighborhood. The entire neighborhood will be used to determine whether the target feature is part of a cluster with high or low values and whether the feature is an outlier.
- Distance band unit specifies the units of the Distance band value.
The Result layer group includes the following parameters:
- Output name determines the name of the layer that is created and added to the map. The name must be unique. If a layer with the same name already exists in your organization, the tool will fail and you will be prompted to use a different name.
- Save in folder specifies the name of a folder in My Content where the result is saved.
The following limitations apply to the tool:
- If Variable type is Point counts, the following limitations apply:
- The input layer must contain at least 60 point features.
- At a minimum, 30 aggregation cells or polygons must contain at least one point feature.
- The point counts within the aggregation cells or polygons cannot be identical. There must be variation in the point counts between aggregation cells or polygons.
- If Variable type is Analysis field, the following limitations apply:
- At a minimum, 30 features must contain non-null values in the specified analysis field.
- The values in the specified analysis field cannot be identical. There must be variation in the values.
- At a minimum, 30 points must fall within the bounding area specified by Define where points are possible.
- The Cell size value cannot exceed the Distance band value.
Analysis environment settings are additional parameters that affect a tool's results. You can access the tool's analysis environment settings from the Environment settings parameter group.
This tool honors the following analysis environments:
This tool consumes credits.
Use Estimate credits to calculate the number of credits that will be required to run the tool. For more information, see Understand credits for spatial analysis.
The tool outputs a polygon layer with the results of the cluster and outlier analysis. The layer includes a field for the count, cluster-outlier type, Local Moran's I value, p-value, z-score, number of neighbors, spatial lag, and z-transform of each feature. The cluster-outlier type field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), high value outlier surrounded by low values (HL), and low value outlier surround by high values (LH). The Local Moran's I value indicates whether the feature and its neighborhood have similar (positive) or dissimilar (negative) values. Outlier points will have a negative Local Moran's Index.
This tool requires the following licensing and configurations:
- Creator or GIS Professional user type
- Publisher, Facilitator, or Administrator role, or an equivalent custom role
Use the following resources to learn more:
- Optimized Outlier Analysis in ArcGIS Pro
- Cluster and Outlier Analysis (Anselin Local Moran's I) in ArcGIS Pro
- Find Outliers in ArcGIS REST API
- find_outliers in ArcGIS API for Python