The Find Point Clusters tool identifies clusters of point features from surrounding noise based on their spatial distribution.
Examples
Example uses of this tool include the following:
- An organization that studies a particular pest-borne disease wants to identify where in their study area to begin treatment and extermination of these pests. An analyst has a point dataset that represents the infested and noninfested households in the study area. The analyst uses the Find Point Clusters tool to find the largest cluster of infested households.
- A disaster response organization needs to determine where to deploy their resources for rescue and evacuation following a natural disaster. An analyst uses the Find Point Clusters tool to identify clusters of geolocated tweets that mention the event. The organization uses the size and location of the clusters to map the impacted area and inform their relief efforts.
Usage notes
The Find Point Clusters tool includes configurations for input features, cluster settings, and the result layer.
Input features
The Input features group includes the Input layer parameter, which is the layer with point features that will be grouped into clusters based on their spatial distribution.
For feature inputs, a count of features is displayed below the layer name. The count includes all features in the layer, except features that have been removed using a filter. Environment settings, such as Processing extent, are not reflected in the feature count.
Note:
Web Mercator is not an appropriate projection for spatial analysis. If the spatial reference system of the input layer is WGS 1984 Web Mercator (Auxiliary Sphere), the data will be converted to a geographic coordinate system to use chordal distances in the analysis.
Cluster settings
The Cluster settings group includes the following parameters:
- Clustering method specifies the method that will be used to identify clusters.
- Defined distance (DBSCAN)—Identifies clusters by searching within a specified search distance. This method is appropriate when all the meaningful clusters have similar densities.
- Self-adjusting (HDBSCAN)—Uses a range of distances to separate clusters of varying densities from sparser noise. This method is the most data-driven of the clustering methods so it does not need a search distance.
- Multi-scale (OPTICS)—Identifies clusters using the distance between neighbors and a reachability plot. The method first determines the minimum reachability distance for all the points. The minimum reachability distance is the distance from a point to its nearest neighbor that has not yet been visited by the search. Once the minimum reachability distance for all the points is determined, the tool constructs a reachability plot. The reachability plot plots each point's reachability order and its reachability distance revealing the clustering structure of the points. This method then uses the Cluster sensitivity value to identify clusters. Similar to the HDBSCAN method, the OPTICS method can identify clusters with varying densities.
- Minimum points per cluster is the minimum number of points that will be used to consider a grouping of points a cluster. In general, the smaller the value, the more clusters that will be detected. This value must be less than or equal to the number of points in the layer. The minimum value supported is 2.
- Search distance specifies the maximum distance around each point that will be considered. If the Clustering method value is Defined distance (DBSCAN), the Search distance value is the maximum distance around each point feature in the cluster to search for points that can be included in the cluster. If the minimum number of points can be found within the search distance of a particular point, that point is considered a core point. If the minimum number of points cannot be found within the search distance of a particular point but that point falls within the search distance of a core point, the point is considered a border point. Clusters will be composed of both core points and border points. If the Clustering method value is Multi-scale (OPTICS), the Search distance is the maximum distance around each point to search for points to assign a reachability distance. Reachability distance is the distance from a point to its nearest neighbor that has not yet been visited by the search. Points within the core distance of a point are assigned the core distance as their reachability distance. The core distance of a point is a measurement of the distance that is required to travel from each point to the defined minimum number of features.
- Search distance unit is the units for the Search distance value.
- Time field is the field from the input layer that contains a timestamp for each feature. This parameter is available if the Clustering method value is Defined distance (DBSCAN) or Multi-scale (Optics). If a Time field value is specified, you must also provide a Search distance and Search distance unit value.
- Search time interval is the time interval that will be used to determine whether features form a space-time cluster. The search time interval spans before and after the time of each feature; for example, a search time interval of 3 days around a feature will include all features starting 3 days before and ending 3 days after the time of the feature.
- Search time unit is the units for the Search time interval.
- Cluster sensitivity is how the shape (both slope and height) of peaks within the reachability plot will be used to separate clusters. The reachability plot plots the reachability order of the points and their reachability distance. A very high Cluster sensitivity value (close to 100) will treat even the smallest peaks in the reachability plot as a separation between clusters. A very low Cluster sensitivity value (close to 0) will treat only the steepest, highest peaks in the reachability plot as a separation between clusters. If left blank, the tool will find a sensitivity value using the Kullback-Leibler divergence.
Result layer
The Result layer group includes the following parameters:
- Output name specifies the name of the layer that is created and displayed. The name must be unique. If a layer with the same name already exists in your organization, the tool will fail and you will be prompted to use a different name.
- Save in folder specifies the name of a folder in My content where the result will be saved.
Environments
Analysis environment settings are additional parameters that affect a tool's results. You can access the tool's analysis environment settings from the Environment settings parameter group.
This tool honors the following analysis environments:
- Output coordinate system
- Processing extent
Note:
The default processing extent is Full extent. This default is different from Map Viewer Classic in which Use current map extent is enabled by default.
Credits
This tool consumes credits.
Use Estimate credits to calculate the number of credits that will be required to run the tool. For more information, see Understand credits for spatial analysis.
Outputs
The tool outputs a point layer. If the Clustering method parameter value is Self-adjusting (HDBSCAN) or Multi-scale (OPTICS), the tool will also output a chart. The output layer of all the Clustering method options will include Cluster ID, Source ID, and Color ID fields. The Cluster ID field identifies the cluster each point belongs to. Noise points will have a value of -1. The Source ID field value is a unique identifier. The Color ID field value represents the color assigned to a point and its cluster. If the output layer includes more than nine clusters, multiple clusters will be assigned to each color. However, neighboring clusters will be assigned different colors to keep them visually distinct. If the Clustering method parameter value is Self-adjusting (HDBSCAN), the output point layer will contain the following additional fields:
- Probability is a value between 0 and 1 that denotes the probability that a point belongs to its assigned cluster. Noise points will have a value of 0.
- Outlier is a value between 0 and 1 that denotes whether a point may be an outlier within its own cluster. The noise points will be considered a single cluster. A high value indicates that the point is more likely to be an outlier.
- Exemplar is a value between 0 and 1 that denotes whether a point is most representative of its cluster.
- Stability is a value that reflects the persistence of each cluster across a range of scales. A larger value indicates that the cluster persists over a wide range of distance scales.
If the Clustering method parameter value is Multi-scale (OPTICS), the output layer will include the following additional fields:
- Reachability order is how the input features were ordered for the analysis
- Reachability distance is the distance between each point and its closest unvisited neighbor.
If the Clustering method parameter value is Self-adjusting (HDBSCAN) or Multi-scale (OPTICS), the tool will output a chart. Multi-scale (OPTICS) outputs a reachability plot which can be used to evaluate the density of each cluster. Self-adjusting (HDBSCAN) outputs a distribution of membership probability chart, which displays the distribution of probability that a feature belongs to its assigned cluster. To view the chart, click Charts on the Contents toolbar.
If a Time field value is specified, the output will include a Time Span per Cluster chart displaying the time span of each space-time cluster. To view the chart, click Charts on the Contents toolbar. The output layer will also include the following fields that summarize the time span of the cluster that each point belongs to:
- Start Time is the start time of the cluster that the feature belongs to.
- End Time is the end time of the cluster that the feature belongs to.
- Mean Time is the mean time of the cluster that the feature belongs to.
- Time Exaggeration
You can use these fields to visualize the space-time clusters over time by enabling time on the layer then using the time slider.
To enable time on the layer in Map Viewer, click Layers on the Contents toolbar. Click the layer. On the Settings toolbar, click Properties. Expand the Information drop-down menu. Click the layer under Source Layer. This opens the layer's item page. On the Overview tab, click Edit under Time Settings. The Time Settings window appears. Check the Enable time check box. Choose specific events in time and choose Mean Time as the Time field.
To enable time on the layer in Scene Viewer, click Layer Manager on the Designer toolbar. Click Options and click Layer properties. Under Time, turn on the Layer visibility period toggle button to expand the time period. Under Start, provide a start date and time for the layer visibility. Under End, provide an end date and time for the layer visibility.
To view additional details about the analysis, open the History pane and find and click the successful tool run. The analysis details will open on the Results tab. The Results tab includes additional details about the analysis. You can also view the additional details on the layer's item page. Click the options button next to the output layer and click View details.
Note:
In ModelBuilder you can only view the additional details about the analysis on the output layer's item page.
Licensing requirements
This tool requires the following user type and configurations:
- Creator, Professional, or Professional Plus user type
- Publisher, Facilitator, or Administrator role, or an equivalent custom role
Resources
Use the following resources to learn more:
- Density-based Clustering in ArcGIS Pro
- Find Point Clusters in ArcGIS REST API
- find_point_clusters in ArcGIS API for Python