Find Point Clusters—ArcGIS Velocity

Tool icon Available in big data analytics.

The Find Point Clusters tool Find point clusters small tool finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.

Workflow diagram

Example

A nongovernmental organization is studying a particular pest-borne disease and has a point dataset representing households in a study area, some of which are infested, some of which are not. Using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.

Usage notes

Keep the following in mind when working with the Find Point Clusters tool:

The input for this tool is a single point layer.
All results will include a field named CLUSTER_ID that indicates which cluster each feature belongs to and a field named COLOR_ID that is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates that a feature has been labeled as noise.
The Clustering method parameter determines whether a defined distance or self-adjusting clustering algorithm will be used. DBSCAN identifies clusters of points that are in close proximity based on a specified search range. HDBSCAN finds clusters of points similar to DBSCAN but uses varying search ranges allowing for clusters with varying densities based on cluster probability (or stability).
- If DBSCAN is chosen, clusters can be found in either two-dimensional space only or in both space and time. If choosing to use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points in close proximity based on a specified search distance and search duration.
- HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.
If the DBSCAN clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:
- FEAT_TIME—The original instant time of each feature.
- START_DATETIME—The start time of the time extent of the cluster to which a feature belongs.
- END_DATETIME—The end time of the time extent of the cluster to which a feature belongs. The resulting layer's time will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring in most cases all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.
If the HDBSCAN clustering method is used, results will also include the following fields:
- PROB—The probability that a feature belongs in its assigned cluster.
- OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier.
- EXEMPLAR—Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
- STABILITY—The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales.
The Minimum features per cluster parameter is used differently depending on the Clustering method chosen:
- Defined distance (DBSCAN)—Specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the Search distance parameter. When using time to find clusters, an additional search duration is required and is set using the Search duration parameter. When searching for cluster members, the specified minimum features per cluster must be found within the specified search distance and search duration to form a cluster. Note that the search distance and duration are not related to the diameter or time extent of the point clusters discovered.
- Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Parameters

The following are the parameters for the Find Point Clusters tool:


Parameter	Description	Data type
Input Layer	The point features from which to find point clusters.	Features
Clustering method	The clustering method used by the tool to determine point clusters. The two options are as follows: DBSCAN—Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance that works well to define all clusters that may be present. This method results in clusters that have similar densities. This is the default. HDBSCAN—Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data-driven of the clustering methods and requires the least user input.	String
Minimum features per cluster	This parameter is used differently depending on the Clustering method chosen as follows: Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search distance parameter. Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.	Int64
Use Time	Whether to use time in identification of point clusters. This option is only available for the DBSCAN clustering method.	Boolean
Search distance	The maximum distance to be considered. The Minimum features per clusterspecified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.	Float64
Search duration	When searching for cluster members, specifies the minimum number of points that must be found within this time duration to form a cluster.	String

Output layer

The output layer generated will contain different fields depending upon the clustering method selected and whether time is used in the identification of point clusters.

Output fields added when the DBSCAN clustering method is chosen and time is utilized


Field name	Description	Field type
All input fields are retained	All input fields from the input dataset are retained.	any
CLUSTER_ID	The Cluster ID indicates which cluster each feature belongs to.	Int32
COLOR_ID	The Color ID is a label used for drawing the results so each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.	Int32
FEAT_TIME	The original instant time of each feature.	Date
START_DATETIME	The start time of the time extent of the cluster to which a feature belongs.	Date
END_DATETIME	The end time of the time extent of the cluster to which a feature belongs.	Date

Output fields added when the DBSCAN clustering method is chosen and no time is utilized


Field name	Description	Field type
All input fields are retained	All input fields from the input dataset are retained.	any
CLUSTER_ID	The Cluster ID indicates which cluster each feature belongs to.	Int32
COLOR_ID	The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.	Int32

Output fields added when the HDBSCAN clustering method is chosen


Field name	Description	Field type
All input fields are retained	All input fields from the input dataset are retained.	any
CLUSTER_ID	The Cluster ID indicates which cluster each feature belongs to.	Int32
COLOR_ID	The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.	Int32
PROB	The probability that a feature belongs in its assigned cluster.	Float64
STABILITY	The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales.	Float64
OUTLIER	The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier.	Float64
EXEMPLAR	Indicates which features are most representative of each cluster. These features are indicated by a value of 1.	Int32

Feedback on this topic?