Available in big data analytics.
The Find Point Clusters tool finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.
Workflow diagram
Example
A nongovernmental organization is studying a particular pest-borne disease and has a point dataset representing households in a study area, some of which are infested, some of which are not. Using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.
Usage notes
Keep the following in mind when working with the Find Point Clusters tool:
- The input for this tool is a single point layer.
- All results will include a field named CLUSTER_ID that indicates which cluster each feature belongs to and a field named COLOR_ID that is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates that a feature has been labeled as noise.
- The Clustering method parameter determines whether a defined distance or self-adjusting clustering algorithm will be used. DBSCAN identifies clusters of points that are in close proximity based on a specified search range. HDBSCAN finds clusters of points similar to DBSCAN but uses varying search ranges allowing for clusters with varying densities based on cluster probability (or stability).
- If DBSCAN is chosen, clusters can be found in either two-dimensional space only or in both space and time. If choosing to use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points in close proximity based on a specified search distance and search duration.
- HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.
- If the DBSCAN clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:
- FEAT_TIME—The original instant time of each feature.
- START_DATETIME—The start time of the time extent of the cluster to which a feature belongs.
- END_DATETIME—The end time of the time extent of the cluster to which a feature belongs. The resulting layer's time will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring in most cases all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.
- If the HDBSCAN clustering method is used, results will also include the following fields:
- PROB—The probability that a feature belongs in its assigned cluster.
- OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier.
- EXEMPLAR—Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
- STABILITY—The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales.
- The Minimum features per cluster parameter is used differently depending on the Clustering method chosen:
- Defined distance (DBSCAN)—Specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the Search distance parameter. When using time to find clusters, an additional search duration is required and is set using the Search duration parameter. When searching for cluster members, the specified minimum features per cluster must be found within the specified search distance and search duration to form a cluster. Note that the search distance and duration are not related to the diameter or time extent of the point clusters discovered.
- Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.
Parameters
The following are the parameters for the Find Point Clusters tool:
Parameter | Description | Data type |
---|---|---|
Input Layer | The point features from which to find point clusters. | Features |
Clustering method | The clustering method used by the tool to determine point clusters. The two options are as follows:
| String |
Minimum features per cluster | This parameter is used differently depending on the Clustering method chosen as follows:
| Int64 |
Use Time | Whether to use time in identification of point clusters. This option is only available for the DBSCAN clustering method. | Boolean |
Search distance | The maximum distance to be considered. The Minimum features per clusterspecified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster. | Float64 |
Search duration | When searching for cluster members, specifies the minimum number of points that must be found within this time duration to form a cluster. | String |
Output layer
The output layer generated will contain different fields depending upon the clustering method selected and whether time is used in the identification of point clusters.
Output fields added when the DBSCAN clustering method is chosen and time is utilized
Field name | Description | Field type |
---|---|---|
All input fields are retained | All input fields from the input dataset are retained. | any |
CLUSTER_ID | The Cluster ID indicates which cluster each feature belongs to. | Int32 |
COLOR_ID | The Color ID is a label used for drawing the results so each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise. | Int32 |
FEAT_TIME | The original instant time of each feature. | Date |
START_DATETIME | The start time of the time extent of the cluster to which a feature belongs. | Date |
END_DATETIME | The end time of the time extent of the cluster to which a feature belongs. | Date |
Output fields added when the DBSCAN clustering method is chosen and no time is utilized
Field name | Description | Field type |
---|---|---|
All input fields are retained | All input fields from the input dataset are retained. | any |
CLUSTER_ID | The Cluster ID indicates which cluster each feature belongs to. | Int32 |
COLOR_ID | The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise. | Int32 |
Output fields added when the HDBSCAN clustering method is chosen
Field name | Description | Field type |
---|---|---|
All input fields are retained | All input fields from the input dataset are retained. | any |
CLUSTER_ID | The Cluster ID indicates which cluster each feature belongs to. | Int32 |
COLOR_ID | The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise. | Int32 |
PROB | The probability that a feature belongs in its assigned cluster. | Float64 |
STABILITY | The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales. | Float64 |
OUTLIER | The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier. | Float64 |
EXEMPLAR | Indicates which features are most representative of each cluster. These features are indicated by a value of 1. | Int32 |