Find Point Clusters

Tool icon Available in big data analytics.

The Find Point Clusters tool Find point clusters small tool finds clusters of point features in surrounding noise based on their spatial or spatiotemporal distribution.

Workflow diagram

Find Point Clusters workflow diagram

Example

A nongovernmental organization is studying a particular pest-borne disease and has a point dataset representing households in a study area, some of which are infested, some of which are not. Using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.

Usage notes

  • The input for this tool is a single point layer.
  • All results will include a field named CLUSTER_ID that indicates which cluster each feature belongs to and a field named COLOR_ID that is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates that a feature has been labeled as noise.
  • The Clustering method parameter determines whether a defined distance or self-adjusting clustering algorithm will be used. DBSCAN identifies clusters of points that are in close proximity based on a specified search range. HDBSCAN finds clusters of points similar to DBSCAN but uses varying search ranges allowing for clusters with varying densities based on cluster probability (or stability).
    • If DBSCAN is chosen, clusters can be found in either two-dimensional space only or in both space and time. If choosing to use time to find clusters and the input layer has time enabled and is of type instant, DBSCAN will discover spatiotemporal clusters of points in close proximity based on a specified search distance and search duration.
    • HDBSCAN currently only supports spatial clustering and will not use time to discover clusters.
  • If the DBSCAN clustering method is used with time to discover spatiotemporal clusters, results will also include the following fields:
    • FEAT_TIME—The original instant time of each feature.
    • START_DATETIME—The start time of the time extent of the cluster to which a feature belongs.
    • END_DATETIME—The end time of the time extent of the cluster to which a feature belongs. The resulting layer's time will be set as an interval on the START_DATETIME and END_DATETIME fields, ensuring in most cases all cluster members are drawn together when visualizing spatiotemporal clusters with a time slider. For noise features, START_DATETIME and END_DATETIME will be equal to FEAT_TIME.
  • If the HDBSCAN clustering method is used, results will also include the following fields:
    • PROB—The probability that a feature belongs in its assigned cluster.
    • OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier.
    • EXEMPLAR—Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
    • STABILITY—The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales.
  • The Minimum features per cluster parameter is used differently depending on the Clustering method chosen:
    • Defined distance (DBSCAN)—Specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the Search distance parameter. When using time to find clusters, an additional search duration is required and is set using the Search duration parameter. When searching for cluster members, the specified minimum features per cluster must be found within the specified search distance and search duration to form a cluster. Note that the search distance and duration are not related to the diameter or time extent of the point clusters discovered.
    • Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Parameters

ParameterDescriptionData type

Input Layer

The point features from which to find point clusters.

Features

Clustering method

The clustering method used by the tool to determine point clusters. The two options are as follows:

  • DBSCAN—Uses a specified distance to separate dense clusters from sparser noise. DBSCAN is the fastest of the clustering methods but is only appropriate if there is a clear distance that works well to define all clusters that may be present. This method results in clusters that have similar densities. This is the default.
  • HDBSCAN—Uses varying distances to separate clusters of varying densities from sparser noise. HDBSCAN is the most data-driven of the clustering methods and requires the least user input.

String

Minimum features per cluster

This parameter is used differently depending on the Clustering method chosen as follows:

  • Defined distance (DBSCAN)—Specifies the number of features that must be found within a certain distance of a point for that point to start to form a cluster. The distance is defined using the Search distance parameter.
  • Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.

Int64

Use Time

Whether to use time in identification of point clusters. This option is only available for the DBSCAN clustering method.

Boolean

Search distance

The maximum distance to be considered.

The Minimum features per clusterspecified must be found within this distance for cluster membership. Individual clusters will be separated by at least this distance. If a feature is located farther than this distance from the next closest feature in the cluster, it will not be included in the cluster.

Float64

Search duration

When searching for cluster members, specifies the minimum number of points that must be found within this time duration to form a cluster.

String

Output layer

The output layer generated will contain different fields depending upon the clustering method selected and whether time is used in the identification of point clusters.

Output fields added when the DBSCAN clustering method is chosen and time is utilized

Field nameDescriptionField type

All input fields are retained

All input fields from the input dataset are retained.

any

CLUSTER_ID

The Cluster ID indicates which cluster each feature belongs to.

Int32

COLOR_ID

The Color ID is a label used for drawing the results so each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.

Int32

FEAT_TIME

The original instant time of each feature.

Date

START_DATETIME

The start time of the time extent of the cluster to which a feature belongs.

Date

END_DATETIME

The end time of the time extent of the cluster to which a feature belongs.

Date

Output fields added when the DBSCAN clustering method is chosen and no time is utilized

Field nameDescriptionField type

All input fields are retained

All input fields from the input dataset are retained.

any

CLUSTER_ID

The Cluster ID indicates which cluster each feature belongs to.

Int32

COLOR_ID

The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.

Int32

Output fields added when the HDBSCAN clustering method is chosen

Field nameDescriptionField type

All input fields are retained

All input fields from the input dataset are retained.

any

CLUSTER_ID

The Cluster ID indicates which cluster each feature belongs to.

Int32

COLOR_ID

The Color ID is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters in most cases. For both fields, a value of -1 indicates a feature has been labeled as noise.

Int32

PROB

The probability that a feature belongs in its assigned cluster.

Float64

STABILITY

The persistence of each cluster across a range of scales. A larger score indicates a cluster persists over a wider range of distance scales.

Float64

OUTLIER

The likelihood that a feature is an outlier within its own cluster. A larger value indicates the feature is more likely to be an outlier.

Float64

EXEMPLAR

Indicates which features are most representative of each cluster. These features are indicated by a value of 1.

Int32