Find K-Means Clusters finds natural clusters of features based on either location or attribute values using the K-Means algorithm. The algorithm classifies the features so that the features within a cluster are as similar as possible, while the clusters are as different as possible.
Examples
The following are example scenarios using Find K-Means Clusters:
- A nongovernmental organization collects data on abandoned fishing gear and other large offshore debris. The location of the debris can be analyzed to find clusters of debris, which can help the organization determine the main sources of the abandoned equipment and debris.
- Customers of a retail location can be analyzed based on their demographic characteristics and buying patterns. Clusters based on properties such as disposable income and spending can be used to design a marketing strategy for the store.
Run Find K-Means Clusters
Find K-Means Clusters can be run on map, chart, or table cards using point, line, or area features.
Complete the following steps to find natural clusters:
- Click the map card to activate it if necessary.
A card is active when the toolbar and Action button appear.
- Click the Action button and do one of the following:
- For a map card, on the Spatial analysis tab, click Find K-Means Clusters.
- For chart and table cards, click How is it distributed and click Find K-Means Clusters.
- For Choose a layer, select the layer for which you want to find clusters.
- For Analysis fields, choose one of the following options:
- To run Find K-Means Clusters spatially, select a location field.
- To run Find K-Means Clusters nonspatially, select one or more number fields.
- Expand Additional options and provide a value for the Number of clusters parameter if necessary.
- Click Run.
Usage notes
The Choose a layer parameter is used to select a dataset in which to find clusters. The dataset can have point, line, or area features, or it can be a nonspatial table (available when using the capability from a chart or table).
The Analysis fields parameter is used to select the field on which the clusters will be based. The field can be either a location field, in which case the clusters will be based on geographic location, or one or more number or rate/ratio fields, in which case the clusters will be based on similarity between attributes. A combination of location and number or rate/ratio fields is not supported.
You can expand Additional options to reveal the Number of clusters parameter. If a specific number of clusters is required for the analysis, provide that value in the Number of clusters parameter. If no value is provided, a number of clusters will be calculated using the Davies-Bouldin index described in Davies and Bouldin (1979) that will optimize the similarities within a cluster and the differences between clusters.
Limitations
This tool is not supported for read-only connections to Google BigQuery, Snowflake, and database platforms that are not supported out of the box.
Cross filters, filter widgets, and temporal filter widgets can be applied to the results of Find K-Means Clusters but will not rerun the tool each time the filter is changed.
References
Davies, David L., and Donald W. Bouldin. 1979. "A Cluster Separation Measure." IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1, no. 2 (April): 224 - 227. https://doi.org/10.1109/TPAMI.1979.4766909.