Find K-Means Clusters

Find K-Means Clusters finds natural clusters of features based on either location or attribute values using the K-Means algorithm. The algorithm works to classify the features so that the features within a cluster are as similar as possible, while the clusters are as different as possible.

Examples

A nongovernmental organization collects data on abandoned fishing gear and other large offshore debris. The location of the debris can be analyzed to find clusters of debris, which can help the organization determine the main sources of the abandoned equipment and debris.

Customers of a retail location can be analyzed based on their demographic characteristics and buying patterns. Clusters based on properties such as disposable income and spending can be used to design a marketing strategy for the store.

Use the Find K-Means Clusters capability

Find K-Means Clusters can be run on map, chart, or table cards using point, line, or area features.

Complete the following steps to run the Find K-Means Clusters analysis capability:

  1. Click the card to activate it, if necessary.

    A card is active when the toolbar and Action button Action appear.

  2. Click the Action button and do one of the following:
    • For a map card, on the Spatial analysis tab, click Find K-Means Clusters.
    • For chart and table cards, click How is it distributed and click Find K-Means Clusters.
  3. For Choose a layer, select the layer for which you want to find clusters.
  4. For Analysis fields, choose one of the following options:
    • To run Find K-Means Clusters spatially, select a location field.
    • To run Find K-Means Clusters nonspatially, select one or more number fields.
  5. Expand Additional options and enter a value for the Number of clusters parameter, if necessary.
  6. Click Run.

Usage notes

The Choose a layer parameter is used to select a dataset in which to find clusters. The dataset can have point, line, or area features, or it can be a nonspatial table (available when using the capability from a chart or table).

The Analysis fields parameter is used to select the field on which the clusters will be based. The field can be either a location field, in which case the clusters will be based on geographic location, or one or more number or rate/ratio fields, in which case the clusters will be based on similarity between attributes. A combination of location and number or rate/ratio fields is not supported.

You can expand Additional options to reveal the Number of clusters parameter. If a specific number of clusters is required for your analysis, enter that value in the Number of clusters parameter. If no value is entered, a number of clusters will be calculated using the Davies-Bouldin index described in Davies & Bouldin (1979) that will optimize the similarities within a cluster and the differences between clusters.

References

Davies, D. L., & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224 - 227. https://doi.org/10.1109/TPAMI.1979.4766909.