Find Similar Locations

Tool icon Available in big data analytics.

The Find Similar Locations tool Find Similar Locations tool identifies candidate features that are most similar or least similar to one or more reference features based on feature attributes.

Workflow diagram

Find Similar Locations workflow diagram

Examples

The following are example uses of the Find Similar Locations tool:

  • Of your production facilities, determine which are the most similar to your most productive facility based on the relationship between numeric attribute values.
  • A crime analyst wants to search a database of all crimes to determine whether a recent crime may be part of a larger pattern or trend.
  • Determine other villages that are at a high risk of a disease based on characteristics of the villages hit hardest by a disease.

Usage notes

Keep the following in mind when working with the Find Similar Locations tool:

  • Tabular, point, polyline, or polygon features can be used.
  • Search (candidate) features are required and will be ranked by similarity or dissimilarity to the reference locations.
  • A maximum of 10,000 search layer features will be returned.
  • If more than one feature exists in the Target layer (reference location features) parameter value, matching is based on averaged reference feature values. For example, if there are two reference features and one of the analysis fields attributes is a population variable, the tool will search the Join layer (candidate search features) parameter value with populations that are similar to the average population values. If population values are 100 and 102, for example, the tool will search for candidates with populations near 101.
    Note:

    If more than one feature exists in the Target layer (reference location features) parameter, choose Base similarity on (analysis fields) for attributes with similar values. If, for example, a population value for one of the features is 100 and the other is 100,000, the tool will search for matches with populations near the average of those two values: 50,050. This averaged value is far from the population value of either layer.

  • Use the Similarity parameter to search for features that are either most similar or least similar to the reference features using the Most similar or Least similar option, respectively. In some cases, you may want to see both. If the Similarity parameter value is 3 and the Similarity parameter value is set to Most and least similar, for example, the tool will find the three most similar and the three least similar candidate features.
  • Any given solution match in the output will be a solution that is either most similar or least similar to the reference features; a single solution cannot be both (and solution matches won't be duplicated in output features). Consequently, when the Similarity parameter value is Most and least similar, the maximum number of resulting matches possible (number of results) will be half the number of the join layer.
  • Two options for the Match method parameter are:
    • Attribute values—The most similar candidates will have the smallest sum of squared differences for all analysis fields attributes. All values are standardized before differences are calculated.
    • Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, there are three analysis fields named A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match method parameter is set to Attribute profiles, the tool will search for candidates with those attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two analysis fields attributes. You can use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 years old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output feature's simindex field.
  • The Base similarity on parameter should be numeric and present, with the same field name and field type in both the reference location features (target layer) and candidate search (join layer) datasets. If the tool does not find corresponding fields for the candidate search (join layer) features, a validation warning appears indicating that identical field names must be present.
  • All of the attributes used for matching are written to the output. Use the Append fields parameter to specify fields to add to the output table. By default, all fields are added. Use the Append fields parameter to select specific fields from the join layer that you want to add.

Parameters

The following are the parameters for the Find Similar Locations tool:

ParameterDescriptionData type

Target layer (reference location features)

The target layer containing reference features. The reference features can be further reduced or filtered using the Reference locations expression or Reference locations extent parameters (below).

Features

Join layer (candidate search features)

The join layer containing search or candidate features. The tool will evaluate search features to find those with similar analysis field attribute values compared to the Target layer (reference features) parameter.

Features

Reference locations expression

An Arcade attribute expression that will be evaluated to filter the target layer (reference features) to retain. The expression is configured in the Arcade expression builder, accessed by clicking Configure an Arcade Expression.

Each record is evaluated, and records that evaluate to true are retained and those that evaluate to false are discarded.

String (Arcade expression)

Reference locations extent

Optionally, provide an extent to filter reference locations. Only reference features that are within the specified reference spatial extent specified for this parameter will be retained as reference features.

In the tool configuration, there is an extent selector component to allow drawing of a reference location extent.

EsriJSON envelope

Base similarity on

Specifies one or more numeric attributes (analysis fields) of interest. The values present for these attribute fields will be calculated for the Target layer (reference location features) parameter value.

Then, the features from the Join layer (candidate search features) parameter will be evaluated to determine which search features are most or least similar to the reference features.

String (Field names)

Similarity

Specifies whether the results returned from this tool will be most or least similar to the Target layer (reference location features) parameter value provided for the specified analysis fields.

The maximum number of results that can be returned is 10,000.

You can choose the Similar, Least similar, or Most and least similar option to return features.

Integer and String

Match method

There are two match methods available:

  • Attribute values—The most similar candidates will have the smallest sum of squared differences for all Analysis Fields attributes. All values are standardized before differences are calculated.
  • Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, there are three analysis fields named A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match method parameter value is Attribute profiles, the tool will search for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two Attribute values attributes. You can use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 years old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output feature's simindex field.

String

Append fields

All of the attributes used for matching are written to the output. The Append fields parameter allows you to specify only specific fields to add to the output table. By default, all fields are added. Use the Append fields parameter to select specific fields from the Join layer (candidate search features) parameter value that you want to add.

String (Field names)

Reference ID field

(optional)

The field that contains unique IDs in the Target layer (reference location features) schema.

If a field is not selected, the tool will generate unique IDs for the features.

String (Field name)

Candidate ID field

(optional)

The field that contains unique IDs in the Join layer (candidate search features) schema.

If a field is not selected, the tool will generate unique IDs for the features.

String (Field name)

Output layer

All of the features in the Target layer (reference location features) parameter and matches in the Join layer (candidate search features) parameter are written to the output features along with the attributes from the Base similarity on and Append fields parameters. In addition, the following fields are included in the output features:

Field nameDescriptionNotes

location_type

A string indicating whether features are a target reference feature or join search (candidate) features.

simrank

When you select Most similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1.

This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.

dissimrank

When you select Least similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1.

This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.

simindex

This field quantifies how similar each solution match is to the target reference features. When you specify Attribute values as the Match method parameter value, this value represents the sum of squared value differences.

This field is only included in the output features when you choose Attribute values for the Match method parameter.

cosimindex

This field quantifies how similar each solution match is to the target features. When you specify Attribute profiles for the Match method parameter, this value represents the cosine similarity.

This field is only included in the output features when you choose Attribute profiles for the Match method parameter.

labelrank

This field is for display purposes only. The tool uses this field to provide default rendering of the analysis results.

reference_id

A unique ID value for target reference features. Join search (candidate) features are given a null value.

If the Reference ID field parameter is not specified, a unique ID value will be generated for reference features.

search_id

A unique ID value for join search (candidate) features. Target reference features are given a null value.

If the Candidate ID field parameter is not specified, a unique ID value will be generated for candidate or search features.