Find Similar Locations

Find Similar Locations small icon The Find Similar Locations tool identifies candidate features that are most similar or least similar to one or more reference features based on feature attributes.

Workflow diagram

Find Similar Locations workflow diagram

Examples

  • Of your production facilities, determine which of are the most similar to your most productive facility based upon the relationship between numeric attribute values.
  • A crime analyst wants to search a database of all crimes to see if a recent crime may be part of a larger pattern or trend.
  • Determine other villages that are at a high risk of a disease based on characteristics of the villages hit hardest by a disease.

Usage notes

  • Tabular, point, polyline, or polygon features can be used.
  • Search (candidate) features are required and will be ranked by similarity or dissimilarity to the reference locations.
  • A maximum of 10,000 search layer features will be returned.
  • If more than one feature exists in the Target layer (reference features), matching is based on averaged reference feature values. For example, if there are two reference features and one of the analysis fields attributes is a population variable, the tool will search the Join layer (search/candidate features) with populations that are similar to the average population values. If population values are 100 and 102, for example, the tool will search for candidates with populations near 101.
    Note:

    If more than one feature exists in the Target layer (reference features), choose Base similarity on (analysis fields) attributes with similar values. If, for example, a population value for one of the features is 100 and the other is 100,000, the tool will search for matches with populations near the average of those two values: 50,050. Note that this averaged value is far from the population value of either layer.

  • Use the Similarity (number of results and most or least similar) parameter to search for features that are either most similar or least similar to the reference features using the Most similar or Least similar option, respectively. In some cases, you may want to see both. If the Similarity Number of Results parameter value is 3 and the Similarity parameter value is set to both, for example, the tool will find the three most similar and the three least similar candidate features.
  • Any given solution match in the output from the tool will be either a solution that is most similar or a solution that is least similar to the reference features; a single solution cannot be both (and solution matches won't be duplicated in output features). Consequently, when the similarity parameter value is Both, the maximum number of resulting matches possible (number of results) will be half the number of the Join layer.
  • There are two Match methods available:
    • Attribute values—The most similar candidates will have the smallest sum of squared differences for all analysis fields attributes. All values are standardized before differences are calculated.
    • Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, suppose there are three analysis fields called A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match method parameter is set to Attribute profiles, the tool will search for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two analysis fields attributes. You could use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles, but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 year old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output features simindex field.
  • The Base similarity on (analysis fields) parameter should be numeric and present, with the same field name and field type in both the reference features (target) and search (candidate) join datasets. If the tool does not find corresponding fields for the search (candidate) features, a validation warning appears indicating that identical field names must be present.
  • All of the attributes used for matching are written to the output. The Append fields parameter allows you to specify fields to add to the output table. By default, all fields are added. Use the Append fields parameter to select specific fields from the Join layer that you want to add.

Parameters

ParameterDescriptionData Type

Target layer (reference location features)

The target layer containing reference features. The reference features can be further reduced or filtered by using the Reference locations expression or Reference Locations Extent. parameters (below).

Features

Join layer (candidate search features)

The join layer containing search or candidate features. The tool will evaluate search features to find those with similar analysis field attribute values compared to the Target layer (reference feature(s)).

Features

Reference locations expression

An Arcade attribute expression that will be evaluated to filter the target layer (reference features) to retain. The expression is configured in the Arcade expression builder, accessed by clicking Configure an Arcade Expression.

Each record is evaluated and records that evaluate to true are retained and those which evaluate to false are discarded.

String (Arcade expression)

Reference locations extent

Optionally, provide an extent to filter reference locations. Only reference features that are within the specified reference spatial extent specified for this parameter will be retained as reference features.

In the tool configuration, there is an extent selector component to allow drawing of a reference location extent.

EsriJSON envelope

Base similarity on

Specifies one or more numeric attributes (analysis fields) of interest. The values present for these attribute fields will be calculated for the Target layer (reference features).

Then, the Join layer (search or candidate) features will be evaluated to determine which search features are most or least similar to the reference features.

String (Field names)

Similarity

Determines whether the results returned from this tool should be most or least similar to the Target layer (reference features) provided for the specified analysis fields.

The maximum number of results that can be returned is 10,000.

You can return most Similar, Least similar, or Most and least similar features.

Integer and String

Match method

There are two match methods available:

  • Attribute values—The most similar candidates will have the smallest sum of squared differences for all Analysis Fields attributes. All values are standardized before differences are calculated.
  • Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, suppose there are three analysis fields called A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match method parameter value is Attribute profiles, the tool will search for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two Attribute values attributes. You could use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles, but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 year old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output features simindex field.

String

Append fields

All of the attributes used for matching are written to the output. The Append fields parameter allows you to specify only specific fields to add to the output table. By default, all fields are added. Use the Append fields parameter to select specific fields from the Join layer that you want to add.

String (Field names)

Reference ID field

(optional)

Specifies which field contains unique ID's in the Target layer schema.

If a field is not selected, the tool will generate unique ID's for the features.

String (Field name)

Candidate ID field

(optional)

Specifies which field contains unique ID's in the Join layer schema.

If a field is not selected, the tool will generate unique ID's for the features.

String (Field name)

Output layer

All of the features in the Target layer (reference features) and matches in the Join layer (search or candidate features) are written to the output features along with the attributes from the Base similarity on (analysis fields) and Append fields parameters. In addition, the following fields are included in the output features.

Field nameDescriptionNotes

location_type

A string indicating whether features are a target reference feature or join search (candidate) features.

simrank

When you select Most similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1.

This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.

dissimrank

When you select Least similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1.

This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.

simindex

This field quantifies how similar each solution match is to the target reference feature(s). When you specify Attribute values as the Match method parameter value, this value represents the sum of squared value differences.

This field is only included in the output features when you choose Attribute values as the Match method.

cosimindex

This field quantifies how similar each solution match is to the target feature(s). When you specify Attribute profiles in the Match method parameter, this value represents the cosine similarity.

This field is only included in the output features when you choose Attribute profiles as the Match method.

labelrank

This field is for display purposes only. The tool uses this field to provide default rendering of the analysis results.

reference_id

A unique ID value for target reference features. Join search (candidate) features are given a null value.

If the Reference ID field parameter is not specified, a unique ID value will be generated for reference features.

search_id

A unique ID value for join search (candidate) features. Target reference features are given a null value.

If the Candidate ID field parameter is not specified, a unique ID value will be generated for candidate/search features.