Find Similar Locations—ArcGIS Velocity

Tool icon Available in big data analytics.

The Find Similar Locations tool identifies candidate features that are most similar or least similar to one or more reference features based on feature attributes.

Workflow diagram

Examples

The following are example uses of the Find Similar Locations tool:

Determine which of your production facilities are the most similar to your most productive facility based on the relationship between numeric attribute values.
Perform a crime analysis by searching a database of all crimes to determine whether a recent crime may be part of a larger pattern or trend.
Determine other villages that are at a high risk of a disease based on characteristics of the villages hit hardest by a disease.

Usage notes

Keep the following in mind when working with the Find Similar Locations tool:

You can use tabular, point, polyline, or polygon features.
Search (candidate) features are required and are ranked by similarity or dissimilarity to the reference locations.
A maximum of 10,000 search layer features are returned.
If more than one feature exists in the Target layer (reference location features) parameter value, matching is based on averaged reference feature values. For example, if there are two reference features and one of the analysis fields attributes is a population variable, the tool searches the Join layer (candidate search features) parameter value with populations that are similar to the average population values. If population values are 100 and 102, for example, the tool searches for candidates with populations near 101.
Note:
If more than one feature exists in the Target layer (reference location features) parameter, choose Base similarity on (analysis fields) for attributes with similar values. For example, if a population value for one of the features is 100 and the other is 100,000, the tool searches for matches with populations near the average of those two values: 50,050. This averaged value is far from the population value of either layer.
Use the Similarity parameter to search for features that are either most similar or least similar to the reference features using the Most similar or Least similar option, respectively. In some cases, you may want to see both. If the Similarity parameter value is 3 and the Similarity parameter value is set to Most and least similar, for example, the tool finds the three most similar and the three least similar candidate features.
Any given solution match in the output is either most similar or least similar to the reference features; a single solution cannot be both (and solution matches won't be duplicated in output features). Consequently, when the Similarity parameter value is Most and least similar, the maximum number of resulting matches possible (number of results) is half the number of the join layer.
Two options for the Match method parameter are as follows:
- Attribute values—The most similar candidates have the smallest sum of squared differences for all analysis fields attributes. All values are standardized before differences are calculated.
- Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, there are three analysis fields named A1, A2, and A3. A2 is twice as large as A1, and A3 is almost equal to A2. If the Match method parameter is set to Attribute profiles, the tool searches for candidates with those attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two analysis fields attributes. You can use the cosine similarity method (the Attribute profiles option) to find places similar to Los Angeles but at a different scale, for example, the profile of population compared to number of cars to number of residents less than 20 years old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output feature's simindex field.
The Base similarity on parameter should be numeric and present, with the same field name and field type in both the reference location features (target layer) and candidate search (join layer) datasets. If the tool does not find corresponding fields for the candidate search (join layer) features, a validation warning appears indicating that identical field names must be present.
All of the attributes used for matching are written to the output. Use the Append fields parameter to specify fields to add to the output. By default, all fields are added. Use the Append fields parameter to choose specific fields from the join layer that you want to add.

Parameters

The following are the parameters for the Find Similar Locations tool:


Parameter	Description	Data type
Target layer (reference location features)	The target layer containing reference features. The reference features can be further reduced or filtered using the Reference locations expression or Reference locations extent parameters (below).	Features
Join layer (candidate search features)	The join layer containing search or candidate features. The tool evaluates search features to find those with similar analysis field attribute values compared to the Target layer (reference features) parameter.	Features
Reference locations expression	An Arcade attribute expression that is evaluated to filter the target layer (reference features) to retain. The expression is configured in the Arcade expression builder, accessed by clicking Configure an Arcade Expression. Each record is evaluated, and records that evaluate to true are retained and those that evaluate to false are discarded.	String (Arcade expression)
Reference locations extent (Optional)	When used, only reference features that are in the specified reference spatial extent for this parameter are retained as reference features. In the tool configuration, there is an extent selector component to allow drawing of a reference location extent.	Esri JSON envelope
Base similarity on (analysis fields)	Specifies one or more numeric attributes (analysis fields) of interest. The values present for these attribute fields are calculated for the Target layer (reference location features) parameter value. Then, the features from the Join layer (candidate search features) parameter are evaluated to determine which search features are most or least similar to the reference features.	String (Field names)
Similarity	Specifies whether the results returned from this tool are most or least similar to the Target layer (reference location features) parameter value provided for the specified analysis fields. The maximum number of results that can be returned is 10,000. You can choose the most, least, or most and least similar locations option to return features.	Integer and String
Match method	There are two match methods available: Attribute values—The most similar candidates have the smallest sum of squared differences for all Analysis Fields attributes. All values are standardized before differences are calculated. Attribute profiles—The cosine similarity is measured. Cosine similarity searches for the same relationships among standardized attribute values rather than trying to match magnitudes. For example, there are three analysis fields named A1, A2, and A3. If the Match method parameter value is Attribute profiles, the tool searches for candidates with those same attribute relationships: A2 is twice as large as A1, and A3 is almost equal to A2. Because this method is finding relationships between attributes, you must specify a minimum of two Attribute values attributes. You can use the cosine similarity method (the Attribute profiles option) to browse places similar to Los Angeles but at a different scale, for example, the profile of population compared to number of cars compared to number of residents less than 20 years old. The cosine similarity index ranges from 1.0 (perfect similarity) to -1.0 (perfect dissimilarity). The cosine similarity index is written to the output feature's simindex field.	String
Append fields	Specifies the fields to add to the output. By default, all fields are added. Use the Append fields parameter to choose specific fields from the Join layer (candidate search features) parameter value that you want to add.	String (Field names)
Reference Id field (Optional)	The field that contains unique IDs in the Target layer (reference location features) schema. If a field is not selected, the tool generates unique IDs for the features.	String (Field name)
Candidate Id field (Optional)	The field that contains unique IDs in the Join layer (candidate search features) schema. If a field is not selected, the tool generates unique IDs for the features.	String (Field name)

Output layer

All of the features in the target layer and matches in the join layer are written to the output features along with the attributes from the Base similarity on and Append fields parameters. In addition, the following fields are included in the output features:


Field name	Description
location_type	A string indicating whether features are a target reference feature or join search (candidate) features.
simrank	When you choose Most similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from most similar to least similar. The most similar solution match has a rank value of 1. Note: This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.
dissimrank	When you choose Least similar or Most and least similar for the Similarity parameter, all of the solution matches are ranked from least similar to most similar. The solution that is least similar has a rank value of 1. Note: This field is only included in the output features when you choose Least similar or Most and least similar for the Similarity parameter.
simindex	This field quantifies how similar each solution match is to the target reference features. When you specify Attribute values as the Match method parameter value, this value represents the sum of squared value differences. Note: This field is only included in the output features when you choose Attribute values for the Match method parameter.
cosimindex	This field quantifies how similar each solution match is to the target features. When you specify Attribute profiles for the Match method parameter, this value represents the cosine similarity. Note: This field is only included in the output features when you choose Attribute profiles for the Match method parameter.
labelrank	This field is for display purposes only. The tool uses this field to provide default rendering of the analysis results.
reference_id	A unique ID value for target reference features. Join search (candidate) features are given a null value. If the Reference ID field parameter is not specified, a unique ID value is generated for reference features.
search_id	A unique ID value for join search (candidate) features. Target reference features are given a null value. If the Candidate ID field parameter is not specified, a unique ID value is generated for candidate or search features.

Feedback on this topic?

Workflow diagram

Examples

Usage notes

Note:

Parameters

Output layer

Note:

Note:

Note:

Note:

In this topic