Skip To Content

Join

The Join tool joins datasets based on the specified relationships. Datasets can be joined using matching attributes, spatial relationships, temporal relationships, or any combination of the three.

Examples

The Join tool can be used in scenarios such as the following:

  • Join county polygons with county census data based on matching countyID field values.
  • Join records that occur at a relative time and place using both a temporal relationship and a spatial relationship.

Parameters

The following table outlines the parameters used in the Join tool:

ParameterDescription

Target dataset

The dataset that will have records joined to it.

Join dataset

The dataset that will be joined to the target dataset.

Join operation

Specifies whether all the matching records will be joined (one to many) or whether all the matching records will be summarized together (one to one).

Join type

Specifies whether target records that do not have a matching join record are returned in the output.

Summary fields

Determines which join dataset fields will have statistics calculated.

Use matching fields

Specifies whether an attribute relationship will be used.

Target field

The field from the target dataset that will be used to match records in the join dataset.

Join field

The field from the join dataset that will be used to match records in the target dataset.

Use spatial relationship

Specifies whether a spatial relationship will be used .

Spatial relationship

Specifies the spatial relationship type that will be used to define the join condition between the target and join datasets.

Near distance

The linear distance threshold that will be used to determine locations that are near each other. This parameter is only available for spatial relationships of type Near planar or Near geodesic.

Near distance unit

The unit of the linear distance threshold. This parameter is only available for spatial relationships of type Near planar or Near geodesic.

Target geometry

The geometry field from the target dataset that will be used to identify spatial relationships with the join dataset.

Join geometry

The geometry field from the join dataset that will be used to identify spatial relationships with the target dataset.

Use temporal relationship

Specifies whether a temporal relationship will be used.

Temporal relationship

Specifies the temporal relationship type that will be used to define the join condition between the target and join datasets.

Near distance

The temporal distance threshold that will be used to determine records that are near each other in time. This parameter is only available for temporal relationships of type Near, Near before, or Near after.

Near distance unit

The unit of the temporal distance threshold. This parameter is only available for temporal relationships of type Near, Near before, or Near after.

Target start time

The datetime field from the target dataset that will be used to specify the start time of a record.

Target end time

The datetime field from the target dataset that will be used to specify the end time of a record. This parameter is required if you want to join using times of type interval. See Temporal relationships in Data Pipelines for details.

Join start time

The datetime field from the join dataset that will be used to specify the start time of a record.

Join end time

The datetime field from the join dataset that will be used to specify the end time of a record. This parameter is required if you want to join using times of type interval. See Temporal relationships in Data Pipelines for details.

Usage notes

Use the Target dataset parameter to identify the dataset to which records will be joined. Use the Join dataset parameter to identify the dataset that contains the records that will be joined to the target records.

If multiple join records match the same target record, you can specify whether all the matching records will be joined (Join one to many option) or all the matching records will be summarized together (Join one to one option) as follows:

  • Join one to many—All the matching records in the join dataset will be joined to the target dataset. The output dataset will contain multiple records of the target record.
  • Join one to one—All of the matching join records will be summarized with each record in the target dataset. Only the records that have a match will be included in the summary and output. The count of joined records will be added. By default, only those target records that have the specified relationships will be maintained in the output record class (Inner join). If Left join is specified, all input target records will be written to the output record class (this is known as a left outer join).
Examples of joining one to many and one to one
In this example, the one-to-one join only includes the count.

The Summary fields parameter determines which fields from the join dataset will be summarized. The Summary fields parameter is available if the join operation is Join one to many. The following summary statistics are available:

  • Sum—The sum of numeric values in a field. The sum of [null, 1, 3] is 4.
  • Min—The minimum value of a numeric or date field. The minimum of [0, 2, null] is 0.
  • Max—The maximum value of a numeric or date field. The maximum of [0, 2, null] is 2.
  • Mean—The mean of numeric values. The mean of [0, 2, null] is 1.
  • Standard deviation—The standard deviation of a numeric field. The standard deviation of [null, 1, 1, 1] is 0.
  • Count—The number of nonnull values. It can be used on a field of any type. The count of [null, "hello", "world"] is 2.
  • Any—A sample value from a field of any type.

You can join records based on a spatial relationship, a temporal relationship, an attribute relationship, or a combination of the three. A spatial join matches records (rows) from the input records (the Target dataset and Join dataset parameter values) based on their spatial locations. A temporal join matches records from the input records based on their temporal relationships. An attribute join matches records based on field values. The following table describes each type of join relationship:

RelationshipDescription

Attribute

The attribute relationship that will determine whether records are joined to each other. Records are matched when the field values in the join dataset are equal to the field values in the target dataset. It is recommended to use fields of the same type.

Spatial

The spatial relationship that will determine whether records are joined to each other. The available relationships depend on the geometry type (point, line, or area) of the datasets being joined. Available spatial relationships are as follows:

  • Intersects
  • Equals
  • Near planar—Uses planar distances
  • Near geodesic—Uses geodesic distances
  • Contains
  • Within
  • Touches
  • Crosses
  • Overlaps

Temporal

The temporal relationship that will determine whether records are joined to each other. The available relationships depend on the time type (instant or interval) of the datasets being joined. Available temporal relationships are as follows:

  • Meets
  • Met by
  • Overlaps
  • Overlapped by
  • During
  • Contains
  • Equals
  • Finishes
  • Finished by
  • Starts
  • Started by
  • Intersects
  • Near
  • Near before
  • Near after

If target and join geometries are in different coordinate systems, the coordinate system of the target dataset will be used in analyzing spatial relationships.

When the Spatial relationship parameter value is Near planar, the Target geometry parameter value must be in a projected coordinate system.

Outputs

When running a one-to-many join, all fields from the target dataset and the joined dataset will be included in the output. If the same field name exists in both datasets, the joined field will start with the word join appended (for example, join_fieldname).

When running a one-to-one join, in addition to all fields (including geometry) from the target dataset, the following fields will be included in the output:

Field nameDescription

COUNT

The count of records that have been joined. This is always included in the output.

STATISTIC_fieldname

Specified statistics will each create an attribute field, named in the following format: STATISTIC_fieldname. For example, the maximum of the id field is MAX_id.

START_DATETIME

This field represents the start time and is only included when using temporal joins.

END_DATETIME

This field represents the end time and is only included when using temporal joins with interval time values.

Licensing requirements

The following licensing and configurations are required:

  • Creator or Professional user type
  • Publisher, Facilitator, or Administrator role, or an equivalent custom role

To learn more about Data Pipelines requirements, see Requirements.

Related topics

See the following topics for additional information: