The Join tool joins datasets based on the specified relationships. Datasets can be joined using matching attributes, spatial relationships, temporal relationships, or any combination of the three.
Examples
The Join tool can be used in scenarios such as the following:
- Join county polygons with county census data based on matching countyID field values.
- Join records that occur at a relative time and place using both a temporal relationship and a spatial relationship.
Parameters
The following table outlines the parameters used in the Join tool:
Parameter | Description |
---|---|
Target dataset | The dataset that will have records joined to it. |
Join dataset | The dataset that will be joined to the target dataset. |
Join operation | Specifies whether all the matching records will be joined (one to many) or whether all the matching records will be summarized together (one to one). |
Join type | Specifies whether target records that do not have a matching join record are returned in the output. |
Summary fields | Determines which join dataset fields will have statistics calculated. |
Use matching fields | Specifies whether an attribute relationship will be used. |
Target field | The field from the target dataset that will be used to match records in the join dataset. |
Join field | The field from the join dataset that will be used to match records in the target dataset. |
Use spatial relationship | Specifies whether a spatial relationship will be used . |
Spatial relationship | Specifies the spatial relationship type that will be used to define the join condition between the target and join datasets. |
Near distance | The linear distance threshold that will be used to determine locations that are near each other. This parameter is only available for spatial relationships of type Near planar or Near geodesic. |
Near distance unit | The unit of the linear distance threshold. This parameter is only available for spatial relationships of type Near planar or Near geodesic. |
Target geometry | The geometry field from the target dataset that will be used to identify spatial relationships with the join dataset. |
Join geometry | The geometry field from the join dataset that will be used to identify spatial relationships with the target dataset. |
Use temporal relationship | Specifies whether a temporal relationship will be used. |
Temporal relationship | Specifies the temporal relationship type that will be used to define the join condition between the target and join datasets. |
Near distance | The temporal distance threshold that will be used to determine records that are near each other in time. This parameter is only available for temporal relationships of type Near, Near before, or Near after. |
Near distance unit | The unit of the temporal distance threshold. This parameter is only available for temporal relationships of type Near, Near before, or Near after. |
Target start time | The datetime field from the target dataset that will be used to specify the start time of a record. |
Target end time | The datetime field from the target dataset that will be used to specify the end time of a record. This parameter is required if you want to join using times of type interval. See Temporal relationships in Data Pipelines for details. |
Join start time | The datetime field from the join dataset that will be used to specify the start time of a record. |
Join end time | The datetime field from the join dataset that will be used to specify the end time of a record. This parameter is required if you want to join using times of type interval. See Temporal relationships in Data Pipelines for details. |
Usage notes
Use the Target dataset parameter to identify the dataset to which records will be joined. Use the Join dataset parameter to identify the dataset that contains the records that will be joined to the target records.
If multiple join records match the same target record, you can specify whether all the matching records will be joined (Join one to many option) or all the matching records will be summarized together (Join one to one option) as follows:
- Join one to many—All the matching records in the join dataset will be joined to the target dataset. The output dataset will contain multiple records of the target record.
- Join one to one—All of the matching join records will be summarized with each record in the target dataset. Only the records that have a match will be included in the summary and output. The count of joined records will be added. By default, only those target records that have the specified relationships will be maintained in the output record class (Inner join). If Left join is specified, all input target records will be written to the output record class (this is known as a left outer join).
The Summary fields parameter determines which fields from the join dataset will be summarized. The Summary fields parameter is available if the join operation is Join one to many. The following summary statistics are available:
- Sum—The sum of numeric values in a field. The sum of [null, 1, 3] is 4.
- Min—The minimum value of a numeric or date field. The minimum of [0, 2, null] is 0.
- Max—The maximum value of a numeric or date field. The maximum of [0, 2, null] is 2.
- Mean—The mean of numeric values. The mean of [0, 2, null] is 1.
- Standard deviation—The standard deviation of a numeric field. The standard deviation of [null, 1, 1, 1] is 0.
- Count—The number of nonnull values. It can be used on a field of any type. The count of [null, "hello", "world"] is 2.
- Any—A sample value from a field of any type.
You can join records based on a spatial relationship, a temporal relationship, an attribute relationship, or a combination of the three. A spatial join matches records (rows) from the input records (the Target dataset and Join dataset parameter values) based on their spatial locations. A temporal join matches records from the input records based on their temporal relationships. An attribute join matches records based on field values. The following table describes each type of join relationship:
Relationship | Description |
---|---|
Attribute | The attribute relationship that will determine whether records are joined to each other. Records are matched when the field values in the join dataset are equal to the field values in the target dataset. It is recommended to use fields of the same type. |
Spatial | The spatial relationship that will determine whether records are joined to each other. The available relationships depend on the geometry type (point, line, or area) of the datasets being joined. Available spatial relationships are as follows:
|
Temporal | The temporal relationship that will determine whether records are joined to each other. The available relationships depend on the time type (instant or interval) of the datasets being joined. Available temporal relationships are as follows:
|
If target and join geometries are in different coordinate systems, the coordinate system of the target dataset will be used in analyzing spatial relationships.
When the Spatial relationship parameter value is Near planar, the Target geometry parameter value must be in a projected coordinate system.
Outputs
When running a one-to-many join, all fields from the target dataset and the joined dataset will be included in the output. If the same field name exists in both datasets, the joined field will start with the word join appended (for example, join_fieldname).
When running a one-to-one join, in addition to all fields (including geometry) from the target dataset, the following fields will be included in the output:
Field name | Description |
---|---|
COUNT | The count of records that have been joined. This is always included in the output. |
STATISTIC_fieldname | Specified statistics will each create an attribute field, named in the following format: STATISTIC_fieldname. For example, the maximum of the id field is MAX_id. |
START_DATETIME | This field represents the start time and is only included when using temporal joins. |
END_DATETIME | This field represents the end time and is only included when using temporal joins with interval time values. |
Licensing requirements
The following licensing and configurations are required:
- Creator or Professional user type
- Publisher, Facilitator, or Administrator role, or an equivalent custom role
To learn more about Data Pipelines requirements, see Requirements.
Related topics
See the following topics for additional information: