Create relationships to join datasets

Insights in ArcGIS Online
Insights in ArcGIS Enterprise
Insights desktop

Relationships are used to join attributes from two or more datasets based on common fields or locations. The common field identifies which rows from each dataset will be joined to each other, and the relationship type determines which rows are kept in the joined dataset.

Relationships can be used to add attributes from a look up table to another dataset, to combine attributes from two datasets to enable different types of analysis, and to enable location in nonspatial tables. The joined dataset is displayed as a result dataset Results in the data pane and allows you to expand your analysis by using the data from all of the joined datasets as a single layer.

Relationships can be created as part of your analysis or as part of a data engineering workflow.

Example

A contact tracer for a public health office is investigating an outbreak of an infectious disease. The contact tracer has access to a look up table with case data (including test result) and a dataset with the activities and locations where contacts were made preceding testing. The two datasets are joined so that the case data is added to the location data. Using the joined dataset, the contact tracer can create a reference table showing the case, activity, activity date, and test result and a treemap showing the test result and number of activities for each case.

See the Track virus spread with ArcGIS Insights lesson for the complete workflow.

Create a relationship in an analysis workflow

Relationships created in an analysis workflow exist only in the workbook where the join was created. If you want the result dataset to exist outside of the workbook, you can create a new feature layer by sharing the dataset or use data engineering.

Complete the following steps to create a relationship in an analysis workflow:

  1. Open a workbook and add two or more datasets, if necessary.
  2. Click the Create relationships button Create relationships above the data pane.

    The Create relationships window appears and displays a list of datasets for your current page.

  3. Select the first dataset you want to join.

    When you select the first dataset, the view updates to show only those datasets that are eligible for a relationship.

  4. Select the second dataset you want to join.

    Insights analyzes the datasets you added and suggests fields to use for the join, if possible. The suggested fields are based on the type of data the field contains and the name of the field.

  5. Click the Edit relationship button Edit relationship to change the following relationship properties:
    Note:

    If no suggested fields are chosen, the dialog opens automatically.

    • To change the fields in the join, click the arrows for the boxes in the Choose fields parameter and choose a new field.
    • To add fields (for example, when the dataset uses a composite key), click the Add fields button and choose the new fields from the menu.
    • To change the join type, select Inner (default), Full, Left, or Right from the Choose relationship type parameter.
  6. Close the Edit relationship window.
  7. Repeat steps 4 to 6 to add additional datasets, if necessary.
  8. Click Finish to create the relationship.

The new joined dataset is added to the data pane.

Create a relationship with data engineering

Insights desktop
Note:

Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Overview of ArcGIS Insights.

Data engineering is currently in Preview.

The Create relationships tool can be added to a data model and used to join two or more datasets to create a single output dataset.

Complete the following steps to create a relationship in a data engineering workflow:

  1. Open a data workbook in Insights desktop and add data, if necessary.
  2. Click the Create button Create.
  3. From the menu, choose Create relationships.

    The Create relationships window appears.

  4. Select the first dataset you want to join in the data pane.
  5. Select the second dataset you want to join.

    Insights analyzes the datasets you added and suggests columns to use for the join, if possible. The suggested columns are based on the type of data the column contains and the name of the column.

  6. Click the Edit relationship button Edit relationship to change the following relationship properties:
    Note:

    If no suggested columns are chosen, the dialog opens automatically.

    • To change the columns in the join, click the arrows for the boxes in the Choose columns parameter and choose a new column.
    • To add columns (for example, when the dataset uses a composite key), click the Add columns button and choose the new columns from the menu.
    • To change the join type, select Inner (default), Full, Left, or Right from the Choose relationship type parameter.
  7. Close the Edit relationship window.
  8. Repeat steps 5 to 7 to add additional datasets, if necessary.
  9. Optional: Click Preview to create a view of the result dataset based on the current join parameters.
  10. Click Finish to create the relationship.

The new joined dataset is added to the data pane and the table appears. The tool is added to the data model.

Supported data

The following data types can be used to create relationships:

  • Feature layers
  • Files added to your workbook
  • Datasets from a database connection (available in Insights in ArcGIS Enterprise and Insights desktop)
  • Result datasets from spatial analysis, including results from unsupported data types

Definition queries in feature services are not supported for creating relationships. Data in your workbook that is not supported for relationships will be disabled in the Create relationships window.

Note:

Relationships are not supported in analysis workflows for the following data sources:

  • Remote feature layers, such as ArcGIS Living Atlas layers.
  • Database tables that are not from the same database and instance.
  • Datasets from different data stores.

If your datasets are not compatible with joining in an analysis workflow, you can copy the datasets to your workbook and use the copied datasets to create the relationship. Alternatively, you can join datasets from different databases or data stores using data engineering.

Relationship types

The relationship type you use, along with the common field you choose, determines the records that you keep in the joined dataset. The relationship types supported in Insights are Inner, Full, Left, and Right.

Note:

Spatial joins (joins made using location fields) are created using Intersects as the spatial relationship. However, you can still choose Inner, All, Left, or Right as the relationship type.

The following subsections provide the results of joining two tables using a common field for each relationship type.

The first table (Orders) tracks transactions from different branches. The table includes a unique ID for each transaction, the branch ID where the transaction took place, the cost of the transaction in dollars, and the payment method.

Orders

Order IDBranch IDTotal amount ($)Payment method

1204

1

91.48

Credit

1295

1

103.86

Credit

1316

3

97.81

Debit

1406

2

103.24

Credit

1421

2

44.95

Cash

1426

2

153.71

Debit

1454

3

168.95

Credit

1533

7

32.99

Cash

The second table (Branches) is a lookup table containing the branch ID numbers and branch locations.

Branches

Branch IDBranch location

1

Redlands

2

Portland

3

Washington, DC

4

Arlington

5

Charlotte

6

Denver

The Branch ID field is shared between the two tables and is used to create the join. In each example, the Orders table is on the left side of the join.

Inner

Inner keeps only the records that match both datasets. Null records are not added.

Inner is the default relationship type.

The following table shows the result of a join with an Inner relationship type between the Orders and Branches tables:

Order IDBranch IDTotal amount ($)Payment methodBranch location

1204

1

91.48

Credit

Redlands

1295

1

103.86

Credit

Redlands

1316

3

97.81

Debit

Washington, DC

1406

2

103.24

Credit

Portland

1421

2

44.95

Cash

Portland

1426

2

153.71

Debit

Portland

1454

3

168.95

Credit

Washington, DC

The records that were excluded were records for branch ID 7 from the Orders table and branch ID 4, 5, and 6 from the Branches table because those records did not exist in both tables.

Full

The Full relationship type (full outer join) keeps all records from both datasets and adds null values where no matches exist.

The following table shows the result of a join with a Full relationship type between the Orders and Branches tables:

Order IDBranch IDTotal amount ($)Payment methodBranch location

1204

1

91.48

Credit

Redlands

1295

1

103.86

Credit

Redlands

1316

3

97.81

Debit

Washington, DC

1406

2

103.24

Credit

Portland

1421

2

44.95

Cash

Portland

1426

2

153.71

Debit

Portland

1454

3

168.95

Credit

Washington, DC

1533

7

32.99

Cash

<No Data>

<No Data>

4

<No Data>

Arlington

<No Data>

5

<No Data>

Charlotte

<No Data>

6

<No Data>

Denver

The resulting dataset contains null values for branch ID 4, 5, 6, and 7 because those branch IDs are not present in both datasets.

Left

The Left relationship type (left outer join) keeps all records from the left table and matching records from the right table. When no matching values exist in the right table, null values are used.

When using the Left relationship type, you must ensure that the table from which you want to retain all records is on the left side of the join by selecting that table first.

The following table shows the result of a join with a Left relationship type between the Orders and Branches tables:

Order IDBranch IDTotal amount ($)Payment methodBranch location

1204

1

91.48

Credit

Redlands

1295

1

103.86

Credit

Redlands

1316

3

97.81

Debit

Washington, DC

1406

2

103.24

Credit

Portland

1421

2

44.95

Cash

Portland

1426

2

153.71

Debit

Portland

1454

3

168.95

Credit

Washington, DC

1533

7

32.99

Cash

<No Data>

The resulting dataset contains a null value for branch ID 7 because that branch ID is not present in the Branches dataset.

Right

The Right relationship type (right outer join) keeps all records from the right table and matching records from the left table. When no matching values exist in the left table, null values are used.

When using the Right relationship type, you must ensure that the table from which you want to retain all records is on the right side of the join by selecting that table second.

The following table shows the result of a join with a Right relationship type between the Orders and Branches tables:

Order IDTotal amount ($)Payment methodBranch IDBranch location

1204

91.48

Credit

1

Redlands

1295

103.86

Credit

1

Redlands

1316

97.81

Debit

3

Washington, DC

1406

103.24

Credit

2

Portland

1421

44.95

Cash

2

Portland

1426

153.71

Debit

2

Portland

1454

168.95

Credit

3

Washington, DC

<No Data>

<No Data>

4

Arlington

<No Data>

<No Data>

5

Charlotte

<No Data>

<No Data>

6

Denver

The resulting dataset contains a null value for branch ID 4, 5, and 6 because those branch IDs are not present in the Orders dataset.

Best practices

It's best practice to place the most detailed dataset on the left side of the join. In the relationship type example, the Orders table is on the left.

Use multiple fields to create joins when a single field is not unique. For example, if you are joining City Name to City Name, you might also have to specify county, state, and country because values in City Name are duplicated in the dataset.

Consider the cardinality (one-to-one, one-to-many, or many-to-many) of a join when deciding which visualizations to create with the joined dataset. Cardinality does not affect the relationship type (Inner, Full, Left, or Right), but it does have implications for aggregating fields in the joined dataset or showing the fields on your map. For example, if the joined datasets have a one-to-many or many-to-many relationship, visualizations that aggregate the data, like bar charts, may display artificially high statistics.