Federate data with external catalogs

If your organization uses cataloging software such as CKAN, or works with other organizations that do, you can federate your hub site's data catalog to make public content more discoverable through search engines and third-party catalogs.

Note:

Every site's data catalog generates a public feed output URL that conforms to DCAT US 1.1 at <siteURL>/data.json. In early 2022, ArcGIS Hub officially migrated to a new endpoint at <siteURL>/api/feed/dcat-us/1.1.json. To learn more, see changes to DCAT configurations on ArcGIS Hub sites.

How site DCAT feeds work

Each site has a catalog (content library) containing all of the content that you want to share through the site. To federate your site’s catalog, you can share a public feed output URL that is automatically generated for every public Hub site. This catalog feed, for example: www.yourhubsite.gov/api/feed/dcat-us/1.1.json, conforms to DCAT US 1.1. You can also edit the content of your site's catalog using the DCAT configuration editor in ArcGIS Hub.

Caution:

Only data items that are shared publicly populate the <DCAT type>.json catalog. Private content within your organization cannot currently be shared or federated through the DCAT catalog method.

In the United States, you can modify the output to work specifically with the national Data.gov catalog. This type of interoperability means that you can point these third-party aggregators to a dataset's multiple download format options (.shp, .kml, and .csv files) and APIs (Geoservices, WMS, and GeoJSON) on a CKAN platform.

Federate with CKAN

ArcGIS Hub feed editors allow site managers to standardize how they describe the data contributions they have. They can choose which metadata values are displayed for each dataset of the feed before it’s harvested.

Before you get started

Your CKAN instance must be properly configured to support data harvesting. First, you must install and configure two extensions that are developed and maintained by the CKAN team and used by Data.gov and others to harvest datasets: the CKAN Harvesting extension and the CKAN DCAT extension.

After confirming that these extensions are installed, ensure that you have the Harvester Gather_Consumer and Fetch_Consumer services running as background services.

  1. Activate your local Python environment: ./usr/lib/ckan/default/bin/activate
  2. Activate the Gather process: paster --plugin=ckanext-harvest harvester gather_consumer --config='/path/to/your config.ini'
  3. Activate the Fetch process: paster --plugin=ckanext-harvest harvester fetch_consumer --config='/path/to/your config.ini'

Harvest the ArcGIS Hub catalog

To harvest the catalog, complete the following steps:

  1. Go to your CKAN harvest administration page and sign in at http://yourCKANinstance/harvest.
  2. Select add harvest source and provide information about your hub site:
    • Fill in the URL with http://yourOpenDataSite/data.json
    • Give the harvest source a title similar to the title of your site.
    • Optionally, fill in the description box.
    • Select DCAT JSON Harvester as the source type.
    • For update frequency, select manual.
    • Click Save when you're finished.
  3. Select admin and select reharvest.
  4. Run harvest jobs on your CKAN instance.
  5. Activate your Python environment: ./usr/lib/ckan/default/bin/activate.
  6. Enter the command: paster --plugin=ckanext-harvest harvester run --config='path/to/your config.ini'.

    CKAN processes your data.json file and includes all of your datasets. You can see what is harvested by viewing the harvest source. All of your descriptions, tags, and dataset distributions from ArcGIS Hub are accessible from the CKAN instance.

Note:

You may experience some delays the first time you preview a .csv or .json file because ArcGIS Hub generates a cache of the data and CKAN cannot identify how to handle this while the data is processing. This will not occur the next time you preview the file.

Federate with Data.gov

Site managers can choose which attributes and values are applied to a site’s DCAT US 1.1 output feed. In the feed editor, you must supply valid keys corresponding to a dataset’s metadata.

Note:

See ...changes to DCAT configurations on ArcGIS Hub sites for information on the DCAT endpoint.

  1. Click the edit button edit to open the site in edit mode.
  2. Click to open the site menu in the top navigation bar and choose Content Library.
  3. Click the more actions button more actions and choose Configure Feeds.
  4. In the DCAT Configuration editor, copy and paste your code anywhere after a comma and before the last bracket.

Default schema

ArcGIS Hub uses a schema written in JSON to determine which metadata properties appear for each record in the corresponding feed. For example, below is the default DCAT US 1.1 schema. It contains key/value pairs such as “"title”: “{{name}}” and “description”: {{description}}. For each record in the feed, you will see the key (“title”) and templated value (“<item’s metadata title>”). The schema’s design is based on the most straightforward mapping between ArcGIS item metadata and the DCAT US 1.1 standard.

{
	"title": "{{name}}",
	"description": "{{description}}",
	"keyword": "{{tags}}",
	"issued": "{{created:toISO}}",
	"modified": "{{modified:toISO}}",
	"publisher": {
		"name": "{{source}}"
	},
	"contactPoint": {
		"fn": "{{owner}}",
		"hasEmail": "{{orgContactEmail}}"
	}
}

Custom schema example

You can customize the schema by adding, updating, or removing key/value pairs. For example, below is a custom DCAT US 1.1 schema with several modifications including the following:

  • Adding a key/value pair
  • Updating a key/value pair
  • Adding a fallback for a key/value pair
{
	"title": "{{name}}",
	"description": "{{description}}",
	"keyword": "{{tags}}",
	"issued": "{{created:toISO}}",
	"modified": "{{modified:toISO}}",
	"publisher": {
		"name": "{{source}}"
	},
	"contactPoint": {
		"fn": "{{owner}}",
		"hasEmail": "{{orgContactEmail}}"
	},
	"culture": "{{culture}}",
	"summary": "{{snippet}}",
	"platform": "ArcGIS Hub",
	"bureauCode": [
		"010:86",
		"010:04"
	],
	"programCode": [
		"015:001",
		"015:002"
	]
}
Note:

The custom DCAT US 1.1 schema includes the addition of five new keys: “culture”, “summary”, “platform”, “bureauCode”, and “programCode”. The keys “culture” and “summary” have template values that pull from the Hub V3 API, the latest version of the Hub API. The keys “platform”, “bureauCode” and “programCode” have string literal values.

Custom value examples

To match an organization's metadata standards, many site managers will want to adjust the metadata that appears in a feed. A key can be any literal string such as “title” or “” but generally they should conform to a target metadata standard. The corresponding values can be a string literal or a template that pulls a key from the Hub V3 API. For templates, you can supply any key returned from the V3 API, either top-level or nested.

For example, on the ArcGIS Hub feeds example site at dc.esri.com, there is a public layer titled “DCAT US Example 1”. You can see JSON metadata for that dataset by accessing the layer’s ID c58b40e9a26c4e21a22a31ecb17a99c6_0 (item ID + layer number ) using the Hub V3 API (https://hub.arcgis.com/api/v3/datasets/c58b40e9a26c4e21a22a31ecb17a99c6_0).

When accessing the example API response above, you should see a JSON response starting like the following:

"data": {"id": "c58b40e9a26c4e21a22a31ecb17a99c6_0","type": "dataset","attributes": {"errors": [],"access": "public","additionalResources": [],"advancedQueryCapabilities": {"supportsSqlExpression": true,"supportsQueryWithResultType": true,"supportsQueryRelatedPagination": true,"supportsQueryWithCacheHint": true,

If you scroll down, you’ll see more keys to choose from and use as template values in the editor, such as “created”, which represents the date that the content was created. To use a value from the Hub V3 API, in the feed editor, add a template value for any Hub V3 API key underneath “attributes”. For example, if you want to include “created” in your feed records, such as the following:

“
{
...
-	"bureauCode": ["010:86","010:04"],
-	"programCode": ["015:001","015:002"],
-	"created": 1634586445000,
…
}
“
In this same example, you would add the following line to the custom schema:
programCode": [
		"015:001",
		"015:002"
	],
	"created": "{{item.created}}"

You can edit the "spatial" attribute of DCAT US and DCAT AP feeds. Hub will use item extent (by default) in new templates. For items with no extent value, the spatial attribute is removed. You can override the "spatial" value with an alternative: "spatial": "{{extent || 'SPATIAL_FALLBACK'}}" and update the default template.

Content managers can configure a feed to include additional custom distributions. These distributions are appended to the existing distributions that Hub automatically generates for a content item's downloadable resources.