Federate data with DCAT and other external catalogs

Your site's data catalog refers to the items that have been added to your site's content library, including items shared to groups that you've added using your site's groups manager.

You can federate this data catalog with external catalogs, such as CKAN, using the URL that is automatically generated for every hub site at www.yourhubsite.gov/data.json. You can also edit the content of your site's catalog using the DCAT Configuration editor in ArcGIS Hub. In the United States, you can modify the output to work specifically with the national Data.gov catalog.

Caution:

Only data items that are shared publicly populate the data.json catalog. Private content within your organization cannot currently be shared or federated through the DCAT catalog method.

Federate with CKAN

The DCAT catalog allows organizations to standardize how they describe the data contributions they have to help people find their content through search engines and third-party catalogs. If your organization uses cataloging software such as CKAN, or works with other organizations that do, you can use the ArcGIS Hub data catalog to configure your site's automatic DCAT output. This type of interoperability means that you can point these third-party aggregators to all of your datasets' multiple download format options (.shp, .kml, and .csv files) and APIs (Geoservices, WMS, and GeoJSON) on a CKAN platform.

Before you get started

Your CKAN instance must be properly configured to support data harvesting. First, you must install and configure two extensions that are developed and maintained by the CKAN team and used by Data.gov and others to harvest datasets: the CKAN Harvesting extension and the CKAN DCAT extension.

After confirming that these extensions are installed, ensure that you have the Harvester Gather_Consumer and Fetch_Consumer services running as background services.

  1. Activate your local Python environment: ./usr/lib/ckan/default/bin/activate
  2. Activate the Gather process: paster --plugin=ckanext-harvest harvester gather_consumer --config='/path/to/your config.ini'
  3. Activate the Fetch process: paster --plugin=ckanext-harvest harvester fetch_consumer --config='/path/to/your config.ini'

Harvest the ArcGIS Hub catalog

To harvest the ArcGIS Hub catalog, complete the following steps:

  1. Go to your CKAN harvest administration page and sign in at http://yourCKANinstance/harvest.
  2. Select add harvest source and provide information about your hub site:
    • Fill in the URL with http://yourOpenDataSite/data.json
    • Give the harvest source a title similar to the title of your site.
    • Optionally, fill in the description box.
    • Select DCAT JSON Harvester as the source type.
    • For update frequency select manual.
    • Click Save when you're finished.
  3. Select admin and select reharvest.
  4. Run harvest jobs on your CKAN instance.
  5. Activate your Python environment: ./usr/lib/ckan/default/bin/activate.
  6. Enter the command: paster --plugin=ckanext-harvest harvester run --config='path/to/your config.ini'.

    CKAN processes your data.json file and includes all of your datasets. You can see what is harvested by viewing the harvest source. All of your descriptions, tags, and dataset distributions from ArcGIS Hub are accessible from the CKAN instance.

Note:

You may experience some delays the first time you preview a .csv or .json file because ArcGIS Hub generates a cache of the data and CKAN cannot identify how to handle this while the data is processing. This will not occur the next time you preview the file.

Federate with Data.gov

To federate your open data with Data.gov, you must comply with the Project Open Data (POD) standard v1.1, which is slightly different from the default DCAT standard provided at your /data.json URL. You can configure your data.json feed by adding the required bureau code and program code in the DCAT Configuration editor.

To federate with Data.gov, complete the following steps:

  1. In a new browser window, open the site you want to integrate.
  2. Click the edit button edit to open the site in edit mode.
  3. Click to open the site drop-down menu in the edit navigation bar.
  4. Click Content Library.
  5. Click the more button more and choose Configure DCAT.
  6. In the DCAT Configuration editor, copy and paste the following code anywhere after a comma and before the last bracket.

    Note:
    The bureau codes and program codes pasted here are applied to every dataset in your data.json feed. If you need different codes to apply to different datasets in your catalog, contact Esri Technical Support for assistance.

    • For one bureau code and one program code, use the following:
      "bureauCode": [
            "010:86"
            ],
          "programCode": [
              "015:001"
            ],
    • For more than one code, use the following:
      "bureauCode": [
            "010:86",
            "010:04"
            ],
          "programCode": [
              "015:001",
              "015:002"
            ],
    Tip:

    For example, a bureau code and program code could be formatted as follows:

    {
                   "title": "{{default.name}}",
                   "description": "{{default.description}}",
                   "keyword": "{{item.tags}}",
                   "issued": "{{item.created:toISO}}",
                   "modified": "{{item.modified:toISO}}",
                   "publisher": {
                                  "source": "{{default.source.source}}"
                   },
                   "bureauCode": [
                      "010:86"
                      ],
                     "programCode": [
                      "015:001"
                     ],
                   "contactPoint": {
                                  "fn": "{{item.owner}}",
                                  "hasEmail": "{{org.portalProperties.links.contactUs.url}}"
                    }
                  }

  7. Replace the bureau code and program code with the correct code for your organization. For more information about formatting bureau code and program code, see the notes provided on the Project Open Data site by following the respective links..
  8. Verify that your DCAT feed is working by pasting your site's DCAT URL in the Project Open Data Validator.
  9. Click Save below the editor on the DCAT Configuration page when you're ready to confirm your changes.