Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Overview of ArcGIS Insights.
Data engineering is currently in Preview.
Tools from the Import options menu are applied to datasets when they're added to a data workbook, but they are not added to the data model.
The following tools are available when importing a dataset to a data workbook:
Apply an advanced filter to the dataset and select fields to include in the sample dataset.
Trim empty spaces
Remove empty spaces from the beginning and end of string values. This tool is enabled by default.
Choose how the dataset sample is created. This tool is available for datasets with over 250,000 records.
Use the import options
Complete the following steps to apply import tools to a dataset:
- Open the Add to page window using one of the following options:
- Create a data workbook. The Add to page window appears when the data workbook is created.
- Click the Add to page button above the data pane in an existing data workbook.
- Select a dataset to add to the data workbook.
- Click the Import options button to access the following tools:
- Choose Filter dataset to apply an advanced filter and select fields to include in the sample dataset.
- Choose whether to trim empty spaces from the beginning and end of strings (enabled by default).
- If your dataset has more than 250,000 records, choose whether to use the Random (default) or Fixed sampling method.
- Click Add.
There are two methods for creating sampled data: Random and Fixed.
The Random sampling method selects 250,000 records randomly from the dataset. This method is likely to create a representative sample of unique values and number ranges. However, values with relatively few occurrences may not be selected in the sample. For example, a typo in a string column that appears only once may not be selected in the random sample, so you will not know to fix the typo as part of your data engineering workflow.
The Random method is the preferred sampling method for most datasets.
Database connectors that are not supported out of the box must have updated configuration files to support random sampling. If you are not using the latest configuration files for a connector, you must remove the connector type then re-add the connector with the latest files.
Data-only connections to ArcGIS Enterprise do not support random sampling if Insights 2022.2 or higher is not installed in the organization.
Random sampling may not be supported for data from the Living Atlas and ArcGIS public tabs.
The Fixed method selects the records in the order they occur in the dataset. The default sample size is 250,000 records, but you can increase or decrease the sample size when you import the dataset.
Use the Fixed method when you want to increase the sample size, or when you have a dataset that will provide a representative sample using the records in the order they occur.
Do not use the Fixed method for datasets that are ordered in a way that impacts which values exist in the sample. For example, a dataset has several years' worth of weather data across a country, but the first 250,000 records only include the first two months of data. Therefore, the data will not be representative of the dates, temperatures, precipitation amounts, and other weather conditions recorded in the full dataset.