Note:
Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Overview of ArcGIS Insights.
Data engineering is currently in Preview.
Data engineering is a process that includes exploring, visualizing, cleaning, and preparing your data for analysis. Data engineering is normally performed before you start an analysis workflow.
Data engineering can be completed in Insights using a data model. The data model is created by running data engineering tools on your dataset or a sample of your dataset. Running the data model applies the tools to the full dataset and creates a new output dataset that is ready to use for analysis.
Example
A GIS analyst is preparing air quality data for analysis in Insights. The analyst loads the data into a data workbook in Insights desktop, which automatically trims all extra spaces from the beginning and end of strings. The analyst uses Show column summary to explore the columns in the dataset and finds that 9999 is used for missing values, then uses Find and replace to search for 9999 values and replace them with null values. The analyst also uses Advanced filter to filter the dataset to the desired study area.
Once the analyst is satisfied that the dataset has been prepared for analysis, they can run the data model to create a new output dataset. The analyst decides to save the output as a local dataset in Insights desktop. They can also export the local dataset to another format, such as a compressed shapefile, to share with members of their organization or use in Insights in ArcGIS Online or Insights in ArcGIS Enterprise.
Perform data engineering
Every data engineering workflow will differ slightly based on requirements of the individual datasets. The following workflow can be used as a general guideline for how to perform data engineering in Insights:
- Create a data workbook in Insights desktop.
- Add data and apply import options, if necessary.
- Apply dataset and column tools to clean and prepare your data.
The tools are added to the data model automatically.
- Run the data model to create an output dataset.