Note:
Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Overview of ArcGIS Insights.
Data engineering is currently in Preview.
Column tools are used to make changes to data in a specific column and are accessed from individual columns in the tables below the data model. Column tools are added to the data model after they are applied.
Convert data type
Data types are applied to every column in Insights. Depending on the formatting in your input data, the default data type may not be correct for all columns. For example, .csv files do not support date/time formatting, so all date/time columns will be imported as strings.
Columns can be converted to string, integer, double, or date/time data types.
Complete the following steps to use Convert data type:
- Open a data workbook in Insights desktop and add data, if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If your view is model or map only, click the Show table button .
- Choose Convert data type.
- For Output data type, choose the data type to which you want to convert the column and apply the necessary formatting options:
- If you are converting a column to date/time, input the date/time format.
- If you are converting to integer or double, select the type of decimal separator used in your data (Point or Comma).
- Click Apply.
The column is converted to the selected data type.
Integer
Integer data types are numbers with no decimal values.
Any column with numbers can be converted to an integer. If the column contains characters that aren't numbers, those characters are removed.
If the column contains decimal values, the numbers after the decimal are removed and the integer is not rounded. If you want your number to be rounded, you can convert the column to double, and calculate a new column using the ROUND() function.
Integer data types are read as number fields in an Insights workbook.
Double
Double (or double-precision floating-point) data types are numbers that allow decimal values.
Any column with numbers can be converted to double. If the column contains characters that aren't numbers, those characters are removed.
Double data types are read as number fields in an Insights workbook.
Date/time
Date/time columns can be date only, time only, or date and time.
A date/time format is required to convert a column to a date/time data type. The default date/time format is ISO 8601. If your data is not using the ISO 8601 format, you must change the format in the Custom format box to match your data.
Note:
All dates and times in a single column must use the same format. It is best practice to include a period component for all time values using a 12-hour clock to ensure the correct time is used.
The following variables are supported to create your date/time format:
Date/time component | Variable | Supported values |
---|---|---|
Month | MM | 01 to 12 |
M | 1 to 12 | |
Day | DD | 01 to 31 |
D | 1 to 31 | |
Year | YYYY | Four digit year |
YY | Two digit year (00 to 99) | |
Hour | HH | 00 to 24 |
H | 0 to 24 | |
hh | 01 to 12 | |
h | 1 to 12 | |
Minute | mm | 00 to 59 |
m | 0 to 59 | |
Second | ss | 00 to 59 |
s | 0 to 59 | |
Period | A | AM, PM, or the supported period format based on your locale. |
a | ||
Time zone | ZZ | +HHmm or -HHmm |
TZD | +hh:mm or -hh:mm |
Filter values
Attribute filters use queries to remove unnecessary data from your dataset. This tool can be used to reduce the size of the dataset and include only data that is relevant to your analysis. For example, you may want to remove entries with null values or filter data only to a specified study area or date range.
Complete the following steps to use Filter values:
- Open a data workbook in Insights desktop and add data, if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If your view is model or map only, click the Show table button .
- Choose Filter values.
- Do one of the following:
- To filter string columns, select or deselect unique values that you want to display or exclude.
- To filter numbers or rate/ratio columns, adjust the slider at the lower and upper ends of the range, or click the nodes and provide a new value.
- To filter date/time columns, use the calendar and clock buttons to adjust the start and end date and time, respectively. You can also type a date or time directly in the filter.
For more information on applying filter settings for each data type, see Filter data.
- Click Apply.
The filter is applied and rows that do not match the filter criteria are removed.
Find and replace
Note:
Find and replace is currently available for string columns. To replace values in number or date/time columns, you must convert the column to a string, replace the string, and convert the column back to the original data type.
Find and replace is used to search for specific values in the column and replace them with a new value. This tool can be used to replace placeholder or default values with null values, fix misspelled strings, or create consistent casing, among other uses.
Complete the following steps to use Find and replace:
- Open a data workbook in Insights desktop and add data, if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If your view is model or map only, click the Show table button .
- Choose Find and replace.
- Do one of the following:
- To find and replace custom strings, remain on the String tab (default). In the Find box, type a custom string to search for in the column. Select Match cases to make the search case-sensitive and select Match whole word only to return only full string matches, if necessary.
- To find and replace empty string values, click the Empty string tab.
- To find and replace null values, click the Null tab.
- In the Replace with box, type the string with which you want to replace the matching strings. Leave the Replace with box to replace with empty strings. If you are replacing empty strings with null values (Empty string tab), select Replace with null.
- Optionally, click Find to search for and highlight the matching strings.
- Click Replace to search for and replace the matching strings.
- Repeat the steps to find and replace additional strings, or click Done to close the dialog box.
The matching strings are updated.
Remove column
The Remove column tool is used to exclude unnecessary columns from the output dataset. A removed column cannot be shown later during analysis using Show hidden fields. Removing a column only affects the output dataset and does not delete the column from the source dataset.
Note:
Datasets in data workbooks must have at least one column. Remove column is disabled for datasets with only one column.
Complete the following steps to remove a column from the dataset.
- Open a data workbook in Insights desktop and add data, if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If your view is model or map only, click the Show table button .
- Choose Remove column.
The column is removed from the table.
Show column summary
Show column summary creates a distribution graph and calculates statistics using data from the column. The summary can be used to find issues in your data like misspelled strings or outliers.
Note:
Show column summary creates a visualization of the values in a column, rather than changing the values. Therefore, Show column summary is not added to the data model.
Complete the following steps to create a column summary:
- Open a data workbook in Insights desktop and add data, if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If your view is model or map only, click the Show table button .
- Click Show column summary.
A side panel appears with a chart and statistics for the column.
String columns
String columns create a bar chart summary. The bar chart displays the unique values in the column with the count of features and the summary displays statistics for total count of features, count of null values, count of empty strings, and total number of unique values. Use the slider on the side of the chart to zoom in and see all unique values, if necessary. Use the Sort button to sort the chart ascending, descending, or alphabetically.
You can make selections on the bar chart summary by clicking individual bars. Selections on the column summary are reflected on the table view or map view, if applicable.
Look for the following issues in a string column summary:
- Does the column contain null values and empty strings? Null values and empty strings will be analyzed separately. Use the Find and replace tool to replace null values with empty strings, or vice versa.
- Do all values use consistent casing? Inconsistent casing can lead to incorrectly separating values that should be aggregated as a single unique value. Use the Find and replace tool to do a case-insensitive search for a string and replace all matching values with a single value.
- Does the column contain misspelled values? Spelling inconsistencies will result in inaccurate aggregations. Use the Find and replace tool to search for the misspelled value and replace it with the correct spelling.
Number columns
Number columns create a histogram summary. The histogram aggregates the numbers into bins and displays the frequency of values within each bin. The mean and median values are displayed on the histogram and the summary displays statistics for total count of features, count of null values, mean, approximate median, upper and lower quartile, standard deviation, skewness, and excess kurtosis. Use the slider under the x-axis to change the number of bins on the histogram.
You can make selections on the histogram summary by clicking individual bars. Selections on the histogram summary are reflected on the table view or map view, if applicable.
Look for the following issues in a number column summary:
- Does the dataset contain outliers or values outside of the expected range? For example, if the column includes percent values, the expected range is 0 to 100. Use the Filter values tool to remove the rows with incorrect values.
- Is the data normally distributed? Some statistical analyses, such as Create Regression Model, require normally distributed data. Use the Calculate column tool to apply a transformation to the column.
- Does the dataset contain placeholder or default values that should be removed (for example, 9999)? Use the Find and replace tool to replace the values with nulls, if appropriate, or use the Filter values tool to remove the rows containing placeholder or default values.
Note:
Find and replace currently only supports string columns. You must convert the number column to a string, replace the string, then convert the column back to a number (integer or double).
Date/time columns
Date/time columns create a time series summary. The time series aggregates the date/time values into intervals and displays the count for each interval. The summary displays the total count of features, minimum date, maximum date, and count of null values. Use the slider above the time series to zoom in and view the graph at a more granular scale. If the zoom level passes a required threshold, the date/time interval updates to the appropriate interval.
You can make selections on the time series summary by clicking individual time points. Selections on the time series summary are reflected on the table view or map view, if applicable.
Look for the following issues in a date/time column summary:
- Are the dates within the required date range? Use the Filter values tool to keep only values within the desired date/time range.
- Does your dataset contain placeholder or default values that should be removed (for example, 1000-01-01)? Use the Find and replace tool to replace the values with nulls, if appropriate, or use the Filter values tool to remove the rows containing placeholder or default values.
Note:
Find and replace currently only supports string columns. You must convert the date/time column to a string, replace the string, and convert the column back to date/time.