Note:
Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Introduction to ArcGIS Insights.
Data engineering is currently in Preview.
Column tools are used to make changes to data in a specific column and are accessed from individual columns in the tables below the data model. Column tools are added to the data model after they are applied.
Convert data type
Data types are applied to every column in Insights. Depending on the formatting of the input data, the default data type may not be correct for all columns. For example, .csv files do not support date/time formatting, so all date/time columns will be imported as strings.
Columns can be converted to string, integer, double, or date/time data types using the Convert data type tool.
Complete the following steps to convert the data type:
- Open a data workbook in Insights desktop and add data if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If the view is model or map only, click the Show table button .
- Choose Convert data type.
- For Output data type, choose the data type to which you want to convert the column and apply the necessary formatting options:
- To convert a column to date/time, input the date/time format.
- To convert to integer or double, select the type of decimal separator used in the data (Point or Comma).
- Click Apply.
The column is converted to the selected data type.
Integer
Integer data types are numbers with no decimal values.
Any column with numbers can be converted to an integer. If the column contains characters that aren't numbers, those characters are removed.
If the column contains decimal values, the numbers after the decimal are removed and the integer is not rounded. To round a number, you can convert the column to double, and calculate a new column using the ROUND() function.
Integer data types are read as number fields in an Insights workbook.
Double
Double (or double-precision floating-point) data types are numbers that allow decimal values.
Any column with numbers can be converted to double. If the column contains characters that aren't numbers, those characters are removed.
Double data types are read as number fields in an Insights workbook.
Date/time
Date/time columns can be date only, time only, or date and time.
A date/time format is required to convert a column to a date/time data type. The default date/time format is ISO 8601. If the data is not using the ISO 8601 format, you must change the format in the Custom format box to match the data.
Note:
All dates and times in a single column must use the same format. It is best practice to include a period component for all time values using a 12-hour clock to ensure that the correct time is used.
The following variables are supported for the date/time format:
Date/time component | Variable | Supported values |
---|---|---|
Month | MM | 01 to 12 |
M | 1 to 12 | |
Day | DD | 01 to 31 |
D | 1 to 31 | |
Year | YYYY | Four digit year |
YY | Two digit year (00 to 99) | |
Hour | HH | 00 to 24 |
H | 0 to 24 | |
hh | 01 to 12 | |
h | 1 to 12 | |
Minute | mm | 00 to 59 |
m | 0 to 59 | |
Second | ss | 00 to 59 |
s | 0 to 59 | |
Period | A | AM, PM, or the supported period format based on your locale. |
a | ||
Time zone | ZZ | +HHmm or -HHmm |
TZD | +hh:mm or -hh:mm |
Filter values
Attribute filters use queries to remove unnecessary data from a dataset. You can use the Filter values tool to reduce the size of the dataset and include only data that is relevant to the analysis. For example, you can remove entries with null values or filter data only to a specified study area or date range.
Complete the following steps to filter values:
- Open a data workbook in Insights desktop and add data if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If the view is model or map only, click the Show table button .
- Choose Filter values.
- To filter columns, do one of the following:
- For string columns, select or deselect unique values that you want to display or exclude.
- For numbers or rate/ratio columns, adjust the slider at the lower and upper ends of the range, or click the nodes and provide a new value.
- For date/time columns, use the calendar and clock buttons to adjust the start and end date and time, respectively. You can also type a date or time directly in the filter.
For more information about applying filter settings for each data type, see Filter data.
- Click Apply.
The filter is applied and rows that do not match the filter criteria are removed.
Find and replace
Note:
The Find and replace tool is currently only available for string columns. To replace values in number or date/time columns, you must convert the column to a string, replace the string, and convert the column back to the original data type.
Find and replace is used to search for specific values in the column and replace them with a new value. This tool can be used to replace placeholder or default values with null values, correct misspelled strings, or create consistent casing, among other uses.
Complete the following steps to use find and replace values:
- Open a data workbook in Insights desktop and add data if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If the view is model or map only, click the Show table button .
- Choose Find and replace.
- To find and replace values, do one of the following:
- For custom strings, remain on the String tab (default), and in the Find text box, type a custom string to search for in the column. Optionally, select Match cases to make the search case sensitive and select Match whole word only to return only full string matches.
- For empty string values, click the Empty string tab.
- For null values, click the Null tab.
- Specify the replace value by doing one of the following:
- In the Replace with text box, type the string to use to replace the matching strings.
- Leave the Replace with text box empty to replace with empty strings.
- To replace empty strings with null values (Empty string tab), select Replace with null.
- Optionally, click Find to search for and highlight the matching strings.
- Click Replace to search for and replace the matching strings.
- Repeat the steps above to find and replace additional strings, or click Done to close the dialog box.
The matching strings are updated.
Remove column
Use the Remove column tool to exclude unnecessary columns from the output dataset. A removed column cannot be shown later during analysis using Show hidden fields. Removing a column only affects the output dataset and does not delete the column from the source dataset.
Note:
Datasets in data workbooks must have at least one column. The Remove column tool is disabled for datasets with only one column.
Complete the following steps to remove a column from the dataset:
- Open a data workbook in Insights desktop and add data if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If the view is model or map only, click the Show table button .
- Choose Remove column.
The column is removed from the table.
Show column summary
The Show column summary tool creates a distribution graph and calculates statistics using data from the column. The summary can be used to find issues in the data such as misspelled strings or outliers.
Note:
The Show column summary tool creates a visualization of the values in a column, rather than changing the values. Show column summary is not added to the data model.
Complete the following steps to create a column summary:
- Open a data workbook in Insights desktop and add data if necessary.
- Click the arrow next to a column name to expand the column tools menu.
Note:
If the view is model or map only, click the Show table button .
- Click Show column summary.
A side panel appears with a chart and statistics for the column.
String columns
String columns create a bar chart summary. The bar chart displays the unique values in the column with the count of features, and the summary displays statistics for total count of features, count of null values, count of empty strings, and total number of unique values. Use the slider on the side of the chart to zoom in on all the unique values if necessary. Use the Sort button to sort the chart in ascending, descending, or alphabetical order.
To make selections on the bar chart summary, click individual bars. Selections on the column summary are reflected on the table view or map view if applicable.
The following are potential issues in a string column summary:
- Does the column contain null values and empty strings? Null values and empty strings will be analyzed separately. Use the Find and replace tool to replace null values with empty strings, or vice versa.
- Do all values use consistent casing? Inconsistent casing can lead to incorrectly separating values that should be aggregated as a single unique value. Use the Find and replace tool to perform a search that is not case sensitive for a string and replace all matching values with a single value.
- Does the column contain misspelled values? Spelling inconsistencies will result in inaccurate aggregations. Use the Find and replace tool to search for the misspelled value and replace it with the correct spelling.
Number columns
Number columns create a histogram summary. The histogram aggregates the numbers into bins and displays the frequency of values within each bin. The mean and median values are displayed on the histogram, and the summary displays statistics for total count of features, count of null values, mean, approximate median, upper and lower quartile, standard deviation, skewness, and excess kurtosis. Use the slider under the x-axis to change the number of bins on the histogram.
To make selections on the histogram summary, click individual bars. Selections on the histogram summary are reflected on the table view or map view if applicable.
The following are potential issues in a number column summary:
- Does the dataset contain outliers or values outside of the expected range? For example, if the column includes percent values, the expected range is 0 to 100. Use the Filter values tool to remove the rows with incorrect values.
- Is the data normally distributed? Some statistical analyses, such as Create Regression Model, require normally distributed data. Use the Calculate column tool to apply a transformation to the column.
- Does the dataset contain placeholder or default values that should be removed (for example, 9999)? Use the Find and replace tool to replace the values with nulls if appropriate, or use the Filter values tool to remove the rows containing placeholder or default values.
Note:
The Find and replace tool currently only supports string columns. You must convert the number column to a string, replace the string, and convert the column back to a number (integer or double).
Date/time columns
Date/time columns create a time series summary. The time series aggregates the date/time values into intervals and displays the count for each interval. The summary displays the total count of features, minimum date, maximum date, and count of null values. Use the slider above the time series to zoom in and view the graph at a more granular scale. If the zoom level passes a required threshold, the date/time interval updates to the appropriate interval.
To make selections on the time series summary, click individual time points. Selections on the time series summary are reflected on the table view or map view, if applicable.
The following are potential issues in a date/time column summary:
- Are the dates within the required date range? Use the Filter values tool to keep only values within the desired date/time range.
- Does the dataset contain placeholder or default values that should be removed (for example, 1000-01-01)? Use the Find and replace tool to replace the values with nulls if appropriate, or use the Filter values tool to remove the rows containing placeholder or default values.
Note:
The Find and replace tool currently only supports string columns. You must convert the date/time column to a string, replace the string, and convert the column back to date/time.