You can add and calculate new fields for your dataset using the data table or data engineering. Calculated fields allow you to create values, such as growth rates, percent loss, and change over time by choosing fields from your dataset and applying operators and functions.
You can also calculate fields using Calculate % Change, Calculate Ratio, and Calculate Z-Score in Find answers.
Tip:
Use the data table to add normalized data to your dataset for maps, charts, and tables. After the data is calculated and added to your dataset, change the field type to identify it as a rate/ratio field . For more information on data normalization, see Choropleth maps.
Calculate a field in the data table
Fields calculated in the data table only appear in the workbook and are not added to the original dataset. If you want to preserve the calculated field outside of the workbook, you can create a new feature layer by sharing the dataset or using data engineering.
Note:
The data table provides a representative view of your data and has a display limit of 2,000 rows. Sorting your table in ascending and descending order allows you to view the top 2,000 rows and the bottom 2,000 rows. All calculations are performed on the full dataset.
Complete the following steps to calculate a field in the data table:
- Open a workbook and add data, if necessary.
- From the data pane, click the Dataset options button next to the dataset where you want to add a calculated field.
- Click View data table.
- Click + Field.
A column named New Field is appended to the table.
Note:
You can resize and reorder the columns, but these changes are not saved.
- Click the heading of the new column and provide a more descriptive name.
- Click fx or Enter calculate function to display a menu with functions, field names, and operators to build your formula.
Equivalent keyboard commands can also be used to replace any of the buttons from the fx menu.
- Use the functions, fields, and operators to complete your calculation as required.
- Click Run.
If the Run button is not enabled, there is an error in the syntax of your calculation.
The new calculated field appears at the end of your dataset. You can remove a calculated field by selecting it and clicking the Delete calculated field button .
Calculate a column using data engineering
Note:
Data engineering is available in Insights desktop. All Insights in ArcGIS Online and Insights in ArcGIS Enterprise users have access to Insights desktop. For more information, see Overview of ArcGIS Insights.
Data engineering is currently in Preview.
The Calculate column tool can be added to a data model and used to add columns to the output dataset. Columns calculated in a data workbook will be permanently added to the output dataset, rather than existing only in the workbook.
Complete the following steps to calculate a column using data engineering:
- Open a data workbook in Insights desktop and add data, if necessary.
- On the side toolbar, click the Calculate column button .
- For the New column name parameter, type the name of the new column.
Note:
The supported databases in Insights only support letters, numbers, and underscores for column names. Unsupported characters in column names are converted to underscores when saved to a database (for example, population/sq mile is converted to population_sq_mile). For best results, apply a column name that uses only supported characters then update the column alias with other characters, if necessary.
- In the Column expression box, use the functions, column names, and operators to complete your calculation as required.
Equivalent keyboard commands can also be used to replace any of the buttons.
Note:
If the data workbook has more than one dataset, the input options are based on the active table.
- Click Run.
If the Run button is not enabled, there is an error in the syntax of your calculation.
The new calculated column is added to the table and the Calculate column tool is added to the data model.
Operators
Calculating a field can require both mathematical and logical operators. The following operators are available for calculating fields:
Operator | Use |
---|---|
+ | Addition. |
- | Subtraction. |
× | Multiplication. The equivalent keyboard command is *. |
÷ | Division. The equivalent keyboard command is /. |
xy | Power function. The equivalent keyboard command is ^. |
< | Less than. |
> | Greater than. |
= | Equal to. |
<= | Less than or equal to. |
>= | Greater than or equal to. |
<> | Not equal to. |
, | Comma, used as a separator between syntax components in functions. |
( | Left bracket. |
) | Right bracket. |
AND | Logical operator where all conditions must be met. |
OR | Logical operator where one of the conditions must be met. |
IS NULL | A binary operator that tests if values in a field are null. An output value of 0 is assigned when the tested value is not null, and an output value of 1 is assigned when the tested value is null. |
IS NOT NULL | A binary operator that tests if values in a field are not null. An output value of 0 is assigned when the tested value is null, and an output value of 1 is assigned when the tested value is not null. |
Note:
Mathematical operators, such as +, -, >, and =, are compatible with number and rate/ratio fields. Functions, such as CONCATENATE() and DATEDIF(), can be used in place of certain mathematical operators for string fields and date/time fields, respectively.
The IS NULL and IS NOT NULL operators are compatible with all field types (string, number, rate/ratio, and date/time) except locations. Some string fields with null values may be configured as empty strings. In those cases, IS NULL and IS NOT NULL will not return the expected results. Instead, the expressions field="" and field<>"" should be used in place of field IS NULL and field IS NOT NULL, respectively.
The AND and OR operators can be used to join conditions with different syntax than their corresponding logical functions. The following examples show equivalent calculations using the functions and operators:
Note:
When calculating fields, AND and OR must be used within the IF() function.
Function | Operator |
---|---|
IF(AND(MeanAge>=18, MeanAge<=33),"Millennial","Other") | IF(MeanAge>=18 AND MeanAge<=33,"Millennial","Other") |
IF(OR(Crime="Theft", Crime="Theft from vehicle", Crime="Shoplifting"), "Larceny", "Other") | IF(Crime="Theft" OR Crime="Theft from vehicle" OR Crime="Shoplifting", "Larceny", "Other") |
Functions
Functions can be accessed using the fx button or the Enter calculate function field in the data table. There are four types of functions: string, numeric, date, and logical.
When you add a function to your calculation, a pop-up displays the function syntax. You can remove the pop-up using the close button , or redisplay the pop-up by clicking the function in the Enter calculate function field.
String functions
Most string functions use string inputs to produce string outputs. The two exceptions are the VALUE() function and the FIND() function, which both use string inputs to produce numeric outputs.
Input text in string functions can be either literal (text surrounded by quotation marks) or category field values. The following table uses category fields for its examples, along with example values that could be found in those fields. Quotation marks are used in the field values to demonstrate that the fields have categorical data.
Syntax | Description | Example |
---|---|---|
CONCATENATE(text1, [text2], ... ) | Concatenates two or more string values. | A dataset of schools in California contains fields for street address, city, and ZIP Code. A single field of mailing addresses can be created using the CONCATENATE() function:
|
FIND(find_text, within_text, [start_num]) | Gives the position of the specified text (character or characters) within a string or text field. The FIND() function may be most useful when used in conjunction with other functions, such as MID(), LEFT(), or RIGHT(). | A dataset contains a field with street addresses (including number and street name). To classify the data by street, the street name has to be removed from the rest of the address using the MID() function. However, each number is a different length, so the start_num will be different for each row. The start_num can be found using the FIND() function:
|
LEFT(text, [num_chars]) | Returns part of a text field.
| A dataset of traffic accidents includes a category field with the day of the accident, including the day of the week, date, and year. To study the accidents by the day of the week, a new field can be calculated to show the first three characters of the field (starting with day of the week) using the LEFT() function:
|
LOWER(text) | Returns a character expression with all data converted to lowercase. | A public works department is compiling a list of street signs that need to be replaced. As new entries are added to the list, the format of the Status field has become unstandardized, making it difficult to display the signs with unique values. The Status field can be standardized to have all lowercase letters using the LOWER() function:
|
MID(text, start_num, [num_chars]) | Returns part of a text field.
| A dataset of schools in California contains fields for street address, city, and ZIP Code. The street name can be isolated from the street address using the MID() function:
|
RIGHT(text, [num_chars]) | Returns part of a text field.
| A dataset of national parks includes a field with the name of the park and the two-digit state code. To symbolize the parks by state, a new field can be added and calculated using the RIGHT() function:
|
SUBSTITUTE(source_text, old_text, new_text) | Replaces the old text in a string with the specified new text.
| A dataset of pothole locations contains a field with street names. The dataset needs to be updated when Main Street is renamed 5th Street. The street_name field can be updated using the SUBSTITUTE() function:
Note:The SUBSTITUTE() function will replace all occurrences of old_text with new_text. For example, in the function SUBSTITUTE("aba", "a", "c") the result text is "cbc". |
TRIM(text) | Returns the string with extra spaces removed from the ends. | A feature service contains text fields with extra spaces at the beginning and end of their values. The extra spaces can be removed using the TRIM() function:
|
UPPER(text) | Returns a character expression with all data converted to uppercase. | A dataset with locations of NGO headquarters contains a field with the organizations' full names and their acronyms when applicable. The acronyms can be standardized to have all capital letters using the UPPER() function:
|
VALUE(text, [format]) | Converts text to a number.
Note:Nested functions within the VALUE() function may output unexpected results if a format is not provided. Therefore, it is recommended that you specify a format when using the VALUE() function. The VALUE() function does not currently support converting text to negative numbers. | A dataset of retail stores has a category field with the revenue amounts. The Revenue field can be converted to numerical values using the VALUE() function.
|
Numeric functions
Numeric functions use number inputs to produce number outputs. Number functions are most likely to be used in conjunction with other functions or as a method of transforming data.
Input numbers can be either literal numbers or number fields. Some of the examples below use numbers as input rather than fields to better display the usage of each function.
Syntax | Description | Example |
---|---|---|
ABS(number) | Returns the absolute value. |
|
AVG(number) | Returns the mean value. |
|
CEILING(number) | Rounds a number up to the nearest multiple of 1. |
|
COS(number) | The trigonometric cosine of the specified angle, in radians. The following formula can be used to convert degrees to radians:
|
|
FLOOR(number) | Rounds a number down to the nearest multiple of 1. |
|
LN(number) | The natural logarithm of the specified float expression. The natural logarithm uses the constant e as the base value (approximately 2.72) |
|
LOG(number) | The logarithm of a number to a specific base. The default is base 10. |
|
MAX(number) or MAX(date) | Returns the maximum value. |
|
MIN(number) or MIN(date) | Returns the minimum value. |
|
POWER(number, power) | The value of the expression raised to the specified power. |
|
ROUND(number, num_digits) | Rounds numeric values to the specified number of digits.
|
|
SIN(number) | The trigonometric sine of the specified angle, in radians. The following formula can be used to convert degrees to radians:
|
|
STDEV(number) | The standard deviation of a number field. |
|
SUM(number) | Returns the total value. |
|
TAN(number) | Returns the tangent of the input expression. The following formula can be used to convert degrees to radians:
|
|
Date functions
Date functions can use date fields, text, or no input, depending on which function is being used. The DATEVALUE() function can be used to replace a date field in the DATEDIF() or TIMEDIF() function.
Note:
Date/time fields used in DATEDIF() and DATEADD() must be in a format that includes a date (in other words, either date and time or date only). Time-only date/time fields will not be accepted.
Date/time fields used in TIMEDIF() must be in a format that includes a time (in other words, either date and time or time only). Date-only date/time fields will not be accepted.
Syntax | Description | Example |
---|---|---|
DATE(year, month, day, [hour, minute, second], [AM/PM]) | Creates a date/time field using three or more separate fields or values. The following specifications are accepted:
Note:All input fields can be numbers or strings, except the AM/PM parameter. Numbers must be integer values. Two-digit years must be strings. If you are using the DATE() function with a database dataset and you have specified any time parameters (hour, minute, or second), you must enter an input value for all time parameters. | A dataset of restaurants includes string fields with the date and time of the most recent health inspections. A date field for the inspections can be created using the DATE() function:
Note:The result dates in this example will be formatted using a 24-hour clock. To use a 12-hour clock, you must specify AM or PM. The same example function can be written as DATE(2016, Month, Day, Hour, Min, 00, "AM"). |
TIME(hour, minute, second) | Creates a date/time field using three separate fields or values. The following specifications are accepted:
Note:All input fields can be numbers or strings, except the AM/PM parameter. Numbers must be integer values. | A dataset of hospital records includes patient arrival times. A time field for the records can be created using the TIME() function:
Note:The result times in this example will be formatted using a 24-hour clock. To use a 12-hour clock, you must specify AM or PM. The same example function can be written as TIME(Hour, Min, 00, "AM"). |
DATEADD(date, integer, [unit]) | Adds a specified amount of time to the dates in the input field.
If no unit is provided, "D" will be used. When months are added to a date/time field, the output will be the same day of the month, regardless of the number of days in the month. If the corresponding day of the month is not valid in the new month, the last day of the month is used. For example, if a calculation adds three months to the input field, and one of the original dates is November 30, 2021, the output value of February 30, 2022 is not valid. The result will be recorded as February 28, 2022, since 2022 is not a leap year. | A public health unit is responsible for keeping vaccine records for patients within the health district. A specific vaccine has two doses that must be taken at least 60 days apart and no more than 180 days apart. DATEADD() can be used to create fields with the first date the patients are eligible for their second dose and the deadline for the final dose. Eligibility date:
Final date:
|
TIMEADD(time, integer, unit) | Adds a specified amount of time to the times in the input field.
| A dataset has dates and times recorded in coordinated universal time (UTC) and needs to be converted to central standard time (CST). A date/time field with times in CST can be created using the TIMEADD() function:
|
DATEDIF(start_date, end_date, [unit]) | Calculates the elapsed time between two dates. The start_date must take place before the end_date; otherwise, the output will be a negative value.
If no unit is provided, "D" will be used. Rounding is performed on the results for each unit as follows:
| A dataset of restaurants includes the dates of their most recent health inspections. The number of days elapsed between health inspections can be calculated using the DATEDIF() function: Example 1:
Example 2:
Example 3:
|
TIMEDIF(start_time, end_time, [unit]) | Calculates the elapsed time between two date/time values. The start_time must take place before the end_time; otherwise, the output will be a negative value.
If no unit is provided, "ss" will be used. Rounding is performed on the results for each unit as follows:
| A dataset of hospital operation records includes the start and end time of surgeries. The time each patient spends in surgery can be calculated using the TIMEDIF() function:
|
DATEVALUE(date_text, [format]) | Converts text to a date.
Note:Nested functions within the DATEVALUE() function may output unexpected results if a format is not provided. Therefore, it is recommended that you specify a format when nesting functions within DATEVALUE(). Dates specified without separators (for example, "10312016") will be treated as milliseconds if no format is provided. Therefore, it is recommended that you specify a format when entering dates without separators. For dates specified with separators but without a format, the format will be guessed. If the format cannot be guessed, the default "MM-DD-YY" will be applied. The DATEVALUE() function can only read text entered inside quotation marks or string fields. A date/time field cannot be used as input to the DATEVALUE() function. | A dataset of restaurants includes the dates of their most recent health inspections. The number of days elapsed since the last inspection can be calculated using the DATEVALUE() function to specify the current date:
Dates and times can be formatted in several ways. The following examples are some of the ways you can format text into dates:
|
NOW() | Returns the current date and time in date/time format. The time is recorded based on coordinated universal time (UTC). | A dataset of restaurants includes the dates of their most recent health inspections. The number of days elapsed since the last inspection can be calculated using the NOW() function to specify the current date:
|
Logical functions
Logical functions use string or number inputs to produce string or number outputs. The inputs can be either literal (text surrounded by quotation marks or constant numbers) or field values. The following table uses a combination of field and literal values for its examples, along with example values that could be found in the given fields.
Syntax | Description | Example |
---|---|---|
AND(condition,condition, ...) | Tests two or more conditions and returns TRUE if all conditions are met. AND() should be used within the condition parameter of the IF() function.
| A dataset contains the mean age of people living within each block group. A market researcher wants to know which block groups contain mostly millennials. The block groups with a mean age within the age range for millennials can be found using a combination of the IF() and AND() functions:
|
IF(condition, TRUE_expression, [FALSE_expression]) | Tests a condition and returns a TRUE or FALSE value based on the result.
TRUE_expression and FALSE_expression can be any valid expression, including a nested IF() function. The data type for TRUE_expression and FALSE_expression should be the same type (for example, both strings or both numbers). | A dataset contains fields with store revenue and expenses, which can be used to find the status of the stores (whether they have posted a net profit or a net loss). The status of each store can be found using the IF() function:
|
ISNULL(field)* | Tests if values in a field are null. An output value of 0 is assigned when the tested value is not null, and an output value of 1 is assigned when the tested value is null.
ISNULL() can be used on its own, or within the condition parameter of the IF() function. | A dataset contains locations of traffic collisions involving cyclists, as well as the type of bike route present on the street where the collision took place. For collisions that took place on a street without a bike route, the RouteType field is left blank. The dataset can be updated to fill in null values using a combination of the IF() and ISNULL() functions:
|
ISNOTNULL(field)* | Tests if values in a field are not null. An output value of 0 is assigned when the tested value is null, and an output value of 1 is assigned when the tested value is not null.
ISNOTNULL() can be used on its own, or within the condition parameter of the IF() function. | A dataset contains locations of traffic collisions involving cyclists, as well as the type of bike route present on the street where the collision took place. For collisions that took place on a street without a bike route, the RouteType field is left blank. The dataset can be updated to fill in null values using a combination of the IF() and ISNOTNULL() functions:
|
OR(condition,condition, ...) | Tests two or more conditions and returns TRUE if at least one condition is met. OR() should be used within the condition parameter of the IF() function.
| A dataset contains locations of crimes across a city. A crime analyst wants to analyze the pattern of larceny compared to other similar crimes. To compare larceny to other crimes, the analyst must first determine whether or not each crime was considered larceny. The type of crime can be determined using a combination of the IF() and OR() functions:
|
*Some fields with no data may be configured as empty rather than null. In those cases, ISNULL() and ISNOTNULL() will not return the expected results. Instead, the expressions field="" and field<>"" should be used in place of ISNULL(field) and ISNOTNULL(field), respectively.