Summarize Attributes (Big data)

Available in big data analytics.

The Summarize Attributes tool summarizes field values to generate a summary table. The resulting output table displays the count of summarized features and any additional specified statistics.

Workflow diagram

Summarize Attributes workflow from input features through state stores calculations to an output table

Example

The following is an example use case for the tool:

Tornadoes and hurricanes are some of the most destructive types of storms in the United States. To understand how their impact differs, you can analyze property damage and financial losses caused by both tornadoes and hurricanes. You have access to a dataset that includes tornado and hurricane data across the United States and you want to summarize values for all hurricanes and tornadoes. Using the storm type as a grouping variable, you can generate summary statistics for hurricanes and tornadoes separately.

Usage notes

Note the following when working with the tool:

  • Inputs can be a tabular layer or a layer with geometry (points, lines, or areas).
  • You can use this tool with spatial data; however, the result is tabular. You can then join the results to spatial data using the Join Features tool.
  • This tool is a tabular analysis tool, not a spatial analysis tool. The output table consists of fields containing the result of the statistical operation.
  • Using the Group fields parameter, you can specify one or more fields to summarize by or summarize all features. When you summarize by a single field, statistics are calculated for each unique attribute value. When you summarize by multiple fields, statistics are calculated for each unique combination of attribute values.
  • The output table of this tool always includes a count of the number of features that have been summarized.
  • Additional statistics can be calculated using the Summary fields (optional) parameter. The available summary fields statistics depend on the field type you are summarizing.
    • A string attribute field can use the Any, Count, and Count (distinct) statistics.
    • A numeric attribute field can use the Any, Count, Count (distinct), Sum, Square Sum, Min, Max, Range, Variance, and Standard Deviation statistics.
    • A date attribute field can use the Any, Count, Min, Max, and Range statistics.

How the tool works

The Summarize Attributes tool calculates variance and summarizes input layers into groups with matching field values. The sections below describe the equations, calculations, parameters, and the output table.

Equations

Variance is calculated using the following equation:

Variance equation
Variance variables

Standard deviation is calculated as the square root of the variance.

Calculations

Input layers are summarized into groups with matching field values. The results are tabular, so they cannot be visualized on a map. You can use an output type that includes a feature layer. For more information about the output table generated by the feature layer, refer to the Output table section below.

Input layers are summarized into groups with matching field values. The results are tabular, so they cannot be visualized on a map.

The tables below illustrate the statistical calculations of a layer that is summarized using similar field values. In this example, the tool uses the VO2 field to calculate numeric statistics (Count,Sum, Min, Max, Range, Mean, Standard Deviation, and Variance) and the Rating field to calculate string statistics (Count and Any).

Input layer to be summarized

The table above was summarized on the Designation field, and the VO2 field was used to calculate the numeric statistics (Count,Sum, Min, Max, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer. This result is a table with two features, representing the distinct values of Designation.

Input layer that was summarized using the Designation field
The input layer that was summarized using the Designation field is shown.

The table below represents how the first few fields appear when the layer is summarized using the Designation and AgeGroup fields. Statistics are calculated using the same methods as the previous example.

Input layer summarized using the Designation and AgeGroup fields
The input layer that was summarized using the Designation and AgeGroup fields is shown.

The count statistic (for strings and numeric fields) counts the number of nonnull values. The count of the following values equals 5: [0,1,10,5,null,6] = 5. The count of this set of values equals 3: [Primary, Secondary, null, Tertiary] = 3.

Parameters

The following are the parameters for the tool:

ParameterExplanationData type
Input layer

The input with features to be summarized.

Features

Group fields

(optional)

The fields used to summarize similar features. Either a single field or multiple fields can be used. For example, if you choose a single field called PropertyType that includes values of Commercial and Residential, all residential fields are summarized together, commercial fields are summarized separately, and summary statistics are calculated for each group.

If more than one field is chosen, each unique combination of values is summarized, and summary statistics are calculated for those combinations. For example, consider a first field called PropertyType with the values of Commercial and Residential, and a second field called Occupied that includes the values Yes and No. There are four possible combinations that can be summarized; summary statistics are calculated separately for each of these four groups.

String

Summary fields (optional)

The statistics calculated for the specified fields. Available statistics vary based on whether the field is a string, numeric, or date field.

The supported statistics types are as follows:

  • Any—A sample string taken from a field containing string values.
  • Count—Calculates the number of nonnull values. It can be used on fields with numeric or string values. The count for [null, 0, 2] is 2.
  • Count (distinct)—Calculates the number of distinct, nonnull values. It can be used on fields with numeric or string values. The count distinct result for [null, 4, 3, 4] is 2.
  • Sum—The sum of numeric values in a field. The sum for [null, 1, 3] is 4.
  • Square Sum—Calculates the sum of squared differences of each observation from the overall mean. The sum of squares for [null, 2.2, 3.1, 4.7] is 3.206.
  • Min—The minimum value of a numeric field. The minimum value for [0, 2, null] is 0.
  • Max—The maximum value of a numeric field. The maximum value for [0, 2, null] is 2.
  • Mean—The mean of numeric values. The mean for [0,2, null] is 1.
  • Range—The range of a numeric field. This is calculated as the minimum value subtracted from the maximum value. The range for [0, null, 1] is 1, and the range for [null, 4] is 0.
  • Variance—The variance of a numeric field in a track. The variance for [1] is null, and the variance for [null, 1,1,1] is 1.
  • Standard Deviation—The standard deviation of a numeric field. The standard deviation of [1] is null, and the standard deviation for [null, 1,1,1] is 1.

String

Output table

The tool output is a table containing the fields provided in the Fields parameter, a count attribute of the number of features summarized by that record, and any summarized attributes as specified in the Summary fields parameter.

If a spatiotemporal feature layer is used as an output type, both a spatiotemporal feature layer and a map image layer are created. If an ArcGIS Online hosted feature layer is used as an output type, the output table is a table (hosted).