Summarize Attributes

Tool icon Available in big data analytics.

The Summarize Attributes tool summarizes similar field values to generate a summary table. The resulting output table displays the count of features summarized as well as any additional statistics that have been specified.

Workflow diagram

Summarize Attributes tool workflow diagram

Example

The following is the example use case of the Summarize Attributes tool:

Tornadoes and hurricanes are some of the most destructive types of storms in the United States. To understand how their impact differs, you want to analyze property damage and financial losses caused by both tornadoes and hurricanes. You have access to tornado and hurricane data across the United States in a single dataset and you want to summarize all the information to see a summary of values for all hurricanes and a summary of values for all tornadoes. You can summarize your data using the type of storm to determine the statistics for each storm type.

Usage notes

Keep the following in mind when working with the Summarize Attributes tool:

  • Inputs can be a tabular layer or a layer with geometry (points, lines, or areas).
  • You can use this tool with spatial data. However, the result is tabular. You can then join your results to spatial data using the Join Features tool.
  • This tool is a tabular analysis tool, not a spatial analysis tool. The output table consists of fields containing the result of the statistical operation.
  • Using the Fields parameter, you can specify one or more fields to summarize by or summarize all features. When you summarize by a single field, statistics are calculated for each unique attribute value. When you summarize by multiple fields, statistics are calculated for each unique combination of attribute values.
  • The output table of this tool always includes a count of the number of features that have been summarized.
  • Additional statistics can be calculated using the Summary fields parameter. The summary fields statistics available depend on the field type you are summarizing.
    • A string attribute field can use the statistics Any, Count, and Count distinct.
    • A numeric attribute field can use the statistics Any, Count, Count distinct, Sum, Sum of squares, Min, Max, Range, Variance, and Standard deviation.
    • A date attribute field can use the statistics Any, Count, Min, Max, and Range.

How the Summarize Attributes tool works

The Summarize Attributes tool calculates variance and summarizes input layers into groups with matching field values. The equations, calculations, parameters, and the output table are described in the sections below.

Equations

Variance is calculated using the following equation:

Variance equation
Variance variables

Standard deviation is calculated as the square root of the variance.

Calculations

Input layers are summarized into groups with matching field values. The results are tabular, so they cannot be visualized on your map. You can use an output type that includes a feature layer. For more information on the output table generated by the feature layer, refer to the Output table section of this page.

The tables below illustrate the statistical calculations of a layer that is summarized using similar field values. The VO2 field was used to calculate the numeric statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer.

Metadata fields for the input layer to be summarized

The table above was summarized on the Designation field, and the VO2 field was used to calculate the numeric statistics (Count,Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer. This result is a table with two features, representing the distinct values of Designation.

Aggregate data for the summary layer using the Designation field
When the input layer is summarized using the Designation field, the sum, minimum, and maximum values are provided.

The following table represents how the first few fields appear when the layer is summarized using the Designation and Age Group fields. Statistics are calculated using the same methods as the previous example.

Tabular summary of the Designation and AgeGroup fields
The input layer is summarized using the Designation and Age Group fields.

The count statistic for strings and numeric fields counts the number of non-null values. The count of the following values equals 5: [0, 1, 10, 5, null, 6] = 5. The count of this set of values equals 3: [Primary, Primary, Secondary, null] = 3.

Parameters

The Summarize Attributes tool parameters are described below:

ParameterExplanationData type

Input layer

The point features that density is calculated for.

Features

Fields (optional)

The fields used to summarize similar features. Either a single field or multiple fields can be used. For example, if you choose a single field called PropertyType that includes values of Commercial and Residential, all residential fields are summarized together, commercial fields are summarized separately, and summary statistics are calculated for each group.

If more than one field is chosen, each unique combination of values is summarized, and summary statistics are calculated for those combinations. For example, consider a first field called PropertyType with the values of Commercial and Residential, and a second field called Occupied that includes the values Yes and No. There are four possible combinations that can be summarized; summary statistics are calculated separately for each of these four groups.

String

Summary fields (optional)

The statistics calculated for the specified fields. Available statistics vary based on whether the field is a string, numeric, or date field.

The following are available statistics types:

  • Any—A sample string taken from a field containing string values.
  • Count—Calculates the number of non-null values. It can be used on fields with numeric or string values. The count for [null, 0, 2] is 2.
  • Count Distinct—Calculates the number of distinct, non-null values. It can be used on fields with numeric or string values. The count distinct result for [null, 4, 3, 4] is 2.
  • Sum—The sum of numeric values in a field. The sum for [null, 1, 3] is 4.
  • Sum of squares—Calculates the sum of squared differences of each observation from the overall mean. The sum of squares for [null, 2.2, 3.1, 4.7] is 3.206.
  • Min—The minimum value of a numeric field. The minimum value for [0, 2, null] is 0.
  • Max—The maximum value of a numeric field. The maximum value for [0, 2, null] is 2.
  • Mean—The mean of numeric values. The mean for [0,2, null] is 1.
  • Range—The range of a numeric field. This is calculated as the minimum value subtracted from the maximum value. The range for [0, null, 1] is 1, while the range for [null, 4] is 0.
  • Variance—The variance of a numeric field in a track. The variance for [1] is null, while the variance for [null, 1,1,1] is 1.
  • Standard deviation—The standard deviation of a numeric field. The standard deviation of [1] is null, while the standard deviation for [null, 1,1,1] is 1.

String

Output table

The output of this tool is a table containing the fields provided in the Fields parameter, a count attribute of the number of features summarized by that record, and any summarized attributes as specified in the Summary fields parameter.

If a spatiotemporal feature layer is used as an output type, both a spatiotemporal feature layer and a map image layer are created. If an ArcGIS Online hosted feature layer is used as an output type, the output table is a Table (hosted).