Create and use a box plot

Insights in ArcGIS Online
Insights in ArcGIS Enterprise
Insights desktop

Box plots provide a quick visual summary of the variability of values in a dataset. They show the median, upper and lower quartiles, minimum and maximum values, and any outliers in the dataset. Outliers can reveal mistakes or unusual occurrences in data. A box plot is created using a number or rate/ratio field on the y-axis.

Box plots can answer questions about data, such as: How is the data distributed? Are there any outliers in the dataset? What are the variations in the spread of several series in the dataset?

Examples

A market researcher is studying the performance of a retail chain. A box plot of the annual revenue at each store can be used to determine the distribution of sales, including the minimum, maximum, and median values.

A box plot of store revenue

The box plot above shows the median sales amount is $1,111,378 (shown by hovering over the chart or using the Flip card button Flip card to flip the card over). The distribution seems fairly even, with the median being in the middle of the box and the whiskers being a similar size. There are also low and high outliers, which gives the analyst an indication of which stores are over- and underperforming.

Learn more about the components of a box plot

To delve deeper into the data, the analyst decides to create individual box plots for each region where the stores are located. She does this by changing the Group by field to Region. The result is four individual box plots that can be compared to discern information about each region.

A box plot of store revenue for each region

Based on the box plots, the analyst can tell that there are few differences between regions; the medians are consistent across the four box plots, the boxes are similar sizes, and all regions have outliers at both the minimum and maximum ends. However, the whiskers for the Northern and Central regions are slightly more compact than the Bay Area and Southern regions, which implies that those regions have more consistent performance than the others. In the Bay Area and Southern regions, the whiskers are a bit longer, which implies those regions have stores that are performing poorly, as well as stores that are performing well. The analyst may want to focus her analysis on those two regions to determine why there is such a variation in performance.

Create a box plot

To create a box plot, complete the following steps:

  1. Select one of the following combinations of data:
    • A number Number field or rate/ratio field Rate/ratio field.
    • A number Number field or rate/ratio field Rate/ratio field plus a string field String field.
    Note:

    You can search for fields using the search bar in the data pane.

  2. Create the chart using the following steps:
    1. Drag the selected fields to a new card.
    2. Hover over the Chart drop zone.
    3. Drop the selected fields on Box Plot.
Tip:

You can also create charts using the Chart menu above the data pane or the Visualization type button Visualization type on an existing card. For the Chart menu, only charts that are compatible with the data selection will be enabled. For the Visualization type menu, only compatible visualizations (including maps, charts, or tables) will be displayed.

Note:

Box plots created from database datasets must have at least five records. Box plots with fewer than five records are most likely to occur when grouping your box plot using a string field or applying a filter to your dataset or card. Database datasets are available through database connections in Insights in ArcGIS Enterprise and Insights desktop.

Usage notes

This visualization creates a result dataset Results in the data pane, which includes the fields used to create the chart. The result dataset can be used to create additional visualizations, rename the fields on the chart axes or in the pop-ups, or apply filters to the chart.

A key feature for a box plot is the determination of outliers. Outliers are values that are much larger or smaller than the rest of the data. Whiskers on a box plot represent the threshold beyond which values are considered outliers. If there are no outliers, the whiskers will stretch to the minimum and maximum values in the dataset. In Insights, the range for the lower and upper outlier values are indicated on the box plot as circles linked by dotted lines.

If a group by field is used, side-by-side box plots are created, with each box plot representing the spread of data in each category.

Each statistic or range in the box plot can be selected by clicking the chart.

Use the Layer options button Layer options to open the Layer options pane and do the following to update the configuration options:

  • Use the Legend tab Legend to view the symbols on the chart. The pop-out legend button Pop out legend displays the legend as a separate card on the page. You can use the legend to make selections on the chart (available for unique symbols).

    To change the color associated with a value, click the symbol and choose a color from the palette or provide a hexadecimal value. Changing the symbol on the Legend tab is only available for unique symbols.
  • The Appearance tab Appearance changes the symbol color on the chart (single symbol only).

Use the Card filter button Card filter to remove any unwanted data from the card. Filters can be applied to all string, number, rate/ratio, and date/time fields. A card filter does not affect other cards using the same dataset.

Use the Selection tools button Selection tools to select features on the chart using the single select tool, or invert the selection.

Use the Visualization type button Visualization type to switch directly between a box plot and other visualizations, such as a graduated symbols map, summary table, or histogram. If the box plot includes a Group by field, the visualization can be changed to charts, such as a line graph or column chart.

Use the Maximize button Maximize to enlarge the card. Other cards on the page will be reduced to thumbnails. The card can be returned to its previous size using the Restore down button Restore down.

Use the Enable cross filters button Enable cross filters to allow filters to be created on the card using selections on other cards. Cross filters can be removed using the Disable cross filters button Disable cross filters.

Use the Flip card button Flip card to view the back of the card. The Card info tab Card info provides information about the data on the card and the Export data tab Export data allows users to export the data from the card.

Use the Card options button Card options to access the following options:

  • Appearance button Appearance—Change the background color, foreground color, and border of the card.
  • Edit labels button Edit labels—Create custom labels for the chart axes. To edit the labels, click the Edit labels button and click the axis to make it editable.
  • Order button Order—Move the card forward or move the card backward relative to other cards on the page.
  • Delete button Delete—Remove the card from the page. If you did not intend to delete the card, you can retrieve it using the Undo button Undo.

How box plots work

A box plot consists of the following components:

A labeled diagram of a box plot

LabelComponentDescription
1

Whisker

The range of data less than the first quartile and greater than the third quartile. Each whisker has 25 percent of the data. Whiskers typically cannot be more than 1.5 times IQR, which sets the threshold for outliers.

2

Box

The range of data between the first and third quartiles. 50 percent of the data lies within this range. The range between the first and third quartile is also known as the Inter Quartile Range (IQR).

3

Maximum

The largest value in the dataset or the largest value that is not outside the threshold set by the whiskers.

4

Third quartile

The value where 75 percent of the data is less than the value, and 25 percent of the data is greater than the value.

5

Median

The middle number in the dataset. Half of the numbers are greater than the median and half are less than the median. The median can also be called the second quartile.

6

First quartile

The value where 25 percent of the data is less than the value, and 75 percent of the data is greater than the value.

7

Minimum

The smallest value in the dataset or the smallest value that is not outside the threshold set by the whiskers.

8

Outliers

Data values that are higher or lower than the limits set by the whiskers.

Resources

Use the following resources to learn more about charts: