Time Series Cross Correlation (Space Time Pattern Mining)

Summary

Calculates the cross correlation at various time lags between two time series stored in a space-time cube.

The cross correlation is calculated by pairing the corresponding values of each time series and calculating a Pearson correlation coefficient. The second time series is then shifted by one time step, and a new correlation is calculated. This shifting repeats up to a specified maximum number of time steps. The time lag (shift) with the strongest correlation is an estimate of the delay between changes in one time series and responses in the other (for example, the delay between advertising spending and sales revenue). You can filter and remove trends from the time series to test for statistically significant dependence between the variables. You can also include spatial neighbors in the calculations to incorporate spatial relationships between the two time series.

Learn more about how Time Series Cross Correlation works

Illustration

Time Series Cross Correlation tool illustration
Cross correlations are calculated between two time series at all locations of a space-time cube across various time lags.

Usage

  • The sign (positive or negative) of a time lag value is interpreted as the shift of the secondary analysis variable relative to the primary analysis variable. For example, a time lag value of 5 means that the secondary variable is shifted five time steps forward (right on the time axis) before calculating the cross correlation. If the time lag with the strongest correlation is positive, it means that changes in the value of the secondary analysis variable occur before changes in the primary analysis variable. Similarly, a time lag value of -3 means that the secondary time series is shifted three time steps backward (left on the time axis). If the time lag with strongest correlation is negative, it means that changes in the primary analysis variable occur before changes in the secondary analysis variable.

    Learn more about time lags

  • The primary output of the tool is a feature class containing the cross correlation results of each location for all time lags. In a map, a group layer will be added containing six layers from different fields of the output features: three layers of the strongest correlations (strongest positive, strongest negative, and strongest in absolute value) and three layers of the associated time lags for each of the strongest correlations. You can use these layers to quickly identify which locations had the strongest correlations and which time lags produced the correlations.

    Optionally, you can create pop-up charts on the output features summarizing and visualizing the correlations across all lags at each location. You can also create output tables containing all individual correlations between locations at every time lag.

    Learn more about tool outputs

  • Use the Spatial Neighbors to Include in Calculations parameter to calculate the cross correlations using neighborhoods around each location. This is appropriate when the time series of nearby locations tend to be more similar than time series of locations that are farther away. If neighbors are used, the cross correlation of a location is a weighted average of the correlations between the primary variable of the focal location and the secondary variable of each of its neighbors (including itself). For example, if a location has five neighbors, the cross correlation of the location is a weighted average of six correlations: the correlation between the primary variable of the focal location and secondary variable of the focal location, the correlation between the primary variable of the focal location and the secondary variable of the first neighbor, the correlation between the primary variable of the focal location and the secondary variable of the second neighbor, and so on. The Spatial Neighbor Weighting Method parameter specifies the weights that will be used in the weighted average.

  • To test the statistical significance of the cross correlations at each lag, the Filter and Remove Trends parameter must be checked. When checked, p-values and 95 percent confidence intervals will be calculated for all lags at all locations. Additionally, significance testing can only be performed on pairwise correlations between two time series (rather than a weighted average of multiple correlations), so if you include spatial neighbors in calculations, only the output pairwise correlations table will contain p-values and confidence intervals. If neighbors are not included, the output features and the output lagged correlations table will contain p-value and confidence interval fields.

    Caution:

    The statistical significance tests are independently performed for each time lag of each location, and there is no correction for multiple hypothesis testing. Be cautious when interpreting the significance of any particular p-value or confidence interval.

    Learn more about removing trends and filtering autocorrelation

  • The same analysis variable can be entered for both the primary and secondary analysis variables (called an autocorrelation analysis). However, the results may be difficult to interpret because a time series is always perfectly correlated with itself when the time lag value is zero (unshifted). The output features and correlation tables will contain the correlation results of all time lags, and the results at time lag zero can be filtered or deselected.

Parameters

LabelExplanationData Type
Input Space Time Cube

The space-time cube containing the variable to be analyzed. Space-time cubes have a .nc file extension and are created using various tools in the Space Time Pattern Mining toolbox.

File
Primary Analysis Variable

The numeric variable of the space-time cube containing the time series values of the primary variable.

String
Secondary Analysis Variable

The numeric variable of the space-time cube containing the secondary analysis variable. When using time lags, the secondary analysis variable is shifted relative to the primary analysis variable.

String
Output Features

The output features containing the cross correlations of all locations for all time lags. The output will also have fields of the strongest correlations (positive, negative, and absolute) and fields of the correlations of all time lags. If you filter and remove trends, and you do not use neighbors, the output will contain fields of p-values and 95 percent confidence intervals of all cross correlations.

Feature Class
Enable Time Series Pop-ups
(Optional)

Specifies whether time series charts will be created in the pop-ups of each output feature showing the cross correlation results. Time series pop-ups are not supported for shapefile outputs.

  • Checked—Time series charts will be created for the output features.
  • Unchecked—Time series charts will not be created. This is the default.
Boolean
Maximum Time Lag
(Optional)

The maximum number of time lags that will be used to shift the secondary analysis variable. Cross correlations will be calculated for every time lag value up to the maximum. Provide a positive value even for negative time lags; for example, if 10 is provided for this parameter and the time lag direction shifts the secondary variable both directions, cross correlations will be calculated for all time lags between -10 and 10. If no value is provided, a value will be determined based on the length of the time series. Provide a value of 0 to calculate only the raw correlation between the time series without any time lags.

Long
Secondary Variable Lag Direction
(Optional)

Specifies the direction of the time lag. The secondary variable can be shifted forward in time (relative to the primary variable), backward in time, or in both directions.

  • Shift secondary variable both directionsThe secondary analysis variable will be shifted in both directions. For example, if the maximum time lag is 5, the correlations for all time lags between -5 and 5 will be calculated. This is the default.
  • Shift secondary variable forward in timeThe secondary analysis variable will be shifted forward in time (right on the time axis). For example, if the maximum time lag is 5, the correlations for all time lags between 0 and 5 will be calculated. This option is appropriate when changes in the secondary analysis variable occur before changes in the primary analysis variable.
  • Shift secondary variable backward in timeThe secondary analysis variable will be shifted backward in time (left on the time axis). For example, if the maximum time lag is 5, the correlations for all time lags between -5 and 0 will be calculated. This option is appropriate when changes in the primary analysis variable occur before changes in the secondary analysis variable.
String
Spatial Neighbors to Include in Calculations
(Optional)

Specifies the neighbors around each location that will be used in calculations. If neighbors are used, the cross correlation of a location is the weighted average of the correlations between the primary variable of the focal location and the secondary variable of each of its neighbors (including itself).

  • No neighborsNo spatial neighbors will be included in the calculations.
  • Distance bandLocations within a specified distance of each location will be included as neighbors in calculations.
  • K nearest neighbors A given number of nearest locations will be included as neighbors in calculations.
  • Contiguity edges only Polygons that share an edge will be included as neighbors (rook contiguity).
  • Contiguity edges corners Polygons that share an edge or a corner will be included as neighbors (queen contiguity).
String
Number of Spatial Neighbors
(Optional)

The number of nearest locations that will be included as neighbors in the calculations.

Long
Distance Band
(Optional)

All locations within this distance will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message. If the specified distance results in more than 1,000 neighbors, only the closest 1,000 locations will be included as neighbors. For polygons, the distance between centroids is used to determine neighbors.

Linear Unit
Spatial Neighbor Weighting Method
(Optional)

Specifies the weighting scheme that will be applied to spatial neighbors when calculating the correlations. The weights are used when calculating the weighted average of the correlation between the focal feature and each neighbor.

  • Equal weightsEach neighbor will receive equal weight (unweighted). This is the default.
  • Bisquare kernelNeighbors will be weighted using a bisquare kernel.
  • Gaussian kernelNeighbors will be weighted using a Gaussian kernel.
String
Filter and Remove Trend
(Optional)

Specifies whether trends, seasonality, and autocorrelation will be removed from the primary analysis variable and used to filter the secondary analysis variable.

  • Checked—Trends, seasonality, and autocorrelation will be removed.
  • Unchecked—The time series values will not be altered. This is the default.
Boolean
Output Lagged Correlations Table
(Optional)

A table containing the correlations of every time lag of every location.

Table
Output Pairwise Correlations Table
(Optional)

A table containing the pairwise correlations between each location and each neighbor at all time lags.

Table

Derived Output

LabelExplanationData Type
Output Layer Group

A group layer of the output layers.

Group Layer

arcpy.stpm.TimeSeriesCrossCorrelation(in_cube, analysis_variable_1, analysis_variable_2, output_features, {enable_pop_ups}, {max_lag}, {lag_direction}, {neighborhood_type}, {num_nbrs}, {distance_band}, {spatial_weights}, {filter_option}, {out_corr_table}, {out_pair_table})
NameExplanationData Type
in_cube

The space-time cube containing the variable to be analyzed. Space-time cubes have a .nc file extension and are created using various tools in the Space Time Pattern Mining toolbox.

File
analysis_variable_1

The numeric variable of the space-time cube containing the time series values of the primary variable.

String
analysis_variable_2

The numeric variable of the space-time cube containing the secondary analysis variable. When using time lags, the secondary analysis variable is shifted relative to the primary analysis variable.

String
output_features

The output features containing the cross correlations of all locations for all time lags. The output will also have fields of the strongest correlations (positive, negative, and absolute) and fields of the correlations of all time lags. If you filter and remove trends, and you do not use neighbors, the output will contain fields of p-values and 95 percent confidence intervals of all cross correlations.

Feature Class
enable_pop_ups
(Optional)

Specifies whether time series charts will be created in the pop-ups of each output feature showing the cross correlation results. Time series pop-ups are not supported for shapefile outputs.

  • CREATE_POPUPTime series charts will be created for the output features.
  • NO_POPUPTime series charts will not be created. This is the default.
Boolean
max_lag
(Optional)

The maximum number of time lags that will be used to shift the secondary analysis variable. Cross correlations will be calculated for every time lag value up to the maximum. Provide a positive value even for negative time lags; for example, if 10 is provided for this parameter and the time lag direction shifts the secondary variable both directions, cross correlations will be calculated for all time lags between -10 and 10. If no value is provided, a value will be determined based on the length of the time series. Provide a value of 0 to calculate only the raw correlation between the time series without any time lags.

Long
lag_direction
(Optional)

Specifies the direction of the time lag. The secondary variable can be shifted forward in time (relative to the primary variable), backward in time, or in both directions.

  • BOTHThe secondary analysis variable will be shifted in both directions. For example, if the maximum time lag is 5, the correlations for all time lags between -5 and 5 will be calculated. This is the default.
  • FORWARDThe secondary analysis variable will be shifted forward in time (right on the time axis). For example, if the maximum time lag is 5, the correlations for all time lags between 0 and 5 will be calculated. This option is appropriate when changes in the secondary analysis variable occur before changes in the primary analysis variable.
  • BACKWARDThe secondary analysis variable will be shifted backward in time (left on the time axis). For example, if the maximum time lag is 5, the correlations for all time lags between -5 and 0 will be calculated. This option is appropriate when changes in the primary analysis variable occur before changes in the secondary analysis variable.
String
neighborhood_type
(Optional)

Specifies the neighbors around each location that will be used in calculations. If neighbors are used, the cross correlation of a location is the weighted average of the correlations between the primary variable of the focal location and the secondary variable of each of its neighbors (including itself).

  • NO_NBRSNo spatial neighbors will be included in the calculations.
  • FIXED_DISTANCELocations within a specified distance of each location will be included as neighbors in calculations.
  • K_NEAREST_NEIGHBORS A given number of nearest locations will be included as neighbors in calculations.
  • CONTIGUITY_EDGES_ONLY Polygons that share an edge will be included as neighbors (rook contiguity).
  • CONTIGUITY_EDGES_CORNERS Polygons that share an edge or a corner will be included as neighbors (queen contiguity).
String
num_nbrs
(Optional)

The number of nearest locations that will be included as neighbors in the calculations.

Long
distance_band
(Optional)

All locations within this distance will be included as neighbors. If no value is provided, one will be estimated during processing and included as a geoprocessing message. If the specified distance results in more than 1,000 neighbors, only the closest 1,000 locations will be included as neighbors. For polygons, the distance between centroids is used to determine neighbors.

Linear Unit
spatial_weights
(Optional)

Specifies the weighting scheme that will be applied to spatial neighbors when calculating the correlations. The weights are used when calculating the weighted average of the correlation between the focal feature and each neighbor.

  • EQUALEach neighbor will receive equal weight (unweighted). This is the default.
  • BISQUARENeighbors will be weighted using a bisquare kernel.
  • GAUSSIANNeighbors will be weighted using a Gaussian kernel.
String
filter_option
(Optional)

Specifies whether trends, seasonality, and autocorrelation will be removed from the primary analysis variable and used to filter the secondary analysis variable.

  • FILTERTrends, seasonality, and autocorrelation will be removed.
  • NO_FILTERThe time series values will not be altered. This is the default.
Boolean
out_corr_table
(Optional)

A table containing the correlations of every time lag of every location.

Table
out_pair_table
(Optional)

A table containing the pairwise correlations between each location and each neighbor at all time lags.

Table

Derived Output

NameExplanationData Type
output_layer_group

A group layer of the output layers.

Group Layer

Code sample

TimeSeriesCrossCorrelation example 1 (Python window)

The following Python script demonstrates how to use the TimeSeriesCrossCorrelation function.

import arcpy
arcpy.stpm.TimeSeriesCrossCorrelation(
    in_cube=r"c:\data\Sales.nc",
    analysis_variable_1="MARKETING",
    analysis_variable_2="REVENUE",
    output_features=r"CrossCorrResults",
    enable_pop_ups="NO_POPUP",
    max_lag=10,
    lag_direction="BOTH",
    neighborhood_type="K_NEAREST_NEIGHBORS",
    num_nbrs=8,
    distance_band=None,
    spatial_weights="EQUAL",
    filter_option="FILTER",
    out_corr_table=r"LagCorrTable",
    out_pair_table=r"PairCorrTable"
)
TimeSeriesCrossCorrelation example 2 (stand-alone script)

The following Python script demonstrates how to use the TimeSeriesCrossCorrelation function.

# Estimate the time lag between infection and 
# hospitalization for seasonal influenza.

# Import required modules.
import arcpy

# Set the workspace.
arcpy.env.workspace = "c:/data/data.gdb"

# Run Time Series Cross Correlation
# Use neighbors and calculate p-values
try:
    arcpy.stats.CausalInferenceAnalysis(
        in_cube=r"c:\data\FluData.nc",
        analysis_variable_1="FLU_CASES",
        analysis_variable_2="HOSPITALIZATIONS",
        output_features=r"CrossCorrResults",
        enable_pop_ups="POPUP",
        max_lag=10,
        lag_direction="BOTH",
        neighborhood_type="K_NEAREST_NEIGHBORS",
        num_nbrs=8,
        distance_band=None,
        spatial_weights="BISQUARE",
        filter_option="FILTER",
        out_corr_table=r"LagCorrTable",
        out_pair_table=r"PairCorrTable"
    )
except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())