Train Time Series Forecasting Model (GeoAI)

Summary

Trains a deep learning-based time series forecasting model using time series data from a space-time cube. The trained model can be used for forecasting the values of each location of a space-time cube using the Forecast Using Time Series Model tool.

Time series data can follow various trends and have multiple levels of seasonality. Traditional time series forecasting models based on statistical approaches perform differently depending on the trend and patterns of seasonality in the data. Deep learning-based models have a high capacity to learn and can provide results across different kinds of time series, provided there is enough training data.

This tool trains time series forecasting models using various deep learning-based models, such as Fully Connected Network (FCN), Long Short-Term Memory (LSTM), InceptionTime, ResNet, and ResCNN. These models support multivariate time series, in which the model learns from more than one time dependent variable to forecast future values. The trained model is saved as a deep learning package file (.dlpk) and can be used for forecasting future values using the Forecast Using Time Series Model tool.

Learn more about how Time Series Forecasting Models work

Usage

  • You must install the proper deep learning framework for Python in ArcGIS AllSource.

    Learn how to install deep learning frameworks for ArcGIS

  • This tool accepts netCDF data created by the Create Space Time Cube By Aggregating Points, Create Space Time Cube From Defined Locations, Create Space Time Cube from Multidimensional Raster Layer, and Subset Space Time Cube tools.

  • Compared to other forecasting tools in the Time Series Forecasting toolset, this tool uses deep learning-based time series forecasting models. Deep learning models have a high capacity to learn and are appropriate for time series that follow complex trends and are difficult to model with simple mathematical functions. However, they require a larger volume of training data to learn such complex trends and use more computational resources for training and inference. A GPU is recommended for using this tool.

  • To run this tool using a GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.

  • This tool can be used to model both univariate and multivariate time series. If the space-time cube has other variables that are related to the variable being forecast, they can be included as explanatory variables to improve the forecast.

  • Univariate time series forecasting is estimated using only the one-step method, which is also the default method.

  • Multivariate time series forecasting can be used using two different approaches, one-step forecasting and multistep forecasting. The Multi-Step parameter will become active when multiple explanatory training variables are selected.

  • During the one-step method, the model can be updated with new data at each time step, making it suitable for real-time applications. However, since the model is updated at each time step, errors in predictions can accumulate over time, leading to less accurate long-term forecasts. When using multistep forecasting, the model predicts multiple future data points beyond the current time step. For example, if the goal is to forecast the next 20 time steps, the model will generate 20 consecutive predictions at once. Multistep forecasting allows the model to consider a broader view of the time series, capturing long-term trends and patterns more effectively. Since the model predicts multiple time steps ahead, the potential for error accumulation is reduced, leading to more accurate long-term forecasts. However, as the model predicts multiple steps at once, it may not be as agile to adapt to real-time changes in the data. The choice between these two approaches depends on the specific requirements and characteristics of the time series forecasting task.

  • The Sequence Length parameter impacts the outcome of a time series forecasting model and can be defined as the number of past time steps to use as input to predict the next time step. If the sequence length is n, the model will take the last n time steps as input to forecast the next time step. The parameter value cannot be larger than the total number of input time steps that remain after excluding validation time steps.

  • Rather than building an independent forecast model at each location of the space-time cube, this tool trains a single global forecast model that uses training data from each location. This global model will be used to forecast future values at every location using the Forecast Using Time Series Model tool.

  • The Output Features parameter value will be added to the Contents pane with rendering based on the final forecasted time step.

  • Example use cases for this tool include training a model to predict demand for retail products based on historical sales data, training a model to predict the spread of diseases, or training a model to predict generation of wind power based on historical production and weather data.

  • Deciding how many time steps to exclude for validation is important. The more time steps that are excluded, the fewer time steps there will be to estimate the validation RMSE. If too few time steps are excluded, the validation RMSE will be estimated using a small amount of data and may be misleading. Exclude as many time steps as possible while maintaining sufficient time steps to estimate the validation RMSE. Withhold at least as many time steps for validation as the number of time steps you intend to forecast if the space-time cube has enough time steps to support this.

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Time Series Data

The netCDF cube containing the variable that will be used to forecast to future time steps. This file must have an .nc file extension and must have been created using the Create Space Time Cube By Aggregating Points, Create Space Time Cube From Defined Locations, or Create Space Time Cube From Multidimensional Raster Layer tool.

File
Output Model

The output folder location that will store the trained model. The trained model will be saved as a deep learning package file (.dlpk).

Folder
Analysis Variable

The numeric variable in the dataset that will be forecasted to future time steps.

String
Sequence Length

The number of previous time steps that will be used when training the model. If the data contains seasonality (repeating cycles), provide the length corresponding to one season.

  • If the Multi-Step parameter is unchecked, the value of this parameter should be less than or equal to the total number of input time steps remaining after excluding the Number Of Time Steps to Exclude for Validation parameter value.
  • If the Multi-Step parameter is checked, 1.5 times the Sequence Length value should be less than or equal to the total number of time steps after excluding the Number Of Time Steps to Exclude for Validation parameter value.

Long
Explanatory Training Variables
(Optional)

Independent variables from the data that will be used to train the model. Check the Categorical check box for any variables that represent classes or categories

Value Table
Max Epochs
(Optional)

The maximum number of epochs for which the model will be trained. The default is 20.

Long
Number Of Time Steps to Exclude for Validation
(Optional)

The number of time steps that will be excluded for validation. For example, if a value of 14 is specified, the last 14 rows in the data frame will be used as validation data. The default is 10 percent of total timesteps. Ideally it should not be less than 5 percent of the total time steps in the input time cube.

  • If the Multi-Step parameter is unchecked, this parameter value should be less than 25 percent of the total number of records in the input space-time cube.
  • If the Multi-Step parameter is checked, this parameter value should be less than or equal to half of the Sequence Length parameter value.

Long
Model Type
(Optional)

Specifies the model architecture that will be used for training the model.

  • InceptionTimeThe InceptionTime architecture that will be used for training the model. This is the default.
  • ResNetThe ResNet architecture that will be used for training the model.
  • ResCNNThe ResCNN architecture that will be used for training the model.
  • FCNThe FCN architecture that will be used for training the model.
  • LSTMThe LSTM architecture that will be used for training the model.
  • TimeSeriesTransformerThe TimeSeriesTransformer architecture that will be used for training the model.
String
Batch Size
(Optional)

The number of samples that will be processed at one time. The default is 64.

Depending on the computer's GPU, this number can be changed to 8, 16, 32, 64, and so on.

Long
Model Arguments
(Optional)

Additional model arguments that will be used specific to each model. These arguments can be used to adjust the model complexity and size. See How Time Series forecasting models work to understand the model architecture, the supported model arguments, and their default values.

Value Table
Stop training when model no longer improves
(Optional)

Specifies whether the model training will stop when validation loss does not register improvement after five consecutive epochs.

  • Checked—The model training will stop when validation loss does not register improvement after five consecutive epochs. This is the default.
  • Unchecked—The model training will continue until the maximum number of epochs has been reached.

Boolean
Output Feature Class
(Optional)

The output feature class of all locations in the space-time cube with forecasted values stored as fields. The feature class will be created using prediction of the trained model on the validation dataset. The output displays the forecast for the final time step and contains pop-up charts showing the time series forecast on the validation set.

Feature Class
Output Cube
(Optional)

An output space-time cube (.nc file) containing the values of the input space-time cube with the forecasted values for the corresponding validation time steps replaced.

File
Multi-Step
(Optional)

Specifies whether a one-step or multistep approach will be used for training the multivariate time series forecasting model.

  • Checked—The model training will use a multistep approach.
  • Unchecked—The model training will use the traditional one-step approach. This is the default.

Boolean

Derived Output

LabelExplanationData Type
Output Model File

The trained model that will be saved as a deep learning package file (.dlpk) in the output model folder.

File

arcpy.geoai.TrainTimeSeriesForecastingModel(in_cube, out_model, analysis_variable, sequence_length, {explanatory_variables}, {max_epochs}, {validation_timesteps}, {model_type}, {batch_size}, {arguments}, {early_stopping}, {out_features}, {out_cube}, {multistep})
NameExplanationData Type
in_cube

The netCDF cube containing the variable that will be used to forecast to future time steps. This file must have an .nc file extension and must have been created using the Create Space Time Cube By Aggregating Points, Create Space Time Cube From Defined Locations, or Create Space Time Cube From Multidimensional Raster Layer tool.

File
out_model

The output folder location that will store the trained model. The trained model will be saved as a deep learning package file (.dlpk).

Folder
analysis_variable

The numeric variable in the dataset that will be forecasted to future time steps.

String
sequence_length

The number of previous time steps that will be used when training the model. If the data contains seasonality (repeating cycles), provide the length corresponding to one season.

  • If the multistep parameter value is False, the value of this parameter should be less than or equal to the total number of input time steps remaining after excluding the validation_timesteps parameter value.
  • If the multistep parameter value is True, 1.5 times the value of sequence_length should be less than or equal to the total number of time steps after excluding the validation_timesteps parameter value.

Long
explanatory_variables
[explanatory_variables,...]
(Optional)

Independent variables from the data that will be used to train the model. Use a True value after any variables that represent classes or categories.

Value Table
max_epochs
(Optional)

The maximum number of epochs for which the model will be trained. The default is 20.

Long
validation_timesteps
(Optional)

The number of time steps that will be excluded for validation. For example, if a value of 14 is specified, the last 14 rows in the data frame will be used as validation data. The default is 10 percent of total timesteps. Ideally it should not be less than 5 percent of the total time steps in the input time cube.

  • If the multistep parameter value is False, this parameter value should be less than 25 percent of the total number of records in the input space-time cube.
  • If the multistep parameter value is True, this parameter value should be less than or equal to half of the sequence_length parameter value.

Long
model_type
(Optional)

Specifies the model architecture that will be used for training the model.

  • InceptionTimeThe InceptionTime architecture that will be used for training the model. This is the default.
  • ResNetThe ResNet architecture that will be used for training the model.
  • ResCNNThe ResCNN architecture that will be used for training the model.
  • FCNThe FCN architecture that will be used for training the model.
  • LSTMThe LSTM architecture that will be used for training the model.
  • TimeSeriesTransformerThe TimeSeriesTransformer architecture that will be used for training the model.
String
batch_size
(Optional)

The number of samples that will be processed at one time. The default is 64.

Depending on the computer's GPU, this number can be changed to 8, 16, 32, 64, and so on.

Long
arguments
[arguments,...]
(Optional)

Additional model arguments that will be used specific to each model. These arguments can be used to adjust the model complexity and size. See How Time Series forecasting models work to understand the model architecture, the supported model arguments, and their default values.

Value Table
early_stopping
(Optional)

Specifies whether the model training will stop when validation loss does not register improvement after five consecutive epochs.

  • TRUEThe model training will stop when validation loss does not register improvement after five consecutive epochs. This is the default.
  • FALSEThe model training will continue until the maximum number of epochs has been reached.
Boolean
out_features
(Optional)

The output feature class of all locations in the space-time cube with forecasted values stored as fields. The feature class will be created using prediction of the trained model on the validation dataset. The output displays the forecast for the final time step and contains pop-up charts showing the time series forecast on the validation set.

Feature Class
out_cube
(Optional)

An output space-time cube (.nc file) containing the values of the input space-time cube with the forecasted values for the corresponding validation time steps replaced.

File
multistep
(Optional)

Specifies whether a one-step or multistep approach will be used for training the multivariate time series forecasting model.

  • TRUEThe model training will use a multistep approach.
  • FALSEThe model training will use the traditional one-step approach. This is the default.
Boolean

Derived Output

NameExplanationData Type
out_model_file

The trained model that will be saved as a deep learning package file (.dlpk) in the output model folder.

File

Code sample

TrainTimeSeriesForecastingModel example (stand-alone script)

This example shows how to use the TrainTimeSeriesForecastingModel function.

# Name: TrainTimeSeriesForecastingModel.py
# Description: Train a time series model on space-time cube data with
# different AI models.
  
# Import system modules                                                                                                                                                                                                                                                                                                                    
import arcpy
import os

# Set local variables
datapath  = "path_to_data_for_forecasting" 
out_path = "path_to_gdb_for_forecasting"

model_path = os.path.join(out_path, "model")
in_cube = os.path.join(datapath, "test_data")
out_features = os.path.join(out_path, "forecasted_feature.gdb", "forecasted")

# Run TrainTimeSeriesForecastingModel
arcpy.geoai.TrainTimeSeriesForecastingModel(
        in_cube,
        model_path,
        "CONSUMPTION",
        12,
        None,
        20,
        2,
        "InceptionTime",
        64,
        None,
        True,
        out_features
    )