Classify Text Using Deep Learning (GeoAI)

Summary

Runs a trained text classification model on a text field in a feature class or table and updates each record with an assigned class or category label with each class having a confidence value.

Learn more about how Text Classification works

Usage

  • This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS AllSource, see Install deep learning frameworks for ArcGIS.

  • This tool requires a model definition file containing model information. The model can be trained using the Train Text Classification tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files can be stored locally or hosted on ArcGIS Living Atlas of the World.

  • This tool supports models trained using transformer-based backbones and the Mistral backbone. To install the Mistral backbone, see ArcGIS Mistral Backbone.

  • This tool supports the use of third-party language models created using the model extensibility feature. The model extensibility feature enables text classification tasks using a custom deep learning model file (.dlpk) that is not created using the Train Text Classification tool. To learn more about creating a custom deep learning (.dlpk) model file, see Use third party language models with ArcGIS.

  • This tool can run on CPU or GPU; however, deep learning is computationally intensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Table

The input point, line, or polygon feature class, or table containing the text that will be classified and labelled.

Feature Layer; Table View
Text Field

A text field in the input feature class or table that contains the text that will be classified.

Field
Input Model Definition File

The trained model that will be used for classification. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
Class Label Field
(Optional)

The name of the field that will contain the class or category label assigned by the model. The default field name is ClassLabel.

String
Model Arguments
(Optional)

Additional arguments that will be used by the model while performing inference. The supported model arguments include sequence_length and confidence_threshold, which will be used to adjust the model's output. The confidence_threshold model argument is only applicable to multilabel text classification.

Note:

When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS.

Value Table
Get explanation for every prediction
(Optional)

Specifies whether SHAP explanations will be generated. The time it takes to generate an explanation will depend on the length of the input.

  • Checked—A SHAP explanation will be generated for each row in the output table.
  • Unchecked—No SHAP explanation will be generated. This is the default.
Boolean
Batch Size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double

Derived Output

LabelExplanationData Type
Updated Table

The output point, line, or polygon feature class, or table containing the classified and labelled text derived from the input data along with the confidence value for each class.

Table View; Feature Layer

arcpy.geoai.ClassifyTextUsingDeepLearning(in_table, text_field, in_model_definition_file, {class_label_field}, {model_arguments}, {explain}, {batch_size})
NameExplanationData Type
in_table

The input point, line, or polygon feature class, or table containing the text that will be classified and labelled.

Feature Layer; Table View
text_field

A text field in the input feature class or table that contains the text that will be classified.

Field
in_model_definition_file

The trained model that will be used for classification. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
class_label_field
(Optional)

The name of the field that will contain the class or category label assigned by the model. The default field name is ClassLabel.

String
model_arguments
[model_arguments,...]
(Optional)

Additional arguments that will be used by the model while performing inference. The supported model arguments include sequence_length and confidence_threshold, which will be used to adjust the model's output. The confidence_threshold model argument is only applicable to multilabel text classification.

Note:

When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS.

Value Table
explain
(Optional)

Specifies whether SHAP explanations will be generated. The time it takes to generate an explanation will depend on the length of the input.

  • ENABLE_SHAPA SHAP explanation will be generated for each row in the output table.
  • DISABLE_SHAPNo SHAP explanation will be generated. This is the default.
Boolean
batch_size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double

Derived Output

NameExplanationData Type
updated_table

The output point, line, or polygon feature class, or table containing the classified and labelled text derived from the input data along with the confidence value for each class.

Table View; Feature Layer

Code sample

ClassifyTextUsingDeepLearning (stand-alone script)

The following example demonstrates how to use the ClassifyTextUsingDeepLearning function.

# Name: ClassifyText.py
# Description: Classify text into multiple classes
#
# Requirements: ArcGIS Pro Advanced license

# Import system modules
import arcpy

arcpy.env.workspace = "C:/textanalysisexamples/data"

# Set local variables
in_table = "TextClassifierData"
pretrained_model_path_emd = "c:\\classifydata\\TextClassifier.emd"

# Run Classify Text Using Deep Learning
arcpy.geoai.ClassifyTextUsingDeepLearning(
    in_table, "Address", pretrained_model_path_emd)

Environments