Classify Text Using Deep Learning (GeoAI)

Summary

Runs a trained text classification model on a text field in a feature class or table and updates each record with an assigned class or category label with each class having a confidence value.

Learn more about how Text Classification works

Usage

  • This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS AllSource, see Install deep learning frameworks for ArcGIS.

  • This tool requires a model definition file containing trained model information. The model can be trained using the Train Text Classification Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files can be stored locally or hosted on ArcGIS Living Atlas.

  • This tool can run on CPU or GPU. However, deep learning is computationally intensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Table

The input point, line, or polygon feature class, or table, containing the text that will be classified and labelled.

Feature Layer; Table View
Text Field

A text field in the input feature class or table that contains the text that will be classified.

Field
Input Model Definition File

The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

File
Class Label Field
(Optional)

The name of the field that will contain the class or category label assigned by the model. The default field name is ClassLabel.

String
Model Arguments
(Optional)

Additional arguments, such as sequence_length or confidence_threshold, that will be used to adjust the model's output.

The names of the arguments will be populated by the tool.

Note:

The model argument confidence_threshold is only applicable for multilabel text classification.

Value Table
Get explanation for every prediction
(Optional)

Specifies whether SHAP explanations will be generated. The time to generate an explanation will depend on the length of the input.

  • Checked—A SHAP explanation will be generated for each row in the output table.
  • Unchecked—No SHAP explanation will be generated. This is the default.
Boolean
Batch Size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double

Derived Output

LabelExplanationData Type
Updated Table

The output point, line, or polygon feature class, or table, containing the classified and labelled text derived from the input data along with the confidence value for each class.

Table View; Feature Layer

arcpy.geoai.ClassifyTextUsingDeepLearning(in_table, text_field, in_model_definition_file, {class_label_field}, {model_arguments}, {explain}, {batch_size})
NameExplanationData Type
in_table

The input point, line, or polygon feature class, or table, containing the text that will be classified and labelled.

Feature Layer; Table View
text_field

A text field in the input feature class or table that contains the text that will be classified.

Field
in_model_definition_file

The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

File
class_label_field
(Optional)

The name of the field that will contain the class or category label assigned by the model. The default field name is ClassLabel.

String
model_arguments
[model_arguments,...]
(Optional)

Additional arguments, such as sequence_length or confidence_threshold, that will be used to adjust the model's output.

The names of the arguments will be populated by the tool.

Note:

The model argument confidence_threshold is only applicable for multilabel text classification.

Value Table
explain
(Optional)
  • ENABLE_SHAPA SHAP explanation will be generated for each row in the output table.
  • DISABLE_SHAPNo SHAP explanation will be generated. This is the default.
Boolean
batch_size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double

Derived Output

NameExplanationData Type
updated_table

The output point, line, or polygon feature class, or table, containing the classified and labelled text derived from the input data along with the confidence value for each class.

Table View; Feature Layer

Code sample

ClassifyTextUsingDeepLearning (Python window)

The following Python window script demonstrates how to use the ClassifyTextUsingDeepLearning function.

# Name: ClassifyText.py
# Description: Classify text into multiple classes
#
# Requirements: ArcGIS Pro Advanced license

# Import system modules
import arcpy

arcpy.env.workspace = "C:/textanalysisexamples/data"

# Set local variables
in_table = "TextClassifierData"
pretrained_model_path_emd = "c:\\classifydata\\TextClassifier.emd"

# Run Classify Text Using Deep Learning
arcpy.geoai.ClassifyTextUsingDeepLearning(in_table, "Address", pretrained_model_path_emd)

Environments