You can use the HF Text Classification model in the Classify Text Using Deep Learning tool available in the GeoAI toolbox in ArcGIS Pro. Follow the steps below to use the model for text classification.
Classify text
Complete the following steps to classify text:
- Download the HF Text Classification pretrained model from ArcGIS Living Atlas of the World.
- The input data will be modified by the tool: new output columns will be added to the same input table. Ensure that the table is added to the current geodatabase.
- Browse to Tools on the Analysis tab.
- Click the Toolboxes tab in the Geoprocessing pane, select GeoAI Tools, and browse to the Classify Text Using Deep Learning tool under Text Analysis.
- Set the variables on the Parameters tab as follows:
- Input Table —The input point, line, or polygon feature class, or table containing the text to be classified.
- Text Field —The text field within the input feature class or table that contains the text to be classified.
- Input Model Definition File —Select the model .dlpk file.
- Class Label Field —The field name that will store the classification result of the text in the output table. The default field name is ClassLabel.
- Model Arguments—Change the values of the arguments if
required.
- huggingface_id—The model ID of a pretrained text classification model hosted on huggingface.co.
Text Classification models can be filtered by selecting the Text Classification tag under the Tasks section within the Natural Language Processing category on the Hugging Face model hub, as shown below:
The model ID follows the format {username}/{repository}, as displayed at the top of the model's page:
Only those models that have config.json are supported. This file can be verified under the Files and versions tab of the model page, as shown below:
- multilabel—Set this variable to True to enable multilabel classification, otherwise, set it to False.
- Confidence Threshold —Specifies the minimum confidence score a predicted class must have to be included in the output.
- huggingface_id—The model ID of a pretrained text classification model hosted on huggingface.co.
- Batch Size
—The number of rows to be processed at once. Increasing the batch
size can improve tool performance; however, as the batch size
increases, more memory is used.
- Set the variables on the Environments tab as follows:
- Processor Type—Select CPU or GPU.
It is recommended that you select GPU, if available, and set GPU ID to specify the GPU to be used.
- Processor Type—Select CPU or GPU.
- Click Run.
The output columns are added to the input table.