Extract Entities Using Deep Learning (GeoAI)

Summary

Runs a trained named entity recognizer model on text files in a folder, or a text field in a feature class or table, to extract entities and locations (such as addresses, place or person names, dates, and monetary values) in a table. If the extracted entities contain an address, the tool geocodes the addresses using the specified locator and produces a feature class as an output.

Learn more about how Entity Recognition works

Usage

  • This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS AllSource, see Install deep learning frameworks for ArcGIS.

  • This tool requires a model definition file containing trained model information. The model can be trained using the Train Entity Recognition Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files can be stored locally or hosted on ArcGIS Living Atlas of the World.

  • This tool supports models trained using transformer-based backbones and the Mistral backbone. To install the Mistral backbone, see ArcGIS Mistral Backbone.

  • This tool supports the use of third-party language models created using the model extensibility feature. The model extensibility feature enables entity extraction tasks using a custom deep learning model file (.dlpk) that is not created using the Train Entity Recognition Model tool. To learn more about creating a custom deep learning (.dlpk) model file, see Use third party language models with ArcGIS.

  • This tool can run on CPU or GPU; however, deep learning is computationally intensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Folder or Table

The input to this parameter can be either of the following:

  • A feature class or table containing the text column on which named entity extraction will be performed.
  • A folder containing the text files on which named entity extraction will be performed.
Folder; Feature Layer; Table View; Feature Class
Output Table

The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses.

Feature Class; Table; Feature Layer
Input Model Definition File

The trained model that will be used to extract entities from text. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
Model Arguments
(Optional)

Additional arguments that will be used by the model while performing inference. The supported model argument is sequence_length, which will be used to adjust the model's output.

Note:

When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS.

Value Table
Batch Size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double
Location Zone
(Optional)

The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results.

String
Input Locator
(Optional)

The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Address Locator
Text Field

A text field in the input feature class or table that contains the text that will be used by the model as input. This parameter is required when the Input Folder or Table parameter value is a feature class or table.

Field

arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, in_model_definition_file, {model_arguments}, {batch_size}, {location_zone}, {in_locator}, text_field)
NameExplanationData Type
in_folder

The input to this parameter can be either of the following:

  • A feature class or table containing the text column on which named entity extraction will be performed.
  • A folder containing the text files on which named entity extraction will be performed.
Folder; Feature Layer; Table View; Feature Class
out_table

The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses.

Feature Class; Table; Feature Layer
in_model_definition_file

The trained model that will be used to extract entities from text. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
model_arguments
[model_arguments,...]
(Optional)

Additional arguments that will be used by the model while performing inference. The supported model argument is sequence_length, which will be used to adjust the model's output.

Note:

When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS.

Value Table
batch_size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double
location_zone
(Optional)

The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results.

String
in_locator
(Optional)

The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Address Locator
text_field

A text field in the input feature class or table that contains the text that will be used by the model as input. This parameter is required when the in_folder parameter value is a feature class or table.

Field

Code sample

ExtractEntitiesUsingDeepLearning (stand-alone script)

The following example demonstrates how to use the ExtractEntitiesUsingDeepLearning function.

# Name: ExtractEntities.py
# Description: Extract useful entities such as "Address", "Date" from text.  

# Import system modules
import arcpy
import os

arcpy.env.workspace = "C:/textanalysisexamples/data"
dbpath = "C:/textanalysisexamples/Text_analysis_tools.gdb"

# Set local variables
in_folder = 'test_data'
out_table = os.path.join(dbpath, "ExtractedEntities")

pretrained_model_path_emd = "c:\\extractentities\\EntityRecognizer.emd"

# Run Extract Entities Using Deep Learning
arcpy.geoai.ExtractEntitiesUsingDeepLearning(
    in_folder, out_table, pretrained_model_path_emd)

Environments