Extract Entities Using Deep Learning (GeoAI)

Summary

Runs a trained named entity recognizer model on text files in a folder to extract entities and locations (such as addresses, place or person names, dates, and monetary values) in a table. If the extracted entities contain an address, the tool geocodes the addresses using the specified locator and produces a feature class as an output.

Learn more about how Entity Recognition works

Usage

  • This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in AllSource, see Install deep learning frameworks for ArcGIS.

  • This tool requires a model definition file containing trained model information. The model can be trained using the Train Text Classification Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files must be stored locally.

  • This tool can run on CPU or GPU. However, deep learning is computationally expensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Folder

The folder containing the text files on which named entity extraction will be performed.

Folder
Output Table

The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses.

Feature Class; Table
Input Model Definition File

The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) stored locally.

File
Model Arguments
(Optional)

Additional arguments, such as a confidence threshold, that will be used to adjust the sensitivity of the model.

The names of the arguments will be populated by the tool.

Value Table
Batch Size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double
Location Zone
(Optional)

The geographic region or zone in which the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area in which the address is located and produces better results.

String
Input Locator
(Optional)

The locator that will be used to geocode addresses found in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Address Locator

arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, in_model_definition_file, {model_arguments}, {batch_size}, {location_zone}, {in_locator})
NameExplanationData Type
in_folder

The folder containing the text files on which named entity extraction will be performed.

Folder
out_table

The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses.

Feature Class; Table
in_model_definition_file

The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) stored locally.

File
model_arguments
[model_arguments,...]
(Optional)

Additional arguments, such as a confidence threshold, that will be used to adjust the sensitivity of the model.

The names of the arguments will be populated by the tool.

Value Table
batch_size
(Optional)

The number of training samples that will be processed at one time. The default value is 4.

Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size.

Double
location_zone
(Optional)

The geographic region or zone in which the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area in which the address is located and produces better results.

String
in_locator
(Optional)

The locator that will be used to geocode addresses found in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Address Locator

Code sample

ExtractEntitiesUsingDeepLearning (Python window)

The following Python window script demonstrates how to use the ExtractEntitiesUsingDeepLearning function.

# Name: ExtractEntities.py
# Description: Extract useful entities like "Address", "Date" from text.  
#
# Requirements: ArcGIS Pro Advanced license

# Import system modules
import arcpy
import os

arcpy.env.workspace = "C:/textanalysisexamples/data"
dbpath = "C:/textanalysisexamples/Text_analysis_tools.gdb"

# Set local variables
in_folder = 'test_data'
out_table = os.path.join(dbpath, "ExtractedEntities")

pretrained_model_path_emd = "c:\\extractentities\\EntityRecognizer.emd"

# Run Extract Entities Using Deep Learning
arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, pretrained_model_path_emd)

Environments