Label | Explanation | Data Type |
Input Folder | The folder containing the text files on which named entity extraction will be performed. | Folder |
Output Table | The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses. | Feature Class; Table |
Input Model Definition File | The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) stored locally. | File |
Model Arguments (Optional) |
Additional arguments, such as a confidence threshold, that will be used to adjust the sensitivity of the model. The names of the arguments will be populated by the tool. | Value Table |
Batch Size
(Optional) | The number of training samples that will be processed at one time. The default value is 4. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
Location Zone
(Optional) | The geographic region or zone in which the addresses are expected to be located. The specified text will be appended to the address extracted by the model. The locator uses the location zone information to identify the region or geographic area in which the address is located and produces better results. | String |
Input Locator
(Optional) | The locator that will be used to geocode addresses found in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class. | Address Locator |
Summary
Runs a trained named entity recognizer model on text files in a folder to extract entities and locations (such as addresses, place or person names, dates, and monetary values) in a table. If the extracted entities contain an address, the tool geocodes the addresses using the specified locator and produces a feature class as an output.
Usage
This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS AllSource, see Install deep learning frameworks for ArcGIS.
This tool requires a model definition file containing trained model information. The model can be trained using the Train Text Classification Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files must be stored locally.
This tool can run on CPU or GPU. However, deep learning is computationally expensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
Parameters
arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, in_model_definition_file, {model_arguments}, {batch_size}, {location_zone}, {in_locator})
Name | Explanation | Data Type |
in_folder | The folder containing the text files on which named entity extraction will be performed. | Folder |
out_table | The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses. | Feature Class; Table |
in_model_definition_file | The trained model that will be used for classification. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) stored locally. | File |
model_arguments [model_arguments,...] (Optional) |
Additional arguments, such as a confidence threshold, that will be used to adjust the sensitivity of the model. The names of the arguments will be populated by the tool. | Value Table |
batch_size (Optional) | The number of training samples that will be processed at one time. The default value is 4. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
location_zone (Optional) | The geographic region or zone in which the addresses are expected to be located. The specified text will be appended to the address extracted by the model. The locator uses the location zone information to identify the region or geographic area in which the address is located and produces better results. | String |
in_locator (Optional) | The locator that will be used to geocode addresses found in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class. | Address Locator |
Code sample
The following Python window script demonstrates how to use the ExtractEntitiesUsingDeepLearning function.
# Name: ExtractEntities.py
# Description: Extract useful entities like "Address", "Date" from text.
#
# Requirements: ArcGIS Pro Advanced license
# Import system modules
import arcpy
import os
arcpy.env.workspace = "C:/textanalysisexamples/data"
dbpath = "C:/textanalysisexamples/Text_analysis_tools.gdb"
# Set local variables
in_folder = 'test_data'
out_table = os.path.join(dbpath, "ExtractedEntities")
pretrained_model_path_emd = "c:\\extractentities\\EntityRecognizer.emd"
# Run Extract Entities Using Deep Learning
arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, pretrained_model_path_emd)