As part of the ArcGIS LocateXT extension, the ArcGIS AllSource Extract Locations pane allows you to scan documents and text for spatial coordinates and custom locations. Open the map to which you want to add the locations that are found. Points representing the locations are stored in a feature class and are added as a layer to the active map.
Open the Extract Locations pane
A map must be active in ArcGIS AllSource to open the Extract Locations pane.
- Create or open a map. For example, on the Map tab, in the Insert group, click New Map.
- On the Data tab, in the Import group, click Extract Locations .
The Extract Locations pane appears.
Extract locations
In the Extract Locations pane, you can specify the following on the Extract tab:
- The files, folders, or text that will be scanned for locations
- The name of the map layer and output feature class that will be created or updated
- The coordinate system of the output feature class, when one is created
Each time you extract locations from documents or text, you can choose whether a feature class is created and a new layer is added to the active map, if an existing map layer and feature class are updated, or if an existing feature class is overwritten.
Add a new layer to the map
A feature class is created to store the extracted locations. A map layer is created in the active map to display the contents of the feature class.
- Open the Extract Locations pane.
- Provide a name for the new map layer and feature class that will be created by doing one of the following:
- Type a name for the new map layer and feature class in the Name text box. A new feature class is created with this name in the project's default geodatabase.
- Click the Browse button , and in the New Feature Class dialog box, browse to the location where you want to create a feature class or shapefile. Type a name for the new item in the Name text box and click Save.
Caution:
If you select an existing feature class instead of providing a name for a new feature class, a warning appears in the Extract Locations pane. The existing feature class is deleted, and a new feature class with the same name is created. Other maps may be affected.
- Click the Coordinate System drop-down list or the Select coordinate system button and click the coordinate system you want to use for the output feature class.
The coordinate system of the input features is specified independently on the Coordinates tab and in the custom locations file. The locations that are found are transformed to the output feature class's coordinate system.
- Click the Files and Folders tab and specify the items to scan for locations.
- Drag files and folders from Windows Explorer onto the tab.
- Click Browse, and on the Add Files and Folders dialog box, browse to and select the appropriate files or folders and click OK. Click Add More to add files and folders to the list.
- Click the Text tab and specify the text to scan for locations.
- Copy text from a document, email, or web page, and paste it on the tab.
- Select the text to scan in a document, email, or web page and drag it to ArcGIS AllSource and onto the tab.
- Click Extract.
You can cancel the process at any time. A message appears at the bottom of the pane when the process is complete indicating if it was successful.
The specified feature class is created and locations that are found are stored in the feature class as points. A map layer referencing the feature class is added to the active map. If no locations are found in the documents and text, the feature class and map layer will be empty.
Note:
If you chose to overwrite an existing feature class that was previously added to the map, a new map layer is created and added to the map that accesses the new feature class.
To extract locations from a different set of documents or text captured from a different location, click Clear All Input at the bottom of the Extract tab. All files are removed from the list on the Files and Folders tab, and all text is removed from the Text tab. Specify a new set of items to process.
Update an existing layer in the map
You can progressively add locations to an existing feature class. For example, every week you can process a new set of reports and add locations from those files to the existing set. Or, after processing a sample set of documents, when you are satisfied with the results, you can process additional documents and add those locations to the existing feature class.
- Open the Extract Locations pane.
- Click the Name drop-down list and click the existing map layer to update.
Locations extracted from the documents and text will be added to the existing feature class referenced by the map layer. The controls used to specify the coordinate system of the output feature class will be disabled.
- Click the Files and Folders tab and specify the items to scan for locations.
- Click the Text tab and specify the text to scan for locations.
- Click Extract.
The Field Matching panel appears in the Extract Locations pane.
- Specify the field in the existing layer's attribute table to store the information extracted from the documents and text.
The full set of fields that can be populated in the output feature class are described below.
- If no fields in the existing feature class can store the extracted information, click Back and select a different output layer or create a layer instead.
- When you are satisfied with the match between the existing layer's fields and the fields of information that are extracted from the documents and text, click OK.
You can cancel the process at any time. A message appears at the bottom of the pane when the process is complete indicating if it was successful.
If locations are found when the documents and text are scanned, those locations are added to the specified feature class. The existing map layer and its attribute table are updated to show the new locations.
Review the extracted locations
After documents and text have been scanned and the output feature class has been created, the output map layer is added to the map and selected in the Contents pane. Click a location that was found to learn more about it. The pop-up window shows the location that was extracted, the document it was extracted from, and information extracted from the document around the location that provides context. Open the layer's attribute table to compare the full range of locations that were found. As you assess the data, you can delete locations beyond your current scope or export a subset of locations that represent your primary interest.
The Extract Locations pane uses various default settings to recognize the most common locations. When you have a better understanding of the locations present in the data, you can adjust those settings on the Properties tab to extract additional locations or more focused information in the output fields.
Learn about the settings used to extract locations and attributes
Output field definitions
When a new output feature class is created to store the extracted locations, the feature class will have the following default fields and any additional fields defined by a custom attributes file:
Learn about custom attributes files
Field name | Field alias | Data type | Description |
---|---|---|---|
Name | Name | Text—50 characters, by default | The name of the file that was processed, or Text to indicate text was processed. The size is controlled by settings on the Output tab. |
Pre_Text | Pre-Text | Text—254 characters, by default | An excerpt of the file or text preceding the location that was found. The size is controlled by settings on the Output tab. |
Ext_Text | Extracted Text | Text—120 characters, by default | The location that was found, as it was found in the file or text, for example, 52.825°N, 169.944°W for a spatial coordinate, or LAX for a custom location that associates an airport code with a spatial coordinate. The size is controlled by settings on the Output tab. |
Ext_Type | Extracted Type | Text—50 characters, by default | The type of location that was found, for example, a decimal degrees (DD) coordinate. When a custom location is found, the location defined in the custom location file that was matched is recorded. The size is controlled by settings on the Output tab. |
Post_Text | Post-Text | Text—254 characters, by default | An excerpt of the file or text following the location that was found. The size is controlled by settings on the Output tab. |
Precision | Precision (m) | Long | For spatial coordinates, the level of precision on the ground to which the location is accurate, in meters. For example, a decimal degrees coordinate with many decimal places will be more accurate and have a smaller distance. For custom locations, the number of letters that did not match when comparing the original text to the matched location. When fuzzy match is disabled, an exact match is required and the value is 0. When enabled and the misspelled location Redalnds is matched to Redlands, the value is 2. |
Std_Coord | Stand. Coord. | Text—30 characters | A standardized version of the extracted location, for example, 52.825000N 169.944000W. The format of this coordinate is controlled by settings on the Output tab. |
First_Date | First Date | Date | The first date found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached. |
Early_Date | Earliest Date | Date | The oldest date that was found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached. |
Late_Date | Latest Date | Date | The most recent date found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached. |
All_Dates | All Dates | Text—254 characters, by default | A comma-delimited list of all dates found in the text, if dates are extracted. Otherwise, the field contains null values. All dates are standardized in yyyy-mm-dd format. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached. If the comma-delimited list of dates is too large for the size of this field, the list will be truncated. The size is controlled by settings on the Output tab. |
ExDateText | Extracted Date Text | Text—254 characters, by default | The dates that were found, as they were found in the file or text, for example, August 18, 2019 or 2/3/2020. If the comma-delimited list of dates is too large for the size of this field, the list will be truncated. The size is controlled by settings on the Output tab. |
Filename | Filename | Text—254 characters, by default | The full path to the file that was processed, or a null value if text was processed. You can choose the files to process or skip. The size is controlled by settings on the Output tab. |
File_Type | File Type | Text—10 characters, by default | The format of the file that was processed, or a null value if text was processed. You can choose to process specific file types. The size is controlled by settings on the Output tab. |
Modified | Modified (UTC) | Text—20 characters | The date and time when the file was last modified in yyyy-mm-dd hh:mm:ss format. |
Scanned | Scanned (UTC) | Text—20 characters | The date and time when the file was processed in yyyy-mm-dd hh:mm:ss format. |
Evaluate results
The first time you scan a document, you may not get the locations you expect. Two log files can be created in addition to the output map layer and feature class: a scan log and an invalid coordinates log. If you provided a document as input and you know its content, and the number of locations created in the output feature class does not match the number you expect, the log files can help you assess the results.
After documents and text have been scanned and the output feature class has been created, a message appears at the bottom of the Extract Locations pane indicating the process has completed successfully. The message includes links to the log files, which are temporary. To keep them for future review, open the files and save them to a permanent location such as the project's home folder. For example, add the name of the map layer or feature class with which the log file is associated.
Scan log
Click the View scan log link in the message at the bottom of the Extract Locations pane to open the scan log file. For each document that is scanned, the log indicates the following information:
- The document's file name and its location on the local or network computer
- A message indicating a problem was encountered when scanning the document, if appropriate
- How many potential locations were found
- How many unique dates were found
A potential location is text found in the document's content that resembles a spatial coordinate or a custom location. When text is provided as input, a file name and location are not provided in the scan log, but the rest of the information in the log file is the same.
If you expected nine locations to be extracted but only six locations were created as output, for example, the scan log can explain what happened. The log may indicate only six possible locations were found based on the current settings in the Extract Locations pane. The log may also indicate more dates were found than expected—a coordinate may have been interpreted as a date. Adjust the settings before attempting to extract locations from the document again.
Invalid coordinates log
An invalid coordinates log is created if a potential location was evaluated and found to be invalid. Click View bad coordinates log to open it.
The invalid coordinates log indicates the following:
- The document in which the potential location was found
- The original text that was determined to be a potential location
- The coordinate format that was used to evaluate the location
For example, if a latitude and longitude coordinate was found but the latitude of the coordinate is greater than 90 degrees, the coordinate is considered invalid. You may find the potential locations in the document were evaluated using a different coordinate format than you expected. Adjust the settings before attempting to extract locations from the document again.
If you do not find the invalid coordinates log helpful, you can uncheck the Log invalid coordinates check box on the Coordinates tab so invalid coordinates are not recorded for the spatial coordinate formats you are using.